Commit graph

82 commits

Author SHA1 Message Date
71bf88d09b feat: implement results table, rating buttons, export UI, and a11y polish 2026-05-11 08:16:52 -07:00
bc4ca1095c feat: add embed-compare route, sidebar nav entry, and full input UI 2026-05-11 08:14:30 -07:00
258bbdc0af chore(deps): fix 10 Dependabot CVEs — vite 7.3.2, defu 6.1.7, yaml 2.8.4, picomatch 4.0.4, undici 7.25.0 2026-05-06 08:41:05 -07:00
32872d1ec6 fix: assigned-only state, remove dead HfNodeModelPanel prop, deduplicate yaml example 2026-05-05 22:11:02 -07:00
1521198cb1 fix: code quality fixes from review (SSE abort, aria-live, shared types, type safety)
- Add AbortController to SSE pull stream in OllamaModelPanel; abort on unmount
- Fix SSE loop: break on success/error events, call fetchModels() after the loop
- Add AbortController to fetchModels() and fetchProfile() one-shot fetches
- Add onUnmounted cleanup to both panel components
- Extract GpuEntry, ServiceInfo, NodeSummary to web/src/types/nodes.ts
- Remove duplicate interface definitions from NodeCard, GpuRow, NodeManagementView
- Fix aria-live regions: persistent container with v-if on inner span (avoids
  screen reader announcement miss on initial mount)
- Tighten STATE_LABELS/STATE_ICONS to Record<ServiceState, string> for exhaustiveness
- Add explicit (await r.json()) as NodeSummary[] cast in fetchNodes()
2026-05-05 21:35:13 -07:00
8dda040480 fix: move /nodes route immediately after /fleet per spec 2026-05-05 21:29:35 -07:00
bf675ed1f6 feat: add OllamaModelPanel and HfNodeModelPanel Vue components 2026-05-05 21:24:38 -07:00
0efd1aedbe feat: add NodeCard, GpuRow, ServiceBadge Vue components 2026-05-05 21:24:32 -07:00
4c225b94f5 feat: add /nodes route, AppSidebar nav item, and NodeManagementView 2026-05-05 21:24:27 -07:00
bce932461a feat: plans benchmark harness — model scoring for CF planning prompts
Adds benchmark_plans.py script, plans_bench API router, PlansBenchTab Vue
component, and registers /api/plans-bench in api.py. Also extends models
registry (cf-text catalog integration), cforch client, LlmEvalTab, and
ModelsView with cf-orch fleet support. Wires Planning mode into BenchmarkView.
2026-05-02 23:36:04 -07:00
e11db5ccd9 fix: align train job/results API envelope, config_json key, progress SSE, dashboard model_key
- GET /api/train/jobs now returns {"jobs":[...]} instead of bare array
- GET /api/train/results now returns {"results":[...]} instead of bare array
- POST /api/train/jobs body key renamed config -> config_json to match Pydantic model
- SSE log handler now handles 'progress' event type (backend never emits 'log')
- Dashboard _get_active_jobs() adds model_key to SELECT and return dict
- corrections.py docstring updated: both /api/corrections and /api/sft prefixes noted
- test_train.py assertions updated to unwrap new envelope shapes
2026-05-02 21:22:18 -07:00
13d1a394d5 fix: add loading state, widen nullable types, add API response guard in TrainResultsView 2026-05-02 20:49:34 -07:00
b077371107 feat: add TrainResultsView with training history table and Fleet registration links 2026-05-02 20:46:03 -07:00
53b25b27ab fix: surface cancel errors, fix SSE sentinel scroll, add missing test coverage in TrainJobsView 2026-05-02 20:33:03 -07:00
e014da2dec feat: add TrainJobsView with job queue, form submission, cancel, and SSE log streaming 2026-05-02 20:28:19 -07:00
c48db45d91 test: fix async flush and add mode-switch coverage in BenchmarkView 2026-05-02 19:35:02 -07:00
d0ba75b995 feat: extract CompareView at /eval/compare; remove Compare tab from BenchmarkView 2026-05-02 18:03:13 -07:00
a134af8b7b feat: add DashboardView with flywheel stage cards and CTA nudges 2026-05-02 16:50:24 -07:00
6ef6f06023 feat: restructure AppSidebar into two-domain nav with section headers and flywheel signal badges 2026-05-02 13:52:45 -07:00
5bdb095235 feat: restructure router into /data/* /eval/* /train/* domains with backward-compat redirects
- Export named `routes` array from router/index.ts for testability
- Move label/fetch/corrections/imitate under /data/* namespace
- Move benchmark/compare under /eval/* namespace
- Add /train/jobs and /train/results under /train/* namespace
- Add / -> DashboardView and /fleet -> ModelsView (replaces old / -> LabelView)
- Add backward-compat redirects for all old flat paths (/benchmark, /models, /stats, /label, /fetch, /corrections, /imitate)
- Add stub views for DashboardView, CompareView, TrainJobsView, TrainResultsView (implemented in later tasks)
- Add router.test.ts: 16 tests covering route structure and redirect targets
2026-05-02 13:00:04 -07:00
ea3da701c6 feat(models): extended model registry + manage.sh benchmark subcommands
- app/models.py: add StyleModel and VoiceModel entries; expand cf-text and
  benchmark model metadata (vram_mb, description, tags)
- tests/test_models.py: coverage for new model types and registry helpers
- ModelsView.vue: updated model browser with style/voice filter tabs
- manage.sh: add benchmark-style and benchmark-voice subcommands
- config/label_tool.yaml.example: add style + voice benchmark config stubs
- web/.gitignore: add node_modules and dist entries
2026-04-24 14:56:24 -07:00
ddb56efb89 refactor(bench): extract benchmark tabs — classifier, compare, llm-eval, style, voice
- BenchmarkView.vue: convert from monolithic view to tabbed shell; each tab is
  now its own component (ClassifierTab, CompareTab, LlmEvalTab, StyleTab, VoiceTab)
- StyleTab + VoiceTab: new benchmark modes for style and voice model evaluation
- app/style.py: FastAPI router for style imitation benchmarks
- app/voice.py: FastAPI router for voice benchmark endpoints
- scripts/benchmark_style.py + benchmark_voice.py: headless runner scripts
2026-04-24 14:56:17 -07:00
cc24cd0d7d feat(imitate): parallel cf-text fanout workers + signal-based cold-start detection
Backend:
- Run all cf-text model allocations concurrently via ThreadPoolExecutor + as_completed
- Announce model_start events upfront so the UI can show loading states immediately
- Replace timer-based startup polling with coordinator state signals: waits for
  state=="running" (success) or state=="stopped" (fail-fast) on the matching
  node/gpu instance; falls back to health poll after 6 consecutive probe misses
- Add /api/cforch/catalog endpoint: fetches live cf-text model list from cf-orch,
  filtering out proxy entries (ollama://, vllm://, http://) so only loadable models
  are returned

Frontend (ImitateView.vue):
- Show per-model loading spinners as results arrive via SSE stream
- Display cold-start badge when coordinator signals the model was freshly loaded
2026-04-24 14:56:09 -07:00
3299c0e23a feat: Imitate tab — pull CF product samples, compare LLM responses
Backend (app/imitate.py):
- GET /api/imitate/products — reads imitate: config, checks online status
- GET /api/imitate/products/{id}/sample — fetches real item from product API
- GET /api/imitate/run (SSE) — streams ollama responses for selected models
- POST /api/imitate/push-corrections — queues results in SFT corrections JSONL

Frontend (ImitateView.vue):
- Step 1: product picker grid (online/offline status, icon from config)
- Step 2: raw sample preview + editable prompt textarea
- Step 3: ollama model multi-select, temperature slider, SSE run with live log
- Step 4: response cards side by side, push to Corrections button

Wiring:
- app/api.py: include imitate_router at /api/imitate
- web/src/router: /imitate route + lazy import
- AppSidebar: Imitate nav entry (mirror icon)
- config/label_tool.yaml.example: imitate: section with peregrine example
- 16 unit tests (100% passing)

Also: BenchmarkView.vue Compare panel — side-by-side run diff for bench results
2026-04-09 20:12:57 -07:00
a271278dc9 feat(#10): env var LLM config + cf-orch coordinator auth
- _load_cforch_config() falls back to CF_ORCH_URL / CF_LICENSE_KEY /
  OLLAMA_HOST / OLLAMA_MODEL env vars when label_tool.yaml cforch: key
  is absent or empty (yaml wins when both present)
- CF_LICENSE_KEY forwarded to benchmark subprocess env so cf-orch agent
  can authenticate without it appearing in command args
- GET /api/cforch/config endpoint — returns resolved connection state;
  redacts license key (returns license_key_set bool only)
- SettingsView: connection status pill (cf-orch / Ollama / unconfigured)
  loaded from /api/cforch/config on mount; shows env vs yaml source
- .env.example documenting all relevant vars
- config/label_tool.yaml.example: full cforch: section with all keys
- environment.yml: add circuitforge-core>=0.9.0 dependency
- .gitignore: add .env
- 4 new tests (17 total in test_cforch.py); 136 passing overall

Closes #10
2026-04-09 12:26:44 -07:00
dffb1d0d7a feat: cf-orch LLM benchmark integration (Phase 1)
Backend (app/cforch.py — new APIRouter at /api/cforch):
- GET /tasks — reads bench_tasks.yaml, returns tasks + deduplicated types
- GET /models — reads bench_models.yaml, returns model list with service/tags
- GET /run — SSE endpoint; spawns cf-orch benchmark.py subprocess with
  --filter-tasks, --filter-tags, --coordinator, --ollama-url; strips ANSI
  codes; emits progress/result/complete/error events; 409 guard on concurrency
- GET /results — returns latest bench_results/*/summary.json; 404 if none
- POST /cancel — terminates running benchmark subprocess
- All paths configurable via label_tool.yaml cforch: section
- 13 tests; follows sft.py/models.py testability seam pattern

Frontend:
- BenchmarkView: mode toggle (Classifier / LLM Eval); LLM Eval panel with
  task picker (by type, select-all + indeterminate), model picker (by service),
  SSE run log, results table with best-per-column highlighting
- StatsView: LLM Benchmark section showing quality_by_task_type table across
  models; hidden when no results; fetches /api/cforch/results on mount

SFT candidate pipeline: cf-orch runs that produce sft_candidates.jsonl are
auto-discovered by the existing bench_results_dir config in sft.py — no
additional wiring needed.
2026-04-09 10:46:06 -07:00
ce12b29c94 feat: model compatibility warning on HF lookup
- GET /api/models/lookup now returns compatible: bool and warning: str|null
- compatible=false + warning when pipeline_tag is absent (no task tag on HF)
  or present but not in the supported adapter map
- Warning message names the unsupported pipeline_tag and lists supported types
- ModelsView: yellow compat-warning banner below preview description;
  Add button relabels to "Add anyway" with muted styling when incompatible
- test_models: accept 405 for path-traversal DELETE tests (StaticFiles mount
  returns 405 for non-GET methods when web/dist exists)
2026-04-09 09:48:55 -07:00
7c304ebc45 feat: benchmark model picker, category grouping, stats benchmark results
Backend (app/api.py):
- GET /api/benchmark/models — returns installed models grouped by adapter
  type (ZeroShotAdapter, RerankerAdapter, GenerationAdapter, Unknown);
  reads _MODELS_DIR via app.models so test overrides are respected
- GET /api/benchmark/run — add model_names query param (comma-separated);
  when set, passes --models <names...> to benchmark_classifier.py
- GET /api/stats — add benchmark_results field from benchmark_results.json

Frontend:
- BenchmarkView: collapsible Model Selection panel with per-category
  checkboxes, select-all per category (supports indeterminate state),
  collapsed summary badge ("All models (N)" or "N of M selected");
  model_names only sent when a strict subset is selected
- StatsView: Benchmark Results table (accuracy, macro_f1, weighted_f1)
  with best-model highlighting per metric; hidden when no results exist
2026-04-08 23:03:56 -07:00
b6b3d2c390 feat: HuggingFace model management tab
- New /api/models router: HF lookup, approval queue (JSONL persistence),
  SSE download progress via snapshot_download(), installed model listing,
  path-traversal-safe DELETE
- pipeline_tag → adapter type mapping (zero-shot-classification,
  sentence-similarity, text-generation)
- 27 tests covering all endpoints, duplicate detection, path traversal
- ModelsView.vue: HF lookup + add, approval queue, live download progress
  bars via SSE, installed model table with delete
- Sidebar entry (🤗 Models) between Benchmark and Corrections
2026-04-08 22:32:35 -07:00
9633d9a535 feat: add failure_category field to SFT corrections (#16)
Adds optional failure_category to SubmitRequest and candidate records so
reviewers can classify why a model response was wrong, not just what to do
with it. Enables the fine-tune harness to filter training data by failure
type (e.g. exclude scoring artifacts, train only on genuine wrong answers).

Taxonomy: scoring_artifact | style_violation | partial_answer |
          wrong_answer | format_error | hallucination

- app/sft.py: FailureCategory Literal type; SubmitRequest.failure_category;
  stored on candidate record in POST /submit correct branch
- tests/test_sft.py: 3 new tests (stores value, null round-trip, 422 on invalid)
- stores/sft.ts: SftFailureCategory type exported; SftQueueItem + SftLastAction
  updated; setLastAction accepts optional category param
- SftCard.vue: chip-group selector shown during correct/discard/flag flow;
  two-step confirm for discard/flag reveals chips before emitting; category
  forwarded in all emit payloads
- CorrectionsView.vue: handleCorrect/Discard/Flag accept and forward category
  to POST /api/sft/submit body and store.setLastAction
- SftCard.test.ts: 11 new tests covering chip visibility, selection,
  single-active enforcement, pending-action flow, emit payloads, cancel
2026-04-08 22:10:26 -07:00
f17aae3bd2 feat: add dev command for hot-reload (uvicorn --reload + Vite HMR)
- manage.sh: dev command starts uvicorn --reload on :8503 and Vite dev
  server (auto-port from 5173); kills API on EXIT/INT/TERM trap
- manage.sh: ENV_UI defaults to 'cf' env (overridable via AVOCET_ENV)
- vite.config.ts: add server.proxy to forward /api to :8503 so Vite
  dev server can reach the backend without CORS issues
2026-04-08 19:43:40 -07:00
09e334359f fix: pessimistic submit/undo, config null-safe, load config on mount
- sft.py GET /config: use `or {}` guard so `sft: ~` (null YAML) doesn't
  return None instead of the default empty config
- CorrectionsView: convert handleCorrect/Discard/Flag and handleUndo from
  optimistic to pessimistic — queue mutation only happens after server
  confirms; failures leave item in queue so user can retry cleanly
- SettingsView: call loadSftConfig() on mount so saved bench_results_dir
  is populated instead of always starting empty
2026-04-08 18:49:38 -07:00
353d0a47a0 feat: Corrections tab — router, sidebar, settings, SFT config endpoints
- Add /corrections route to Vue router (lazy-loaded CorrectionsView)
- Add Corrections nav item (✍️) to AppSidebar after Benchmark
- Add cf-orch Integration section to SettingsView with bench_results_dir
  field, run scanner, and per-run import table
- Add GET /api/sft/config and POST /api/sft/config endpoints to app/sft.py
2026-04-08 18:29:22 -07:00
e63d77127b feat: CorrectionsView and useSftKeyboard composable 2026-04-08 15:26:13 -07:00
03e5f9f9b4 fix: guard null failure_reason render, fix mid-quality test description
- Add v-if guard on failure-reason <p> so null renders no element (not literal "null")
- Clarify mid-quality test description: score is 0.4 to <0.7 (exclusive upper bound)
- Add test: renders nothing for failure_reason when null (+1 → 14 SftCard tests)
2026-04-08 15:23:19 -07:00
e16ea95dcc fix: guard aria-describedby from rendering undefined string 2026-04-08 15:22:12 -07:00
8873920b83 feat: SftCard — quality chip, prompt collapsible, action buttons, correction area slot 2026-04-08 15:19:37 -07:00
2d939b77f9 feat: SftCorrectionArea — inline correction text area component 2026-04-08 15:16:45 -07:00
137a9dbb8e fix: nullable failure_reason, factory fixture for sft store tests 2026-04-08 15:14:29 -07:00
9c11916d81 feat: useSftStore — SftQueueItem type and Pinia store 2026-04-08 15:11:17 -07:00
0d252da2a0 feat(avocet): add cancel buttons for benchmark and fine-tune runs 2026-03-15 18:15:35 -07:00
5d68b0706f fix(avocet): use startsWith for error class in ft-log (consistent with benchmark log) 2026-03-15 16:14:47 -07:00
65548f4ddb feat(avocet): add fine-tune section and trained models badge row to BenchmarkView 2026-03-15 16:09:51 -07:00
a53f3a7341 feat(avocet): benchmark UI, label fixes, BenchmarkView with charts and SSE run 2026-03-15 09:39:37 -07:00
ce1b8c2215 fix(avocet): reset card element state when new item loads to clear previous animation inline styles 2026-03-08 07:44:02 -07:00
f1933ab51c feat(avocet): badge pop via Anime.js spring transition hook 2026-03-08 07:35:49 -07:00
6a898bbdee fix(avocet): constrain grid-active to 640px on wide viewports using left/right offsets 2026-03-08 07:26:46 -07:00
efc2d33de2 feat(avocet): animate bucket grid rise with Anime.js spring 2026-03-08 07:17:56 -07:00
5c6aa02998 fix(avocet): restore drag aura color feedback via updateAura in useCardAnimation 2026-03-08 07:14:24 -07:00
9302644259 feat(avocet): wire Anime.js card animation into EmailCardStack
Replace CSS keyframe dismiss classes and inline cardStyle/deltaX/deltaY
with useCardAnimation composable — pickup/setDragPosition/snapBack/animateDismiss
are now called from pointer event handlers and a dismissType watcher.
2026-03-08 07:07:58 -07:00