Adds corpus.db (corpus_sources, corpus_batches, corpus_entries), a FastAPI router
at /api/corpus with receive/label/skip/stats/export endpoints, and seeds consent
tokens for xanderland + orchard nodes from label_tool.yaml. PII flag excludes
entries from JSONL export. Closes avocet#61.
- dashboard: eval card now shows last run + score for all bench types
(classifier, LLM, style, plans) via new _get_recent_bench_runs()
- dashboard: skip cforch LLM-bench list summaries when scanning for
classifier best_macro_f1 (fixes _find_latest_classifier_bench)
- cforch: stale _BENCH_RUNNING flag now auto-resets if process exited;
idle timeout (120s via select) kills hung benchmark if node crashes
- api: add /api/finetune/{run,cancel} backward-compat shims while
ClassifierTab fine-tune section is migrated to TrainJobsView
- ClassifierTab: migrate all /api/benchmark/* paths to /api/cforch/*;
fix null-safety on results.models access; load fine-tuned models from
/api/train/results instead of /api/finetune/status
- CompareTab: extend model picker to include vllm + cf-text alongside
ollama, grouped by service; pre-select all LLM_SERVICES on load
- LlmEvalTab: null-safety on quality_by_task_type lookups
- models: AVOCET_MODELS_DIR env var overrides default models/ path
- Create app/nodes.py with _CONFIG_DIR testability seam, _load_config,
_profiles_dir, _profile_path, _load_profile, _get_ollama_url helpers,
and stub list_nodes endpoint returning [] when no coordinator_url is set
- Mount nodes router at /api/nodes-mgmt in app/api.py
- Add profiles_dir comment to config/label_tool.yaml.example cforch section
- Create tests/test_nodes.py with autouse fixture and two passing tests
Adds benchmark_plans.py script, plans_bench API router, PlansBenchTab Vue
component, and registers /api/plans-bench in api.py. Also extends models
registry (cf-text catalog integration), cforch client, LlmEvalTab, and
ModelsView with cf-orch fleet support. Wires Planning mode into BenchmarkView.
Replace 149-line api.py (with inline helpers, JSONL utilities, and ad-hoc
router registrations) with a 57-line pure factory. All business logic was
already extracted to domain modules in B1-B7; this removes the dead code
and adds the /api/corrections/* prefix alongside the /api/sft/* backward-
compat alias. Smoke tests updated to cover the new /api/corrections/ingest
and /api/dashboard routes.
Replaces the ad-hoc _running_procs dict in api.py with a persistent,
inspectable SQLite job queue. Removes old /api/finetune/* routes and
_best_cuda_device from api.py. Adds /api/train/* routes (list, create,
get, cancel, run SSE, results). 16 new tests all passing.
Backend:
- Run all cf-text model allocations concurrently via ThreadPoolExecutor + as_completed
- Announce model_start events upfront so the UI can show loading states immediately
- Replace timer-based startup polling with coordinator state signals: waits for
state=="running" (success) or state=="stopped" (fail-fast) on the matching
node/gpu instance; falls back to health poll after 6 consecutive probe misses
- Add /api/cforch/catalog endpoint: fetches live cf-text model list from cf-orch,
filtering out proxy entries (ollama://, vllm://, http://) so only loadable models
are returned
Frontend (ImitateView.vue):
- Show per-model loading spinners as results arrive via SSE stream
- Display cold-start badge when coordinator signals the model was freshly loaded
Backend (app/imitate.py):
- GET /api/imitate/products — reads imitate: config, checks online status
- GET /api/imitate/products/{id}/sample — fetches real item from product API
- GET /api/imitate/run (SSE) — streams ollama responses for selected models
- POST /api/imitate/push-corrections — queues results in SFT corrections JSONL
Frontend (ImitateView.vue):
- Step 1: product picker grid (online/offline status, icon from config)
- Step 2: raw sample preview + editable prompt textarea
- Step 3: ollama model multi-select, temperature slider, SSE run with live log
- Step 4: response cards side by side, push to Corrections button
Wiring:
- app/api.py: include imitate_router at /api/imitate
- web/src/router: /imitate route + lazy import
- AppSidebar: Imitate nav entry (mirror icon)
- config/label_tool.yaml.example: imitate: section with peregrine example
- 16 unit tests (100% passing)
Also: BenchmarkView.vue Compare panel — side-by-side run diff for bench results
Backend (app/api.py):
- GET /api/benchmark/models — returns installed models grouped by adapter
type (ZeroShotAdapter, RerankerAdapter, GenerationAdapter, Unknown);
reads _MODELS_DIR via app.models so test overrides are respected
- GET /api/benchmark/run — add model_names query param (comma-separated);
when set, passes --models <names...> to benchmark_classifier.py
- GET /api/stats — add benchmark_results field from benchmark_results.json
Frontend:
- BenchmarkView: collapsible Model Selection panel with per-category
checkboxes, select-all per category (supports indeterminate state),
collapsed summary badge ("All models (N)" or "N of M selected");
model_names only sent when a strict subset is selected
- StatsView: Benchmark Results table (accuracy, macro_f1, weighted_f1)
with best-model highlighting per metric; hidden when no results exist
- except clause in cancel_benchmark/cancel_finetune narrowed from Exception
to _subprocess.TimeoutExpired (C1)
- _cancelled_jobs.discard() called after registering new proc to prevent
a stale flag from a prior run masking errors (I2)
- local `import subprocess` removed from run_benchmark and
run_finetune_endpoint; all Popen calls updated to _subprocess.Popen (I1)
- test patch targets updated from subprocess.Popen to app.api._subprocess.Popen;
cancelled-event tests updated to set flag in proc.wait() side-effect so
the discard-on-new-run logic is exercised correctly
Adds POST /api/benchmark/cancel and POST /api/finetune/cancel endpoints
that terminate the running subprocess (kill on 3s timeout), and updates
the run generators to emit a cancelled SSE event instead of error when
the job was intentionally stopped.
Two bugs fixed:
1. Blank white page after vue SPA rebuild: browsers cached old index.html
referencing old asset hashes. Assets are deleted on rebuild, causing
404s for JS/CSS -> blank page. Fix: serve index.html with
Cache-Control: no-cache so browsers always fetch fresh HTML.
Hashed assets (/assets/chunk-abc123.js) remain cacheable forever.
2. Queue draining to empty on skip/discard: handleSkip and handleDiscard
never refilled the local queue buffer. After enough skips, store.current
went null and the empty state showed (blank-looking). Fix: both handlers
now call fetchBatch() when queue drops below 3, matching handleLabel.
Also: sync classifier_adapters LABELS to match current 10-label schema
(new_lead + hired, remove unrelated).
48 Python tests pass, 48 frontend tests pass.
- Add _item_id() (content hash) + _normalize() to map legacy JSONL fields
(from_addr/account/no-id) to Vue schema (from/source/id)
- All mutating endpoints now look up by _normalize(x)[id] — handles both
stored-id (test fixtures) and content-hash (real data) transparently
- Change uvicorn bind from 127.0.0.1 to 0.0.0.0 so LAN clients can connect