Commit graph

38 commits

Author SHA1 Message Date
391ebb3cd1 feat(recipe-scan): labeling UI for Kiwi vision training pipeline (closes #65)
- POST /api/recipe-scan/import — bulk ingest from Kiwi scanner pipeline, idempotent by item id
- GET /api/recipe-scan/next — oldest-first pending item for review
- POST /api/recipe-scan/items/{id}/approve|edit|reject — label actions
- GET /api/recipe-scan/stats — counts by status and modality
- GET /api/recipe-scan/export — JSONL training pairs (messages chat format, Option B: correction prompt + extracted draft → corrected ground truth)
- GET /api/recipe-scan/image — path-traversal-safe image serving from /Library/Assets/kiwi/
- SQLite at data/recipe_scan.db with WAL mode; separate from corpus.db lifecycle
- set_db_path() testability seam; 18 tests, all passing
- RecipeScanView.vue: two-column review UI (image left, JSON diff right), keyboard shortcuts A/E/R, toast feedback, stats header, export download
- Route /data/recipe-scan and sidebar nav entry added
2026-05-17 12:22:15 -07:00
2b990a603a feat: log corpus receiver — accept Turnstone push batches and label for logreading fine-tune
Adds corpus.db (corpus_sources, corpus_batches, corpus_entries), a FastAPI router
at /api/corpus with receive/label/skip/stats/export endpoints, and seeds consent
tokens for xanderland + orchard nodes from label_tool.yaml. PII flag excludes
entries from JSONL export. Closes avocet#61.
2026-05-11 17:07:54 -07:00
9fdaeeb3d6 feat: multi-bench dashboard, API path migration, benchmark reliability fixes
- dashboard: eval card now shows last run + score for all bench types
  (classifier, LLM, style, plans) via new _get_recent_bench_runs()
- dashboard: skip cforch LLM-bench list summaries when scanning for
  classifier best_macro_f1 (fixes _find_latest_classifier_bench)
- cforch: stale _BENCH_RUNNING flag now auto-resets if process exited;
  idle timeout (120s via select) kills hung benchmark if node crashes
- api: add /api/finetune/{run,cancel} backward-compat shims while
  ClassifierTab fine-tune section is migrated to TrainJobsView
- ClassifierTab: migrate all /api/benchmark/* paths to /api/cforch/*;
  fix null-safety on results.models access; load fine-tuned models from
  /api/train/results instead of /api/finetune/status
- CompareTab: extend model picker to include vllm + cf-text alongside
  ollama, grouped by service; pre-select all LLM_SERVICES on load
- LlmEvalTab: null-safety on quality_by_task_type lookups
- models: AVOCET_MODELS_DIR env var overrides default models/ path
2026-05-11 09:05:12 -07:00
95afddb772 feat: add nodes.py scaffold with set_config_dir and router mount
- Create app/nodes.py with _CONFIG_DIR testability seam, _load_config,
  _profiles_dir, _profile_path, _load_profile, _get_ollama_url helpers,
  and stub list_nodes endpoint returning [] when no coordinator_url is set
- Mount nodes router at /api/nodes-mgmt in app/api.py
- Add profiles_dir comment to config/label_tool.yaml.example cforch section
- Create tests/test_nodes.py with autouse fixture and two passing tests
2026-05-05 19:35:28 -07:00
bce932461a feat: plans benchmark harness — model scoring for CF planning prompts
Adds benchmark_plans.py script, plans_bench API router, PlansBenchTab Vue
component, and registers /api/plans-bench in api.py. Also extends models
registry (cf-text catalog integration), cforch client, LlmEvalTab, and
ModelsView with cf-orch fleet support. Wires Planning mode into BenchmarkView.
2026-05-02 23:36:04 -07:00
0904967320 feat: slim api.py to factory-only; all domain routes in dedicated modules
Replace 149-line api.py (with inline helpers, JSONL utilities, and ad-hoc
router registrations) with a 57-line pure factory. All business logic was
already extracted to domain modules in B1-B7; this removes the dead code
and adds the /api/corrections/* prefix alongside the /api/sft/* backward-
compat alias. Smoke tests updated to cover the new /api/corrections/ingest
and /api/dashboard routes.
2026-05-02 09:55:58 -07:00
aa742bcfc0 feat: add GET /api/dashboard flywheel aggregate endpoint 2026-05-01 23:30:04 -07:00
766fbafa02 feat: build SQLite-backed train job queue in app/train/train.py
Replaces the ad-hoc _running_procs dict in api.py with a persistent,
inspectable SQLite job queue. Removes old /api/finetune/* routes and
_best_cuda_device from api.py. Adds /api/train/* routes (list, create,
get, cancel, run SSE, results). 16 new tests all passing.
2026-05-01 23:05:11 -07:00
bccb385f61 feat: build app/eval/cforch.py aggregating eval benchmark routers 2026-05-01 22:23:06 -07:00
2054866ff1 feat: extract fetch routes and IMAP helpers into app/data/fetch.py 2026-05-01 21:57:31 -07:00
cbec776ef1 fix: restore ensure_ascii=False in utils jsonl helpers; remove dead _last_action from api.py 2026-05-01 20:59:44 -07:00
167d7351e3 feat: extract label queue API into app/data/label.py 2026-05-01 18:48:14 -07:00
cc24cd0d7d feat(imitate): parallel cf-text fanout workers + signal-based cold-start detection
Backend:
- Run all cf-text model allocations concurrently via ThreadPoolExecutor + as_completed
- Announce model_start events upfront so the UI can show loading states immediately
- Replace timer-based startup polling with coordinator state signals: waits for
  state=="running" (success) or state=="stopped" (fail-fast) on the matching
  node/gpu instance; falls back to health poll after 6 consecutive probe misses
- Add /api/cforch/catalog endpoint: fetches live cf-text model list from cf-orch,
  filtering out proxy entries (ollama://, vllm://, http://) so only loadable models
  are returned

Frontend (ImitateView.vue):
- Show per-model loading spinners as results arrive via SSE stream
- Display cold-start badge when coordinator signals the model was freshly loaded
2026-04-24 14:56:09 -07:00
3299c0e23a feat: Imitate tab — pull CF product samples, compare LLM responses
Backend (app/imitate.py):
- GET /api/imitate/products — reads imitate: config, checks online status
- GET /api/imitate/products/{id}/sample — fetches real item from product API
- GET /api/imitate/run (SSE) — streams ollama responses for selected models
- POST /api/imitate/push-corrections — queues results in SFT corrections JSONL

Frontend (ImitateView.vue):
- Step 1: product picker grid (online/offline status, icon from config)
- Step 2: raw sample preview + editable prompt textarea
- Step 3: ollama model multi-select, temperature slider, SSE run with live log
- Step 4: response cards side by side, push to Corrections button

Wiring:
- app/api.py: include imitate_router at /api/imitate
- web/src/router: /imitate route + lazy import
- AppSidebar: Imitate nav entry (mirror icon)
- config/label_tool.yaml.example: imitate: section with peregrine example
- 16 unit tests (100% passing)

Also: BenchmarkView.vue Compare panel — side-by-side run diff for bench results
2026-04-09 20:12:57 -07:00
dffb1d0d7a feat: cf-orch LLM benchmark integration (Phase 1)
Backend (app/cforch.py — new APIRouter at /api/cforch):
- GET /tasks — reads bench_tasks.yaml, returns tasks + deduplicated types
- GET /models — reads bench_models.yaml, returns model list with service/tags
- GET /run — SSE endpoint; spawns cf-orch benchmark.py subprocess with
  --filter-tasks, --filter-tags, --coordinator, --ollama-url; strips ANSI
  codes; emits progress/result/complete/error events; 409 guard on concurrency
- GET /results — returns latest bench_results/*/summary.json; 404 if none
- POST /cancel — terminates running benchmark subprocess
- All paths configurable via label_tool.yaml cforch: section
- 13 tests; follows sft.py/models.py testability seam pattern

Frontend:
- BenchmarkView: mode toggle (Classifier / LLM Eval); LLM Eval panel with
  task picker (by type, select-all + indeterminate), model picker (by service),
  SSE run log, results table with best-per-column highlighting
- StatsView: LLM Benchmark section showing quality_by_task_type table across
  models; hidden when no results; fetches /api/cforch/results on mount

SFT candidate pipeline: cf-orch runs that produce sft_candidates.jsonl are
auto-discovered by the existing bench_results_dir config in sft.py — no
additional wiring needed.
2026-04-09 10:46:06 -07:00
7c304ebc45 feat: benchmark model picker, category grouping, stats benchmark results
Backend (app/api.py):
- GET /api/benchmark/models — returns installed models grouped by adapter
  type (ZeroShotAdapter, RerankerAdapter, GenerationAdapter, Unknown);
  reads _MODELS_DIR via app.models so test overrides are respected
- GET /api/benchmark/run — add model_names query param (comma-separated);
  when set, passes --models <names...> to benchmark_classifier.py
- GET /api/stats — add benchmark_results field from benchmark_results.json

Frontend:
- BenchmarkView: collapsible Model Selection panel with per-category
  checkboxes, select-all per category (supports indeterminate state),
  collapsed summary badge ("All models (N)" or "N of M selected");
  model_names only sent when a strict subset is selected
- StatsView: Benchmark Results table (accuracy, macro_f1, weighted_f1)
  with best-model highlighting per metric; hidden when no results exist
2026-04-08 23:03:56 -07:00
b6b3d2c390 feat: HuggingFace model management tab
- New /api/models router: HF lookup, approval queue (JSONL persistence),
  SSE download progress via snapshot_download(), installed model listing,
  path-traversal-safe DELETE
- pipeline_tag → adapter type mapping (zero-shot-classification,
  sentence-similarity, text-generation)
- 27 tests covering all endpoints, duplicate detection, path traversal
- ModelsView.vue: HF lookup + add, approval queue, live download progress
  bars via SSE, installed model table with delete
- Sidebar entry (🤗 Models) between Benchmark and Corrections
2026-04-08 22:32:35 -07:00
597ffc7324 feat: sft router skeleton — /api/sft/runs and /api/sft/import 2026-04-08 13:54:58 -07:00
e38a28dcc3 fix(avocet): narrow cancel except clause, clear stale cancel flags on new run
- except clause in cancel_benchmark/cancel_finetune narrowed from Exception
  to _subprocess.TimeoutExpired (C1)
- _cancelled_jobs.discard() called after registering new proc to prevent
  a stale flag from a prior run masking errors (I2)
- local `import subprocess` removed from run_benchmark and
  run_finetune_endpoint; all Popen calls updated to _subprocess.Popen (I1)
- test patch targets updated from subprocess.Popen to app.api._subprocess.Popen;
  cancelled-event tests updated to set flag in proc.wait() side-effect so
  the discard-on-new-run logic is exercised correctly
2026-03-15 18:13:01 -07:00
0ab49609c0 feat(avocet): add cancel endpoints for benchmark and finetune jobs
Adds POST /api/benchmark/cancel and POST /api/finetune/cancel endpoints
that terminate the running subprocess (kill on 3s timeout), and updates
the run generators to emit a cancelled SSE event instead of error when
the job was intentionally stopped.
2026-03-15 18:09:20 -07:00
cbc382cc88 fix(avocet): reduce deberta-small VRAM + auto-select freest GPU for training
- deberta-small: batch_size 16→8 + grad_accum 1→2 (same effective batch),
  gradient_checkpointing=True (fp16 stays off: DeBERTa v3 disentangled
  attention overflows fp16 at the gather step)
- api: _best_cuda_device() picks highest free-VRAM GPU via nvidia-smi;
  sets CUDA_VISIBLE_DEVICES in subprocess env to prevent DataParallel
  replication across both GPUs; adds PYTORCH_ALLOC_CONF=expandable_segments
- SSE log now reports which GPU was selected
2026-03-15 17:09:06 -07:00
dd352f07cd fix(avocet): _MODELS_DIR overridable in tests; sanitize score paths against path traversal 2026-03-15 16:07:27 -07:00
903624a4b8 feat(avocet): add /api/finetune/status and /api/finetune/run endpoints 2026-03-15 16:04:34 -07:00
a53f3a7341 feat(avocet): benchmark UI, label fixes, BenchmarkView with charts and SSE run 2026-03-15 09:39:37 -07:00
07407117a5 feat: add GET /api/fetch/stream SSE endpoint for real-time IMAP progress 2026-03-04 12:05:23 -08:00
e5e66b09cc feat: add POST /api/accounts/test endpoint 2026-03-04 12:04:42 -08:00
47a2178ee4 feat: add GET /api/stats and GET /api/stats/download endpoints 2026-03-04 12:04:11 -08:00
3f0cd7e837 feat: add GET/POST /api/config endpoints for IMAP account management 2026-03-04 12:03:40 -08:00
3788254abd fix: prevent blank page on rebuild and queue drain on skip/discard
Two bugs fixed:

1. Blank white page after vue SPA rebuild: browsers cached old index.html
   referencing old asset hashes. Assets are deleted on rebuild, causing
   404s for JS/CSS -> blank page. Fix: serve index.html with
   Cache-Control: no-cache so browsers always fetch fresh HTML.
   Hashed assets (/assets/chunk-abc123.js) remain cacheable forever.

2. Queue draining to empty on skip/discard: handleSkip and handleDiscard
   never refilled the local queue buffer. After enough skips, store.current
   went null and the empty state showed (blank-looking). Fix: both handlers
   now call fetchBatch() when queue drops below 3, matching handleLabel.

Also: sync classifier_adapters LABELS to match current 10-label schema
(new_lead + hired, remove unrelated).

48 Python tests pass, 48 frontend tests pass.
2026-03-03 19:26:34 -08:00
cb9ebb805c fix(avocet): normalize queue schema + bind to 0.0.0.0 for LAN access
- Add _item_id() (content hash) + _normalize() to map legacy JSONL fields
  (from_addr/account/no-id) to Vue schema (from/source/id)
- All mutating endpoints now look up by _normalize(x)[id] — handles both
  stored-id (test fixtures) and content-hash (real data) transparently
- Change uvicorn bind from 127.0.0.1 to 0.0.0.0 so LAN clients can connect
2026-03-03 18:43:00 -08:00
01cc908eab fix(avocet): undo — commit-then-clear order, empty-records guard, skip dedup, stronger test 2026-03-03 15:41:58 -08:00
f4facc6484 feat(avocet): discard, undo, labels config, static serving — backend complete 2026-03-03 15:35:01 -08:00
5912b73705 feat(avocet): POST /api/skip endpoint 2026-03-03 15:21:32 -08:00
efd9d69692 fix(avocet): store original item in _last_action; add requirements.txt 2026-03-03 15:16:54 -08:00
ce202d97ea feat(avocet): POST /api/label endpoint 2026-03-03 15:14:04 -08:00
8898258055 feat(avocet): GET /api/queue endpoint 2026-03-03 15:00:59 -08:00
d36d0be166 fix(avocet): _write_jsonl empty-list writes empty file; add reset_last_action helper 2026-03-03 14:36:18 -08:00
f06114e648 feat(avocet): FastAPI skeleton + JSONL helpers 2026-03-03 13:30:28 -08:00