Commit graph

63 commits

Author SHA1 Message Date
f2f150b4fb feat(classifier): implement EmbeddingKNNAdapter.load() and unload() 2026-05-05 07:12:53 -07:00
72449561cf feat(classifier): add EmbeddingKNNAdapter skeleton and constructor tests 2026-05-05 06:08:21 -07:00
c177fb1628 fix(classifier): quality fixes for DEFAULT_EXEMPLARS — remove forward __all__ entry, tighten tests, fix survey exemplar 2026-05-04 20:03:18 -07:00
3be5055e31 feat(classifier): add DEFAULT_EXEMPLARS for embedding k-NN fallback 2026-05-04 17:44:44 -07:00
78b64d007d feat(classifier): add _cosine() helper for embedding similarity 2026-05-04 17:41:45 -07:00
bce932461a feat: plans benchmark harness — model scoring for CF planning prompts
Adds benchmark_plans.py script, plans_bench API router, PlansBenchTab Vue
component, and registers /api/plans-bench in api.py. Also extends models
registry (cf-text catalog integration), cforch client, LlmEvalTab, and
ModelsView with cf-orch fleet support. Wires Planning mode into BenchmarkView.
2026-05-02 23:36:04 -07:00
e11db5ccd9 fix: align train job/results API envelope, config_json key, progress SSE, dashboard model_key
- GET /api/train/jobs now returns {"jobs":[...]} instead of bare array
- GET /api/train/results now returns {"results":[...]} instead of bare array
- POST /api/train/jobs body key renamed config -> config_json to match Pydantic model
- SSE log handler now handles 'progress' event type (backend never emits 'log')
- Dashboard _get_active_jobs() adds model_key to SELECT and return dict
- corrections.py docstring updated: both /api/corrections and /api/sft prefixes noted
- test_train.py assertions updated to unwrap new envelope shapes
2026-05-02 21:22:18 -07:00
0904967320 feat: slim api.py to factory-only; all domain routes in dedicated modules
Replace 149-line api.py (with inline helpers, JSONL utilities, and ad-hoc
router registrations) with a 57-line pure factory. All business logic was
already extracted to domain modules in B1-B7; this removes the dead code
and adds the /api/corrections/* prefix alongside the /api/sft/* backward-
compat alias. Smoke tests updated to cover the new /api/corrections/ingest
and /api/dashboard routes.
2026-05-02 09:55:58 -07:00
8fda821e15 feat: add POST /ingest endpoint to corrections API with Bearer auth
Adds IngestRequest model and POST /api/sft/ingest route to
app/data/corrections.py. Sibling CF products (Peregrine, Kiwi, etc.)
can push pre-approved corrections via Bearer token auth
(AVOCET_INGESTION_SECRET). Records land as status=approved in both
sft_candidates.jsonl and sft_approved.jsonl immediately.

7 tests in tests/test_data_corrections.py cover 503 (secret unset),
401 (missing/malformed header), 403 (wrong secret), happy-path writes
to both files, and optional label field.
2026-05-02 09:07:10 -07:00
aa742bcfc0 feat: add GET /api/dashboard flywheel aggregate endpoint 2026-05-01 23:30:04 -07:00
766fbafa02 feat: build SQLite-backed train job queue in app/train/train.py
Replaces the ad-hoc _running_procs dict in api.py with a persistent,
inspectable SQLite job queue. Removes old /api/finetune/* routes and
_best_cuda_device from api.py. Adds /api/train/* routes (list, create,
get, cancel, run SSE, results). 16 new tests all passing.
2026-05-01 23:05:11 -07:00
bccb385f61 feat: build app/eval/cforch.py aggregating eval benchmark routers 2026-05-01 22:23:06 -07:00
d74ad3f972 feat: move imitate API into app/data/imitate.py 2026-05-01 22:12:19 -07:00
99ea39fe38 feat: move SFT corrections API into app/data/corrections.py 2026-05-01 22:02:22 -07:00
2054866ff1 feat: extract fetch routes and IMAP helpers into app/data/fetch.py 2026-05-01 21:57:31 -07:00
cbec776ef1 fix: restore ensure_ascii=False in utils jsonl helpers; remove dead _last_action from api.py 2026-05-01 20:59:44 -07:00
167d7351e3 feat: extract label queue API into app/data/label.py 2026-05-01 18:48:14 -07:00
ea3da701c6 feat(models): extended model registry + manage.sh benchmark subcommands
- app/models.py: add StyleModel and VoiceModel entries; expand cf-text and
  benchmark model metadata (vram_mb, description, tags)
- tests/test_models.py: coverage for new model types and registry helpers
- ModelsView.vue: updated model browser with style/voice filter tabs
- manage.sh: add benchmark-style and benchmark-voice subcommands
- config/label_tool.yaml.example: add style + voice benchmark config stubs
- web/.gitignore: add node_modules and dist entries
2026-04-24 14:56:24 -07:00
3299c0e23a feat: Imitate tab — pull CF product samples, compare LLM responses
Backend (app/imitate.py):
- GET /api/imitate/products — reads imitate: config, checks online status
- GET /api/imitate/products/{id}/sample — fetches real item from product API
- GET /api/imitate/run (SSE) — streams ollama responses for selected models
- POST /api/imitate/push-corrections — queues results in SFT corrections JSONL

Frontend (ImitateView.vue):
- Step 1: product picker grid (online/offline status, icon from config)
- Step 2: raw sample preview + editable prompt textarea
- Step 3: ollama model multi-select, temperature slider, SSE run with live log
- Step 4: response cards side by side, push to Corrections button

Wiring:
- app/api.py: include imitate_router at /api/imitate
- web/src/router: /imitate route + lazy import
- AppSidebar: Imitate nav entry (mirror icon)
- config/label_tool.yaml.example: imitate: section with peregrine example
- 16 unit tests (100% passing)

Also: BenchmarkView.vue Compare panel — side-by-side run diff for bench results
2026-04-09 20:12:57 -07:00
dc246df42d test: fix test_tasks_parses_yaml for TaskEntry schema
TaskEntry now includes prompt/system fields (default ""). Switch from
exact dict comparison to field-by-field assertions so the test is
forward-compatible with optional schema additions.
2026-04-09 20:11:01 -07:00
891142570b feat(#14): default bench_results_dir + testability seam
- sft.py: _DEFAULT_BENCH_RESULTS_DIR set to circuitforge-orch bench
  results path; set_default_bench_results_dir() seam for test isolation
- test fixture resets default to tmp_path to avoid real-fs interference
- 136 tests passing

Closes #14
2026-04-09 12:28:38 -07:00
a271278dc9 feat(#10): env var LLM config + cf-orch coordinator auth
- _load_cforch_config() falls back to CF_ORCH_URL / CF_LICENSE_KEY /
  OLLAMA_HOST / OLLAMA_MODEL env vars when label_tool.yaml cforch: key
  is absent or empty (yaml wins when both present)
- CF_LICENSE_KEY forwarded to benchmark subprocess env so cf-orch agent
  can authenticate without it appearing in command args
- GET /api/cforch/config endpoint — returns resolved connection state;
  redacts license key (returns license_key_set bool only)
- SettingsView: connection status pill (cf-orch / Ollama / unconfigured)
  loaded from /api/cforch/config on mount; shows env vs yaml source
- .env.example documenting all relevant vars
- config/label_tool.yaml.example: full cforch: section with all keys
- environment.yml: add circuitforge-core>=0.9.0 dependency
- .gitignore: add .env
- 4 new tests (17 total in test_cforch.py); 136 passing overall

Closes #10
2026-04-09 12:26:44 -07:00
dffb1d0d7a feat: cf-orch LLM benchmark integration (Phase 1)
Backend (app/cforch.py — new APIRouter at /api/cforch):
- GET /tasks — reads bench_tasks.yaml, returns tasks + deduplicated types
- GET /models — reads bench_models.yaml, returns model list with service/tags
- GET /run — SSE endpoint; spawns cf-orch benchmark.py subprocess with
  --filter-tasks, --filter-tags, --coordinator, --ollama-url; strips ANSI
  codes; emits progress/result/complete/error events; 409 guard on concurrency
- GET /results — returns latest bench_results/*/summary.json; 404 if none
- POST /cancel — terminates running benchmark subprocess
- All paths configurable via label_tool.yaml cforch: section
- 13 tests; follows sft.py/models.py testability seam pattern

Frontend:
- BenchmarkView: mode toggle (Classifier / LLM Eval); LLM Eval panel with
  task picker (by type, select-all + indeterminate), model picker (by service),
  SSE run log, results table with best-per-column highlighting
- StatsView: LLM Benchmark section showing quality_by_task_type table across
  models; hidden when no results; fetches /api/cforch/results on mount

SFT candidate pipeline: cf-orch runs that produce sft_candidates.jsonl are
auto-discovered by the existing bench_results_dir config in sft.py — no
additional wiring needed.
2026-04-09 10:46:06 -07:00
ce12b29c94 feat: model compatibility warning on HF lookup
- GET /api/models/lookup now returns compatible: bool and warning: str|null
- compatible=false + warning when pipeline_tag is absent (no task tag on HF)
  or present but not in the supported adapter map
- Warning message names the unsupported pipeline_tag and lists supported types
- ModelsView: yellow compat-warning banner below preview description;
  Add button relabels to "Add anyway" with muted styling when incompatible
- test_models: accept 405 for path-traversal DELETE tests (StaticFiles mount
  returns 405 for non-GET methods when web/dist exists)
2026-04-09 09:48:55 -07:00
b6b3d2c390 feat: HuggingFace model management tab
- New /api/models router: HF lookup, approval queue (JSONL persistence),
  SSE download progress via snapshot_download(), installed model listing,
  path-traversal-safe DELETE
- pipeline_tag → adapter type mapping (zero-shot-classification,
  sentence-similarity, text-generation)
- 27 tests covering all endpoints, duplicate detection, path traversal
- ModelsView.vue: HF lookup + add, approval queue, live download progress
  bars via SSE, installed model table with delete
- Sidebar entry (🤗 Models) between Benchmark and Corrections
2026-04-08 22:32:35 -07:00
9633d9a535 feat: add failure_category field to SFT corrections (#16)
Adds optional failure_category to SubmitRequest and candidate records so
reviewers can classify why a model response was wrong, not just what to do
with it. Enables the fine-tune harness to filter training data by failure
type (e.g. exclude scoring artifacts, train only on genuine wrong answers).

Taxonomy: scoring_artifact | style_violation | partial_answer |
          wrong_answer | format_error | hallucination

- app/sft.py: FailureCategory Literal type; SubmitRequest.failure_category;
  stored on candidate record in POST /submit correct branch
- tests/test_sft.py: 3 new tests (stores value, null round-trip, 422 on invalid)
- stores/sft.ts: SftFailureCategory type exported; SftQueueItem + SftLastAction
  updated; setLastAction accepts optional category param
- SftCard.vue: chip-group selector shown during correct/discard/flag flow;
  two-step confirm for discard/flag reveals chips before emitting; category
  forwarded in all emit payloads
- CorrectionsView.vue: handleCorrect/Discard/Flag accept and forward category
  to POST /api/sft/submit body and store.setLastAction
- SftCard.test.ts: 11 new tests covering chip visibility, selection,
  single-active enforcement, pending-action flow, emit payloads, cancel
2026-04-08 22:10:26 -07:00
07807f0d05 feat: sft router — /export and /stats endpoints 2026-04-08 14:46:08 -07:00
f19cab60f7 feat: sft router — /queue, /submit, /undo endpoints 2026-04-08 14:22:06 -07:00
b330e84111 fix: sft router — yaml error handling, none filter, shared jsonl utils, fixture restore 2026-04-08 14:07:09 -07:00
597ffc7324 feat: sft router skeleton — /api/sft/runs and /api/sft/import 2026-04-08 13:54:58 -07:00
bbfae1a622 fix: log warning when sft record is missing id field 2026-04-08 07:30:46 -07:00
03dac57fd9 feat: sft_import.py — run discovery and JSONL deduplication 2026-04-08 07:13:37 -07:00
25880e377d refactor: consolidate HTML extraction into app/utils.py
Rename _strip_html/_extract_body to strip_html/extract_body (public API).
Remove duplicate _TextExtractor, strip_html, and _extract_body from
imap_fetch.py; import from app.utils instead. Update test_label_tool.py
to use the new public names.
2026-04-08 06:52:15 -07:00
ae0ac19505 chore: retire Streamlit app, scaffold sft branch
- Delete app/label_tool.py (Streamlit UI retired; Vue SPA is sole UI)
- Extract _strip_html and _extract_body into app/utils.py (stdlib-only, reusable)
- Update tests/test_label_tool.py import to app.utils
- Rename start-api/stop-api/restart-api/open-api → start/stop/restart/open in manage.sh
- Remove STREAMLIT variable and all Streamlit-specific case blocks from manage.sh
- Update manage.sh usage section to reflect Vue+FastAPI-only commands
- Add data/sft_candidates.jsonl and data/sft_approved.jsonl to .gitignore
- Add sft.bench_results_dir key to config/label_tool.yaml.example
2026-04-08 06:18:12 -07:00
e38a28dcc3 fix(avocet): narrow cancel except clause, clear stale cancel flags on new run
- except clause in cancel_benchmark/cancel_finetune narrowed from Exception
  to _subprocess.TimeoutExpired (C1)
- _cancelled_jobs.discard() called after registering new proc to prevent
  a stale flag from a prior run masking errors (I2)
- local `import subprocess` removed from run_benchmark and
  run_finetune_endpoint; all Popen calls updated to _subprocess.Popen (I1)
- test patch targets updated from subprocess.Popen to app.api._subprocess.Popen;
  cancelled-event tests updated to set flag in proc.wait() side-effect so
  the discard-on-new-run logic is exercised correctly
2026-03-15 18:13:01 -07:00
0ab49609c0 feat(avocet): add cancel endpoints for benchmark and finetune jobs
Adds POST /api/benchmark/cancel and POST /api/finetune/cancel endpoints
that terminate the running subprocess (kill on 3s timeout), and updates
the run generators to emit a cancelled SSE event instead of error when
the job was intentionally stopped.
2026-03-15 18:09:20 -07:00
dd352f07cd fix(avocet): _MODELS_DIR overridable in tests; sanitize score paths against path traversal 2026-03-15 16:07:27 -07:00
903624a4b8 feat(avocet): add /api/finetune/status and /api/finetune/run endpoints 2026-03-15 16:04:34 -07:00
48e02f2ed6 fix(avocet): move TorchDataset import to top; split sample_count into total+train 2026-03-15 16:02:43 -07:00
939ce06f45 feat(avocet): run_finetune, CLI, multi-score-file merge with last-write-wins dedup
- load_and_prepare_data() now accepts Path | list[Path]; single-Path callers unchanged
- Dedup by MD5(subject + body[:100]); last file/row wins (lets later runs correct labels)
- Prints summary line when duplicates are dropped
- Added _EmailDataset (TorchDataset wrapper), run_finetune(), and argparse CLI
- run_finetune() saves model + tokenizer + training_info.json with score_files provenance
- Stratified split guard: val set size clamped to at least n_classes (handles tiny example data)
- 3 new unit tests (merge, last-write-wins dedup, single-Path compat) + 1 integration test
- All 16 tests pass (15 unit + 1 integration)
2026-03-15 15:52:41 -07:00
4e70e79b26 fix(avocet): tighten body truncation test to exact 400-char assertion 2026-03-15 15:44:19 -07:00
de5794611b feat(avocet): add finetune data pipeline, class weights, WeightedTrainer
Implements load_and_prepare_data (JSONL ingestion with class filtering),
compute_class_weights (inverse-frequency, div-by-zero safe), compute_metrics_for_trainer
(macro F1 + accuracy), and WeightedTrainer.compute_loss (**kwargs-safe for
Transformers 4.38+ num_items_in_batch). All 12 tests pass.
2026-03-15 15:38:45 -07:00
d1a36bfd63 fix(avocet): guard discover_finetuned_models against malformed/incomplete training_info.json 2026-03-15 15:18:13 -07:00
df37a8e16d feat(avocet): auto-discover fine-tuned models in benchmark harness 2026-03-15 11:59:13 -07:00
179cb67e1c fix(avocet): FineTunedAdapter GPU device routing + precise body truncation test 2026-03-15 10:56:47 -07:00
dc321de59f feat(avocet): add FineTunedAdapter for local checkpoint inference 2026-03-15 10:54:38 -07:00
07407117a5 feat: add GET /api/fetch/stream SSE endpoint for real-time IMAP progress 2026-03-04 12:05:23 -08:00
e5e66b09cc feat: add POST /api/accounts/test endpoint 2026-03-04 12:04:42 -08:00
47a2178ee4 feat: add GET /api/stats and GET /api/stats/download endpoints 2026-03-04 12:04:11 -08:00
3f0cd7e837 feat: add GET/POST /api/config endpoints for IMAP account management 2026-03-04 12:03:40 -08:00