avocet

History

pyr0ball dffb1d0d7a feat: cf-orch LLM benchmark integration (Phase 1) Backend (app/cforch.py — new APIRouter at /api/cforch): - GET /tasks — reads bench_tasks.yaml, returns tasks + deduplicated types - GET /models — reads bench_models.yaml, returns model list with service/tags - GET /run — SSE endpoint; spawns cf-orch benchmark.py subprocess with --filter-tasks, --filter-tags, --coordinator, --ollama-url; strips ANSI codes; emits progress/result/complete/error events; 409 guard on concurrency - GET /results — returns latest bench_results/*/summary.json; 404 if none - POST /cancel — terminates running benchmark subprocess - All paths configurable via label_tool.yaml cforch: section - 13 tests; follows sft.py/models.py testability seam pattern Frontend: - BenchmarkView: mode toggle (Classifier / LLM Eval); LLM Eval panel with task picker (by type, select-all + indeterminate), model picker (by service), SSE run log, results table with best-per-column highlighting - StatsView: LLM Benchmark section showing quality_by_task_type table across models; hidden when no results; fetches /api/cforch/results on mount SFT candidate pipeline: cf-orch runs that produce sft_candidates.jsonl are auto-discovered by the existing bench_results_dir config in sft.py — no additional wiring needed.		2026-04-09 10:46:06 -07:00
..
__init__.py	feat: initial avocet repo — email classifier training tool	2026-02-27 14:07:38 -08:00
test_api.py	fix(avocet): narrow cancel except clause, clear stale cancel flags on new run	2026-03-15 18:13:01 -07:00
test_benchmark_classifier.py	fix(avocet): guard discover_finetuned_models against malformed/incomplete training_info.json	2026-03-15 15:18:13 -07:00
test_cforch.py	feat: cf-orch LLM benchmark integration (Phase 1)	2026-04-09 10:46:06 -07:00
test_classifier_adapters.py	fix(avocet): FineTunedAdapter GPU device routing + precise body truncation test	2026-03-15 10:56:47 -07:00
test_finetune.py	fix(avocet): move TorchDataset import to top; split sample_count into total+train	2026-03-15 16:02:43 -07:00
test_imap_fetch.py	feat: extract IMAP logic to app/imap_fetch.py for reuse by API	2026-03-04 11:42:22 -08:00
test_label_tool.py	refactor: consolidate HTML extraction into app/utils.py	2026-04-08 06:52:15 -07:00
test_models.py	feat: model compatibility warning on HF lookup	2026-04-09 09:48:55 -07:00
test_sft.py	feat: add failure_category field to SFT corrections (#16 )	2026-04-08 22:10:26 -07:00
test_sft_import.py	fix: log warning when sft record is missing id field	2026-04-08 07:30:46 -07:00