avocet

History

pyr0ball 7c304ebc45 feat: benchmark model picker, category grouping, stats benchmark results Backend (app/api.py): - GET /api/benchmark/models — returns installed models grouped by adapter type (ZeroShotAdapter, RerankerAdapter, GenerationAdapter, Unknown); reads _MODELS_DIR via app.models so test overrides are respected - GET /api/benchmark/run — add model_names query param (comma-separated); when set, passes --models <names...> to benchmark_classifier.py - GET /api/stats — add benchmark_results field from benchmark_results.json Frontend: - BenchmarkView: collapsible Model Selection panel with per-category checkboxes, select-all per category (supports indeterminate state), collapsed summary badge ("All models (N)" or "N of M selected"); model_names only sent when a strict subset is selected - StatsView: Benchmark Results table (accuracy, macro_f1, weighted_f1) with best-model highlighting per metric; hidden when no results exist		2026-04-08 23:03:56 -07:00
..
api.py	feat: benchmark model picker, category grouping, stats benchmark results	2026-04-08 23:03:56 -07:00
imap_fetch.py	refactor: consolidate HTML extraction into app/utils.py	2026-04-08 06:52:15 -07:00
models.py	feat: HuggingFace model management tab	2026-04-08 22:32:35 -07:00
sft.py	feat: add failure_category field to SFT corrections (#16 )	2026-04-08 22:10:26 -07:00
utils.py	fix: sft router — yaml error handling, none filter, shared jsonl utils, fixture restore	2026-04-08 14:07:09 -07:00