avocet/app
pyr0ball 7c304ebc45 feat: benchmark model picker, category grouping, stats benchmark results
Backend (app/api.py):
- GET /api/benchmark/models — returns installed models grouped by adapter
  type (ZeroShotAdapter, RerankerAdapter, GenerationAdapter, Unknown);
  reads _MODELS_DIR via app.models so test overrides are respected
- GET /api/benchmark/run — add model_names query param (comma-separated);
  when set, passes --models <names...> to benchmark_classifier.py
- GET /api/stats — add benchmark_results field from benchmark_results.json

Frontend:
- BenchmarkView: collapsible Model Selection panel with per-category
  checkboxes, select-all per category (supports indeterminate state),
  collapsed summary badge ("All models (N)" or "N of M selected");
  model_names only sent when a strict subset is selected
- StatsView: Benchmark Results table (accuracy, macro_f1, weighted_f1)
  with best-model highlighting per metric; hidden when no results exist
2026-04-08 23:03:56 -07:00
..
api.py feat: benchmark model picker, category grouping, stats benchmark results 2026-04-08 23:03:56 -07:00
imap_fetch.py refactor: consolidate HTML extraction into app/utils.py 2026-04-08 06:52:15 -07:00
models.py feat: HuggingFace model management tab 2026-04-08 22:32:35 -07:00
sft.py feat: add failure_category field to SFT corrections (#16) 2026-04-08 22:10:26 -07:00
utils.py fix: sft router — yaml error handling, none filter, shared jsonl utils, fixture restore 2026-04-08 14:07:09 -07:00