Commit graph

5 commits

Author SHA1 Message Date
7c304ebc45 feat: benchmark model picker, category grouping, stats benchmark results
Backend (app/api.py):
- GET /api/benchmark/models — returns installed models grouped by adapter
  type (ZeroShotAdapter, RerankerAdapter, GenerationAdapter, Unknown);
  reads _MODELS_DIR via app.models so test overrides are respected
- GET /api/benchmark/run — add model_names query param (comma-separated);
  when set, passes --models <names...> to benchmark_classifier.py
- GET /api/stats — add benchmark_results field from benchmark_results.json

Frontend:
- BenchmarkView: collapsible Model Selection panel with per-category
  checkboxes, select-all per category (supports indeterminate state),
  collapsed summary badge ("All models (N)" or "N of M selected");
  model_names only sent when a strict subset is selected
- StatsView: Benchmark Results table (accuracy, macro_f1, weighted_f1)
  with best-model highlighting per metric; hidden when no results exist
2026-04-08 23:03:56 -07:00
0d252da2a0 feat(avocet): add cancel buttons for benchmark and fine-tune runs 2026-03-15 18:15:35 -07:00
5d68b0706f fix(avocet): use startsWith for error class in ft-log (consistent with benchmark log) 2026-03-15 16:14:47 -07:00
65548f4ddb feat(avocet): add fine-tune section and trained models badge row to BenchmarkView 2026-03-15 16:09:51 -07:00
a53f3a7341 feat(avocet): benchmark UI, label fixes, BenchmarkView with charts and SSE run 2026-03-15 09:39:37 -07:00