feat: benchmark model picker, category grouping, stats benchmark results #20

Merged
pyr0ball merged 1 commit from feat/benchmark-model-picker into main 2026-04-08 23:07:10 -07:00
Owner

Stacks on #19 (HuggingFace model management). Retarget to main after #19 merges.

Summary

  • GET /api/benchmark/models — installed models grouped by adapter type (ZeroShotAdapter / RerankerAdapter / GenerationAdapter / Unknown)
  • GET /api/benchmark/run — new model_names query param (comma-separated); passes --models flag to benchmark script when a subset is selected
  • GET /api/stats — new benchmark_results field surfacing benchmark_results.json content
  • BenchmarkView: collapsible Model Selection panel with per-category checkboxes + select-all (with indeterminate state); badge shows "All models (N)" or "N of M selected"
  • StatsView: Benchmark Results table (accuracy / macro_f1 / weighted_f1) with best-model highlight; hidden when no results

Test plan

  • 66/66 tests pass (pytest tests/test_api.py tests/test_models.py)
  • Model Selection panel appears collapsed on Benchmark tab
  • After adding a model via Models tab: panel shows categories + checkboxes
  • Deselect some models, run benchmark — only selected models run
  • After benchmark run, Stats tab shows Benchmark Results table with highlighted best model
  • No benchmark_results.json → Benchmark Results section hidden on Stats
Stacks on #19 (HuggingFace model management). Retarget to main after #19 merges. ## Summary - `GET /api/benchmark/models` — installed models grouped by adapter type (ZeroShotAdapter / RerankerAdapter / GenerationAdapter / Unknown) - `GET /api/benchmark/run` — new `model_names` query param (comma-separated); passes `--models` flag to benchmark script when a subset is selected - `GET /api/stats` — new `benchmark_results` field surfacing `benchmark_results.json` content - BenchmarkView: collapsible Model Selection panel with per-category checkboxes + select-all (with indeterminate state); badge shows "All models (N)" or "N of M selected" - StatsView: Benchmark Results table (accuracy / macro_f1 / weighted_f1) with best-model highlight; hidden when no results ## Test plan - [ ] 66/66 tests pass (`pytest tests/test_api.py tests/test_models.py`) - [ ] Model Selection panel appears collapsed on Benchmark tab - [ ] After adding a model via Models tab: panel shows categories + checkboxes - [ ] Deselect some models, run benchmark — only selected models run - [ ] After benchmark run, Stats tab shows Benchmark Results table with highlighted best model - [ ] No benchmark_results.json → Benchmark Results section hidden on Stats
pyr0ball added 1 commit 2026-04-08 23:04:19 -07:00
Backend (app/api.py):
- GET /api/benchmark/models — returns installed models grouped by adapter
  type (ZeroShotAdapter, RerankerAdapter, GenerationAdapter, Unknown);
  reads _MODELS_DIR via app.models so test overrides are respected
- GET /api/benchmark/run — add model_names query param (comma-separated);
  when set, passes --models <names...> to benchmark_classifier.py
- GET /api/stats — add benchmark_results field from benchmark_results.json

Frontend:
- BenchmarkView: collapsible Model Selection panel with per-category
  checkboxes, select-all per category (supports indeterminate state),
  collapsed summary badge ("All models (N)" or "N of M selected");
  model_names only sent when a strict subset is selected
- StatsView: Benchmark Results table (accuracy, macro_f1, weighted_f1)
  with best-model highlighting per metric; hidden when no results exist
pyr0ball changed target branch from feat/hf-model-queue to main 2026-04-08 23:07:02 -07:00
pyr0ball merged commit 49ec85706c into main 2026-04-08 23:07:10 -07:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Circuit-Forge/avocet#20
No description provided.