feat: benchmark model picker, category grouping, stats benchmark results #20
Loading…
Reference in a new issue
No description provided.
Delete branch "feat/benchmark-model-picker"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Stacks on #19 (HuggingFace model management). Retarget to main after #19 merges.
Summary
GET /api/benchmark/models— installed models grouped by adapter type (ZeroShotAdapter / RerankerAdapter / GenerationAdapter / Unknown)GET /api/benchmark/run— newmodel_namesquery param (comma-separated); passes--modelsflag to benchmark script when a subset is selectedGET /api/stats— newbenchmark_resultsfield surfacingbenchmark_results.jsoncontentTest plan
pytest tests/test_api.py tests/test_models.py)Backend (app/api.py): - GET /api/benchmark/models — returns installed models grouped by adapter type (ZeroShotAdapter, RerankerAdapter, GenerationAdapter, Unknown); reads _MODELS_DIR via app.models so test overrides are respected - GET /api/benchmark/run — add model_names query param (comma-separated); when set, passes --models <names...> to benchmark_classifier.py - GET /api/stats — add benchmark_results field from benchmark_results.json Frontend: - BenchmarkView: collapsible Model Selection panel with per-category checkboxes, select-all per category (supports indeterminate state), collapsed summary badge ("All models (N)" or "N of M selected"); model_names only sent when a strict subset is selected - StatsView: Benchmark Results table (accuracy, macro_f1, weighted_f1) with best-model highlighting per metric; hidden when no results exist