- dashboard: eval card now shows last run + score for all bench types
(classifier, LLM, style, plans) via new _get_recent_bench_runs()
- dashboard: skip cforch LLM-bench list summaries when scanning for
classifier best_macro_f1 (fixes _find_latest_classifier_bench)
- cforch: stale _BENCH_RUNNING flag now auto-resets if process exited;
idle timeout (120s via select) kills hung benchmark if node crashes
- api: add /api/finetune/{run,cancel} backward-compat shims while
ClassifierTab fine-tune section is migrated to TrainJobsView
- ClassifierTab: migrate all /api/benchmark/* paths to /api/cforch/*;
fix null-safety on results.models access; load fine-tuned models from
/api/train/results instead of /api/finetune/status
- CompareTab: extend model picker to include vllm + cf-text alongside
ollama, grouped by service; pre-select all LLM_SERVICES on load
- LlmEvalTab: null-safety on quality_by_task_type lookups
- models: AVOCET_MODELS_DIR env var overrides default models/ path
Adds benchmark_plans.py script, plans_bench API router, PlansBenchTab Vue
component, and registers /api/plans-bench in api.py. Also extends models
registry (cf-text catalog integration), cforch client, LlmEvalTab, and
ModelsView with cf-orch fleet support. Wires Planning mode into BenchmarkView.
- GET /api/models/lookup now returns compatible: bool and warning: str|null
- compatible=false + warning when pipeline_tag is absent (no task tag on HF)
or present but not in the supported adapter map
- Warning message names the unsupported pipeline_tag and lists supported types
- ModelsView: yellow compat-warning banner below preview description;
Add button relabels to "Add anyway" with muted styling when incompatible
- test_models: accept 405 for path-traversal DELETE tests (StaticFiles mount
returns 405 for non-GET methods when web/dist exists)