avocet

Circuit-Forge/avocet

Fork 0

Commit graph

Author	SHA1	Message	Date
pyr0ball	9fdaeeb3d6	feat: multi-bench dashboard, API path migration, benchmark reliability fixes - dashboard: eval card now shows last run + score for all bench types (classifier, LLM, style, plans) via new _get_recent_bench_runs() - dashboard: skip cforch LLM-bench list summaries when scanning for classifier best_macro_f1 (fixes _find_latest_classifier_bench) - cforch: stale _BENCH_RUNNING flag now auto-resets if process exited; idle timeout (120s via select) kills hung benchmark if node crashes - api: add /api/finetune/{run,cancel} backward-compat shims while ClassifierTab fine-tune section is migrated to TrainJobsView - ClassifierTab: migrate all /api/benchmark/* paths to /api/cforch/*; fix null-safety on results.models access; load fine-tuned models from /api/train/results instead of /api/finetune/status - CompareTab: extend model picker to include vllm + cf-text alongside ollama, grouped by service; pre-select all LLM_SERVICES on load - LlmEvalTab: null-safety on quality_by_task_type lookups - models: AVOCET_MODELS_DIR env var overrides default models/ path	2026-05-11 09:05:12 -07:00
pyr0ball	ddb56efb89	refactor(bench): extract benchmark tabs — classifier, compare, llm-eval, style, voice - BenchmarkView.vue: convert from monolithic view to tabbed shell; each tab is now its own component (ClassifierTab, CompareTab, LlmEvalTab, StyleTab, VoiceTab) - StyleTab + VoiceTab: new benchmark modes for style and voice model evaluation - app/style.py: FastAPI router for style imitation benchmarks - app/voice.py: FastAPI router for voice benchmark endpoints - scripts/benchmark_style.py + benchmark_voice.py: headless runner scripts	2026-04-24 14:56:17 -07:00

Author

SHA1

Message

Date

pyr0ball

9fdaeeb3d6

feat: multi-bench dashboard, API path migration, benchmark reliability fixes

- dashboard: eval card now shows last run + score for all bench types
  (classifier, LLM, style, plans) via new _get_recent_bench_runs()
- dashboard: skip cforch LLM-bench list summaries when scanning for
  classifier best_macro_f1 (fixes _find_latest_classifier_bench)
- cforch: stale _BENCH_RUNNING flag now auto-resets if process exited;
  idle timeout (120s via select) kills hung benchmark if node crashes
- api: add /api/finetune/{run,cancel} backward-compat shims while
  ClassifierTab fine-tune section is migrated to TrainJobsView
- ClassifierTab: migrate all /api/benchmark/* paths to /api/cforch/*;
  fix null-safety on results.models access; load fine-tuned models from
  /api/train/results instead of /api/finetune/status
- CompareTab: extend model picker to include vllm + cf-text alongside
  ollama, grouped by service; pre-select all LLM_SERVICES on load
- LlmEvalTab: null-safety on quality_by_task_type lookups
- models: AVOCET_MODELS_DIR env var overrides default models/ path

2026-05-11 09:05:12 -07:00

pyr0ball

ddb56efb89

refactor(bench): extract benchmark tabs — classifier, compare, llm-eval, style, voice

- BenchmarkView.vue: convert from monolithic view to tabbed shell; each tab is
  now its own component (ClassifierTab, CompareTab, LlmEvalTab, StyleTab, VoiceTab)
- StyleTab + VoiceTab: new benchmark modes for style and voice model evaluation
- app/style.py: FastAPI router for style imitation benchmarks
- app/voice.py: FastAPI router for voice benchmark endpoints
- scripts/benchmark_style.py + benchmark_voice.py: headless runner scripts

2026-04-24 14:56:17 -07:00

2 commits