avocet

Author	SHA1	Message	Date
pyr0ball	bce932461a	feat: plans benchmark harness — model scoring for CF planning prompts Adds benchmark_plans.py script, plans_bench API router, PlansBenchTab Vue component, and registers /api/plans-bench in api.py. Also extends models registry (cf-text catalog integration), cforch client, LlmEvalTab, and ModelsView with cf-orch fleet support. Wires Planning mode into BenchmarkView.	2026-05-02 23:36:04 -07:00
pyr0ball	e11db5ccd9	fix: align train job/results API envelope, config_json key, progress SSE, dashboard model_key - GET /api/train/jobs now returns {"jobs":[...]} instead of bare array - GET /api/train/results now returns {"results":[...]} instead of bare array - POST /api/train/jobs body key renamed config -> config_json to match Pydantic model - SSE log handler now handles 'progress' event type (backend never emits 'log') - Dashboard _get_active_jobs() adds model_key to SELECT and return dict - corrections.py docstring updated: both /api/corrections and /api/sft prefixes noted - test_train.py assertions updated to unwrap new envelope shapes	2026-05-02 21:22:18 -07:00
pyr0ball	13d1a394d5	fix: add loading state, widen nullable types, add API response guard in TrainResultsView	2026-05-02 20:49:34 -07:00
pyr0ball	b077371107	feat: add TrainResultsView with training history table and Fleet registration links	2026-05-02 20:46:03 -07:00
pyr0ball	53b25b27ab	fix: surface cancel errors, fix SSE sentinel scroll, add missing test coverage in TrainJobsView	2026-05-02 20:33:03 -07:00
pyr0ball	e014da2dec	feat: add TrainJobsView with job queue, form submission, cancel, and SSE log streaming	2026-05-02 20:28:19 -07:00
pyr0ball	c48db45d91	test: fix async flush and add mode-switch coverage in BenchmarkView	2026-05-02 19:35:02 -07:00
pyr0ball	d0ba75b995	feat: extract CompareView at /eval/compare; remove Compare tab from BenchmarkView	2026-05-02 18:03:13 -07:00
pyr0ball	a134af8b7b	feat: add DashboardView with flywheel stage cards and CTA nudges	2026-05-02 16:50:24 -07:00
pyr0ball	6ef6f06023	feat: restructure AppSidebar into two-domain nav with section headers and flywheel signal badges	2026-05-02 13:52:45 -07:00
pyr0ball	5bdb095235	feat: restructure router into /data/* /eval/* /train/* domains with backward-compat redirects - Export named `routes` array from router/index.ts for testability - Move label/fetch/corrections/imitate under /data/* namespace - Move benchmark/compare under /eval/* namespace - Add /train/jobs and /train/results under /train/* namespace - Add / -> DashboardView and /fleet -> ModelsView (replaces old / -> LabelView) - Add backward-compat redirects for all old flat paths (/benchmark, /models, /stats, /label, /fetch, /corrections, /imitate) - Add stub views for DashboardView, CompareView, TrainJobsView, TrainResultsView (implemented in later tasks) - Add router.test.ts: 16 tests covering route structure and redirect targets	2026-05-02 13:00:04 -07:00
pyr0ball	0904967320	feat: slim api.py to factory-only; all domain routes in dedicated modules Replace 149-line api.py (with inline helpers, JSONL utilities, and ad-hoc router registrations) with a 57-line pure factory. All business logic was already extracted to domain modules in B1-B7; this removes the dead code and adds the /api/corrections/* prefix alongside the /api/sft/* backward- compat alias. Smoke tests updated to cover the new /api/corrections/ingest and /api/dashboard routes.	2026-05-02 09:55:58 -07:00
pyr0ball	8fda821e15	feat: add POST /ingest endpoint to corrections API with Bearer auth Adds IngestRequest model and POST /api/sft/ingest route to app/data/corrections.py. Sibling CF products (Peregrine, Kiwi, etc.) can push pre-approved corrections via Bearer token auth (AVOCET_INGESTION_SECRET). Records land as status=approved in both sft_candidates.jsonl and sft_approved.jsonl immediately. 7 tests in tests/test_data_corrections.py cover 503 (secret unset), 401 (missing/malformed header), 403 (wrong secret), happy-path writes to both files, and optional label field.	2026-05-02 09:07:10 -07:00
pyr0ball	0853ed7d56	fix: add logger.warning to silent except blocks in dashboard._find_latest_eval	2026-05-01 23:36:19 -07:00
pyr0ball	aa742bcfc0	feat: add GET /api/dashboard flywheel aggregate endpoint	2026-05-01 23:30:04 -07:00
pyr0ball	32d3436bbd	fix: path traversal guard, python_bin config, completed_at on Popen failure	2026-05-01 23:24:00 -07:00
pyr0ball	766fbafa02	feat: build SQLite-backed train job queue in app/train/train.py Replaces the ad-hoc _running_procs dict in api.py with a persistent, inspectable SQLite job queue. Removes old /api/finetune/* routes and _best_cuda_device from api.py. Adds /api/train/* routes (list, create, get, cancel, run SSE, results). 16 new tests all passing.	2026-05-01 23:05:11 -07:00
pyr0ball	d432026fd7	fix: restore real plans_bench.py (was accidentally stubbed)	2026-05-01 22:25:22 -07:00
pyr0ball	bccb385f61	feat: build app/eval/cforch.py aggregating eval benchmark routers	2026-05-01 22:23:06 -07:00
pyr0ball	d74ad3f972	feat: move imitate API into app/data/imitate.py	2026-05-01 22:12:19 -07:00
pyr0ball	99ea39fe38	feat: move SFT corrections API into app/data/corrections.py	2026-05-01 22:02:22 -07:00
pyr0ball	2054866ff1	feat: extract fetch routes and IMAP helpers into app/data/fetch.py	2026-05-01 21:57:31 -07:00
pyr0ball	cbec776ef1	fix: restore ensure_ascii=False in utils jsonl helpers; remove dead _last_action from api.py	2026-05-01 20:59:44 -07:00
pyr0ball	167d7351e3	feat: extract label queue API into app/data/label.py	2026-05-01 18:48:14 -07:00
pyr0ball	6689ff07b1	chore: gitignore .worktrees/ directory	2026-05-01 12:25:23 -07:00
pyr0ball	0745bc3f70	refactor: import detect_byok from cf-core, remove local copy	2026-04-25 16:45:47 -07:00
pyr0ball	2891606765	feat(cloud_session): add session resolution + forward user_id to cf-orch imitate app/cloud_session.py: - Thin wrapper around cf_core.cloud_session.CloudSessionFactory - BYOK detection reads ~/.config/circuitforge/llm.yaml (same path as other products) - get_session: FastAPI dependency, returns CloudUser (user_id, tier, has_byok) - require_tier: dependency factory for tier-gated routes app/imitate.py: - _run_cftext gains user_id: str \| None param; non-None values included in the cf-orch ServiceAllocateRequest so premium users get their custom models - run_imitate injects session via Depends(_get_imitate_session); extracts user_id, filters out local/anon sessions (they get the shared catalog), passes real cloud user_id to the ThreadPoolExecutor fanout - _get_imitate_session wraps get_session with a try/except so imitate keeps working in envs where cloud_session deps aren't installed	2026-04-24 16:41:45 -07:00
pyr0ball	5a0ba92fc6	chore: add README + gather_corpus.py script	2026-04-24 15:29:26 -07:00
pyr0ball	ea3da701c6	feat(models): extended model registry + manage.sh benchmark subcommands - app/models.py: add StyleModel and VoiceModel entries; expand cf-text and benchmark model metadata (vram_mb, description, tags) - tests/test_models.py: coverage for new model types and registry helpers - ModelsView.vue: updated model browser with style/voice filter tabs - manage.sh: add benchmark-style and benchmark-voice subcommands - config/label_tool.yaml.example: add style + voice benchmark config stubs - web/.gitignore: add node_modules and dist entries	2026-04-24 14:56:24 -07:00
pyr0ball	ddb56efb89	refactor(bench): extract benchmark tabs — classifier, compare, llm-eval, style, voice - BenchmarkView.vue: convert from monolithic view to tabbed shell; each tab is now its own component (ClassifierTab, CompareTab, LlmEvalTab, StyleTab, VoiceTab) - StyleTab + VoiceTab: new benchmark modes for style and voice model evaluation - app/style.py: FastAPI router for style imitation benchmarks - app/voice.py: FastAPI router for voice benchmark endpoints - scripts/benchmark_style.py + benchmark_voice.py: headless runner scripts	2026-04-24 14:56:17 -07:00
pyr0ball	cc24cd0d7d	feat(imitate): parallel cf-text fanout workers + signal-based cold-start detection Backend: - Run all cf-text model allocations concurrently via ThreadPoolExecutor + as_completed - Announce model_start events upfront so the UI can show loading states immediately - Replace timer-based startup polling with coordinator state signals: waits for state=="running" (success) or state=="stopped" (fail-fast) on the matching node/gpu instance; falls back to health poll after 6 consecutive probe misses - Add /api/cforch/catalog endpoint: fetches live cf-text model list from cf-orch, filtering out proxy entries (ollama://, vllm://, http://) so only loadable models are returned Frontend (ImitateView.vue): - Show per-model loading spinners as results arrive via SSE stream - Display cold-start badge when coordinator signals the model was freshly loaded	2026-04-24 14:56:09 -07:00
pyr0ball	e6b64d6efe	fix: imitate extractor + health_path — support CF cloud API shapes - _extract_sample: add saved_searches, entries, calls, records as recognized list-wrapper keys (snipe/osprey response shapes) - _is_online: accept health_path param (default /api/health) so products using /api/v1/health/ (kiwi) report correctly - products endpoint: pass health_path from config into _is_online	2026-04-09 20:24:26 -07:00
pyr0ball	fee0cdb4a8	Merge pull request 'feat: Imitate tab — pull CF product samples, compare LLM responses' (#23 ) from feat/imitate into main	2026-04-09 20:13:20 -07:00
pyr0ball	3299c0e23a	feat: Imitate tab — pull CF product samples, compare LLM responses Backend (app/imitate.py): - GET /api/imitate/products — reads imitate: config, checks online status - GET /api/imitate/products/{id}/sample — fetches real item from product API - GET /api/imitate/run (SSE) — streams ollama responses for selected models - POST /api/imitate/push-corrections — queues results in SFT corrections JSONL Frontend (ImitateView.vue): - Step 1: product picker grid (online/offline status, icon from config) - Step 2: raw sample preview + editable prompt textarea - Step 3: ollama model multi-select, temperature slider, SSE run with live log - Step 4: response cards side by side, push to Corrections button Wiring: - app/api.py: include imitate_router at /api/imitate - web/src/router: /imitate route + lazy import - AppSidebar: Imitate nav entry (mirror icon) - config/label_tool.yaml.example: imitate: section with peregrine example - 16 unit tests (100% passing) Also: BenchmarkView.vue Compare panel — side-by-side run diff for bench results	2026-04-09 20:12:57 -07:00
pyr0ball	dc246df42d	test: fix test_tasks_parses_yaml for TaskEntry schema TaskEntry now includes prompt/system fields (default ""). Switch from exact dict comparison to field-by-field assertions so the test is forward-compatible with optional schema additions.	2026-04-09 20:11:01 -07:00
pyr0ball	7a392df492	Merge pull request 'feat: env var LLM config, cf-orch coordinator auth, SFT default bench path (#10 , #14 )' (#22 ) from feat/env-config-sft-import into main	2026-04-09 12:30:56 -07:00
pyr0ball	891142570b	feat(#14 ): default bench_results_dir + testability seam - sft.py: _DEFAULT_BENCH_RESULTS_DIR set to circuitforge-orch bench results path; set_default_bench_results_dir() seam for test isolation - test fixture resets default to tmp_path to avoid real-fs interference - 136 tests passing Closes #14	2026-04-09 12:28:38 -07:00
pyr0ball	a271278dc9	feat(#10 ): env var LLM config + cf-orch coordinator auth - _load_cforch_config() falls back to CF_ORCH_URL / CF_LICENSE_KEY / OLLAMA_HOST / OLLAMA_MODEL env vars when label_tool.yaml cforch: key is absent or empty (yaml wins when both present) - CF_LICENSE_KEY forwarded to benchmark subprocess env so cf-orch agent can authenticate without it appearing in command args - GET /api/cforch/config endpoint — returns resolved connection state; redacts license key (returns license_key_set bool only) - SettingsView: connection status pill (cf-orch / Ollama / unconfigured) loaded from /api/cforch/config on mount; shows env vs yaml source - .env.example documenting all relevant vars - config/label_tool.yaml.example: full cforch: section with all keys - environment.yml: add circuitforge-core>=0.9.0 dependency - .gitignore: add .env - 4 new tests (17 total in test_cforch.py); 136 passing overall Closes #10	2026-04-09 12:26:44 -07:00
pyr0ball	dffb1d0d7a	feat: cf-orch LLM benchmark integration (Phase 1) Backend (app/cforch.py — new APIRouter at /api/cforch): - GET /tasks — reads bench_tasks.yaml, returns tasks + deduplicated types - GET /models — reads bench_models.yaml, returns model list with service/tags - GET /run — SSE endpoint; spawns cf-orch benchmark.py subprocess with --filter-tasks, --filter-tags, --coordinator, --ollama-url; strips ANSI codes; emits progress/result/complete/error events; 409 guard on concurrency - GET /results — returns latest bench_results/*/summary.json; 404 if none - POST /cancel — terminates running benchmark subprocess - All paths configurable via label_tool.yaml cforch: section - 13 tests; follows sft.py/models.py testability seam pattern Frontend: - BenchmarkView: mode toggle (Classifier / LLM Eval); LLM Eval panel with task picker (by type, select-all + indeterminate), model picker (by service), SSE run log, results table with best-per-column highlighting - StatsView: LLM Benchmark section showing quality_by_task_type table across models; hidden when no results; fetches /api/cforch/results on mount SFT candidate pipeline: cf-orch runs that produce sft_candidates.jsonl are auto-discovered by the existing bench_results_dir config in sft.py — no additional wiring needed.	2026-04-09 10:46:06 -07:00
pyr0ball	ce12b29c94	feat: model compatibility warning on HF lookup - GET /api/models/lookup now returns compatible: bool and warning: str\|null - compatible=false + warning when pipeline_tag is absent (no task tag on HF) or present but not in the supported adapter map - Warning message names the unsupported pipeline_tag and lists supported types - ModelsView: yellow compat-warning banner below preview description; Add button relabels to "Add anyway" with muted styling when incompatible - test_models: accept 405 for path-traversal DELETE tests (StaticFiles mount returns 405 for non-GET methods when web/dist exists)	2026-04-09 09:48:55 -07:00
pyr0ball	49ec85706c	Merge pull request 'feat: benchmark model picker, category grouping, stats benchmark results' (#20 ) from feat/benchmark-model-picker into main	2026-04-08 23:07:10 -07:00
pyr0ball	478a47f6e0	Merge pull request 'feat: HuggingFace model management tab' (#19 ) from feat/hf-model-queue into main	2026-04-08 23:06:54 -07:00
pyr0ball	7c304ebc45	feat: benchmark model picker, category grouping, stats benchmark results Backend (app/api.py): - GET /api/benchmark/models — returns installed models grouped by adapter type (ZeroShotAdapter, RerankerAdapter, GenerationAdapter, Unknown); reads _MODELS_DIR via app.models so test overrides are respected - GET /api/benchmark/run — add model_names query param (comma-separated); when set, passes --models <names...> to benchmark_classifier.py - GET /api/stats — add benchmark_results field from benchmark_results.json Frontend: - BenchmarkView: collapsible Model Selection panel with per-category checkboxes, select-all per category (supports indeterminate state), collapsed summary badge ("All models (N)" or "N of M selected"); model_names only sent when a strict subset is selected - StatsView: Benchmark Results table (accuracy, macro_f1, weighted_f1) with best-model highlighting per metric; hidden when no results exist	2026-04-08 23:03:56 -07:00
pyr0ball	b6b3d2c390	feat: HuggingFace model management tab - New /api/models router: HF lookup, approval queue (JSONL persistence), SSE download progress via snapshot_download(), installed model listing, path-traversal-safe DELETE - pipeline_tag → adapter type mapping (zero-shot-classification, sentence-similarity, text-generation) - 27 tests covering all endpoints, duplicate detection, path traversal - ModelsView.vue: HF lookup + add, approval queue, live download progress bars via SSE, installed model table with delete - Sidebar entry (🤗 Models) between Benchmark and Corrections	2026-04-08 22:32:35 -07:00
pyr0ball	a7cb3ae62a	Merge pull request 'feat: SFT failure_category — classify why a model response was wrong' (#17 ) from feat/sft-failure-category into main	2026-04-08 22:19:20 -07:00
pyr0ball	c5eaacc767	Merge pull request 'feat: Corrections tab — SFT candidate import, review, and JSONL export' (#15 ) from feat/sft-corrections into main	2026-04-08 22:19:01 -07:00
pyr0ball	9633d9a535	feat: add failure_category field to SFT corrections (#16 ) Adds optional failure_category to SubmitRequest and candidate records so reviewers can classify why a model response was wrong, not just what to do with it. Enables the fine-tune harness to filter training data by failure type (e.g. exclude scoring artifacts, train only on genuine wrong answers). Taxonomy: scoring_artifact \| style_violation \| partial_answer \| wrong_answer \| format_error \| hallucination - app/sft.py: FailureCategory Literal type; SubmitRequest.failure_category; stored on candidate record in POST /submit correct branch - tests/test_sft.py: 3 new tests (stores value, null round-trip, 422 on invalid) - stores/sft.ts: SftFailureCategory type exported; SftQueueItem + SftLastAction updated; setLastAction accepts optional category param - SftCard.vue: chip-group selector shown during correct/discard/flag flow; two-step confirm for discard/flag reveals chips before emitting; category forwarded in all emit payloads - CorrectionsView.vue: handleCorrect/Discard/Flag accept and forward category to POST /api/sft/submit body and store.setLastAction - SftCard.test.ts: 11 new tests covering chip visibility, selection, single-active enforcement, pending-action flow, emit payloads, cancel	2026-04-08 22:10:26 -07:00
pyr0ball	f17aae3bd2	feat: add dev command for hot-reload (uvicorn --reload + Vite HMR) - manage.sh: dev command starts uvicorn --reload on :8503 and Vite dev server (auto-port from 5173); kills API on EXIT/INT/TERM trap - manage.sh: ENV_UI defaults to 'cf' env (overridable via AVOCET_ENV) - vite.config.ts: add server.proxy to forward /api to :8503 so Vite dev server can reach the backend without CORS issues	2026-04-08 19:43:40 -07:00
pyr0ball	09e334359f	fix: pessimistic submit/undo, config null-safe, load config on mount - sft.py GET /config: use `or {}` guard so `sft: ~` (null YAML) doesn't return None instead of the default empty config - CorrectionsView: convert handleCorrect/Discard/Flag and handleUndo from optimistic to pessimistic — queue mutation only happens after server confirms; failures leave item in queue so user can retry cleanly - SettingsView: call loadSftConfig() on mount so saved bench_results_dir is populated instead of always starting empty	2026-04-08 18:49:38 -07:00
pyr0ball	353d0a47a0	feat: Corrections tab — router, sidebar, settings, SFT config endpoints - Add /corrections route to Vue router (lazy-loaded CorrectionsView) - Add Corrections nav item (✍️) to AppSidebar after Benchmark - Add cf-orch Integration section to SettingsView with bench_results_dir field, run scanner, and per-run import table - Add GET /api/sft/config and POST /api/sft/config endpoints to app/sft.py	2026-04-08 18:29:22 -07:00

1 2 3 4

156 commits