Commit graph

89 commits

Author SHA1 Message Date
391ebb3cd1 feat(recipe-scan): labeling UI for Kiwi vision training pipeline (closes #65)
- POST /api/recipe-scan/import — bulk ingest from Kiwi scanner pipeline, idempotent by item id
- GET /api/recipe-scan/next — oldest-first pending item for review
- POST /api/recipe-scan/items/{id}/approve|edit|reject — label actions
- GET /api/recipe-scan/stats — counts by status and modality
- GET /api/recipe-scan/export — JSONL training pairs (messages chat format, Option B: correction prompt + extracted draft → corrected ground truth)
- GET /api/recipe-scan/image — path-traversal-safe image serving from /Library/Assets/kiwi/
- SQLite at data/recipe_scan.db with WAL mode; separate from corpus.db lifecycle
- set_db_path() testability seam; 18 tests, all passing
- RecipeScanView.vue: two-column review UI (image left, JSON diff right), keyboard shortcuts A/E/R, toast feedback, stats header, export download
- Route /data/recipe-scan and sidebar nav entry added
2026-05-17 12:22:15 -07:00
9bb88b168f feat(corpus): pipeline log ingest from shared dir (closes #67)
Pull-side companion to kiwi#141. Ingests structured JSONL pipeline logs
from /Library/Assets/logs/pipeline/ into the log corpus for Turnstone
logreading model training.

- app/data/log_corpus.py: add ingested_pipeline_files tracking table,
  _pipeline_ingest_dir() config helper, _ingest_one_file() parser, and
  POST /api/corpus/pipeline-ingest endpoint
- source_host = "pipeline_scrape"; source_id from logger field; extra
  dict stored as matched_patterns; batch_type = "pipeline_log"
- Idempotent by filename: skips files already in ingested_pipeline_files
- config/label_tool.yaml.example: add corpus section with pipeline_ingest_dir
  and push sources comment block
- tests/test_log_corpus.py: 8 new tests covering ingest, idempotency,
  non-JSONL filtering, malformed line resilience, incremental runs
2026-05-17 11:28:33 -07:00
13ca082a43 chore(models): refresh model registries with current cluster catalog
Replace stale llama/mistral/phi model refs with models active on the
cluster: deepseek-r1 (1.5b, 7b-4bit, 0528-qwen3-8b-gguf), granite-4.1-8b,
qwen2.5 (3b, 7b), capybarahermes-2.5-mistral-7b, darwin-9b-opus. Update
benchmark_plans.py doc examples to match.
2026-05-17 11:24:03 -07:00
d416ef8aa4 feat(imitate): task-model assignment routing via cf-orch
Add _resolve_task_model() helper that looks up a product.task assignment
from the coordinator and resolves its service_type from the model registry.
Add task_ids param to run_imitate() (comma-separated "product/task" strings)
so the imitate harness can dispatch to models chosen by the assignment layer
rather than requiring explicit model IDs.
2026-05-17 11:23:55 -07:00
79b9ccbd3d feat(fleet): profile editor, assignments tab, node management polish
Backend:
- app/nodes.py: fix coordinator response envelope (.get("nodes"/"services"))
- app/nodes.py: add PUT /nodes/{id}/profile (atomic YAML write + reload)
- app/nodes.py: add POST /nodes/{id}/profile/generate (coordinator-seeded skeleton)
- tests/test_nodes.py: fix mock envelopes; add deploy model + profile tests

Frontend:
- NodeManagementView: tab bar switching nodes / assignments panels
- AssignmentsTab: full product.task → model routing UI (add/edit/delete)
- ProfileEditorPanel: full YAML profile editor with GPU + service sections
- CatalogEntryFormModal: add/edit model catalog entries per service
- ServiceFormModal: add/edit service config blocks
- NodeCard, GpuRow, ServiceBadge, OllamaModelPanel, HfNodeModelPanel: polish pass
- ModelsView: model download additions
- nodes.ts: extend types for full profile editing (ServiceManaged, CatalogEntryFull)
2026-05-17 11:23:47 -07:00
e93afec271 fix(tests): resolve 5 pre-existing test failures on main (closes #56)
- app/models.py: add set_cf_text_models_dir() testability seam
- tests/test_models.py: redirect _CF_TEXT_MODELS_DIR in reset_models_globals
  fixture so list_installed() count tests are not polluted by real NFS models
- app/cforch.py: fix get_results() return type annotation list → dict
- tests/test_cforch.py: give _BENCH_RUNNING=True test a mock proc with
  poll()=None so the stale-flag check correctly returns 409; patch
  _select.select in streaming tests (select requires fileno(), iter() doesn't)
- tests/test_finetune.py: mark GPU integration test @pytest.mark.gpu
- pytest.ini: register gpu and slow markers
2026-05-17 11:21:58 -07:00
2b990a603a feat: log corpus receiver — accept Turnstone push batches and label for logreading fine-tune
Adds corpus.db (corpus_sources, corpus_batches, corpus_entries), a FastAPI router
at /api/corpus with receive/label/skip/stats/export endpoints, and seeds consent
tokens for xanderland + orchard nodes from label_tool.yaml. PII flag excludes
entries from JSONL export. Closes avocet#61.
2026-05-11 17:07:54 -07:00
9fdaeeb3d6 feat: multi-bench dashboard, API path migration, benchmark reliability fixes
- dashboard: eval card now shows last run + score for all bench types
  (classifier, LLM, style, plans) via new _get_recent_bench_runs()
- dashboard: skip cforch LLM-bench list summaries when scanning for
  classifier best_macro_f1 (fixes _find_latest_classifier_bench)
- cforch: stale _BENCH_RUNNING flag now auto-resets if process exited;
  idle timeout (120s via select) kills hung benchmark if node crashes
- api: add /api/finetune/{run,cancel} backward-compat shims while
  ClassifierTab fine-tune section is migrated to TrainJobsView
- ClassifierTab: migrate all /api/benchmark/* paths to /api/cforch/*;
  fix null-safety on results.models access; load fine-tuned models from
  /api/train/results instead of /api/finetune/status
- CompareTab: extend model picker to include vllm + cf-text alongside
  ollama, grouped by service; pre-select all LLM_SERVICES on load
- LlmEvalTab: null-safety on quality_by_task_type lookups
- models: AVOCET_MODELS_DIR env var overrides default models/ path
2026-05-11 09:05:12 -07:00
1ad7ba322a feat: add embed-bench rate and export endpoints 2026-05-11 08:07:17 -07:00
32e3b2a0dd feat: add embed-bench run endpoint with SSE streaming 2026-05-07 09:05:34 -07:00
12117ad0c6 fix: narrow exception types in get_models, fix patch targets in tests, add type annotation 2026-05-07 09:03:37 -07:00
5939c67b9f feat: add embed-bench models endpoint and register router in aggregator 2026-05-07 09:01:25 -07:00
5ea77da97d fix: add _cosine dimension guard, fix return type annotation, add zero-vector test 2026-05-07 08:59:24 -07:00
276bdadb92 feat: add embed_bench module scaffold and _cosine() helper 2026-05-07 08:37:18 -07:00
1cd9c5d455 fix: move json import to module scope in nodes.py 2026-05-05 21:01:32 -07:00
5702a7190b feat: add Ollama list/pull-SSE/delete endpoints 2026-05-05 20:41:29 -07:00
55b017ba3b fix: log coordinator reload failures in update_gpu_services
- Replace bare `except Exception: pass` with `except Exception as exc` and a
  logger.warning call that surfaces node_id and the exception for diagnostics.
- Move `import os as _os` from mid-file (between test functions) to the
  top-level import block to satisfy PEP 8 and linter expectations.
2026-05-05 20:36:08 -07:00
fd8cb622a1 feat: add GET /api/nodes-mgmt/nodes/{node_id}/profile endpoint 2026-05-05 20:31:22 -07:00
47cb9f661f fix: narrow exception handling in list_nodes, move mock imports to top
- Remove redundant httpx.ConnectError from nodes except clause (it's a
  subclass of HTTPError so the tuple catch was redundant)
- Narrow services except clause from bare Exception to httpx.HTTPError,
  add logger.warning with coordinator_url for debuggability
- Move `from unittest.mock import MagicMock, patch` from mid-file to
  the top-of-file import block with the other stdlib/third-party imports
2026-05-05 20:18:50 -07:00
c2de9e53da feat: implement GET /api/nodes-mgmt/nodes with coordinator proxy and profile merge 2026-05-05 20:16:06 -07:00
c039ea4698 fix: remove unused imports and em dash in nodes.py scaffold
- Drop unused StreamingResponse import from app/nodes.py (will be
  re-added in Task 2 when the SSE endpoint is implemented)
- Replace em dash with colon in _get_ollama_url HTTPException detail
- Remove unused os and unittest.mock imports from test_nodes.py
  (mock imports will return in Task 2 tests)
2026-05-05 19:59:32 -07:00
95afddb772 feat: add nodes.py scaffold with set_config_dir and router mount
- Create app/nodes.py with _CONFIG_DIR testability seam, _load_config,
  _profiles_dir, _profile_path, _load_profile, _get_ollama_url helpers,
  and stub list_nodes endpoint returning [] when no coordinator_url is set
- Mount nodes router at /api/nodes-mgmt in app/api.py
- Add profiles_dir comment to config/label_tool.yaml.example cforch section
- Create tests/test_nodes.py with autouse fixture and two passing tests
2026-05-05 19:35:28 -07:00
bce932461a feat: plans benchmark harness — model scoring for CF planning prompts
Adds benchmark_plans.py script, plans_bench API router, PlansBenchTab Vue
component, and registers /api/plans-bench in api.py. Also extends models
registry (cf-text catalog integration), cforch client, LlmEvalTab, and
ModelsView with cf-orch fleet support. Wires Planning mode into BenchmarkView.
2026-05-02 23:36:04 -07:00
e11db5ccd9 fix: align train job/results API envelope, config_json key, progress SSE, dashboard model_key
- GET /api/train/jobs now returns {"jobs":[...]} instead of bare array
- GET /api/train/results now returns {"results":[...]} instead of bare array
- POST /api/train/jobs body key renamed config -> config_json to match Pydantic model
- SSE log handler now handles 'progress' event type (backend never emits 'log')
- Dashboard _get_active_jobs() adds model_key to SELECT and return dict
- corrections.py docstring updated: both /api/corrections and /api/sft prefixes noted
- test_train.py assertions updated to unwrap new envelope shapes
2026-05-02 21:22:18 -07:00
0904967320 feat: slim api.py to factory-only; all domain routes in dedicated modules
Replace 149-line api.py (with inline helpers, JSONL utilities, and ad-hoc
router registrations) with a 57-line pure factory. All business logic was
already extracted to domain modules in B1-B7; this removes the dead code
and adds the /api/corrections/* prefix alongside the /api/sft/* backward-
compat alias. Smoke tests updated to cover the new /api/corrections/ingest
and /api/dashboard routes.
2026-05-02 09:55:58 -07:00
8fda821e15 feat: add POST /ingest endpoint to corrections API with Bearer auth
Adds IngestRequest model and POST /api/sft/ingest route to
app/data/corrections.py. Sibling CF products (Peregrine, Kiwi, etc.)
can push pre-approved corrections via Bearer token auth
(AVOCET_INGESTION_SECRET). Records land as status=approved in both
sft_candidates.jsonl and sft_approved.jsonl immediately.

7 tests in tests/test_data_corrections.py cover 503 (secret unset),
401 (missing/malformed header), 403 (wrong secret), happy-path writes
to both files, and optional label field.
2026-05-02 09:07:10 -07:00
0853ed7d56 fix: add logger.warning to silent except blocks in dashboard._find_latest_eval 2026-05-01 23:36:19 -07:00
aa742bcfc0 feat: add GET /api/dashboard flywheel aggregate endpoint 2026-05-01 23:30:04 -07:00
32d3436bbd fix: path traversal guard, python_bin config, completed_at on Popen failure 2026-05-01 23:24:00 -07:00
766fbafa02 feat: build SQLite-backed train job queue in app/train/train.py
Replaces the ad-hoc _running_procs dict in api.py with a persistent,
inspectable SQLite job queue. Removes old /api/finetune/* routes and
_best_cuda_device from api.py. Adds /api/train/* routes (list, create,
get, cancel, run SSE, results). 16 new tests all passing.
2026-05-01 23:05:11 -07:00
d432026fd7 fix: restore real plans_bench.py (was accidentally stubbed) 2026-05-01 22:25:22 -07:00
bccb385f61 feat: build app/eval/cforch.py aggregating eval benchmark routers 2026-05-01 22:23:06 -07:00
d74ad3f972 feat: move imitate API into app/data/imitate.py 2026-05-01 22:12:19 -07:00
99ea39fe38 feat: move SFT corrections API into app/data/corrections.py 2026-05-01 22:02:22 -07:00
2054866ff1 feat: extract fetch routes and IMAP helpers into app/data/fetch.py 2026-05-01 21:57:31 -07:00
cbec776ef1 fix: restore ensure_ascii=False in utils jsonl helpers; remove dead _last_action from api.py 2026-05-01 20:59:44 -07:00
167d7351e3 feat: extract label queue API into app/data/label.py 2026-05-01 18:48:14 -07:00
0745bc3f70 refactor: import detect_byok from cf-core, remove local copy 2026-04-25 16:45:47 -07:00
2891606765 feat(cloud_session): add session resolution + forward user_id to cf-orch imitate
app/cloud_session.py:
- Thin wrapper around cf_core.cloud_session.CloudSessionFactory
- BYOK detection reads ~/.config/circuitforge/llm.yaml (same path as other products)
- get_session: FastAPI dependency, returns CloudUser (user_id, tier, has_byok)
- require_tier: dependency factory for tier-gated routes

app/imitate.py:
- _run_cftext gains user_id: str | None param; non-None values included in
  the cf-orch ServiceAllocateRequest so premium users get their custom models
- run_imitate injects session via Depends(_get_imitate_session); extracts user_id,
  filters out local/anon sessions (they get the shared catalog), passes real
  cloud user_id to the ThreadPoolExecutor fanout
- _get_imitate_session wraps get_session with a try/except so imitate keeps
  working in envs where cloud_session deps aren't installed
2026-04-24 16:41:45 -07:00
ea3da701c6 feat(models): extended model registry + manage.sh benchmark subcommands
- app/models.py: add StyleModel and VoiceModel entries; expand cf-text and
  benchmark model metadata (vram_mb, description, tags)
- tests/test_models.py: coverage for new model types and registry helpers
- ModelsView.vue: updated model browser with style/voice filter tabs
- manage.sh: add benchmark-style and benchmark-voice subcommands
- config/label_tool.yaml.example: add style + voice benchmark config stubs
- web/.gitignore: add node_modules and dist entries
2026-04-24 14:56:24 -07:00
ddb56efb89 refactor(bench): extract benchmark tabs — classifier, compare, llm-eval, style, voice
- BenchmarkView.vue: convert from monolithic view to tabbed shell; each tab is
  now its own component (ClassifierTab, CompareTab, LlmEvalTab, StyleTab, VoiceTab)
- StyleTab + VoiceTab: new benchmark modes for style and voice model evaluation
- app/style.py: FastAPI router for style imitation benchmarks
- app/voice.py: FastAPI router for voice benchmark endpoints
- scripts/benchmark_style.py + benchmark_voice.py: headless runner scripts
2026-04-24 14:56:17 -07:00
cc24cd0d7d feat(imitate): parallel cf-text fanout workers + signal-based cold-start detection
Backend:
- Run all cf-text model allocations concurrently via ThreadPoolExecutor + as_completed
- Announce model_start events upfront so the UI can show loading states immediately
- Replace timer-based startup polling with coordinator state signals: waits for
  state=="running" (success) or state=="stopped" (fail-fast) on the matching
  node/gpu instance; falls back to health poll after 6 consecutive probe misses
- Add /api/cforch/catalog endpoint: fetches live cf-text model list from cf-orch,
  filtering out proxy entries (ollama://, vllm://, http://) so only loadable models
  are returned

Frontend (ImitateView.vue):
- Show per-model loading spinners as results arrive via SSE stream
- Display cold-start badge when coordinator signals the model was freshly loaded
2026-04-24 14:56:09 -07:00
e6b64d6efe fix: imitate extractor + health_path — support CF cloud API shapes
- _extract_sample: add saved_searches, entries, calls, records as
  recognized list-wrapper keys (snipe/osprey response shapes)
- _is_online: accept health_path param (default /api/health) so
  products using /api/v1/health/ (kiwi) report correctly
- products endpoint: pass health_path from config into _is_online
2026-04-09 20:24:26 -07:00
3299c0e23a feat: Imitate tab — pull CF product samples, compare LLM responses
Backend (app/imitate.py):
- GET /api/imitate/products — reads imitate: config, checks online status
- GET /api/imitate/products/{id}/sample — fetches real item from product API
- GET /api/imitate/run (SSE) — streams ollama responses for selected models
- POST /api/imitate/push-corrections — queues results in SFT corrections JSONL

Frontend (ImitateView.vue):
- Step 1: product picker grid (online/offline status, icon from config)
- Step 2: raw sample preview + editable prompt textarea
- Step 3: ollama model multi-select, temperature slider, SSE run with live log
- Step 4: response cards side by side, push to Corrections button

Wiring:
- app/api.py: include imitate_router at /api/imitate
- web/src/router: /imitate route + lazy import
- AppSidebar: Imitate nav entry (mirror icon)
- config/label_tool.yaml.example: imitate: section with peregrine example
- 16 unit tests (100% passing)

Also: BenchmarkView.vue Compare panel — side-by-side run diff for bench results
2026-04-09 20:12:57 -07:00
891142570b feat(#14): default bench_results_dir + testability seam
- sft.py: _DEFAULT_BENCH_RESULTS_DIR set to circuitforge-orch bench
  results path; set_default_bench_results_dir() seam for test isolation
- test fixture resets default to tmp_path to avoid real-fs interference
- 136 tests passing

Closes #14
2026-04-09 12:28:38 -07:00
a271278dc9 feat(#10): env var LLM config + cf-orch coordinator auth
- _load_cforch_config() falls back to CF_ORCH_URL / CF_LICENSE_KEY /
  OLLAMA_HOST / OLLAMA_MODEL env vars when label_tool.yaml cforch: key
  is absent or empty (yaml wins when both present)
- CF_LICENSE_KEY forwarded to benchmark subprocess env so cf-orch agent
  can authenticate without it appearing in command args
- GET /api/cforch/config endpoint — returns resolved connection state;
  redacts license key (returns license_key_set bool only)
- SettingsView: connection status pill (cf-orch / Ollama / unconfigured)
  loaded from /api/cforch/config on mount; shows env vs yaml source
- .env.example documenting all relevant vars
- config/label_tool.yaml.example: full cforch: section with all keys
- environment.yml: add circuitforge-core>=0.9.0 dependency
- .gitignore: add .env
- 4 new tests (17 total in test_cforch.py); 136 passing overall

Closes #10
2026-04-09 12:26:44 -07:00
dffb1d0d7a feat: cf-orch LLM benchmark integration (Phase 1)
Backend (app/cforch.py — new APIRouter at /api/cforch):
- GET /tasks — reads bench_tasks.yaml, returns tasks + deduplicated types
- GET /models — reads bench_models.yaml, returns model list with service/tags
- GET /run — SSE endpoint; spawns cf-orch benchmark.py subprocess with
  --filter-tasks, --filter-tags, --coordinator, --ollama-url; strips ANSI
  codes; emits progress/result/complete/error events; 409 guard on concurrency
- GET /results — returns latest bench_results/*/summary.json; 404 if none
- POST /cancel — terminates running benchmark subprocess
- All paths configurable via label_tool.yaml cforch: section
- 13 tests; follows sft.py/models.py testability seam pattern

Frontend:
- BenchmarkView: mode toggle (Classifier / LLM Eval); LLM Eval panel with
  task picker (by type, select-all + indeterminate), model picker (by service),
  SSE run log, results table with best-per-column highlighting
- StatsView: LLM Benchmark section showing quality_by_task_type table across
  models; hidden when no results; fetches /api/cforch/results on mount

SFT candidate pipeline: cf-orch runs that produce sft_candidates.jsonl are
auto-discovered by the existing bench_results_dir config in sft.py — no
additional wiring needed.
2026-04-09 10:46:06 -07:00
ce12b29c94 feat: model compatibility warning on HF lookup
- GET /api/models/lookup now returns compatible: bool and warning: str|null
- compatible=false + warning when pipeline_tag is absent (no task tag on HF)
  or present but not in the supported adapter map
- Warning message names the unsupported pipeline_tag and lists supported types
- ModelsView: yellow compat-warning banner below preview description;
  Add button relabels to "Add anyway" with muted styling when incompatible
- test_models: accept 405 for path-traversal DELETE tests (StaticFiles mount
  returns 405 for non-GET methods when web/dist exists)
2026-04-09 09:48:55 -07:00
7c304ebc45 feat: benchmark model picker, category grouping, stats benchmark results
Backend (app/api.py):
- GET /api/benchmark/models — returns installed models grouped by adapter
  type (ZeroShotAdapter, RerankerAdapter, GenerationAdapter, Unknown);
  reads _MODELS_DIR via app.models so test overrides are respected
- GET /api/benchmark/run — add model_names query param (comma-separated);
  when set, passes --models <names...> to benchmark_classifier.py
- GET /api/stats — add benchmark_results field from benchmark_results.json

Frontend:
- BenchmarkView: collapsible Model Selection panel with per-category
  checkboxes, select-all per category (supports indeterminate state),
  collapsed summary badge ("All models (N)" or "N of M selected");
  model_names only sent when a strict subset is selected
- StatsView: Benchmark Results table (accuracy, macro_f1, weighted_f1)
  with best-model highlighting per metric; hidden when no results exist
2026-04-08 23:03:56 -07:00
b6b3d2c390 feat: HuggingFace model management tab
- New /api/models router: HF lookup, approval queue (JSONL persistence),
  SSE download progress via snapshot_download(), installed model listing,
  path-traversal-safe DELETE
- pipeline_tag → adapter type mapping (zero-shot-classification,
  sentence-similarity, text-generation)
- 27 tests covering all endpoints, duplicate detection, path traversal
- ModelsView.vue: HF lookup + add, approval queue, live download progress
  bars via SSE, installed model table with delete
- Sidebar entry (🤗 Models) between Benchmark and Corrections
2026-04-08 22:32:35 -07:00