avocet

Author	SHA1	Message	Date
pyr0ball	3299c0e23a	feat: Imitate tab — pull CF product samples, compare LLM responses Backend (app/imitate.py): - GET /api/imitate/products — reads imitate: config, checks online status - GET /api/imitate/products/{id}/sample — fetches real item from product API - GET /api/imitate/run (SSE) — streams ollama responses for selected models - POST /api/imitate/push-corrections — queues results in SFT corrections JSONL Frontend (ImitateView.vue): - Step 1: product picker grid (online/offline status, icon from config) - Step 2: raw sample preview + editable prompt textarea - Step 3: ollama model multi-select, temperature slider, SSE run with live log - Step 4: response cards side by side, push to Corrections button Wiring: - app/api.py: include imitate_router at /api/imitate - web/src/router: /imitate route + lazy import - AppSidebar: Imitate nav entry (mirror icon) - config/label_tool.yaml.example: imitate: section with peregrine example - 16 unit tests (100% passing) Also: BenchmarkView.vue Compare panel — side-by-side run diff for bench results	2026-04-09 20:12:57 -07:00
pyr0ball	a271278dc9	feat(#10 ): env var LLM config + cf-orch coordinator auth - _load_cforch_config() falls back to CF_ORCH_URL / CF_LICENSE_KEY / OLLAMA_HOST / OLLAMA_MODEL env vars when label_tool.yaml cforch: key is absent or empty (yaml wins when both present) - CF_LICENSE_KEY forwarded to benchmark subprocess env so cf-orch agent can authenticate without it appearing in command args - GET /api/cforch/config endpoint — returns resolved connection state; redacts license key (returns license_key_set bool only) - SettingsView: connection status pill (cf-orch / Ollama / unconfigured) loaded from /api/cforch/config on mount; shows env vs yaml source - .env.example documenting all relevant vars - config/label_tool.yaml.example: full cforch: section with all keys - environment.yml: add circuitforge-core>=0.9.0 dependency - .gitignore: add .env - 4 new tests (17 total in test_cforch.py); 136 passing overall Closes #10	2026-04-09 12:26:44 -07:00
pyr0ball	dffb1d0d7a	feat: cf-orch LLM benchmark integration (Phase 1) Backend (app/cforch.py — new APIRouter at /api/cforch): - GET /tasks — reads bench_tasks.yaml, returns tasks + deduplicated types - GET /models — reads bench_models.yaml, returns model list with service/tags - GET /run — SSE endpoint; spawns cf-orch benchmark.py subprocess with --filter-tasks, --filter-tags, --coordinator, --ollama-url; strips ANSI codes; emits progress/result/complete/error events; 409 guard on concurrency - GET /results — returns latest bench_results/*/summary.json; 404 if none - POST /cancel — terminates running benchmark subprocess - All paths configurable via label_tool.yaml cforch: section - 13 tests; follows sft.py/models.py testability seam pattern Frontend: - BenchmarkView: mode toggle (Classifier / LLM Eval); LLM Eval panel with task picker (by type, select-all + indeterminate), model picker (by service), SSE run log, results table with best-per-column highlighting - StatsView: LLM Benchmark section showing quality_by_task_type table across models; hidden when no results; fetches /api/cforch/results on mount SFT candidate pipeline: cf-orch runs that produce sft_candidates.jsonl are auto-discovered by the existing bench_results_dir config in sft.py — no additional wiring needed.	2026-04-09 10:46:06 -07:00
pyr0ball	ce12b29c94	feat: model compatibility warning on HF lookup - GET /api/models/lookup now returns compatible: bool and warning: str\|null - compatible=false + warning when pipeline_tag is absent (no task tag on HF) or present but not in the supported adapter map - Warning message names the unsupported pipeline_tag and lists supported types - ModelsView: yellow compat-warning banner below preview description; Add button relabels to "Add anyway" with muted styling when incompatible - test_models: accept 405 for path-traversal DELETE tests (StaticFiles mount returns 405 for non-GET methods when web/dist exists)	2026-04-09 09:48:55 -07:00
pyr0ball	7c304ebc45	feat: benchmark model picker, category grouping, stats benchmark results Backend (app/api.py): - GET /api/benchmark/models — returns installed models grouped by adapter type (ZeroShotAdapter, RerankerAdapter, GenerationAdapter, Unknown); reads _MODELS_DIR via app.models so test overrides are respected - GET /api/benchmark/run — add model_names query param (comma-separated); when set, passes --models <names...> to benchmark_classifier.py - GET /api/stats — add benchmark_results field from benchmark_results.json Frontend: - BenchmarkView: collapsible Model Selection panel with per-category checkboxes, select-all per category (supports indeterminate state), collapsed summary badge ("All models (N)" or "N of M selected"); model_names only sent when a strict subset is selected - StatsView: Benchmark Results table (accuracy, macro_f1, weighted_f1) with best-model highlighting per metric; hidden when no results exist	2026-04-08 23:03:56 -07:00
pyr0ball	b6b3d2c390	feat: HuggingFace model management tab - New /api/models router: HF lookup, approval queue (JSONL persistence), SSE download progress via snapshot_download(), installed model listing, path-traversal-safe DELETE - pipeline_tag → adapter type mapping (zero-shot-classification, sentence-similarity, text-generation) - 27 tests covering all endpoints, duplicate detection, path traversal - ModelsView.vue: HF lookup + add, approval queue, live download progress bars via SSE, installed model table with delete - Sidebar entry (🤗 Models) between Benchmark and Corrections	2026-04-08 22:32:35 -07:00
pyr0ball	9633d9a535	feat: add failure_category field to SFT corrections (#16 ) Adds optional failure_category to SubmitRequest and candidate records so reviewers can classify why a model response was wrong, not just what to do with it. Enables the fine-tune harness to filter training data by failure type (e.g. exclude scoring artifacts, train only on genuine wrong answers). Taxonomy: scoring_artifact \| style_violation \| partial_answer \| wrong_answer \| format_error \| hallucination - app/sft.py: FailureCategory Literal type; SubmitRequest.failure_category; stored on candidate record in POST /submit correct branch - tests/test_sft.py: 3 new tests (stores value, null round-trip, 422 on invalid) - stores/sft.ts: SftFailureCategory type exported; SftQueueItem + SftLastAction updated; setLastAction accepts optional category param - SftCard.vue: chip-group selector shown during correct/discard/flag flow; two-step confirm for discard/flag reveals chips before emitting; category forwarded in all emit payloads - CorrectionsView.vue: handleCorrect/Discard/Flag accept and forward category to POST /api/sft/submit body and store.setLastAction - SftCard.test.ts: 11 new tests covering chip visibility, selection, single-active enforcement, pending-action flow, emit payloads, cancel	2026-04-08 22:10:26 -07:00
pyr0ball	f17aae3bd2	feat: add dev command for hot-reload (uvicorn --reload + Vite HMR) - manage.sh: dev command starts uvicorn --reload on :8503 and Vite dev server (auto-port from 5173); kills API on EXIT/INT/TERM trap - manage.sh: ENV_UI defaults to 'cf' env (overridable via AVOCET_ENV) - vite.config.ts: add server.proxy to forward /api to :8503 so Vite dev server can reach the backend without CORS issues	2026-04-08 19:43:40 -07:00
pyr0ball	09e334359f	fix: pessimistic submit/undo, config null-safe, load config on mount - sft.py GET /config: use `or {}` guard so `sft: ~` (null YAML) doesn't return None instead of the default empty config - CorrectionsView: convert handleCorrect/Discard/Flag and handleUndo from optimistic to pessimistic — queue mutation only happens after server confirms; failures leave item in queue so user can retry cleanly - SettingsView: call loadSftConfig() on mount so saved bench_results_dir is populated instead of always starting empty	2026-04-08 18:49:38 -07:00
pyr0ball	353d0a47a0	feat: Corrections tab — router, sidebar, settings, SFT config endpoints - Add /corrections route to Vue router (lazy-loaded CorrectionsView) - Add Corrections nav item (✍️) to AppSidebar after Benchmark - Add cf-orch Integration section to SettingsView with bench_results_dir field, run scanner, and per-run import table - Add GET /api/sft/config and POST /api/sft/config endpoints to app/sft.py	2026-04-08 18:29:22 -07:00
pyr0ball	e63d77127b	feat: CorrectionsView and useSftKeyboard composable	2026-04-08 15:26:13 -07:00
pyr0ball	03e5f9f9b4	fix: guard null failure_reason render, fix mid-quality test description - Add v-if guard on failure-reason <p> so null renders no element (not literal "null") - Clarify mid-quality test description: score is 0.4 to <0.7 (exclusive upper bound) - Add test: renders nothing for failure_reason when null (+1 → 14 SftCard tests)	2026-04-08 15:23:19 -07:00
pyr0ball	e16ea95dcc	fix: guard aria-describedby from rendering undefined string	2026-04-08 15:22:12 -07:00
pyr0ball	8873920b83	feat: SftCard — quality chip, prompt collapsible, action buttons, correction area slot	2026-04-08 15:19:37 -07:00
pyr0ball	2d939b77f9	feat: SftCorrectionArea — inline correction text area component	2026-04-08 15:16:45 -07:00
pyr0ball	137a9dbb8e	fix: nullable failure_reason, factory fixture for sft store tests	2026-04-08 15:14:29 -07:00
pyr0ball	9c11916d81	feat: useSftStore — SftQueueItem type and Pinia store	2026-04-08 15:11:17 -07:00
pyr0ball	0d252da2a0	feat(avocet): add cancel buttons for benchmark and fine-tune runs	2026-03-15 18:15:35 -07:00
pyr0ball	5d68b0706f	fix(avocet): use startsWith for error class in ft-log (consistent with benchmark log)	2026-03-15 16:14:47 -07:00
pyr0ball	65548f4ddb	feat(avocet): add fine-tune section and trained models badge row to BenchmarkView	2026-03-15 16:09:51 -07:00
pyr0ball	a53f3a7341	feat(avocet): benchmark UI, label fixes, BenchmarkView with charts and SSE run	2026-03-15 09:39:37 -07:00
pyr0ball	ce1b8c2215	fix(avocet): reset card element state when new item loads to clear previous animation inline styles	2026-03-08 07:44:02 -07:00
pyr0ball	f1933ab51c	feat(avocet): badge pop via Anime.js spring transition hook	2026-03-08 07:35:49 -07:00
pyr0ball	6a898bbdee	fix(avocet): constrain grid-active to 640px on wide viewports using left/right offsets	2026-03-08 07:26:46 -07:00
pyr0ball	efc2d33de2	feat(avocet): animate bucket grid rise with Anime.js spring	2026-03-08 07:17:56 -07:00
pyr0ball	5c6aa02998	fix(avocet): restore drag aura color feedback via updateAura in useCardAnimation	2026-03-08 07:14:24 -07:00
pyr0ball	9302644259	feat(avocet): wire Anime.js card animation into EmailCardStack Replace CSS keyframe dismiss classes and inline cardStyle/deltaX/deltaY with useCardAnimation composable — pickup/setDragPosition/snapBack/animateDismiss are now called from pointer event handlers and a dismissType watcher.	2026-03-08 07:07:58 -07:00
pyr0ball	b68c176278	feat(avocet): add useCardAnimation composable with Anime.js TDD: 8 tests written first (red), then composable implemented (green). Adapts to Anime.js v4 API: 2-arg animate(), object-param spring(), utils.set() for instant drag-position updates without cache desync.	2026-03-08 06:52:27 -07:00
pyr0ball	d02c937ff1	feat(avocet): add animejs v4 dependency	2026-03-08 06:47:50 -07:00
pyr0ball	1a95d4d580	fix(avocet): ball escapes overflow clip, floats above header/footer with z-index + transparency	2026-03-05 15:14:24 -08:00
pyr0ball	351703d9db	fix(avocet): grid pinned to viewport with height 100dvh + card ball floats above finger at scale 0.55	2026-03-05 15:07:58 -08:00
pyr0ball	d7cd01a8da	feat(avocet): add velocity-based fling detection to toss gesture (option B: speed + alignment)	2026-03-05 14:55:10 -08:00
pyr0ball	8947dc5d05	feat(avocet): add toss-zone overlays and grid-rise animation to LabelView	2026-03-05 13:41:52 -08:00
pyr0ball	fc8cb9a8bd	feat(avocet): replace swipe+HTML5-drag with unified pointer-events toss gesture	2026-03-05 10:38:52 -08:00
pyr0ball	cac02b2c5f	feat(avocet): replace HTML5 drag events on LabelBucketGrid with hoveredBucket prop	2026-03-05 10:10:48 -08:00
pyr0ball	8a2df0e2f8	feat: card crumples to small ball on drag pickup so buckets expand fully	2026-03-04 12:38:46 -08:00
pyr0ball	33f5e0d8a1	fix: keyboard shortcuts now work after labels load (lazy keymap evaluation) useLabelKeyboard now accepts labels as Label[] \| (() => Label[]). The keymap is rebuilt on every keypress from the getter result instead of being captured once at construction time — so keys 1–9 now fire correctly after the async /api/config/labels fetch completes. LabelView passes () => labels.value so the reactive ref is read lazily. New test: 'evaluates labels getter on each keypress' covers the async-load scenario (empty list → no match; push a label → key fires).	2026-03-04 12:32:25 -08:00
pyr0ball	43ef2ff8d2	fix: pin bucket grid to bottom of viewport with sticky footer; prevents mis-click from layout shift	2026-03-04 12:26:04 -08:00
pyr0ball	c4498a8190	feat: implement FetchView — SSE progress bars, account selection, targeted fetch	2026-03-04 12:23:58 -08:00
pyr0ball	6b6205e4ed	feat: implement StatsView — label distribution bars, file info, download	2026-03-04 12:21:21 -08:00
pyr0ball	9ef2c1251d	feat: implement SettingsView — IMAP account management, test connection, display toggles	2026-03-04 12:20:30 -08:00
pyr0ball	c94d271f4c	feat: add useApiSSE helper for Server-Sent Events connections	2026-03-04 12:17:46 -08:00
pyr0ball	2a48ab0f03	feat: add Vue Router + stow-able AppSidebar; stub Fetch/Stats/Settings views	2026-03-04 12:12:26 -08:00
pyr0ball	dc92ecff5f	fix: bucket grid now renders 3x3+1 numpad layout on all screen sizes	2026-03-04 11:31:36 -08:00
pyr0ball	8d2fdf6299	fix: UndoToast now emits expire after 5s so toast self-dismisses	2026-03-04 11:29:03 -08:00
pyr0ball	3788254abd	fix: prevent blank page on rebuild and queue drain on skip/discard Two bugs fixed: 1. Blank white page after vue SPA rebuild: browsers cached old index.html referencing old asset hashes. Assets are deleted on rebuild, causing 404s for JS/CSS -> blank page. Fix: serve index.html with Cache-Control: no-cache so browsers always fetch fresh HTML. Hashed assets (/assets/chunk-abc123.js) remain cacheable forever. 2. Queue draining to empty on skip/discard: handleSkip and handleDiscard never refilled the local queue buffer. After enough skips, store.current went null and the empty state showed (blank-looking). Fix: both handlers now call fetchBatch() when queue drops below 3, matching handleLabel. Also: sync classifier_adapters LABELS to match current 10-label schema (new_lead + hired, remove unrelated). 48 Python tests pass, 48 frontend tests pass.	2026-03-03 19:26:34 -08:00
pyr0ball	65d9f6089e	feat(avocet): easter eggs — hired confetti, century mark, clean sweep, midnight labeler, cursor trail	2026-03-03 16:24:47 -08:00
pyr0ball	8e3d263847	feat(avocet): LabelView — wires store, API, card stack, keyboard, easter eggs Implements Task 13: LabelView.vue wires together the label store, API fetch, card stack, bucket grid, keyboard shortcuts, haptics, motion preference, and three easter egg badges (on-a-roll, speed round, fifty deep). App.vue updated to mount LabelView and restore hacker-mode theme on load. 3 new LabelView tests; all 48 tests pass, build clean.	2026-03-03 16:21:07 -08:00
pyr0ball	05d12a1417	feat(avocet): LabelBucketGrid bucket-mode CSS — spring expansion, glow on drop	2026-03-03 16:19:29 -08:00
pyr0ball	97437f39c9	feat(avocet): EmailCardStack — swipe gestures, depth shadows, dismissal classes	2026-03-03 16:16:09 -08:00

1 2

59 commits