avocet

Author	SHA1	Message	Date
pyr0ball	118ae2660a	feat: Imitate tab — pull CF product samples, compare LLM responses Backend (app/imitate.py): - GET /api/imitate/products — reads imitate: config, checks online status - GET /api/imitate/products/{id}/sample — fetches real item from product API - GET /api/imitate/run (SSE) — streams ollama responses for selected models - POST /api/imitate/push-corrections — queues results in SFT corrections JSONL Frontend (ImitateView.vue): - Step 1: product picker grid (online/offline status, icon from config) - Step 2: raw sample preview + editable prompt textarea - Step 3: ollama model multi-select, temperature slider, SSE run with live log - Step 4: response cards side by side, push to Corrections button Wiring: - app/api.py: include imitate_router at /api/imitate - web/src/router: /imitate route + lazy import - AppSidebar: Imitate nav entry (mirror icon) - config/label_tool.yaml.example: imitate: section with peregrine example - 16 unit tests (100% passing) Also: BenchmarkView.vue Compare panel — side-by-side run diff for bench results	2026-04-09 20:04:45 -07:00
pyr0ball	dc2dc70ef9	test: fix test_tasks_parses_yaml for TaskEntry schema TaskEntry now includes prompt/system fields (default ""). Switch from exact dict comparison to field-by-field assertions so the test is forward-compatible with optional schema additions.	2026-04-09 20:02:12 -07:00
pyr0ball	dffb1d0d7a	feat: cf-orch LLM benchmark integration (Phase 1) Backend (app/cforch.py — new APIRouter at /api/cforch): - GET /tasks — reads bench_tasks.yaml, returns tasks + deduplicated types - GET /models — reads bench_models.yaml, returns model list with service/tags - GET /run — SSE endpoint; spawns cf-orch benchmark.py subprocess with --filter-tasks, --filter-tags, --coordinator, --ollama-url; strips ANSI codes; emits progress/result/complete/error events; 409 guard on concurrency - GET /results — returns latest bench_results/*/summary.json; 404 if none - POST /cancel — terminates running benchmark subprocess - All paths configurable via label_tool.yaml cforch: section - 13 tests; follows sft.py/models.py testability seam pattern Frontend: - BenchmarkView: mode toggle (Classifier / LLM Eval); LLM Eval panel with task picker (by type, select-all + indeterminate), model picker (by service), SSE run log, results table with best-per-column highlighting - StatsView: LLM Benchmark section showing quality_by_task_type table across models; hidden when no results; fetches /api/cforch/results on mount SFT candidate pipeline: cf-orch runs that produce sft_candidates.jsonl are auto-discovered by the existing bench_results_dir config in sft.py — no additional wiring needed.	2026-04-09 10:46:06 -07:00
pyr0ball	ce12b29c94	feat: model compatibility warning on HF lookup - GET /api/models/lookup now returns compatible: bool and warning: str\|null - compatible=false + warning when pipeline_tag is absent (no task tag on HF) or present but not in the supported adapter map - Warning message names the unsupported pipeline_tag and lists supported types - ModelsView: yellow compat-warning banner below preview description; Add button relabels to "Add anyway" with muted styling when incompatible - test_models: accept 405 for path-traversal DELETE tests (StaticFiles mount returns 405 for non-GET methods when web/dist exists)	2026-04-09 09:48:55 -07:00
pyr0ball	b6b3d2c390	feat: HuggingFace model management tab - New /api/models router: HF lookup, approval queue (JSONL persistence), SSE download progress via snapshot_download(), installed model listing, path-traversal-safe DELETE - pipeline_tag → adapter type mapping (zero-shot-classification, sentence-similarity, text-generation) - 27 tests covering all endpoints, duplicate detection, path traversal - ModelsView.vue: HF lookup + add, approval queue, live download progress bars via SSE, installed model table with delete - Sidebar entry (🤗 Models) between Benchmark and Corrections	2026-04-08 22:32:35 -07:00
pyr0ball	9633d9a535	feat: add failure_category field to SFT corrections (#16 ) Adds optional failure_category to SubmitRequest and candidate records so reviewers can classify why a model response was wrong, not just what to do with it. Enables the fine-tune harness to filter training data by failure type (e.g. exclude scoring artifacts, train only on genuine wrong answers). Taxonomy: scoring_artifact \| style_violation \| partial_answer \| wrong_answer \| format_error \| hallucination - app/sft.py: FailureCategory Literal type; SubmitRequest.failure_category; stored on candidate record in POST /submit correct branch - tests/test_sft.py: 3 new tests (stores value, null round-trip, 422 on invalid) - stores/sft.ts: SftFailureCategory type exported; SftQueueItem + SftLastAction updated; setLastAction accepts optional category param - SftCard.vue: chip-group selector shown during correct/discard/flag flow; two-step confirm for discard/flag reveals chips before emitting; category forwarded in all emit payloads - CorrectionsView.vue: handleCorrect/Discard/Flag accept and forward category to POST /api/sft/submit body and store.setLastAction - SftCard.test.ts: 11 new tests covering chip visibility, selection, single-active enforcement, pending-action flow, emit payloads, cancel	2026-04-08 22:10:26 -07:00
pyr0ball	07807f0d05	feat: sft router — /export and /stats endpoints	2026-04-08 14:46:08 -07:00
pyr0ball	f19cab60f7	feat: sft router — /queue, /submit, /undo endpoints	2026-04-08 14:22:06 -07:00
pyr0ball	b330e84111	fix: sft router — yaml error handling, none filter, shared jsonl utils, fixture restore	2026-04-08 14:07:09 -07:00
pyr0ball	597ffc7324	feat: sft router skeleton — /api/sft/runs and /api/sft/import	2026-04-08 13:54:58 -07:00
pyr0ball	bbfae1a622	fix: log warning when sft record is missing id field	2026-04-08 07:30:46 -07:00
pyr0ball	03dac57fd9	feat: sft_import.py — run discovery and JSONL deduplication	2026-04-08 07:13:37 -07:00
pyr0ball	25880e377d	refactor: consolidate HTML extraction into app/utils.py Rename _strip_html/_extract_body to strip_html/extract_body (public API). Remove duplicate _TextExtractor, strip_html, and _extract_body from imap_fetch.py; import from app.utils instead. Update test_label_tool.py to use the new public names.	2026-04-08 06:52:15 -07:00
pyr0ball	ae0ac19505	chore: retire Streamlit app, scaffold sft branch - Delete app/label_tool.py (Streamlit UI retired; Vue SPA is sole UI) - Extract _strip_html and _extract_body into app/utils.py (stdlib-only, reusable) - Update tests/test_label_tool.py import to app.utils - Rename start-api/stop-api/restart-api/open-api → start/stop/restart/open in manage.sh - Remove STREAMLIT variable and all Streamlit-specific case blocks from manage.sh - Update manage.sh usage section to reflect Vue+FastAPI-only commands - Add data/sft_candidates.jsonl and data/sft_approved.jsonl to .gitignore - Add sft.bench_results_dir key to config/label_tool.yaml.example	2026-04-08 06:18:12 -07:00
pyr0ball	e38a28dcc3	fix(avocet): narrow cancel except clause, clear stale cancel flags on new run - except clause in cancel_benchmark/cancel_finetune narrowed from Exception to _subprocess.TimeoutExpired (C1) - _cancelled_jobs.discard() called after registering new proc to prevent a stale flag from a prior run masking errors (I2) - local `import subprocess` removed from run_benchmark and run_finetune_endpoint; all Popen calls updated to _subprocess.Popen (I1) - test patch targets updated from subprocess.Popen to app.api._subprocess.Popen; cancelled-event tests updated to set flag in proc.wait() side-effect so the discard-on-new-run logic is exercised correctly	2026-03-15 18:13:01 -07:00
pyr0ball	0ab49609c0	feat(avocet): add cancel endpoints for benchmark and finetune jobs Adds POST /api/benchmark/cancel and POST /api/finetune/cancel endpoints that terminate the running subprocess (kill on 3s timeout), and updates the run generators to emit a cancelled SSE event instead of error when the job was intentionally stopped.	2026-03-15 18:09:20 -07:00
pyr0ball	dd352f07cd	fix(avocet): _MODELS_DIR overridable in tests; sanitize score paths against path traversal	2026-03-15 16:07:27 -07:00
pyr0ball	903624a4b8	feat(avocet): add /api/finetune/status and /api/finetune/run endpoints	2026-03-15 16:04:34 -07:00
pyr0ball	48e02f2ed6	fix(avocet): move TorchDataset import to top; split sample_count into total+train	2026-03-15 16:02:43 -07:00
pyr0ball	939ce06f45	feat(avocet): run_finetune, CLI, multi-score-file merge with last-write-wins dedup - load_and_prepare_data() now accepts Path \| list[Path]; single-Path callers unchanged - Dedup by MD5(subject + body[:100]); last file/row wins (lets later runs correct labels) - Prints summary line when duplicates are dropped - Added _EmailDataset (TorchDataset wrapper), run_finetune(), and argparse CLI - run_finetune() saves model + tokenizer + training_info.json with score_files provenance - Stratified split guard: val set size clamped to at least n_classes (handles tiny example data) - 3 new unit tests (merge, last-write-wins dedup, single-Path compat) + 1 integration test - All 16 tests pass (15 unit + 1 integration)	2026-03-15 15:52:41 -07:00
pyr0ball	4e70e79b26	fix(avocet): tighten body truncation test to exact 400-char assertion	2026-03-15 15:44:19 -07:00
pyr0ball	de5794611b	feat(avocet): add finetune data pipeline, class weights, WeightedTrainer Implements load_and_prepare_data (JSONL ingestion with class filtering), compute_class_weights (inverse-frequency, div-by-zero safe), compute_metrics_for_trainer (macro F1 + accuracy), and WeightedTrainer.compute_loss (**kwargs-safe for Transformers 4.38+ num_items_in_batch). All 12 tests pass.	2026-03-15 15:38:45 -07:00
pyr0ball	d1a36bfd63	fix(avocet): guard discover_finetuned_models against malformed/incomplete training_info.json	2026-03-15 15:18:13 -07:00
pyr0ball	df37a8e16d	feat(avocet): auto-discover fine-tuned models in benchmark harness	2026-03-15 11:59:13 -07:00
pyr0ball	179cb67e1c	fix(avocet): FineTunedAdapter GPU device routing + precise body truncation test	2026-03-15 10:56:47 -07:00
pyr0ball	dc321de59f	feat(avocet): add FineTunedAdapter for local checkpoint inference	2026-03-15 10:54:38 -07:00
pyr0ball	07407117a5	feat: add GET /api/fetch/stream SSE endpoint for real-time IMAP progress	2026-03-04 12:05:23 -08:00
pyr0ball	e5e66b09cc	feat: add POST /api/accounts/test endpoint	2026-03-04 12:04:42 -08:00
pyr0ball	47a2178ee4	feat: add GET /api/stats and GET /api/stats/download endpoints	2026-03-04 12:04:11 -08:00
pyr0ball	3f0cd7e837	feat: add GET/POST /api/config endpoints for IMAP account management	2026-03-04 12:03:40 -08:00
pyr0ball	8a0545a6e7	feat: extract IMAP logic to app/imap_fetch.py for reuse by API	2026-03-04 11:42:22 -08:00
pyr0ball	3788254abd	fix: prevent blank page on rebuild and queue drain on skip/discard Two bugs fixed: 1. Blank white page after vue SPA rebuild: browsers cached old index.html referencing old asset hashes. Assets are deleted on rebuild, causing 404s for JS/CSS -> blank page. Fix: serve index.html with Cache-Control: no-cache so browsers always fetch fresh HTML. Hashed assets (/assets/chunk-abc123.js) remain cacheable forever. 2. Queue draining to empty on skip/discard: handleSkip and handleDiscard never refilled the local queue buffer. After enough skips, store.current went null and the empty state showed (blank-looking). Fix: both handlers now call fetchBatch() when queue drops below 3, matching handleLabel. Also: sync classifier_adapters LABELS to match current 10-label schema (new_lead + hired, remove unrelated). 48 Python tests pass, 48 frontend tests pass.	2026-03-03 19:26:34 -08:00
pyr0ball	2fdafc1d10	fix(avocet): strip HTML from email bodies — stdlib HTMLParser, no deps	2026-03-03 16:28:18 -08:00
pyr0ball	01cc908eab	fix(avocet): undo — commit-then-clear order, empty-records guard, skip dedup, stronger test	2026-03-03 15:41:58 -08:00
pyr0ball	f4facc6484	feat(avocet): discard, undo, labels config, static serving — backend complete	2026-03-03 15:35:01 -08:00
pyr0ball	5912b73705	feat(avocet): POST /api/skip endpoint	2026-03-03 15:21:32 -08:00
pyr0ball	ce202d97ea	feat(avocet): POST /api/label endpoint	2026-03-03 15:14:04 -08:00
pyr0ball	6556e3fef0	fix(avocet): queue_with_items fixture uses api._DATA_DIR to avoid implicit tmp_path coupling	2026-03-03 15:03:57 -08:00
pyr0ball	8898258055	feat(avocet): GET /api/queue endpoint	2026-03-03 15:00:59 -08:00
pyr0ball	d36d0be166	fix(avocet): _write_jsonl empty-list writes empty file; add reset_last_action helper	2026-03-03 14:36:18 -08:00
pyr0ball	f06114e648	feat(avocet): FastAPI skeleton + JSONL helpers	2026-03-03 13:30:28 -08:00
pyr0ball	4c346aa328	feat: 9 labels (add event_rescheduled/unrelated/digest), wildcard Other label, InvalidCharacterError fix	2026-02-27 14:34:15 -08:00
pyr0ball	d68754d432	feat: initial avocet repo — email classifier training tool Scrape → Store → Process pipeline for building email classifier benchmark data across the CircuitForge menagerie. - app/label_tool.py — Streamlit card-stack UI, multi-account IMAP fetch, 6-bucket labeling, undo/skip, keyboard shortcuts (1-6/S/U) - scripts/classifier_adapters.py — ZeroShotAdapter (+ two_pass), GLiClassAdapter, RerankerAdapter; ABC with lazy model loading - scripts/benchmark_classifier.py — 13-model registry, --score, --compare, --list-models, --export-db; uses label_tool.yaml for IMAP - tests/ — 20 tests, all passing, zero model downloads required - config/label_tool.yaml.example — multi-account IMAP template - data/email_score.jsonl.example — sample labeled data for CI Labels: interview_scheduled, offer_received, rejected, positive_response, survey_received, neutral	2026-02-27 14:07:38 -08:00

43 commits