avocet

Author	SHA1	Message	Date
pyr0ball	13ca082a43	chore(models): refresh model registries with current cluster catalog Replace stale llama/mistral/phi model refs with models active on the cluster: deepseek-r1 (1.5b, 7b-4bit, 0528-qwen3-8b-gguf), granite-4.1-8b, qwen2.5 (3b, 7b), capybarahermes-2.5-mistral-7b, darwin-9b-opus. Update benchmark_plans.py doc examples to match.	2026-05-17 11:24:03 -07:00
pyr0ball	5df33b0f41	feat(benchmark): wire EmbeddingKNNAdapter into MODEL_REGISTRY as embed-knn-nomic	2026-05-05 12:43:48 -07:00
pyr0ball	41584de5df	fix(benchmark): guard empty exemplars, warn on malformed JSON in build_exemplars_from_jsonl	2026-05-05 12:41:46 -07:00
pyr0ball	1d4c07e4a0	feat(benchmark): add build_exemplars_from_jsonl() for k-NN seed	2026-05-05 11:43:12 -07:00
pyr0ball	e823b5e76d	fix(classifier): majority-vote key, partial-load guard, sparse label test	2026-05-05 11:39:24 -07:00
pyr0ball	88bc6bed67	feat(classifier): implement EmbeddingKNNAdapter.classify() with k-NN vote	2026-05-05 08:04:54 -07:00
pyr0ball	4a64a6686d	fix(classifier): atomic embed assignment, logging on orch failure, guard double load	2026-05-05 07:53:15 -07:00
pyr0ball	f2f150b4fb	feat(classifier): implement EmbeddingKNNAdapter.load() and unload()	2026-05-05 07:12:53 -07:00
pyr0ball	72449561cf	feat(classifier): add EmbeddingKNNAdapter skeleton and constructor tests	2026-05-05 06:08:21 -07:00
pyr0ball	c177fb1628	fix(classifier): quality fixes for DEFAULT_EXEMPLARS — remove forward __all__ entry, tighten tests, fix survey exemplar	2026-05-04 20:03:18 -07:00
pyr0ball	3be5055e31	feat(classifier): add DEFAULT_EXEMPLARS for embedding k-NN fallback	2026-05-04 17:44:44 -07:00
pyr0ball	78b64d007d	feat(classifier): add _cosine() helper for embedding similarity	2026-05-04 17:41:45 -07:00
pyr0ball	bce932461a	feat: plans benchmark harness — model scoring for CF planning prompts Adds benchmark_plans.py script, plans_bench API router, PlansBenchTab Vue component, and registers /api/plans-bench in api.py. Also extends models registry (cf-text catalog integration), cforch client, LlmEvalTab, and ModelsView with cf-orch fleet support. Wires Planning mode into BenchmarkView.	2026-05-02 23:36:04 -07:00
pyr0ball	5a0ba92fc6	chore: add README + gather_corpus.py script	2026-04-24 15:29:26 -07:00
pyr0ball	ddb56efb89	refactor(bench): extract benchmark tabs — classifier, compare, llm-eval, style, voice - BenchmarkView.vue: convert from monolithic view to tabbed shell; each tab is now its own component (ClassifierTab, CompareTab, LlmEvalTab, StyleTab, VoiceTab) - StyleTab + VoiceTab: new benchmark modes for style and voice model evaluation - app/style.py: FastAPI router for style imitation benchmarks - app/voice.py: FastAPI router for voice benchmark endpoints - scripts/benchmark_style.py + benchmark_voice.py: headless runner scripts	2026-04-24 14:56:17 -07:00
pyr0ball	cfde474454	fix: log on malformed json in _read_jsonl, use streaming id dedup	2026-04-08 07:37:22 -07:00
pyr0ball	bbfae1a622	fix: log warning when sft record is missing id field	2026-04-08 07:30:46 -07:00
pyr0ball	03dac57fd9	feat: sft_import.py — run discovery and JSONL deduplication	2026-04-08 07:13:37 -07:00
pyr0ball	db44c9323e	fix(avocet): use_reentrant=False for gradient checkpointing Reentrant gradient checkpointing (the default) conflicts with Accelerate's gradient accumulation context manager -- causes 'backward through graph a second time' on the first training step. use_reentrant=False uses the non-reentrant autograd hook path which is compatible with Accelerate >= 0.27.	2026-03-15 17:23:40 -07:00
pyr0ball	cbc382cc88	fix(avocet): reduce deberta-small VRAM + auto-select freest GPU for training - deberta-small: batch_size 16→8 + grad_accum 1→2 (same effective batch), gradient_checkpointing=True (fp16 stays off: DeBERTa v3 disentangled attention overflows fp16 at the gather step) - api: _best_cuda_device() picks highest free-VRAM GPU via nvidia-smi; sets CUDA_VISIBLE_DEVICES in subprocess env to prevent DataParallel replication across both GPUs; adds PYTORCH_ALLOC_CONF=expandable_segments - SSE log now reports which GPU was selected	2026-03-15 17:09:06 -07:00
pyr0ball	48e02f2ed6	fix(avocet): move TorchDataset import to top; split sample_count into total+train	2026-03-15 16:02:43 -07:00
pyr0ball	939ce06f45	feat(avocet): run_finetune, CLI, multi-score-file merge with last-write-wins dedup - load_and_prepare_data() now accepts Path \| list[Path]; single-Path callers unchanged - Dedup by MD5(subject + body[:100]); last file/row wins (lets later runs correct labels) - Prints summary line when duplicates are dropped - Added _EmailDataset (TorchDataset wrapper), run_finetune(), and argparse CLI - run_finetune() saves model + tokenizer + training_info.json with score_files provenance - Stratified split guard: val set size clamped to at least n_classes (handles tiny example data) - 3 new unit tests (merge, last-write-wins dedup, single-Path compat) + 1 integration test - All 16 tests pass (15 unit + 1 integration)	2026-03-15 15:52:41 -07:00
pyr0ball	de5794611b	feat(avocet): add finetune data pipeline, class weights, WeightedTrainer Implements load_and_prepare_data (JSONL ingestion with class filtering), compute_class_weights (inverse-frequency, div-by-zero safe), compute_metrics_for_trainer (macro F1 + accuracy), and WeightedTrainer.compute_loss (**kwargs-safe for Transformers 4.38+ num_items_in_batch). All 12 tests pass.	2026-03-15 15:38:45 -07:00
pyr0ball	d1a36bfd63	fix(avocet): guard discover_finetuned_models against malformed/incomplete training_info.json	2026-03-15 15:18:13 -07:00
pyr0ball	df37a8e16d	feat(avocet): auto-discover fine-tuned models in benchmark harness	2026-03-15 11:59:13 -07:00
pyr0ball	179cb67e1c	fix(avocet): FineTunedAdapter GPU device routing + precise body truncation test	2026-03-15 10:56:47 -07:00
pyr0ball	dc321de59f	feat(avocet): add FineTunedAdapter for local checkpoint inference	2026-03-15 10:54:38 -07:00
pyr0ball	a53f3a7341	feat(avocet): benchmark UI, label fixes, BenchmarkView with charts and SSE run	2026-03-15 09:39:37 -07:00
pyr0ball	3788254abd	fix: prevent blank page on rebuild and queue drain on skip/discard Two bugs fixed: 1. Blank white page after vue SPA rebuild: browsers cached old index.html referencing old asset hashes. Assets are deleted on rebuild, causing 404s for JS/CSS -> blank page. Fix: serve index.html with Cache-Control: no-cache so browsers always fetch fresh HTML. Hashed assets (/assets/chunk-abc123.js) remain cacheable forever. 2. Queue draining to empty on skip/discard: handleSkip and handleDiscard never refilled the local queue buffer. After enough skips, store.current went null and the empty state showed (blank-looking). Fix: both handlers now call fetchBatch() when queue drops below 3, matching handleLabel. Also: sync classifier_adapters LABELS to match current 10-label schema (new_lead + hired, remove unrelated). 48 Python tests pass, 48 frontend tests pass.	2026-03-03 19:26:34 -08:00
pyr0ball	f97ef32100	fix: RerankerAdapter falls back to label name when no LABEL_DESCRIPTIONS entry	2026-02-27 14:54:31 -08:00
pyr0ball	4c346aa328	feat: 9 labels (add event_rescheduled/unrelated/digest), wildcard Other label, InvalidCharacterError fix	2026-02-27 14:34:15 -08:00
pyr0ball	d68754d432	feat: initial avocet repo — email classifier training tool Scrape → Store → Process pipeline for building email classifier benchmark data across the CircuitForge menagerie. - app/label_tool.py — Streamlit card-stack UI, multi-account IMAP fetch, 6-bucket labeling, undo/skip, keyboard shortcuts (1-6/S/U) - scripts/classifier_adapters.py — ZeroShotAdapter (+ two_pass), GLiClassAdapter, RerankerAdapter; ABC with lazy model loading - scripts/benchmark_classifier.py — 13-model registry, --score, --compare, --list-models, --export-db; uses label_tool.yaml for IMAP - tests/ — 20 tests, all passing, zero model downloads required - config/label_tool.yaml.example — multi-account IMAP template - data/email_score.jsonl.example — sample labeled data for CI Labels: interview_scheduled, offer_received, rejected, positive_response, survey_received, neutral	2026-02-27 14:07:38 -08:00

32 commits