Commit graph

13 commits

Author SHA1 Message Date
5dee23f53c fix(avocet): reduce deberta-small VRAM + auto-select freest GPU for training
- deberta-small: batch_size 16→8 + grad_accum 1→2 (same effective batch),
  gradient_checkpointing=True (fp16 stays off: DeBERTa v3 disentangled
  attention overflows fp16 at the gather step)
- api: _best_cuda_device() picks highest free-VRAM GPU via nvidia-smi;
  sets CUDA_VISIBLE_DEVICES in subprocess env to prevent DataParallel
  replication across both GPUs; adds PYTORCH_ALLOC_CONF=expandable_segments
- SSE log now reports which GPU was selected
2026-03-15 17:09:06 -07:00
64fd19a7b6 fix(avocet): move TorchDataset import to top; split sample_count into total+train 2026-03-15 16:02:43 -07:00
8ba34bb2d1 feat(avocet): run_finetune, CLI, multi-score-file merge with last-write-wins dedup
- load_and_prepare_data() now accepts Path | list[Path]; single-Path callers unchanged
- Dedup by MD5(subject + body[:100]); last file/row wins (lets later runs correct labels)
- Prints summary line when duplicates are dropped
- Added _EmailDataset (TorchDataset wrapper), run_finetune(), and argparse CLI
- run_finetune() saves model + tokenizer + training_info.json with score_files provenance
- Stratified split guard: val set size clamped to at least n_classes (handles tiny example data)
- 3 new unit tests (merge, last-write-wins dedup, single-Path compat) + 1 integration test
- All 16 tests pass (15 unit + 1 integration)
2026-03-15 15:52:41 -07:00
5eb593569d feat(avocet): add finetune data pipeline, class weights, WeightedTrainer
Implements load_and_prepare_data (JSONL ingestion with class filtering),
compute_class_weights (inverse-frequency, div-by-zero safe), compute_metrics_for_trainer
(macro F1 + accuracy), and WeightedTrainer.compute_loss (**kwargs-safe for
Transformers 4.38+ num_items_in_batch). All 12 tests pass.
2026-03-15 15:38:45 -07:00
2d795b9573 fix(avocet): guard discover_finetuned_models against malformed/incomplete training_info.json 2026-03-15 15:18:13 -07:00
36117b35c4 feat(avocet): auto-discover fine-tuned models in benchmark harness 2026-03-15 11:59:13 -07:00
da8478082e fix(avocet): FineTunedAdapter GPU device routing + precise body truncation test 2026-03-15 10:56:47 -07:00
7a4ca422ca feat(avocet): add FineTunedAdapter for local checkpoint inference 2026-03-15 10:54:38 -07:00
8c22dd62de feat(avocet): benchmark UI, label fixes, BenchmarkView with charts and SSE run 2026-03-15 09:39:37 -07:00
82eeb4defc fix: prevent blank page on rebuild and queue drain on skip/discard
Two bugs fixed:

1. Blank white page after vue SPA rebuild: browsers cached old index.html
   referencing old asset hashes. Assets are deleted on rebuild, causing
   404s for JS/CSS -> blank page. Fix: serve index.html with
   Cache-Control: no-cache so browsers always fetch fresh HTML.
   Hashed assets (/assets/chunk-abc123.js) remain cacheable forever.

2. Queue draining to empty on skip/discard: handleSkip and handleDiscard
   never refilled the local queue buffer. After enough skips, store.current
   went null and the empty state showed (blank-looking). Fix: both handlers
   now call fetchBatch() when queue drops below 3, matching handleLabel.

Also: sync classifier_adapters LABELS to match current 10-label schema
(new_lead + hired, remove unrelated).

48 Python tests pass, 48 frontend tests pass.
2026-03-03 19:26:34 -08:00
1452889d0f fix: RerankerAdapter falls back to label name when no LABEL_DESCRIPTIONS entry 2026-02-27 14:54:31 -08:00
fd476e4199 feat: 9 labels (add event_rescheduled/unrelated/digest), wildcard Other label, InvalidCharacterError fix 2026-02-27 14:34:15 -08:00
0e238a9e37 feat: initial avocet repo — email classifier training tool
Scrape → Store → Process pipeline for building email classifier
benchmark data across the CircuitForge menagerie.

- app/label_tool.py — Streamlit card-stack UI, multi-account IMAP fetch,
  6-bucket labeling, undo/skip, keyboard shortcuts (1-6/S/U)
- scripts/classifier_adapters.py — ZeroShotAdapter (+ two_pass),
  GLiClassAdapter, RerankerAdapter; ABC with lazy model loading
- scripts/benchmark_classifier.py — 13-model registry, --score,
  --compare, --list-models, --export-db; uses label_tool.yaml for IMAP
- tests/ — 20 tests, all passing, zero model downloads required
- config/label_tool.yaml.example — multi-account IMAP template
- data/email_score.jsonl.example — sample labeled data for CI

Labels: interview_scheduled, offer_received, rejected,
        positive_response, survey_received, neutral
2026-02-27 14:07:38 -08:00