avocet/tests
pyr0ball 8ba34bb2d1 feat(avocet): run_finetune, CLI, multi-score-file merge with last-write-wins dedup
- load_and_prepare_data() now accepts Path | list[Path]; single-Path callers unchanged
- Dedup by MD5(subject + body[:100]); last file/row wins (lets later runs correct labels)
- Prints summary line when duplicates are dropped
- Added _EmailDataset (TorchDataset wrapper), run_finetune(), and argparse CLI
- run_finetune() saves model + tokenizer + training_info.json with score_files provenance
- Stratified split guard: val set size clamped to at least n_classes (handles tiny example data)
- 3 new unit tests (merge, last-write-wins dedup, single-Path compat) + 1 integration test
- All 16 tests pass (15 unit + 1 integration)
2026-03-15 15:52:41 -07:00
..
__init__.py feat: initial avocet repo — email classifier training tool 2026-02-27 14:07:38 -08:00
test_api.py feat: add GET /api/fetch/stream SSE endpoint for real-time IMAP progress 2026-03-04 12:05:23 -08:00
test_benchmark_classifier.py fix(avocet): guard discover_finetuned_models against malformed/incomplete training_info.json 2026-03-15 15:18:13 -07:00
test_classifier_adapters.py fix(avocet): FineTunedAdapter GPU device routing + precise body truncation test 2026-03-15 10:56:47 -07:00
test_finetune.py feat(avocet): run_finetune, CLI, multi-score-file merge with last-write-wins dedup 2026-03-15 15:52:41 -07:00
test_imap_fetch.py feat: extract IMAP logic to app/imap_fetch.py for reuse by API 2026-03-04 11:42:22 -08:00
test_label_tool.py fix(avocet): strip HTML from email bodies — stdlib HTMLParser, no deps 2026-03-03 16:28:18 -08:00