Avocet by Circuit Forge LLC — email classifier training tool: multi-account IMAP fetch, card-stack labeling UI, benchmark harness
Find a file
pyr0ball 8ba34bb2d1 feat(avocet): run_finetune, CLI, multi-score-file merge with last-write-wins dedup
- load_and_prepare_data() now accepts Path | list[Path]; single-Path callers unchanged
- Dedup by MD5(subject + body[:100]); last file/row wins (lets later runs correct labels)
- Prints summary line when duplicates are dropped
- Added _EmailDataset (TorchDataset wrapper), run_finetune(), and argparse CLI
- run_finetune() saves model + tokenizer + training_info.json with score_files provenance
- Stratified split guard: val set size clamped to at least n_classes (handles tiny example data)
- 3 new unit tests (merge, last-write-wins dedup, single-Path compat) + 1 integration test
- All 16 tests pass (15 unit + 1 integration)
2026-03-15 15:52:41 -07:00
app feat(avocet): benchmark UI, label fixes, BenchmarkView with charts and SSE run 2026-03-15 09:39:37 -07:00
config feat: initial avocet repo — email classifier training tool 2026-02-27 14:07:38 -08:00
data feat: initial avocet repo — email classifier training tool 2026-02-27 14:07:38 -08:00
docs feat(avocet): benchmark UI, label fixes, BenchmarkView with charts and SSE run 2026-03-15 09:39:37 -07:00
scripts feat(avocet): run_finetune, CLI, multi-score-file merge with last-write-wins dedup 2026-03-15 15:52:41 -07:00
tests feat(avocet): run_finetune, CLI, multi-score-file merge with last-write-wins dedup 2026-03-15 15:52:41 -07:00
web feat(avocet): benchmark UI, label fixes, BenchmarkView with charts and SSE run 2026-03-15 09:39:37 -07:00
.gitignore feat: initial avocet repo — email classifier training tool 2026-02-27 14:07:38 -08:00
CLAUDE.md docs(avocet): document email field schemas and normalization layer 2026-03-03 18:43:41 -08:00
environment.yml chore(avocet): add scikit-learn to classifier env 2026-03-15 09:44:04 -07:00
manage.sh fix(avocet): normalize queue schema + bind to 0.0.0.0 for LAN access 2026-03-03 18:43:00 -08:00
PRIVACY.md docs: add privacy policy reference 2026-03-05 20:59:37 -08:00
pytest.ini feat: initial avocet repo — email classifier training tool 2026-02-27 14:07:38 -08:00
requirements.txt fix(avocet): store original item in _last_action; add requirements.txt 2026-03-03 15:16:54 -08:00