Scrape → Store → Process pipeline for building email classifier
benchmark data across the CircuitForge menagerie.
- app/label_tool.py — Streamlit card-stack UI, multi-account IMAP fetch,
6-bucket labeling, undo/skip, keyboard shortcuts (1-6/S/U)
- scripts/classifier_adapters.py — ZeroShotAdapter (+ two_pass),
GLiClassAdapter, RerankerAdapter; ABC with lazy model loading
- scripts/benchmark_classifier.py — 13-model registry, --score,
--compare, --list-models, --export-db; uses label_tool.yaml for IMAP
- tests/ — 20 tests, all passing, zero model downloads required
- config/label_tool.yaml.example — multi-account IMAP template
- data/email_score.jsonl.example — sample labeled data for CI
Labels: interview_scheduled, offer_received, rejected,
positive_response, survey_received, neutral
4 KiB
4 KiB
Avocet — Email Classifier Training Tool
What it is
Shared infrastructure for building and benchmarking email classifiers across the CircuitForge menagerie. Named for the avocet's sweeping-bill technique — it sweeps through email streams and filters out categories.
Pipeline:
Scrape (IMAP, wide search, multi-account) → data/email_label_queue.jsonl
↓
Label (card-stack UI) → data/email_score.jsonl
↓
Benchmark (HuggingFace NLI/reranker) → per-model macro-F1 + latency
Environment
- Python env:
conda run -n job-seeker <cmd>for basic use (streamlit, yaml, stdlib only) - Classifier env:
conda run -n job-seeker-classifiers <cmd>for benchmark (transformers, FlagEmbedding, gliclass) - Run tests:
/devl/miniconda3/envs/job-seeker/bin/pytest tests/ -v(direct binary —conda run pytestcan spawn runaway processes) - Create classifier env:
conda env create -f environment.yml
Label Tool (app/label_tool.py)
Card-stack Streamlit UI for manually labeling recruitment emails.
conda run -n job-seeker streamlit run app/label_tool.py --server.port 8503
- Config:
config/label_tool.yaml(gitignored — copy from.example) - Queue:
data/email_label_queue.jsonl(gitignored) - Output:
data/email_score.jsonl(gitignored) - Three tabs: 🃏 Label, 📥 Fetch, 📊 Stats
- Keyboard shortcuts: 1–6 = label, S = skip, U = undo
- Dedup: MD5 of
(subject + body[:100])— cross-account safe
Benchmark (scripts/benchmark_classifier.py)
# List available models
conda run -n job-seeker-classifiers python scripts/benchmark_classifier.py --list-models
# Score against labeled JSONL
conda run -n job-seeker-classifiers python scripts/benchmark_classifier.py --score
# Visual comparison on live IMAP emails
conda run -n job-seeker-classifiers python scripts/benchmark_classifier.py --compare --limit 20
# Include slow/large models
conda run -n job-seeker-classifiers python scripts/benchmark_classifier.py --score --include-slow
# Export DB-labeled emails (⚠️ LLM-generated labels — review first)
conda run -n job-seeker-classifiers python scripts/benchmark_classifier.py --export-db --db /path/to/staging.db
Labels (peregrine defaults — configurable per product)
| Label | Meaning |
|---|---|
interview_scheduled |
Phone screen, video call, or on-site invitation |
offer_received |
Formal job offer or offer letter |
rejected |
Application declined or not moving forward |
positive_response |
Recruiter interest or request to connect |
survey_received |
Culture-fit survey or assessment invitation |
neutral |
ATS confirmation or unrelated email |
Model Registry (13 models, 7 defaults)
See scripts/benchmark_classifier.py:MODEL_REGISTRY.
Default models run without --include-slow.
Add --models deberta-small deberta-small-2pass to test a specific subset.
Config Files
config/label_tool.yaml— gitignored; multi-account IMAP configconfig/label_tool.yaml.example— committed template
Data Files
data/email_score.jsonl— gitignored; manually-labeled ground truthdata/email_score.jsonl.example— committed sample for CIdata/email_label_queue.jsonl— gitignored; IMAP fetch queue
Key Design Notes
ZeroShotAdapter.load()instantiates the pipeline object;classify()calls the object. Tests patchscripts.classifier_adapters.pipeline(the module-level factory) with a two-level mock:mock_factory.return_value = MagicMock(return_value={...}).two_pass=Trueon ZeroShotAdapter: first pass ranks all 6 labels; second pass re-runs with only top-2, forcing a binary choice. 2× cost, better confidence.--compareuses the first account inlabel_tool.yamlfor live IMAP emails.- DB export labels are llama3.1:8b-generated — treat as noisy, not gold truth.
Relationship to Peregrine
Avocet started as peregrine/tools/label_tool.py + peregrine/scripts/classifier_adapters.py.
Peregrine retains copies during stabilization; once avocet is proven, peregrine will import from here.