7.4 KiB
Avocet — Email Classifier Training Tool
What it is
Shared infrastructure for building and benchmarking email classifiers across the CircuitForge menagerie. Named for the avocet's sweeping-bill technique — it sweeps through email streams and filters out categories.
Pipeline:
Scrape (IMAP, wide search, multi-account) → data/email_label_queue.jsonl
↓
Label (card-stack UI) → data/email_score.jsonl
↓
Benchmark (HuggingFace NLI/reranker) → per-model macro-F1 + latency
Environment
- Python env:
conda run -n job-seeker <cmd>for basic use (streamlit, yaml, stdlib only) - Classifier env:
conda run -n job-seeker-classifiers <cmd>for benchmark (transformers, FlagEmbedding, gliclass) - Run tests:
/devl/miniconda3/envs/job-seeker/bin/pytest tests/ -v(direct binary —conda run pytestcan spawn runaway processes) - Create classifier env:
conda env create -f environment.yml
Label Tool (app/label_tool.py)
Card-stack Streamlit UI for manually labeling recruitment emails.
conda run -n job-seeker streamlit run app/label_tool.py --server.port 8503
- Config:
config/label_tool.yaml(gitignored — copy from.example, or use ⚙️ Settings tab) - Queue:
data/email_label_queue.jsonl(gitignored) - Output:
data/email_score.jsonl(gitignored) - Four tabs: 🃏 Label, 📥 Fetch, 📊 Stats, ⚙️ Settings
- Keyboard shortcuts: 1–9 = label, 0 = Other (wildcard, prompts free-text input), S = skip, U = undo
- Dedup: MD5 of
(subject + body[:100])— cross-account safe
Settings Tab (⚙️)
- Add / edit / remove IMAP accounts via form UI — no manual YAML editing required
- Per-account fields: display name, host, port, SSL toggle, username, password (masked), folder, days back
- 🔌 Test connection button per account — connects, logs in, selects folder, reports message count
- Global: max emails per account per fetch
- 💾 Save writes
config/label_tool.yaml; ↩ Reload discards unsaved changes _sync_settings_to_state()collects widget values before any add/remove to avoid index-key drift
Benchmark (scripts/benchmark_classifier.py)
# List available models
conda run -n job-seeker-classifiers python scripts/benchmark_classifier.py --list-models
# Score against labeled JSONL
conda run -n job-seeker-classifiers python scripts/benchmark_classifier.py --score
# Visual comparison on live IMAP emails
conda run -n job-seeker-classifiers python scripts/benchmark_classifier.py --compare --limit 20
# Include slow/large models
conda run -n job-seeker-classifiers python scripts/benchmark_classifier.py --score --include-slow
# Export DB-labeled emails (⚠️ LLM-generated labels — review first)
conda run -n job-seeker-classifiers python scripts/benchmark_classifier.py --export-db --db /path/to/staging.db
Labels (peregrine defaults — configurable per product)
| Label | Key | Meaning |
|---|---|---|
interview_scheduled |
1 | Phone screen, video call, or on-site invitation |
offer_received |
2 | Formal job offer or offer letter |
rejected |
3 | Application declined or not moving forward |
positive_response |
4 | Recruiter interest or request to connect |
survey_received |
5 | Culture-fit survey or assessment invitation |
neutral |
6 | ATS confirmation (application received, etc.) |
event_rescheduled |
7 | Interview or event moved to a new time |
digest |
8 | Job digest or multi-listing email (scrapeable) |
new_lead |
9 | Unsolicited recruiter outreach or cold contact |
hired |
h | Offer accepted, onboarding, welcome email, start date |
Model Registry (13 models, 7 defaults)
See scripts/benchmark_classifier.py:MODEL_REGISTRY.
Default models run without --include-slow.
Add --models deberta-small deberta-small-2pass to test a specific subset.
Config Files
config/label_tool.yaml— gitignored; multi-account IMAP configconfig/label_tool.yaml.example— committed template
Data Files
data/email_score.jsonl— gitignored; manually-labeled ground truthdata/email_score.jsonl.example— committed sample for CIdata/email_label_queue.jsonl— gitignored; IMAP fetch queue
Key Design Notes
ZeroShotAdapter.load()instantiates the pipeline object;classify()calls the object. Tests patchscripts.classifier_adapters.pipeline(the module-level factory) with a two-level mock:mock_factory.return_value = MagicMock(return_value={...}).two_pass=Trueon ZeroShotAdapter: first pass ranks all 6 labels; second pass re-runs with only top-2, forcing a binary choice. 2× cost, better confidence.--compareuses the first account inlabel_tool.yamlfor live IMAP emails.- DB export labels are llama3.1:8b-generated — treat as noisy, not gold truth.
Vue Label UI (app/api.py + web/)
FastAPI on port 8503 serves both the REST API and the built Vue SPA (web/dist/).
./manage.sh start-api # build Vue SPA + start FastAPI (binds 0.0.0.0:8503 — LAN accessible)
./manage.sh stop-api
./manage.sh open-api # xdg-open http://localhost:8503
Logs: log/api.log
Email Field Schema — IMPORTANT
Two schemas exist. The normalization layer in app/api.py bridges them automatically.
JSONL on-disk schema (written by label_tool.py and label_tool.py's IMAP fetch)
| Field | Type | Notes |
|---|---|---|
subject |
str | Email subject line |
body |
str | Plain-text body, truncated at 800 chars; HTML stripped by _strip_html() |
from_addr |
str | Sender address string ("Name <addr>") |
date |
str | Raw RFC 2822 date string |
account |
str | Display name of the IMAP account that fetched it |
(no id) |
— | Dedup key is MD5 of (subject + body[:100]) — never stored on disk |
Vue API schema (returned by GET /api/queue, required by POST endpoints)
| Field | Type | Notes |
|---|---|---|
id |
str | MD5 content hash, or stored id if item has one |
subject |
str | Unchanged |
body |
str | Unchanged |
from |
str | Mapped from from_addr (or from if already present) |
date |
str | Unchanged |
source |
str | Mapped from account (or source if already present) |
Normalization layer (_normalize() in app/api.py)
_normalize(item) handles the mapping and ID generation. All GET /api/queue responses
pass through it. Mutating endpoints (/api/label, /api/skip, /api/discard) look up
items via _normalize(x)["id"], so both real data (no id, uses content hash) and test
fixtures (explicit id field) work transparently.
Peregrine integration
Peregrine's staging.db uses different field names again:
| staging.db column | Maps to avocet JSONL field |
|---|---|
subject |
subject |
body |
body (may contain HTML — run through _strip_html() before queuing) |
from_address |
from_addr |
received_date |
date |
account or source context |
account |
When exporting from Peregrine's DB for avocet labeling, transform to the JSONL schema above
(not the Vue API schema). The --export-db flag in benchmark_classifier.py does this.
Any new export path should also call _strip_html() on the body before writing.
Relationship to Peregrine
Avocet started as peregrine/tools/label_tool.py + peregrine/scripts/classifier_adapters.py.
Peregrine retains copies during stabilization; once avocet is proven, peregrine will import from here.