# Avocet — Email Classifier Training Tool ## What it is Shared infrastructure for building and benchmarking email classifiers across the CircuitForge menagerie. Named for the avocet's sweeping-bill technique — it sweeps through email streams and filters out categories. **Pipeline:** ``` Scrape (IMAP, wide search, multi-account) → data/email_label_queue.jsonl ↓ Label (card-stack UI) → data/email_score.jsonl ↓ Benchmark (HuggingFace NLI/reranker) → per-model macro-F1 + latency ``` ## Environment - Python env: `conda run -n job-seeker ` for basic use (streamlit, yaml, stdlib only) - Classifier env: `conda run -n job-seeker-classifiers ` for benchmark (transformers, FlagEmbedding, gliclass) - Run tests: `/devl/miniconda3/envs/job-seeker/bin/pytest tests/ -v` (direct binary — `conda run pytest` can spawn runaway processes) - Create classifier env: `conda env create -f environment.yml` ## Label Tool (app/label_tool.py) Card-stack Streamlit UI for manually labeling recruitment emails. ``` conda run -n job-seeker streamlit run app/label_tool.py --server.port 8503 ``` - Config: `config/label_tool.yaml` (gitignored — copy from `.example`, or use ⚙️ Settings tab) - Queue: `data/email_label_queue.jsonl` (gitignored) - Output: `data/email_score.jsonl` (gitignored) - Four tabs: 🃏 Label, 📥 Fetch, 📊 Stats, ⚙️ Settings - Keyboard shortcuts: 1–9 = label, 0 = Other (wildcard, prompts free-text input), S = skip, U = undo - Dedup: MD5 of `(subject + body[:100])` — cross-account safe ### Settings Tab (⚙️) - Add / edit / remove IMAP accounts via form UI — no manual YAML editing required - Per-account fields: display name, host, port, SSL toggle, username, password (masked), folder, days back - **🔌 Test connection** button per account — connects, logs in, selects folder, reports message count - Global: max emails per account per fetch - **💾 Save** writes `config/label_tool.yaml`; **↩ Reload** discards unsaved changes - `_sync_settings_to_state()` collects widget values before any add/remove to avoid index-key drift ## Benchmark (scripts/benchmark_classifier.py) ``` # List available models conda run -n job-seeker-classifiers python scripts/benchmark_classifier.py --list-models # Score against labeled JSONL conda run -n job-seeker-classifiers python scripts/benchmark_classifier.py --score # Visual comparison on live IMAP emails conda run -n job-seeker-classifiers python scripts/benchmark_classifier.py --compare --limit 20 # Include slow/large models conda run -n job-seeker-classifiers python scripts/benchmark_classifier.py --score --include-slow # Export DB-labeled emails (⚠️ LLM-generated labels — review first) conda run -n job-seeker-classifiers python scripts/benchmark_classifier.py --export-db --db /path/to/staging.db ``` ## Labels (peregrine defaults — configurable per product) | Label | Key | Meaning | |-------|-----|---------| | `interview_scheduled` | 1 | Phone screen, video call, or on-site invitation | | `offer_received` | 2 | Formal job offer or offer letter | | `rejected` | 3 | Application declined or not moving forward | | `positive_response` | 4 | Recruiter interest or request to connect | | `survey_received` | 5 | Culture-fit survey or assessment invitation | | `neutral` | 6 | ATS confirmation (application received, etc.) | | `event_rescheduled` | 7 | Interview or event moved to a new time | | `digest` | 8 | Job digest or multi-listing email (scrapeable) | | `new_lead` | 9 | Unsolicited recruiter outreach or cold contact | | `hired` | h | Offer accepted, onboarding, welcome email, start date | ## Model Registry (13 models, 7 defaults) See `scripts/benchmark_classifier.py:MODEL_REGISTRY`. Default models run without `--include-slow`. Add `--models deberta-small deberta-small-2pass` to test a specific subset. ## Config Files - `config/label_tool.yaml` — gitignored; multi-account IMAP config - `config/label_tool.yaml.example` — committed template ## Data Files - `data/email_score.jsonl` — gitignored; manually-labeled ground truth - `data/email_score.jsonl.example` — committed sample for CI - `data/email_label_queue.jsonl` — gitignored; IMAP fetch queue ## Key Design Notes - `ZeroShotAdapter.load()` instantiates the pipeline object; `classify()` calls the object. Tests patch `scripts.classifier_adapters.pipeline` (the module-level factory) with a two-level mock: `mock_factory.return_value = MagicMock(return_value={...})`. - `two_pass=True` on ZeroShotAdapter: first pass ranks all 6 labels; second pass re-runs with only top-2, forcing a binary choice. 2× cost, better confidence. - `--compare` uses the first account in `label_tool.yaml` for live IMAP emails. - DB export labels are llama3.1:8b-generated — treat as noisy, not gold truth. ## Vue Label UI (app/api.py + web/) FastAPI on port 8503 serves both the REST API and the built Vue SPA (`web/dist/`). ``` ./manage.sh start-api # build Vue SPA + start FastAPI (binds 0.0.0.0:8503 — LAN accessible) ./manage.sh stop-api ./manage.sh open-api # xdg-open http://localhost:8503 ``` Logs: `log/api.log` ## Email Field Schema — IMPORTANT Two schemas exist. The normalization layer in `app/api.py` bridges them automatically. ### JSONL on-disk schema (written by `label_tool.py` and `label_tool.py`'s IMAP fetch) | Field | Type | Notes | |-------|------|-------| | `subject` | str | Email subject line | | `body` | str | Plain-text body, truncated at 800 chars; HTML stripped by `_strip_html()` | | `from_addr` | str | Sender address string (`"Name "`) | | `date` | str | Raw RFC 2822 date string | | `account` | str | Display name of the IMAP account that fetched it | | *(no `id`)* | — | Dedup key is MD5 of `(subject + body[:100])` — never stored on disk | ### Vue API schema (returned by `GET /api/queue`, required by POST endpoints) | Field | Type | Notes | |-------|------|-------| | `id` | str | MD5 content hash, or stored `id` if item has one | | `subject` | str | Unchanged | | `body` | str | Unchanged | | `from` | str | Mapped from `from_addr` (or `from` if already present) | | `date` | str | Unchanged | | `source` | str | Mapped from `account` (or `source` if already present) | ### Normalization layer (`_normalize()` in `app/api.py`) `_normalize(item)` handles the mapping and ID generation. All `GET /api/queue` responses pass through it. Mutating endpoints (`/api/label`, `/api/skip`, `/api/discard`) look up items via `_normalize(x)["id"]`, so both real data (no `id`, uses content hash) and test fixtures (explicit `id` field) work transparently. ### Peregrine integration Peregrine's `staging.db` uses different field names again: | staging.db column | Maps to avocet JSONL field | |-------------------|---------------------------| | `subject` | `subject` | | `body` | `body` (may contain HTML — run through `_strip_html()` before queuing) | | `from_address` | `from_addr` | | `received_date` | `date` | | `account` or source context | `account` | When exporting from Peregrine's DB for avocet labeling, transform to the JSONL schema above (not the Vue API schema). The `--export-db` flag in `benchmark_classifier.py` does this. Any new export path should also call `_strip_html()` on the body before writing. ## Relationship to Peregrine Avocet started as `peregrine/tools/label_tool.py` + `peregrine/scripts/classifier_adapters.py`. Peregrine retains copies during stabilization; once avocet is proven, peregrine will import from here.