avocet/CLAUDE.md

173 lines
7.4 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Avocet — Email Classifier Training Tool
## What it is
Shared infrastructure for building and benchmarking email classifiers across the CircuitForge menagerie.
Named for the avocet's sweeping-bill technique — it sweeps through email streams and filters out categories.
**Pipeline:**
```
Scrape (IMAP, wide search, multi-account) → data/email_label_queue.jsonl
Label (card-stack UI) → data/email_score.jsonl
Benchmark (HuggingFace NLI/reranker) → per-model macro-F1 + latency
```
## Environment
- Python env: `conda run -n job-seeker <cmd>` for basic use (streamlit, yaml, stdlib only)
- Classifier env: `conda run -n job-seeker-classifiers <cmd>` for benchmark (transformers, FlagEmbedding, gliclass)
- Run tests: `/devl/miniconda3/envs/job-seeker/bin/pytest tests/ -v`
(direct binary — `conda run pytest` can spawn runaway processes)
- Create classifier env: `conda env create -f environment.yml`
## Label Tool (app/label_tool.py)
Card-stack Streamlit UI for manually labeling recruitment emails.
```
conda run -n job-seeker streamlit run app/label_tool.py --server.port 8503
```
- Config: `config/label_tool.yaml` (gitignored — copy from `.example`, or use ⚙️ Settings tab)
- Queue: `data/email_label_queue.jsonl` (gitignored)
- Output: `data/email_score.jsonl` (gitignored)
- Four tabs: 🃏 Label, 📥 Fetch, 📊 Stats, ⚙️ Settings
- Keyboard shortcuts: 19 = label, 0 = Other (wildcard, prompts free-text input), S = skip, U = undo
- Dedup: MD5 of `(subject + body[:100])` — cross-account safe
### Settings Tab (⚙️)
- Add / edit / remove IMAP accounts via form UI — no manual YAML editing required
- Per-account fields: display name, host, port, SSL toggle, username, password (masked), folder, days back
- **🔌 Test connection** button per account — connects, logs in, selects folder, reports message count
- Global: max emails per account per fetch
- **💾 Save** writes `config/label_tool.yaml`; **↩ Reload** discards unsaved changes
- `_sync_settings_to_state()` collects widget values before any add/remove to avoid index-key drift
## Benchmark (scripts/benchmark_classifier.py)
```
# List available models
conda run -n job-seeker-classifiers python scripts/benchmark_classifier.py --list-models
# Score against labeled JSONL
conda run -n job-seeker-classifiers python scripts/benchmark_classifier.py --score
# Visual comparison on live IMAP emails
conda run -n job-seeker-classifiers python scripts/benchmark_classifier.py --compare --limit 20
# Include slow/large models
conda run -n job-seeker-classifiers python scripts/benchmark_classifier.py --score --include-slow
# Export DB-labeled emails (⚠️ LLM-generated labels — review first)
conda run -n job-seeker-classifiers python scripts/benchmark_classifier.py --export-db --db /path/to/staging.db
```
## Labels (peregrine defaults — configurable per product)
| Label | Key | Meaning |
|-------|-----|---------|
| `interview_scheduled` | 1 | Phone screen, video call, or on-site invitation |
| `offer_received` | 2 | Formal job offer or offer letter |
| `rejected` | 3 | Application declined or not moving forward |
| `positive_response` | 4 | Recruiter interest or request to connect |
| `survey_received` | 5 | Culture-fit survey or assessment invitation |
| `neutral` | 6 | ATS confirmation (application received, etc.) |
| `event_rescheduled` | 7 | Interview or event moved to a new time |
| `digest` | 8 | Job digest or multi-listing email (scrapeable) |
| `new_lead` | 9 | Unsolicited recruiter outreach or cold contact |
| `hired` | h | Offer accepted, onboarding, welcome email, start date |
## Model Registry (13 models, 7 defaults)
See `scripts/benchmark_classifier.py:MODEL_REGISTRY`.
Default models run without `--include-slow`.
Add `--models deberta-small deberta-small-2pass` to test a specific subset.
## Config Files
- `config/label_tool.yaml` — gitignored; multi-account IMAP config
- `config/label_tool.yaml.example` — committed template
## Data Files
- `data/email_score.jsonl` — gitignored; manually-labeled ground truth
- `data/email_score.jsonl.example` — committed sample for CI
- `data/email_label_queue.jsonl` — gitignored; IMAP fetch queue
## Key Design Notes
- `ZeroShotAdapter.load()` instantiates the pipeline object; `classify()` calls the object.
Tests patch `scripts.classifier_adapters.pipeline` (the module-level factory) with a
two-level mock: `mock_factory.return_value = MagicMock(return_value={...})`.
- `two_pass=True` on ZeroShotAdapter: first pass ranks all 6 labels; second pass re-runs
with only top-2, forcing a binary choice. 2× cost, better confidence.
- `--compare` uses the first account in `label_tool.yaml` for live IMAP emails.
- DB export labels are llama3.1:8b-generated — treat as noisy, not gold truth.
## Vue Label UI (app/api.py + web/)
FastAPI on port 8503 serves both the REST API and the built Vue SPA (`web/dist/`).
```
./manage.sh start-api # build Vue SPA + start FastAPI (binds 0.0.0.0:8503 — LAN accessible)
./manage.sh stop-api
./manage.sh open-api # xdg-open http://localhost:8503
```
Logs: `log/api.log`
## Email Field Schema — IMPORTANT
Two schemas exist. The normalization layer in `app/api.py` bridges them automatically.
### JSONL on-disk schema (written by `label_tool.py` and `label_tool.py`'s IMAP fetch)
| Field | Type | Notes |
|-------|------|-------|
| `subject` | str | Email subject line |
| `body` | str | Plain-text body, truncated at 800 chars; HTML stripped by `_strip_html()` |
| `from_addr` | str | Sender address string (`"Name <addr>"`) |
| `date` | str | Raw RFC 2822 date string |
| `account` | str | Display name of the IMAP account that fetched it |
| *(no `id`)* | — | Dedup key is MD5 of `(subject + body[:100])` — never stored on disk |
### Vue API schema (returned by `GET /api/queue`, required by POST endpoints)
| Field | Type | Notes |
|-------|------|-------|
| `id` | str | MD5 content hash, or stored `id` if item has one |
| `subject` | str | Unchanged |
| `body` | str | Unchanged |
| `from` | str | Mapped from `from_addr` (or `from` if already present) |
| `date` | str | Unchanged |
| `source` | str | Mapped from `account` (or `source` if already present) |
### Normalization layer (`_normalize()` in `app/api.py`)
`_normalize(item)` handles the mapping and ID generation. All `GET /api/queue` responses
pass through it. Mutating endpoints (`/api/label`, `/api/skip`, `/api/discard`) look up
items via `_normalize(x)["id"]`, so both real data (no `id`, uses content hash) and test
fixtures (explicit `id` field) work transparently.
### Peregrine integration
Peregrine's `staging.db` uses different field names again:
| staging.db column | Maps to avocet JSONL field |
|-------------------|---------------------------|
| `subject` | `subject` |
| `body` | `body` (may contain HTML — run through `_strip_html()` before queuing) |
| `from_address` | `from_addr` |
| `received_date` | `date` |
| `account` or source context | `account` |
When exporting from Peregrine's DB for avocet labeling, transform to the JSONL schema above
(not the Vue API schema). The `--export-db` flag in `benchmark_classifier.py` does this.
Any new export path should also call `_strip_html()` on the body before writing.
## Relationship to Peregrine
Avocet started as `peregrine/tools/label_tool.py` + `peregrine/scripts/classifier_adapters.py`.
Peregrine retains copies during stabilization; once avocet is proven, peregrine will import from here.