avocet/CLAUDE.md
pyr0ball 0e238a9e37 feat: initial avocet repo — email classifier training tool
Scrape → Store → Process pipeline for building email classifier
benchmark data across the CircuitForge menagerie.

- app/label_tool.py — Streamlit card-stack UI, multi-account IMAP fetch,
  6-bucket labeling, undo/skip, keyboard shortcuts (1-6/S/U)
- scripts/classifier_adapters.py — ZeroShotAdapter (+ two_pass),
  GLiClassAdapter, RerankerAdapter; ABC with lazy model loading
- scripts/benchmark_classifier.py — 13-model registry, --score,
  --compare, --list-models, --export-db; uses label_tool.yaml for IMAP
- tests/ — 20 tests, all passing, zero model downloads required
- config/label_tool.yaml.example — multi-account IMAP template
- data/email_score.jsonl.example — sample labeled data for CI

Labels: interview_scheduled, offer_received, rejected,
        positive_response, survey_received, neutral
2026-02-27 14:07:38 -08:00

4 KiB
Raw Blame History

Avocet — Email Classifier Training Tool

What it is

Shared infrastructure for building and benchmarking email classifiers across the CircuitForge menagerie. Named for the avocet's sweeping-bill technique — it sweeps through email streams and filters out categories.

Pipeline:

Scrape (IMAP, wide search, multi-account) → data/email_label_queue.jsonl
                ↓
Label (card-stack UI)                      → data/email_score.jsonl
                ↓
Benchmark (HuggingFace NLI/reranker)       → per-model macro-F1 + latency

Environment

  • Python env: conda run -n job-seeker <cmd> for basic use (streamlit, yaml, stdlib only)
  • Classifier env: conda run -n job-seeker-classifiers <cmd> for benchmark (transformers, FlagEmbedding, gliclass)
  • Run tests: /devl/miniconda3/envs/job-seeker/bin/pytest tests/ -v (direct binary — conda run pytest can spawn runaway processes)
  • Create classifier env: conda env create -f environment.yml

Label Tool (app/label_tool.py)

Card-stack Streamlit UI for manually labeling recruitment emails.

conda run -n job-seeker streamlit run app/label_tool.py --server.port 8503
  • Config: config/label_tool.yaml (gitignored — copy from .example)
  • Queue: data/email_label_queue.jsonl (gitignored)
  • Output: data/email_score.jsonl (gitignored)
  • Three tabs: 🃏 Label, 📥 Fetch, 📊 Stats
  • Keyboard shortcuts: 16 = label, S = skip, U = undo
  • Dedup: MD5 of (subject + body[:100]) — cross-account safe

Benchmark (scripts/benchmark_classifier.py)

# List available models
conda run -n job-seeker-classifiers python scripts/benchmark_classifier.py --list-models

# Score against labeled JSONL
conda run -n job-seeker-classifiers python scripts/benchmark_classifier.py --score

# Visual comparison on live IMAP emails
conda run -n job-seeker-classifiers python scripts/benchmark_classifier.py --compare --limit 20

# Include slow/large models
conda run -n job-seeker-classifiers python scripts/benchmark_classifier.py --score --include-slow

# Export DB-labeled emails (⚠️ LLM-generated labels — review first)
conda run -n job-seeker-classifiers python scripts/benchmark_classifier.py --export-db --db /path/to/staging.db

Labels (peregrine defaults — configurable per product)

Label Meaning
interview_scheduled Phone screen, video call, or on-site invitation
offer_received Formal job offer or offer letter
rejected Application declined or not moving forward
positive_response Recruiter interest or request to connect
survey_received Culture-fit survey or assessment invitation
neutral ATS confirmation or unrelated email

Model Registry (13 models, 7 defaults)

See scripts/benchmark_classifier.py:MODEL_REGISTRY. Default models run without --include-slow. Add --models deberta-small deberta-small-2pass to test a specific subset.

Config Files

  • config/label_tool.yaml — gitignored; multi-account IMAP config
  • config/label_tool.yaml.example — committed template

Data Files

  • data/email_score.jsonl — gitignored; manually-labeled ground truth
  • data/email_score.jsonl.example — committed sample for CI
  • data/email_label_queue.jsonl — gitignored; IMAP fetch queue

Key Design Notes

  • ZeroShotAdapter.load() instantiates the pipeline object; classify() calls the object. Tests patch scripts.classifier_adapters.pipeline (the module-level factory) with a two-level mock: mock_factory.return_value = MagicMock(return_value={...}).
  • two_pass=True on ZeroShotAdapter: first pass ranks all 6 labels; second pass re-runs with only top-2, forcing a binary choice. 2× cost, better confidence.
  • --compare uses the first account in label_tool.yaml for live IMAP emails.
  • DB export labels are llama3.1:8b-generated — treat as noisy, not gold truth.

Relationship to Peregrine

Avocet started as peregrine/tools/label_tool.py + peregrine/scripts/classifier_adapters.py. Peregrine retains copies during stabilization; once avocet is proven, peregrine will import from here.