Replace stale llama/mistral/phi model refs with models active on the cluster: deepseek-r1 (1.5b, 7b-4bit, 0528-qwen3-8b-gguf), granite-4.1-8b, qwen2.5 (3b, 7b), capybarahermes-2.5-mistral-7b, darwin-9b-opus. Update benchmark_plans.py doc examples to match. |
||
|---|---|---|
| app | ||
| config | ||
| data | ||
| scripts | ||
| tests | ||
| web | ||
| .env.example | ||
| .gitignore | ||
| environment.yml | ||
| manage.sh | ||
| PRIVACY.md | ||
| pytest.ini | ||
| README.md | ||
| requirements.txt | ||
What is Avocet?
Avocet is the internal data pipeline Circuit Forge uses to build, evaluate, and fine-tune email classifiers. It implements a three-stage workflow: human labelers review emails one at a time in a drag-to-bucket UI and produce a ground-truth dataset; the benchmark harness scores any number of HuggingFace zero-shot models against that dataset and produces a ranked comparison; and the fine-tune harness adapts the best-scoring base model to the labeled distribution. The output feeds directly into Peregrine's email classification layer. No LLM API key required for the label tool or benchmark — all inference runs locally via HuggingFace Transformers.
Quick Start
git clone https://git.opensourcesolarpunk.com/Circuit-Forge/avocet.git
cd avocet
# Copy config template and fill in your IMAP credentials
cp config/label_tool.yaml.example config/label_tool.yaml
# Start the label tool (Vue SPA + FastAPI, port 8503)
./manage.sh start
./manage.sh open
Features
- Drag-to-bucket label UI — ASMR-style card interface; drag emails into labeled buckets or discard without queuing noise into the training set
- Targeted IMAP fetch — pull emails by date range, sender, or subject filter across multiple accounts without flooding the queue
- Email classifier benchmark — score any HuggingFace zero-shot model against your labeled JSONL; side-by-side comparison on live IMAP emails
- Planning benchmark — evaluate LLMs on structured planning tasks; compare models head-to-head with verbose diff output
- Writing style benchmark — compare Ollama models on writing style coherence; scan local disk for existing outputs
- Fine-tune harness — HuggingFace Transformers fine-tuning from labeled ground truth; classifier adapter interface for swapping backends at runtime
- Local inference first — no API key required; GPU optional; designed to run on developer hardware
- Hot-reload dev mode — uvicorn
--reload+ Vite HMR (hot module replacement) for fast iteration on both API and UI
CLI Reference
All operations go through manage.sh.
Label Tool
./manage.sh start # Build Vue SPA and start FastAPI on port 8503
./manage.sh stop # Stop FastAPI server
./manage.sh restart # Stop, rebuild, and restart
./manage.sh status # Show running state and port
./manage.sh logs # Tail the API log
./manage.sh open # Open http://localhost:8503 in browser
./manage.sh dev # Hot-reload: uvicorn --reload + Vite HMR
./manage.sh test # Run pytest suite
Email Classifier Benchmark
./manage.sh benchmark [args] # Run benchmark_classifier.py
./manage.sh list-models # List available zero-shot models
./manage.sh score # Score models against labeled JSONL
./manage.sh score --include-slow # Include large/slow models
./manage.sh compare --limit 30 # Side-by-side comparison on live IMAP emails
Planning Benchmark
./manage.sh plans-bench [args] # Run benchmark_plans.py
./manage.sh plans-list # List available models
./manage.sh plans-run <model> [args] # Run a single model (verbose)
./manage.sh plans-compare <m1> <m2> [...] # Compare models side-by-side
Writing Style Benchmark
./manage.sh style-bench [args] # Run benchmark_style.py
./manage.sh style-list # List available Ollama models
./manage.sh style-run [args] # Run writing style benchmark
./manage.sh style-last # Print most recent benchmark report
Data Flow
IMAP accounts
→ fetch (targeted or wide)
→ email_label_queue.jsonl
email_label_queue.jsonl
→ label tool drag-to-bucket UI
→ email_score.jsonl (ground truth)
email_score.jsonl
→ benchmark harness
→ model rankings
best model
→ fine-tune harness
→ Peregrine classifier adapter
Labels
| Label | Key |
|---|---|
interview_scheduled |
1 |
offer_received |
2 |
rejected |
3 |
positive_response |
4 |
survey_received |
5 |
neutral |
6 |
event_rescheduled |
7 |
unrelated |
8 |
digest |
9 |
Stack
| Layer | Technology |
|---|---|
| Label UI | Vue 3 SPA (Vite) |
| API | FastAPI + uvicorn (port 8503) |
| Benchmark | Python + HuggingFace Transformers |
| Email fetch | IMAP (multi-account, targeted date/sender/subject filter) |
| Data | JSONL (data/email_label_queue.jsonl, data/email_score.jsonl) |
| Runtime | SQLite |
| Config | config/label_tool.yaml (gitignored — .example committed) |
Logo
The Avocet logo (avocet_v1_poly.svg) lives in the shared graphics repo. Copy it to docs/avocet-logo.svg to render correctly in this README.
About
Avocet is internal CircuitForge infrastructure, open source as a reference implementation. It is not a user-facing product. The primary consumer is Peregrine, CircuitForge's job-search pipeline tool.
Docs: docs.circuitforge.tech/avocet
Forgejo-primary
Avocet is developed and maintained on Forgejo at git.opensourcesolarpunk.com/Circuit-Forge/avocet. GitHub and Codeberg are read-only mirrors.
License
Business Source License 1.1 — classifier training is an AI feature under the CircuitForge licensing model.
Free for personal non-commercial self-hosting. Commercial use or SaaS re-hosting requires a paid license. Converts to MIT after 4 years.
© 2026 Circuit Forge LLC — Privacy · Safety · Accessibility