Cross-product: nutrition label extraction training pipeline (Kiwi → Avocet) #33

Open
opened 2026-04-13 12:51:35 -07:00 by pyr0ball · 0 comments
Owner

Context: Kiwi's paid-tier visual label capture produces confirmed nutrition extractions from user photos of product packaging. These are natural labeled training examples for a nutrition extraction model — routing them through Avocet turns user corrections into fine-tune data.

Scope:

  • Ingest endpoint POST /api/v1/ingest/nutrition-label — accepts Kiwi payload (image hash, extracted JSON, correction diff, model_id); deduplicates by hash
  • New nutrition_label_extractions DB table + migration
  • Nutrition domain in card-stack labeling UI — confirm / edit / reject individual field extractions
  • Benchmark harness extension: nutrition task category, per-field extraction F1, confidence calibration metrics
  • Fine-tune export: JSONL dataset (OpenAI-compatible) for SFT pipeline

Privacy constraints (binding):

  • No images stored, ever — hash only for dedup
  • No barcodes or PII in payload (stripped by Kiwi)
  • Opt-in only; user-revocable; opting out purges pending (unlabeled) submissions
  • Data used only to improve Kiwi — never sold, profiled, or shared externally

Out of scope: Auto model deployment (cf-orch handles that), receipt-level multi-item extraction, public API for the trained model.

Acceptance criteria: Ingest accepts Kiwi payloads; labelers can work nutrition extractions in card-stack UI; benchmark reports per-field F1; export produces SFT-ready JSONL.

Related: circuitforge-plans/avocet/superpowers/specs/2026-04-13-nutrition-label-training-design.md · Circuit-Forge/kiwi (visual capture ticket)

**Context:** Kiwi's paid-tier visual label capture produces confirmed nutrition extractions from user photos of product packaging. These are natural labeled training examples for a nutrition extraction model — routing them through Avocet turns user corrections into fine-tune data. **Scope:** - [ ] Ingest endpoint `POST /api/v1/ingest/nutrition-label` — accepts Kiwi payload (image hash, extracted JSON, correction diff, model_id); deduplicates by hash - [ ] New `nutrition_label_extractions` DB table + migration - [ ] Nutrition domain in card-stack labeling UI — confirm / edit / reject individual field extractions - [ ] Benchmark harness extension: `nutrition` task category, per-field extraction F1, confidence calibration metrics - [ ] Fine-tune export: JSONL dataset (OpenAI-compatible) for SFT pipeline **Privacy constraints (binding):** - No images stored, ever — hash only for dedup - No barcodes or PII in payload (stripped by Kiwi) - Opt-in only; user-revocable; opting out purges pending (unlabeled) submissions - Data used only to improve Kiwi — never sold, profiled, or shared externally **Out of scope:** Auto model deployment (cf-orch handles that), receipt-level multi-item extraction, public API for the trained model. **Acceptance criteria:** Ingest accepts Kiwi payloads; labelers can work nutrition extractions in card-stack UI; benchmark reports per-field F1; export produces SFT-ready JSONL. **Related:** `circuitforge-plans/avocet/superpowers/specs/2026-04-13-nutrition-label-training-design.md` · Circuit-Forge/kiwi (visual capture ticket)
pyr0ball added this to the Public Launch milestone 2026-04-13 12:51:35 -07:00
pyr0ball added the
enhancement
backlog
labels 2026-04-13 12:51:35 -07:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Circuit-Forge/avocet#33
No description provided.