feat: recipe scan labeling task type for Kiwi training pipeline #65
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Background
Kiwi is building a recipe scan training dataset using Purple Carrot recipes as ground truth. The dataset will have multiple input modalities:
Each capture is paired with a ground truth structured recipe JSON sourced from the Purple Carrot web corpus.
What Avocet Needs
A new task domain:
recipe_scanlabeling.Input per item
image_path— path to scan in/Library/Assets/modality—scanner | phone | handwrittensource— e.g.purple_carrotextracted— the JSON produced by docuvision + LLM structuring (the model output to review)ground_truth— the canonical structured recipe JSON from the web corpusLabel action
Output format (training pair)
This reuses the existing
messageschat format so the fine-tune harness works without changes.Blocking
Kiwi recipe scan corpus build (Purple Carrot scraper + scan pipeline) can proceed independently. Avocet labeling UI is needed before the fine-tuning phase.
References
kiwi/app/services/recipe/recipe_scanner.py— extraction pipelinekiwi/scripts/pipeline/— corpus build scripts/Library/Assets/kiwi/pipeline/— existing recipe parquetsdata/plan_pairs.jsonl— reference format for training pairs