feat: ingest cf-harvest knowledge base for recipe techniques, pairings, and preservation methods #152

Open
opened 2026-06-08 08:35:54 -07:00 by pyr0ball · 0 comments
Owner

Summary

Kiwi will consume structured knowledge extracted by the cf-harvest pipeline (Circuit-Forge/circuitforge-orch#82). cf-harvest processes specialist cooking and gardening video content using Marlin-2B video captioning + Whisper STT + LLM classification to produce structured, labeled segments. The cooking domain taxonomy routes relevant classes to Kiwi.

What Kiwi receives from cf-harvest

Class label Data shape Kiwi feature it feeds
recipe_technique Technique (name, description, steps, equipment, visual_cue) Recipe step enrichment, technique library
pairing IngredientPairing (ingredient_a, ingredient_b, relationship, rationale) Ingredient pairing suggestions
substitution Substitution (original, substitute, ratio, notes, context) Ingredient substitution engine
storage_preservation StorageMethod (ingredient, method, duration, conditions, notes) Pantry shelf-life and storage guidance
instruction RecipeStep (action, ingredient, quantity, temp, time, visual_cue) Recipe suggestions, step-by-step guidance

Design notes

  • cf-harvest routes are configured in the domain taxonomy YAML: config/domains/cooking.yaml. Kiwi does not need to know about the pipeline internals.
  • Kiwi defines its own KB schema. cf-harvest produces typed JSON; Kiwi maps it to its local SQLite store on ingest.
  • Source provenance (show title, timestamp, domain) is preserved on every record so users can trace a suggestion back to the clip it came from.
  • Techniques extracted from video content are especially valuable because they capture visual cues (colour change, texture, sound) that text recipes omit. The visual_cue field should be surfaced in the Kiwi UI.
  • Specialist techniques not in public recipe datasets (regional methods, traditional preservation, rare fermentation steps) are the primary value of this pipeline for Kiwi.

Scope for this issue

  • Define Kiwi KB schema for each incoming class (SQLite tables or JSON store)
  • Implement ingest endpoint or CLI command that accepts cf-harvest output JSON
  • Surface technique library in recipe suggestions (e.g. "this recipe uses X technique — here is how")
  • Surface ingredient pairings in pantry and recipe context
  • Surface substitution data when an ingredient is missing from pantry
  • Improve shelf-life estimates using storage_preservation records from specialist sources
  • Track source provenance per record
## Summary Kiwi will consume structured knowledge extracted by the cf-harvest pipeline (Circuit-Forge/circuitforge-orch#82). cf-harvest processes specialist cooking and gardening video content using Marlin-2B video captioning + Whisper STT + LLM classification to produce structured, labeled segments. The cooking domain taxonomy routes relevant classes to Kiwi. ## What Kiwi receives from cf-harvest | Class label | Data shape | Kiwi feature it feeds | |---|---|---| | `recipe_technique` | Technique (name, description, steps, equipment, visual_cue) | Recipe step enrichment, technique library | | `pairing` | IngredientPairing (ingredient_a, ingredient_b, relationship, rationale) | Ingredient pairing suggestions | | `substitution` | Substitution (original, substitute, ratio, notes, context) | Ingredient substitution engine | | `storage_preservation` | StorageMethod (ingredient, method, duration, conditions, notes) | Pantry shelf-life and storage guidance | | `instruction` | RecipeStep (action, ingredient, quantity, temp, time, visual_cue) | Recipe suggestions, step-by-step guidance | ## Design notes - cf-harvest routes are configured in the domain taxonomy YAML: `config/domains/cooking.yaml`. Kiwi does not need to know about the pipeline internals. - Kiwi defines its own KB schema. cf-harvest produces typed JSON; Kiwi maps it to its local SQLite store on ingest. - Source provenance (show title, timestamp, domain) is preserved on every record so users can trace a suggestion back to the clip it came from. - Techniques extracted from video content are especially valuable because they capture visual cues (colour change, texture, sound) that text recipes omit. The `visual_cue` field should be surfaced in the Kiwi UI. - Specialist techniques not in public recipe datasets (regional methods, traditional preservation, rare fermentation steps) are the primary value of this pipeline for Kiwi. ## Scope for this issue - Define Kiwi KB schema for each incoming class (SQLite tables or JSON store) - Implement ingest endpoint or CLI command that accepts cf-harvest output JSON - Surface technique library in recipe suggestions (e.g. "this recipe uses X technique — here is how") - Surface ingredient pairings in pantry and recipe context - Surface substitution data when an ingredient is missing from pantry - Improve shelf-life estimates using storage_preservation records from specialist sources - Track source provenance per record ## Related - cf-harvest pipeline: Circuit-Forge/circuitforge-orch#82 - Waxwing consumer issue: Circuit-Forge/waxwing (cross-link TBD)
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Circuit-Forge/kiwi#152
No description provided.