feat: flavor molecule RAG — ingredient pairing and substitution via FlavorDB compound vectors #146

Open
opened 2026-06-01 11:50:39 -07:00 by pyr0ball · 0 comments
Owner

Idea

Represent each ingredient as a sparse vector over its flavor compound profile (sourced from FlavorDB, ~35,000 molecules), then use cosine similarity to find nearest neighbors in compound space. Feed compound overlap to local LLM to explain pairings and generate recipes.

This is technically sparse vector nearest-neighbor search + LLM generation on top — retrieval is deterministic, generation is probabilistic. Fits the existing Kiwi architecture (deterministic core, LLM assist).

Origin

Alan and his partner tasted an artificial papaya candy. She perceived basil and balsamic; he perceived fruit. After she named it, he could re-perceive it her way. This is odor object re-labeling: smell perception is language-mediated, so the same receptor signals get reorganized under a new label. The molecular cause: artificial papaya is heavy in linalool, which is also a primary compound in basil. Balsamic aromatic esters overlap with tropical fruit ester profiles. Same molecules, different concentrations = vector distance.

Data sources

  • FlavorDB: ~35,000 flavor molecules mapped to foods — https://cosylab.iiitd.edu.in/flavordb/
  • FooDB: nutritional + flavor compound data per food — https://foodb.ca/
  • Flavor network paper: Ahn et al. (2011) "Flavor network and the principles of food pairing" — structured data available; shows Western cuisine pairs foods sharing compounds, East Asian cuisine tends to avoid shared compounds

Feature surface in Kiwi

  • Substitution engine: "I don't have tarragon" finds pantry ingredients sharing estragole, surfaces "basil or fennel fronds will work" with compound explanation
  • Pairing suggestions: given pantry contents, surface unexpected pairings with molecular justification
  • Expiry-driven discovery: X is about to expire, it shares compounds Y and Z with ingredient W already in pantry — here are recipes that bridge them
  • Flavor profile tagging: tag ingredients by profile family (fruity-ester, green-herbaceous, sulfurous-allium) so recipe search works in flavor-space, not just ingredient name

Architecture sketch

FlavorDB compound data indexed per ingredient at install/update time
  User queries: "what goes with mango?" or "substitute for tarragon?"
  Cosine similarity lookup in compound vector space
  Retrieve top-N neighbors + shared compound list
  Feed to local LLM: explain pairing, generate recipe ideas
  Filter results by pantry contents (Kiwi already knows what you have)

Notes

  • Compound data can be pre-indexed at build time — no runtime FlavorDB dependency needed
  • LLM step is optional; the similarity search alone is useful for substitution
  • Western vs. East Asian pairing difference from the flavor network paper could be a user preference toggle
  • This also enables flavor profile similarity in recipe search: "find recipes that taste like this dish" without requiring the same ingredients
## Idea Represent each ingredient as a sparse vector over its flavor compound profile (sourced from FlavorDB, ~35,000 molecules), then use cosine similarity to find nearest neighbors in compound space. Feed compound overlap to local LLM to explain pairings and generate recipes. This is technically sparse vector nearest-neighbor search + LLM generation on top — retrieval is deterministic, generation is probabilistic. Fits the existing Kiwi architecture (deterministic core, LLM assist). ## Origin Alan and his partner tasted an artificial papaya candy. She perceived basil and balsamic; he perceived fruit. After she named it, he could re-perceive it her way. This is odor object re-labeling: smell perception is language-mediated, so the same receptor signals get reorganized under a new label. The molecular cause: artificial papaya is heavy in linalool, which is also a primary compound in basil. Balsamic aromatic esters overlap with tropical fruit ester profiles. Same molecules, different concentrations = vector distance. ## Data sources - **FlavorDB**: ~35,000 flavor molecules mapped to foods — https://cosylab.iiitd.edu.in/flavordb/ - **FooDB**: nutritional + flavor compound data per food — https://foodb.ca/ - **Flavor network paper**: Ahn et al. (2011) "Flavor network and the principles of food pairing" — structured data available; shows Western cuisine pairs foods sharing compounds, East Asian cuisine tends to avoid shared compounds ## Feature surface in Kiwi - **Substitution engine**: "I don't have tarragon" finds pantry ingredients sharing estragole, surfaces "basil or fennel fronds will work" with compound explanation - **Pairing suggestions**: given pantry contents, surface unexpected pairings with molecular justification - **Expiry-driven discovery**: X is about to expire, it shares compounds Y and Z with ingredient W already in pantry — here are recipes that bridge them - **Flavor profile tagging**: tag ingredients by profile family (fruity-ester, green-herbaceous, sulfurous-allium) so recipe search works in flavor-space, not just ingredient name ## Architecture sketch ``` FlavorDB compound data indexed per ingredient at install/update time User queries: "what goes with mango?" or "substitute for tarragon?" Cosine similarity lookup in compound vector space Retrieve top-N neighbors + shared compound list Feed to local LLM: explain pairing, generate recipe ideas Filter results by pantry contents (Kiwi already knows what you have) ``` ## Notes - Compound data can be pre-indexed at build time — no runtime FlavorDB dependency needed - LLM step is optional; the similarity search alone is useful for substitution - Western vs. East Asian pairing difference from the flavor network paper could be a user preference toggle - This also enables flavor profile similarity in recipe search: "find recipes that taste like this dish" without requiring the same ingredients
pyr0ball self-assigned this 2026-06-01 11:56:37 -07:00
pyr0ball added the
needs-design
backlog
feature-request
labels 2026-06-01 12:11:29 -07:00
pyr0ball added this to the Post-Launch milestone 2026-06-01 12:11:29 -07:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Circuit-Forge/kiwi#146
No description provided.