recipe_browser_fts: only 1.2K of 3.2M corpus recipes have category/keywords — browser returns sparse results #108
Labels
No labels
accessibility
backlog
beta-feedback
bug
duplicate
enhancement
feature-request
help wanted
invalid
needs-design
needs-triage
question
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: Circuit-Forge/kiwi#108
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
The
recipe_browser_ftstable indexescategory,keywords, andinferred_tags. Only ~1,215 of 3.19M corpus recipes have these columns populated (they come from the food.com subset loaded with category metadata). The other 3.1M recipes have empty category and keywords.Result: browsing by domain/category finds at most ~40 recipes even for common categories like Breakfast.
Root cause
The corpus pipeline only populated category/keywords for a small slice of the dataset. The
recipe_browser_ftsFTS index is effectively sparse.Fix options
categoryandkeywordsfor all 3.19M recipes from their source metadatatitle+ingredient_namesso the full corpus is browsable by ingredient/title searchtag_inferrerpipeline on all 3.19M recipes to generate browsable tags from ingredientsOption 3 is the most principled: inferred tags are deterministic from nutrition/ingredient data and don't require the original source metadata.
Impact
recipes_fts(ingredient-based search) is fully populated (3.19M rows). Core recipe suggestion works. Only the browse-by-category feature is affected.Notes
RECIPE_DB_PATHATTACH fix (#102) is working correctly — this is a data quality issue, not a code bugbrowser_domains.pyhas a comment noting that keyword lists need validation against the corpus before production deploy