Recipe browser: subcategory coverage sparse — category-level fully populated #108

Closed
opened 2026-04-18 14:12:42 -07:00 by pyr0ball · 0 comments
Owner

Current State (updated 2026-04-21)

What is now working

recipe_browser_fts is fully populated — 3,195,798 entries covering the entire corpus. The backfill resolved the original data gap.

Category-level browse results are healthy:

Category Recipes
Italian 206,201
American 65,321
European 67,093
Mexican 97,138
Mediterranean 34,858
Indian 33,536
Asian 52,079
Latin American 9,805
Total indexed 566,031

The browse API matches recipe titles + ingredient names against keyword lists in browser_domains.py — no dependency on recipes.category column data (still only 1,228 rows populated, but irrelevant for browse).

Remaining gap: subcategory counts all 0

Drill-down subcategories (Sicilian, Neapolitan, Tuscan, etc.) return 0 results. The subcategory keywords are specific dish names and regional terms that rarely appear in recipe titles in the food.com corpus. The FTS match works — the terms just aren't in the data.

Root cause: The food.com corpus tags recipes at a high level ("Italian") but not regionally. Subcategory classification requires either:

  • Running infer_recipe_tags.py against the full corpus to derive regional subcategory membership from ingredient + title signals
  • Or accepting sparse subcategories and surfacing only categories with recipe_count > 0 in the UI

New categories — pending cloud restart

browser_domains.py was updated 2026-04-21 to add:

  • BBQ & Smoke (8 regional subcategories — Texas, Carolina, KC, Memphis, Alabama, Kentucky, St. Louis, Backyard)
  • Central American (Salvadoran, Guatemalan, Costa Rican, Honduran, Nicaraguan)
  • African (West African, Senegalese, Ethiopian/Eritrean, East African, North African, South African, Moroccan)
  • Pacific & Oceania (Māori/NZ, Australian, Fijian, Samoan, Tongan, PNG, Hawaiian)
  • Central Asian & Caucasus (Persian/Iranian, Georgian, Armenian, Azerbaijani, Uzbek, Afghan, Kazakh)

These will not appear in the browse UI until kiwi-cloud-api-1 is restarted with the updated code.

Next steps

  1. docker restart kiwi-cloud-api-1 to expose new categories
  2. Run scripts/pipeline/infer_recipe_tags.py against full 3.2M corpus to populate subcategory coverage
  3. Consider UI change: hide subcategories with 0 count, or show them greyed with "No recipes yet" rather than showing empty results
## Current State (updated 2026-04-21) ### What is now working `recipe_browser_fts` is **fully populated** — 3,195,798 entries covering the entire corpus. The backfill resolved the original data gap. Category-level browse results are healthy: | Category | Recipes | |---|---:| | Italian | 206,201 | | American | 65,321 | | European | 67,093 | | Mexican | 97,138 | | Mediterranean | 34,858 | | Indian | 33,536 | | Asian | 52,079 | | Latin American | 9,805 | | **Total indexed** | **566,031** | The browse API matches recipe titles + ingredient names against keyword lists in `browser_domains.py` — no dependency on `recipes.category` column data (still only 1,228 rows populated, but irrelevant for browse). ### Remaining gap: subcategory counts all 0 Drill-down subcategories (Sicilian, Neapolitan, Tuscan, etc.) return 0 results. The subcategory keywords are specific dish names and regional terms that rarely appear in recipe titles in the food.com corpus. The FTS match works — the terms just aren't in the data. **Root cause:** The food.com corpus tags recipes at a high level ("Italian") but not regionally. Subcategory classification requires either: - Running `infer_recipe_tags.py` against the full corpus to derive regional subcategory membership from ingredient + title signals - Or accepting sparse subcategories and surfacing only categories with `recipe_count > 0` in the UI ### New categories — pending cloud restart `browser_domains.py` was updated 2026-04-21 to add: - **BBQ & Smoke** (8 regional subcategories — Texas, Carolina, KC, Memphis, Alabama, Kentucky, St. Louis, Backyard) - **Central American** (Salvadoran, Guatemalan, Costa Rican, Honduran, Nicaraguan) - **African** (West African, Senegalese, Ethiopian/Eritrean, East African, North African, South African, Moroccan) - **Pacific & Oceania** (Māori/NZ, Australian, Fijian, Samoan, Tongan, PNG, Hawaiian) - **Central Asian & Caucasus** (Persian/Iranian, Georgian, Armenian, Azerbaijani, Uzbek, Afghan, Kazakh) These will not appear in the browse UI until `kiwi-cloud-api-1` is restarted with the updated code. ### Next steps 1. `docker restart kiwi-cloud-api-1` to expose new categories 2. Run `scripts/pipeline/infer_recipe_tags.py` against full 3.2M corpus to populate subcategory coverage 3. Consider UI change: hide subcategories with 0 count, or show them greyed with "No recipes yet" rather than showing empty results
pyr0ball changed title from recipe_browser_fts: only 1.2K of 3.2M corpus recipes have category/keywords — browser returns sparse results to Recipe browser: subcategory coverage sparse — category-level fully populated 2026-04-21 10:11:44 -07:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Circuit-Forge/kiwi#108
No description provided.