Recipe browser: subcategory coverage sparse — category-level fully populated #108

New issue

Closed

opened 2026-04-18 14:12:42 -07:00 by pyr0ball · 0 comments

pyr0ball commented

2026-04-18 14:12:42 -07:00

Owner

Current State (updated 2026-04-21)

What is now working

recipe_browser_fts is fully populated — 3,195,798 entries covering the entire corpus. The backfill resolved the original data gap.

Category-level browse results are healthy:

Category	Recipes
Italian	206,201
American	65,321
European	67,093
Mexican	97,138
Mediterranean	34,858
Indian	33,536
Asian	52,079
Latin American	9,805
Total indexed	566,031

The browse API matches recipe titles + ingredient names against keyword lists in browser_domains.py — no dependency on recipes.category column data (still only 1,228 rows populated, but irrelevant for browse).

Remaining gap: subcategory counts all 0

Drill-down subcategories (Sicilian, Neapolitan, Tuscan, etc.) return 0 results. The subcategory keywords are specific dish names and regional terms that rarely appear in recipe titles in the food.com corpus. The FTS match works — the terms just aren't in the data.

Root cause: The food.com corpus tags recipes at a high level ("Italian") but not regionally. Subcategory classification requires either:

Running infer_recipe_tags.py against the full corpus to derive regional subcategory membership from ingredient + title signals
Or accepting sparse subcategories and surfacing only categories with recipe_count > 0 in the UI

New categories — pending cloud restart

browser_domains.py was updated 2026-04-21 to add:

BBQ & Smoke (8 regional subcategories — Texas, Carolina, KC, Memphis, Alabama, Kentucky, St. Louis, Backyard)
Central American (Salvadoran, Guatemalan, Costa Rican, Honduran, Nicaraguan)
African (West African, Senegalese, Ethiopian/Eritrean, East African, North African, South African, Moroccan)
Pacific & Oceania (Māori/NZ, Australian, Fijian, Samoan, Tongan, PNG, Hawaiian)
Central Asian & Caucasus (Persian/Iranian, Georgian, Armenian, Azerbaijani, Uzbek, Afghan, Kazakh)

These will not appear in the browse UI until kiwi-cloud-api-1 is restarted with the updated code.

Next steps

docker restart kiwi-cloud-api-1 to expose new categories
Run scripts/pipeline/infer_recipe_tags.py against full 3.2M corpus to populate subcategory coverage
Consider UI change: hide subcategories with 0 count, or show them greyed with "No recipes yet" rather than showing empty results

## Current State (updated 2026-04-21) ### What is now working `recipe_browser_fts` is **fully populated** — 3,195,798 entries covering the entire corpus. The backfill resolved the original data gap. Category-level browse results are healthy: | Category | Recipes | |---|---:| | Italian | 206,201 | | American | 65,321 | | European | 67,093 | | Mexican | 97,138 | | Mediterranean | 34,858 | | Indian | 33,536 | | Asian | 52,079 | | Latin American | 9,805 | | **Total indexed** | **566,031** | The browse API matches recipe titles + ingredient names against keyword lists in `browser_domains.py` — no dependency on `recipes.category` column data (still only 1,228 rows populated, but irrelevant for browse). ### Remaining gap: subcategory counts all 0 Drill-down subcategories (Sicilian, Neapolitan, Tuscan, etc.) return 0 results. The subcategory keywords are specific dish names and regional terms that rarely appear in recipe titles in the food.com corpus. The FTS match works — the terms just aren't in the data. **Root cause:** The food.com corpus tags recipes at a high level ("Italian") but not regionally. Subcategory classification requires either: - Running `infer_recipe_tags.py` against the full corpus to derive regional subcategory membership from ingredient + title signals - Or accepting sparse subcategories and surfacing only categories with `recipe_count > 0` in the UI ### New categories — pending cloud restart `browser_domains.py` was updated 2026-04-21 to add: - **BBQ & Smoke** (8 regional subcategories — Texas, Carolina, KC, Memphis, Alabama, Kentucky, St. Louis, Backyard) - **Central American** (Salvadoran, Guatemalan, Costa Rican, Honduran, Nicaraguan) - **African** (West African, Senegalese, Ethiopian/Eritrean, East African, North African, South African, Moroccan) - **Pacific & Oceania** (Māori/NZ, Australian, Fijian, Samoan, Tongan, PNG, Hawaiian) - **Central Asian & Caucasus** (Persian/Iranian, Georgian, Armenian, Azerbaijani, Uzbek, Afghan, Kazakh) These will not appear in the browse UI until `kiwi-cloud-api-1` is restarted with the updated code. ### Next steps 1. `docker restart kiwi-cloud-api-1` to expose new categories 2. Run `scripts/pipeline/infer_recipe_tags.py` against full 3.2M corpus to populate subcategory coverage 3. Consider UI change: hide subcategories with 0 count, or show them greyed with "No recipes yet" rather than showing empty results

pyr0ball referenced this issue from a commit

2026-04-18 15:39:06 -07:00

fix: recipe enrichment backfill, main_ingredient browser domain, bug batch

pyr0ball referenced this issue

2026-04-18 15:39:32 -07:00

fix: recipe enrichment backfill, main_ingredient browser, bug batch #109

pyr0ball closed this issue

2026-04-18 15:50:42 -07:00

pyr0ball changed title from ~~recipe_browser_fts: only 1.2K of 3.2M corpus recipes have category/keywords — browser returns sparse results~~ to Recipe browser: subcategory coverage sparse — category-level fully populated

2026-04-21 10:11:44 -07:00