kiwi/scripts
pyr0ball 0c200f3148
Some checks failed
CI / Backend (Python) (push) Has been cancelled
CI / Frontend (Vue) (push) Has been cancelled
Mirror / mirror (push) Has been cancelled
feat(pipeline): ingest_purplecarrot.py — upsert scraped recipes into corpus DB
- Maps Purple Carrot parquet columns to recipes table schema:
  Slug → external_id (pc_<slug>), Name → title,
  RecipeIngredientParts/RecipeInstructions → ingredients/directions
- Sets source='purplecarrot', category='meal-kit', servings=2
- Allergens encoded as allergen:<tag> keywords alongside HIGH-PROTEIN etc.
- Handles numpy ndarray columns from parquet (not plain Python lists)
- Upserts: insert new, update existing — safe to run repeatedly

Wire step 3 (ingest) into weekly_harvest.sh so the full pipeline is:
  1. discover_current_menu.py → parquet of active menu slugs
  2. scrape_live.py --resume  → scrape only new slugs, append to live parquet
  3. ingest_purplecarrot.py   → upsert into /Library/Assets/kiwi/kiwi.db
2026-05-21 16:43:23 -07:00
..
pipeline feat(pipeline): ingest_purplecarrot.py — upsert scraped recipes into corpus DB 2026-05-21 16:43:23 -07:00
__init__.py feat: data pipeline -- USDA FDC ingredient index builder 2026-03-30 22:44:25 -07:00
backfill_keywords.py chore: commit in-progress work -- tag inferrer, imitate endpoint, hall-of-chaos easter egg, migration files, Dockerfile .env defense 2026-04-14 13:23:15 -07:00
backfill_texture_profiles.py feat: recipe engine — assembly templates, prep notes, FTS fixes, texture backfill 2026-04-02 22:12:35 -07:00
tag_sensory_profiles.py feat: sensory profile filter — texture/smell/noise filtering for Browse and Find (kiwi#51) 2026-04-24 09:47:48 -07:00