- Maps Purple Carrot parquet columns to recipes table schema:
Slug → external_id (pc_<slug>), Name → title,
RecipeIngredientParts/RecipeInstructions → ingredients/directions
- Sets source='purplecarrot', category='meal-kit', servings=2
- Allergens encoded as allergen:<tag> keywords alongside HIGH-PROTEIN etc.
- Handles numpy ndarray columns from parquet (not plain Python lists)
- Upserts: insert new, update existing — safe to run repeatedly
Wire step 3 (ingest) into weekly_harvest.sh so the full pipeline is:
1. discover_current_menu.py → parquet of active menu slugs
2. scrape_live.py --resume → scrape only new slugs, append to live parquet
3. ingest_purplecarrot.py → upsert into /Library/Assets/kiwi/kiwi.db
Add weekly_harvest.sh wrapper that:
- Runs discover_current_menu.py to fetch this week's 23 active menu slugs
- Runs scrape_live.py with --resume to scrape only new slugs
- Appends timestamped output to /Library/Assets/kiwi/pipeline/logs/
Cron entry added to system crontab:
0 23 * * 0 (every Sunday 23:00)
Logs: /Library/Assets/kiwi/pipeline/logs/purple_carrot_harvest.log