Commit graph

2 commits

Author SHA1 Message Date
0c200f3148 feat(pipeline): ingest_purplecarrot.py — upsert scraped recipes into corpus DB
Some checks failed
CI / Backend (Python) (push) Has been cancelled
CI / Frontend (Vue) (push) Has been cancelled
Mirror / mirror (push) Has been cancelled
- Maps Purple Carrot parquet columns to recipes table schema:
  Slug → external_id (pc_<slug>), Name → title,
  RecipeIngredientParts/RecipeInstructions → ingredients/directions
- Sets source='purplecarrot', category='meal-kit', servings=2
- Allergens encoded as allergen:<tag> keywords alongside HIGH-PROTEIN etc.
- Handles numpy ndarray columns from parquet (not plain Python lists)
- Upserts: insert new, update existing — safe to run repeatedly

Wire step 3 (ingest) into weekly_harvest.sh so the full pipeline is:
  1. discover_current_menu.py → parquet of active menu slugs
  2. scrape_live.py --resume  → scrape only new slugs, append to live parquet
  3. ingest_purplecarrot.py   → upsert into /Library/Assets/kiwi/kiwi.db
2026-05-21 16:43:23 -07:00
21a0664961 feat(pipeline): weekly Purple Carrot harvest script + cron
Some checks are pending
CI / Backend (Python) (push) Waiting to run
CI / Frontend (Vue) (push) Waiting to run
Mirror / mirror (push) Waiting to run
Add weekly_harvest.sh wrapper that:
- Runs discover_current_menu.py to fetch this week's 23 active menu slugs
- Runs scrape_live.py with --resume to scrape only new slugs
- Appends timestamped output to /Library/Assets/kiwi/pipeline/logs/

Cron entry added to system crontab:
  0 23 * * 0 (every Sunday 23:00)
Logs: /Library/Assets/kiwi/pipeline/logs/purple_carrot_harvest.log
2026-05-21 16:22:26 -07:00