kiwi/scripts at a9ab996bcc786e19c131d2e421fe434b7ff38d61 - Circuit-Forge/kiwi

History

pyr0ball a9ab996bcc Some checks are pending CI / Backend (Python) (push) Waiting to run Details CI / Frontend (Vue) (push) Waiting to run Details Mirror / mirror (push) Waiting to run Details feat(pipeline): purple carrot weekly menu scraper with CF bypass Add three new scripts for Purple Carrot recipe pipeline: - discover_current_menu.py: fetches this week's active menu slugs from /plant-based-recipes using requests (server-rendered HTML, no JS needed). Accumulates slugs across weekly runs for building a recipe corpus over time. - discover_slugs_categories.py: crawls recipe-category listing pages with ?page=N pagination to discover historical slug inventory. Note: category archive slugs (past menu items) 404 when scraped live; only use for identifying currently-featured recipes per category. - scrape_live.py: updated with --slugs-from flag (load slug inventory from any parquet, not just the default Wayback one) and fresh-context-per-slug pattern to bypass Cloudflare session-level bot detection (which fires on the 2nd+ request in a shared browser context). Discovery: the live site only renders full ingredient/instruction content for recipes currently on the active weekly menu. 23/23 current menu recipes scraped successfully (100% hit rate vs ~1% for archived slugs).		2026-05-21 16:16:32 -07:00
..
pipeline	feat(pipeline): purple carrot weekly menu scraper with CF bypass	2026-05-21 16:16:32 -07:00
__init__.py	feat: data pipeline -- USDA FDC ingredient index builder	2026-03-30 22:44:25 -07:00
backfill_keywords.py	chore: commit in-progress work -- tag inferrer, imitate endpoint, hall-of-chaos easter egg, migration files, Dockerfile .env defense	2026-04-14 13:23:15 -07:00
backfill_texture_profiles.py	feat: recipe engine — assembly templates, prep notes, FTS fixes, texture backfill	2026-04-02 22:12:35 -07:00
tag_sensory_profiles.py	feat: sensory profile filter — texture/smell/noise filtering for Browse and Find (kiwi#51)	2026-04-24 09:47:48 -07:00