Compare commits

...

52 commits
v0.6.0 ... main

Author SHA1 Message Date
cf807179f5 docs: add LLM development disclosure to README
Some checks failed
CI / Backend (Python) (push) Has been cancelled
CI / Frontend (Vue) (push) Has been cancelled
Mirror / mirror (push) Has been cancelled
Humans own design, architecture, code review, testing, and
verification. LLMs are part of our development workflow.
Links to circuitforge.tech/positions for our full position.
2026-05-28 08:20:17 -07:00
0c200f3148 feat(pipeline): ingest_purplecarrot.py — upsert scraped recipes into corpus DB
Some checks failed
CI / Backend (Python) (push) Has been cancelled
CI / Frontend (Vue) (push) Has been cancelled
Mirror / mirror (push) Has been cancelled
- Maps Purple Carrot parquet columns to recipes table schema:
  Slug → external_id (pc_<slug>), Name → title,
  RecipeIngredientParts/RecipeInstructions → ingredients/directions
- Sets source='purplecarrot', category='meal-kit', servings=2
- Allergens encoded as allergen:<tag> keywords alongside HIGH-PROTEIN etc.
- Handles numpy ndarray columns from parquet (not plain Python lists)
- Upserts: insert new, update existing — safe to run repeatedly

Wire step 3 (ingest) into weekly_harvest.sh so the full pipeline is:
  1. discover_current_menu.py → parquet of active menu slugs
  2. scrape_live.py --resume  → scrape only new slugs, append to live parquet
  3. ingest_purplecarrot.py   → upsert into /Library/Assets/kiwi/kiwi.db
2026-05-21 16:43:23 -07:00
21a0664961 feat(pipeline): weekly Purple Carrot harvest script + cron
Some checks are pending
CI / Backend (Python) (push) Waiting to run
CI / Frontend (Vue) (push) Waiting to run
Mirror / mirror (push) Waiting to run
Add weekly_harvest.sh wrapper that:
- Runs discover_current_menu.py to fetch this week's 23 active menu slugs
- Runs scrape_live.py with --resume to scrape only new slugs
- Appends timestamped output to /Library/Assets/kiwi/pipeline/logs/

Cron entry added to system crontab:
  0 23 * * 0 (every Sunday 23:00)
Logs: /Library/Assets/kiwi/pipeline/logs/purple_carrot_harvest.log
2026-05-21 16:22:26 -07:00
a9ab996bcc feat(pipeline): purple carrot weekly menu scraper with CF bypass
Some checks are pending
CI / Backend (Python) (push) Waiting to run
CI / Frontend (Vue) (push) Waiting to run
Mirror / mirror (push) Waiting to run
Add three new scripts for Purple Carrot recipe pipeline:

- discover_current_menu.py: fetches this week's active menu slugs from
  /plant-based-recipes using requests (server-rendered HTML, no JS needed).
  Accumulates slugs across weekly runs for building a recipe corpus over time.

- discover_slugs_categories.py: crawls recipe-category listing pages with
  ?page=N pagination to discover historical slug inventory. Note: category
  archive slugs (past menu items) 404 when scraped live; only use for
  identifying currently-featured recipes per category.

- scrape_live.py: updated with --slugs-from flag (load slug inventory from
  any parquet, not just the default Wayback one) and fresh-context-per-slug
  pattern to bypass Cloudflare session-level bot detection (which fires on
  the 2nd+ request in a shared browser context).

Discovery: the live site only renders full ingredient/instruction content for
recipes currently on the active weekly menu. 23/23 current menu recipes
scraped successfully (100% hit rate vs ~1% for archived slugs).
2026-05-21 16:16:32 -07:00
56f942b3fd feat(pipeline): Purple Carrot scraper hardening + shared pipeline logging
scrape_recipes.py:
- Switch CDX to HTTPS (avoids HTTP 503 rate-limit bucket)
- Restrict product API CDX to 2019–2021 window (pre-HelloFresh instruction stripping)
- Replace inline CDX requests with _cdx_get() helper: retries on 429/503 with
  exponential backoff (15s, 30s, 60s, 120s)
- Increase HTML fallback CDX limit from 5 to 10 timestamps
- Bump CDX_DELAY 0.5s → 3.0s and REPLAY_DELAY 1.2s → 2.0s (polite scraping)
- Fix KeyError: 0 on hero_images dict (normalise dict to list before indexing)

discover_wayback.py:
- Switch CDX to HTTPS

scripts/pipeline/log_utils.py (new):
- attach_pipeline_log(script_name): adds a JSON FileHandler to the root logger
  writing to /Library/Assets/logs/pipeline/<script>_<ts>.jsonl for Avocet
  Turnstone training data ingestion (kiwi#141 / avocet#67)
2026-05-17 13:35:35 -07:00
84636bcdaf docs: bump version badge to match latest Forgejo release
Some checks failed
CI / Backend (Python) (push) Has been cancelled
CI / Frontend (Vue) (push) Has been cancelled
Mirror / mirror (push) Has been cancelled
2026-05-17 11:19:12 -07:00
51a48a430b feat(config): add GPU_SERVER_URL alias for CF_ORCH_URL
Some checks failed
CI / Backend (Python) (push) Waiting to run
CI / Frontend (Vue) (push) Waiting to run
Mirror / mirror (push) Has been cancelled
Release / release (push) Has been cancelled
Self-hoster-friendly env var name. Priority: GPU_SERVER_URL →
CF_ORCH_URL (compat) → https://orch.circuitforge.tech when
CF_LICENSE_KEY is present (Paid+ auto-default). Resolved value
written back to os.environ["CF_ORCH_URL"] at startup so all
service callers remain unchanged.

Bump version to 0.10.0.
2026-05-17 09:42:48 -07:00
b326d4aa6e fix(config): add CF_ORCH_URL to local env for recipe scan + LLM features
Without CF_ORCH_URL set, _call_vision_backend() skips cf-orch entirely
and falls through to local VLM (no GPU in container) then fails.
.env gets CF_ORCH_URL=http://10.1.10.71:7700 for the local rack.
.env.example updated with documentation for self-hosters.

Local scan confirmed: cf-docuvision (Sif, GGUF) → ollama llama3.1:8b → 200 OK.
2026-05-17 09:21:33 -07:00
7cad503b35 feat(pipeline): Purple Carrot recipe corpus scraper via Wayback Machine
discover_wayback.py — enumerates recipe slugs from archived menu API
  (/api/v2/menus/<id>) and product API (/api/v1/products/*) plus
  recipe-category HTML pages. Writes incremental JSONL manifest to
  /Library/Assets/kiwi/pipeline/pc_slugs.jsonl.

scrape_recipes.py — fetches full recipe data per slug using three-tier
  fallback: product API JSON (oldest captures first), HTML inline state
  (__NEXT_DATA__ / __INITIAL_STATE__), and JSON-LD structured data.
  Outputs recipes_purplecarrot.parquet in food.com columnar format so
  build_recipe_index.py imports it unchanged. Includes SourceURL column
  for recipe attribution UI (kiwi#139). Checkpoints every 50 recipes.

Initial discovery: 158 slugs from menu 1536 + product_api pass.
Re-run discover_wayback.py after archive.org stabilizes to pick up
older slugs from recipe-category pages.

Backlog: live Playwright scraper for post-Wayback recipes (kiwi#137).
2026-05-17 09:16:35 -07:00
430600c1af fix(recipe_scan): harden JSON parser for real-world LLM output quirks
- Strip <think>/<thinking> blocks before parsing (Qwen3/DeepSeek-R1 emit
  these before the actual JSON answer)
- Replace greedy regex with brace-balanced _extract_json_object() so
  trailing prose after } doesn't corrupt the extract
- Use non-greedy fence regex to pull JSON from inside ```json blocks
- Pass system= to LLMRouter.complete() with a terse JSON-only instruction
  so Ollama models receive it as a system message, not buried in the user turn
- Add logger.warning() on parse failure so raw output is diagnosable
2026-05-17 08:30:55 -07:00
21a9b85067 fix(recipe_scan): revert to cf-docuvision path (GGUF backend now works)
Route recipe_scan back through task_allocate -> cf-docuvision -> DocuvisionClient
now that docuvision supports GGUF models via Qwen25VLChatHandler.

Two-step pipeline: docuvision OCRs image(s), LLMRouter structures OCR text to JSON.
Removes the non-functional cf-text image_url path (cf-text rejects content arrays).
2026-05-16 19:25:01 -07:00
c72b4415db feat(recipe_scan): use Qwen2-VL GGUF via cf-text OpenAI-compat API
Replace two-step docuvision OCR + LLM structuring pipeline with a
single multimodal VLM call. The bartowski Qwen2-VL-7B-Instruct Q5_K_M
GGUF is served by cf-text (llama.cpp) and accepts image_url content
blocks identical to the OpenAI vision API format.

Removes docuvision dependency for recipe scanning; the addict-missing /
DeepseekVLV2Processor-missing cf-docuvision error no longer blocks scans.
Receipt OCR (kiwi.ocr task) still routes to cf-docuvision separately.
2026-05-16 18:38:21 -07:00
2df17ec719 feat(recipe-scan): add SSE streaming endpoint for cold-start progress feedback
Some checks failed
CI / Frontend (Vue) (push) Waiting to run
CI / Backend (Python) (push) Waiting to run
Mirror / mirror (push) Has been cancelled
Release / release (push) Has been cancelled
POST /recipes/scan/stream emits live status events while cf-docuvision
allocates and processes, replacing the static spinner with phase-aware labels:
  allocating -> scanning -> structuring -> done|error

Uses asyncio.Queue bridge to route progress callbacks from the sync scanner
thread to the async SSE generator. Frontend updated to consume the stream via
fetch + ReadableStream (EventSource does not support POST multipart).

Closes kiwi#136 (companion to the docuvision routing fix).
2026-05-16 16:24:32 -07:00
4ac24e7920 fix(recipe-scan): wire cf-docuvision OCR + LLMRouter for cloud recipe scanning (kiwi#136)
Some checks are pending
CI / Backend (Python) (push) Waiting to run
CI / Frontend (Vue) (push) Waiting to run
Mirror / mirror (push) Waiting to run
Two-step pipeline: task_allocate("kiwi", "recipe_scan", service_hint="cf-docuvision")
acquires a docuvision allocation, calls /extract per image to get OCR text, then
LLMRouter structures the combined OCR output into recipe JSON via the text
extraction prompt.

Also fixes DocuvisionClient bugs:
- POST field was "image" (ignored by Pydantic) — should be "image_b64"
- Response read "text" key — docuvision returns "raw_text"
- Add hint parameter (use "text" for recipe cards, dense prose)
- Configurable timeout (default 120s; docuvision lazy-loads model on first request)
2026-05-16 14:21:15 -07:00
cdbc24240a feat(orch): migrate OCR vision routing to task-based allocation with direct-allocate fallback 2026-05-13 10:46:07 -07:00
dd39418bc8 fix(orch): release Tier 2 allocation ctx when alloc is None; add fallback tests 2026-05-13 10:41:55 -07:00
02abc8e734 feat(orch): migrate meal plan LLM routing to task-based allocation with direct-allocate fallback
Replaces single-path cf-orch allocation with a three-tier strategy:
tier 1 task_allocate() (coordinator-driven), tier 2 direct CFOrchClient.allocate()
(TaskNotRegistered fallback), tier 3 local LLMRouter. Module-level imports for
CFOrchClient and LLMRouter make all three paths patchable in tests without
import caching issues.
2026-05-13 10:32:58 -07:00
61c428baf0 feat(orch): add task_inference helper for POST /api/inference/task routing 2026-05-13 10:27:47 -07:00
6e954c5c6e feat(ap): issue #113 — ActivityPub federation + Mastodon OAuth
Some checks failed
CI / Backend (Python) (push) Has been cancelled
CI / Frontend (Vue) (push) Has been cancelled
Mirror / mirror (push) Has been cancelled
Full ActivityPub implementation wired to cf-core.activitypub module:

Endpoints (root-level, not under /api/v1):
  GET  /.well-known/webfinger  — WebFinger JRD (AP_ENABLED only)
  GET  /ap/actor               — Instance actor document
  POST /ap/actor/inbox         — Incoming Follow/Undo (dedup + Accept dispatch)
  GET  /ap/outbox              — OrderedCollection of community posts
  GET  /ap/posts/{slug}        — Individual AP Note
  GET  /ap/followers           — Follower count collection
  GET  /ap/following           — Empty following collection

Mastodon OAuth (under /api/v1/social/mastodon/):
  POST   /connect      — Dynamic app registration + OAuth flow start
  GET    /callback     — Code exchange + token storage (Fernet-encrypted)
  DELETE /disconnect   — Token revocation
  GET    /status       — Connection status

Config: AP_ENABLED, AP_HOST, AP_KEY_PATH, AP_TOKEN_ENCRYPTION_KEY
Migration 042: ap_followers, ap_deliveries, ap_received, mastodon_tokens tables
Key manager: auto-generates RSA-2048 keypair on first boot if AP_ENABLED
Delivery service: deliver_to_followers() with 3-retry exponential backoff + DB log
Post publish: background fan-out to AP followers + Mastodon when opted-in

All AP endpoints gracefully degrade (404) when AP_ENABLED=false.
2026-05-11 17:55:51 -07:00
ef04064728 feat(community): issue #119 — recipe dedup + variation clustering on submit
Some checks are pending
CI / Backend (Python) (push) Waiting to run
CI / Frontend (Vue) (push) Waiting to run
Mirror / mirror (push) Waiting to run
Three-layer dedup check before community post submission:
- L1: title ILIKE search against existing posts in community DB
- L2: Jaccard ingredient overlap using local corpus (≥0.70 very_similar, ≥0.35 somewhat_similar)
- L3: similar_to_ref FK — user can explicitly mark post as variation of existing

New endpoint: POST /api/v1/community/check-similar (gracefully no-ops if community DB absent)
New service: app/services/community/dedup.py — jaccard(), similarity_tier(), build_similar_post_result()
Both publish modals (plan + outcome) now check similarity before submit; user can proceed as-is,
mark as variation, or cancel. similar_to_ref passed in final publish payload.
2026-05-11 17:25:06 -07:00
59b183a898 feat(ask): Add Ask tab — natural-language recipe search with session history
Some checks are pending
CI / Frontend (Vue) (push) Waiting to run
CI / Backend (Python) (push) Waiting to run
Mirror / mirror (push) Waiting to run
- New Ask tab in recipe browser tab bar (alongside Find/Browse/Saved)
- Text input + Search button; Enter to submit
- 4 example question chips shown in empty state
- Results as clickable recipe cards (opens RecipeDetailPanel)
- Pantry match_pct badge on each card when pantry items are available
- LLM-synthesized answer shown above results (paid tier)
- Session history: last 3 questions shown as re-runnable chips
- Keyboard navigable (tab key, Enter on card, Arrow keys on tab bar)
- ARIA: role=tabpanel, aria-labelledby, aria-live for error/answer regions

Also fixes pre-existing build issues now caught by vue-tsc:
- Move pantryItems/secondaryPantryItems declarations before auto-suggest
  watcher that uses them (TS2448 block-scoped variable before declaration)
- Fix nullable regex capture group access in parsedStream computed (TS2532)
  using optional chaining (titleMatch?.[1], ingMatch?.[1], etc.)

Closes #134
2026-05-11 13:08:06 -07:00
b4624fba84 feat(ask): add POST /recipes/ask endpoint for natural-language recipe search
Free tier: keyword extraction + FTS ingredient search + title probe search.
Paid tier / BYOK: same search, then LLM synthesis of a conversational answer
(8s timeout so an unresponsive model degrades gracefully to recipe list only).

- AskRequest / AskRecipeHit / AskResponse schemas in recipe.py
- _extract_ask_keywords(): tokenize question, strip stopwords
- _ask_in_thread(): two-pronged search (ingredient FTS + title LIKE)
  merges by ID, computes pantry match_pct when pantry_items provided
- Endpoint registered before /{recipe_id} to avoid integer coercion on /ask
- LLM synthesis gated to paid/premium/ultra only (not "local" dev tier)

Closes #134 (backend)
2026-05-11 13:07:53 -07:00
667daf939e feat(streaming): replace raw <pre> with skeleton + progressive reveal (closes #133)
Parses the streamed LLM output (Title / Ingredients / Directions / Notes
plain-text format) on the fly as tokens arrive. Shows a shimmer skeleton
for each section while that section has not yet arrived, then swaps in
real content as the parse succeeds — title first, then ingredients, then
numbered steps, then notes on completion.

parsedStream computed: matches Title, Ingredients (comma-split), numbered
step lines, and Notes sections from the accumulating streamChunks string.

Skeleton shimmer is CSS-only (no JS); respects prefers-reduced-motion by
falling back to a static placeholder color. The stream-output <pre> block
is removed from the template entirely — raw tokens never reach the user.
2026-05-11 12:46:27 -07:00
4e50661483 feat(find): invert flow — auto-suggest on tab open, collapsible Refine panel (closes #132)
Auto-suggest (L1/L2 only):
  When the Find tab is activated with a non-empty pantry and no existing
  results, suggestion fires immediately without user action. L3/L4 are
  excluded to avoid unintended VRAM allocation and AI quota charges.
  After the first auto-suggest completes, the Refine panel collapses so
  the results are the first thing the user sees.

Live re-suggest (L1/L2 only):
  A single filterKey computed wraps all filter state as JSON. Any filter
  change while on the Find tab with existing results triggers a debounced
  (1.2s) re-suggest, keeping the result list live without button clicks.

Refine collapsible:
  Time budget, Dietary preferences, and Nutrition/Advanced filters are
  wrapped in a v-show panel controlled by filtersOpen (persisted to
  localStorage under kiwi:find_filters_open, default open). Level
  selector, Hard Day Mode, and the Suggest button remain always visible.
  Toggle button shows active filter count badge when any filter is set.
2026-05-11 12:41:58 -07:00
ac4eda2047 fix(build): remove unused settingsStore import after time-budget change
Some checks are pending
CI / Backend (Python) (push) Waiting to run
CI / Frontend (Vue) (push) Waiting to run
Mirror / mirror (push) Waiting to run
2026-05-11 12:37:24 -07:00
3f4b756fc6 feat(find): surface time budget inline, always visible (closes #131)
Some checks are pending
CI / Backend (Python) (push) Waiting to run
CI / Frontend (Vue) (push) Waiting to run
Mirror / mirror (push) Waiting to run
The time budget selector (hands-on and total time chips) was previously
gated behind the time_first_layout Settings preference. Removed the v-if
guard so both rows are always visible in the Find tab without requiring
a Settings change.

Added "No limit" clear buttons that appear next to the chip row when a
time limit is active, so users can reset a time filter in one tap without
needing to find the active chip and re-tap it.

The time_first_layout setting in Settings remains for users who want
control over the layout.
2026-05-11 12:11:06 -07:00
973c76a4c8 feat(browse): add breadcrumb nav above recipe grid (closes #130)
Renders domain › category › subcategory above the recipe grid whenever
a domain and category are active. Each ancestor crumb is a button that
navigates back up the hierarchy (selectDomain / selectCategory). The
leaf node is a plain span with aria-current="page". The nav has
aria-label="Browse location" for screen reader context.
2026-05-11 11:58:49 -07:00
92fab94ae0 feat(find): active-filter bar with clear-all (closes #129)
Adds a summary bar that appears at the top of the Find Recipes panel
whenever any filter is active. Shows a count ("3 filters active") and
a Clear all button that resets all Find-tab filters in one tap:
  constraints, allergies, excluded ingredients, shopping mode,
  pantry-match-only, hard day mode, time budgets (active + total),
  max missing, style, category, and all four nutrition limits.

Local input refs (constraintInput, allergyInput, etc.) are also cleared
so the text fields don't show stale uncommitted values after a clear.
2026-05-11 11:57:10 -07:00
30f5620fd5 feat(settings): autosave on change, remove Save buttons (closes #128)
Each setting now saves via a debounced (600ms) individual API call when
its value changes. A hydration guard (_hydrated flag + nextTick) prevents
watchers from firing during the initial load() fetch, ensuring the first
API round-trip does not generate spurious write calls.

Removed: five explicit Save buttons across Equipment, Sensory, Units,
Shopping Region, and Recipe Search Layout sections.
Added: "Changes save automatically." subtitle + fixed bottom-right toast
  that appears for 2s after any successful save, with enter/leave
  transitions that respect prefers-reduced-motion via the theme.

The full save() and saveSensory() actions are kept as internal fallbacks.
2026-05-11 11:55:09 -07:00
0ef57618bf fix(a11y): add aria-pressed and aria-label to Browse panel buttons (WCAG 2.1)
Screen readers had no way to determine which domain, category, subcategory,
or sort button was selected — the active CSS class is invisible to assistive
technology.

  - aria-pressed on all toggle buttons (domain, category, subcategory, sort)
  - aria-label="Previous page" / "Next page" on pagination buttons
  - aria-live="polite" on results count span — announces filter result changes
  - Equipment chip-remove: "Remove" → "Remove equipment: {item}"

Addresses WCAG 2.1 AA criteria 4.1.2 (Name, Role, Value) and 1.3.1
(Info and Relationships). Part of kiwi UX audit (2026-05-11).
2026-05-11 11:33:10 -07:00
8c765b7da2 fix(barcode): look up product info before checking auto_add_to_inventory
Previously, get_or_create_product was only called when auto_add was true,
so scan responses with auto_add=false returned no product details. Now the
DB lookup always runs when product_info is available; inventory insertion
is still conditional on auto_add_to_inventory. Fixes preview-only barcode
scans returning empty product fields.
2026-05-11 11:33:02 -07:00
e57f46f4b6 feat(streaming): add native SSE fallback for L3/L4 recipe generation (closes #126)
Two-phase streaming architecture:
  Phase 1 (sync thread): IngredientClassifier builds element profiles +
    gap list from SQLite — thread-safe, no async context needed
  Phase 2 (async): LLMRecipeGenerator.stream_generate() yields tokens via
    cf-orch warm vllm (existing /stream-token path) or AsyncOpenAI against
    Ollama if the coordinator is unavailable

Backend (app/services/recipe/llm_recipe.py):
  - stream_generate() async generator; _try_alloc_for_stream() sync helper
  - _stream_openai_compat() static method handles __auto__ model resolution
  - LLMRecipeGenerator(None) is safe for streaming (store not used)

Endpoint (app/api/endpoints/recipes.py):
  - ?stream=true on POST /recipes/suggest returns StreamingResponse
  - X-Accel-Buffering: no prevents nginx buffering without nginx.conf edits

Frontend (api.ts, recipes.ts, RecipesView.vue):
  - suggestRecipeStream() uses fetch + ReadableStream (POST; EventSource
    only supports GET)
  - streamSuggest() action in recipes store builds request internally
  - RecipesView.streamRecipe() silently falls back to native SSE when
    cf-orch token fetch fails rather than surfacing an error
2026-05-11 11:32:54 -07:00
04dbdddbad feat(mcp): add Kiwi MCP server for corpus DB access (closes #124)
Exposes four read-only tools to Claude Code:
  kiwi_query_corpus   — parameterised SELECT against kiwi.db (200-row cap)
  kiwi_count_fts      — FTS5 MATCH hit count for keyword coverage audits
  kiwi_sample_tags    — tag frequency distribution by prefix
  kiwi_browse_preview — first-page results from the live browse API

DB opened in SQLite URI read-only mode (mode=ro); any write statement is
rejected at the driver level. Configure via KIWI_DB_PATH and KIWI_API_URL
env vars (see module docstring for settings.json snippet).
2026-05-11 11:32:40 -07:00
e83bb0415a feat(manage): add update and cloud-update commands (closes #127)
Adds `update` (local stack) and `cloud-update` (menagerie) subcommands
to manage.sh. Both pull HEAD and rebuild/restart the Docker stack in one
step — required for post-merge deployment without manual compose commands.
2026-05-11 11:32:30 -07:00
e62d69d099 docs(readme): landing page rewrite — feature table, quick start, tier table, Forgejo-primary, split license
Some checks failed
CI / Backend (Python) (push) Has been cancelled
CI / Frontend (Vue) (push) Has been cancelled
Mirror / mirror (push) Has been cancelled
2026-05-06 08:51:38 -07:00
7498995092 feat(filters): split time filter into hands-on and total time (kiwi#52)
Some checks failed
CI / Backend (Python) (push) Has been cancelled
CI / Frontend (Vue) (push) Has been cancelled
Mirror / mirror (push) Has been cancelled
Adds max_active_min request field and backend filter. Active time uses
parse_time_effort().active_min (passive waits excluded). Recipes with
no parsed active time signal are not excluded (avoid hiding unlabelled
results). Total and active limits are AND'd when both set.

UI: two pill rows — "Hands-on time" (15/30/45/1hr) and "Total time"
(30m/1hr/90m/2hr/3hr/4+hr). Replaces single row capped at 90 min.
2026-04-27 16:03:27 -07:00
640fcefa9e fix(ui): compact recipe cards, batch ingredient classifier queries
Some checks are pending
CI / Backend (Python) (push) Waiting to run
CI / Frontend (Vue) (push) Waiting to run
Mirror / mirror (push) Waiting to run
Recipe cards were rendering full directions, all nutrition chips,
prep notes, swap candidates, and grocery links inline in the grid —
making each card tall enough to push the second row below the fold at
3-column widths. Cards now show title, match/complexity/time badges,
up to 4 pantry ingredient chips, missing count, and calorie hint.
Full detail remains in RecipeDetailPanel on "Make this".

ElementClassifier.classify_batch() was issuing N separate DB queries
(one per pantry item). Replaced with a single WHERE name IN (...)
query + heuristic fallback for misses — same result, one round-trip.
2026-04-27 14:56:00 -07:00
d5a4b14400 chore(pipeline): add fast targeted meal-tag backfill script
Some checks failed
CI / Backend (Python) (push) Waiting to run
CI / Frontend (Vue) (push) Waiting to run
Mirror / mirror (push) Has been cancelled
Release / release (push) Has been cancelled
backfill_meal_tags.py merges meal: tags from title-only matching
into existing inferred_tags without re-deriving all other signals.

~10x faster than infer_recipe_tags.py --force for meal-tag-only
updates: 3.19M recipes in ~5-10min vs ~2.5h for full re-derivation.
2026-04-27 13:00:58 -07:00
7fd92d5179 feat(tags): add meal type inference from recipe titles (#125)
Adds _MEAL_SIGNALS table to tag_inferrer with title-only matching for:
  meal:Breakfast — pancakes, waffles, frittata, oatmeal, granola, etc.
  meal:Dessert   — cake, cookie, brownie, pudding, ice cream, tart, etc.
  meal:Snack     — dip, chips, popcorn, nachos, energy balls, etc.
  meal:Beverage  — smoothie, cocktail, juice, lemonade, etc.
  meal:Lunch     — sandwich, wrap, panini, grilled cheese, etc.
  meal:Bread     — bread, sourdough, focaccia, dinner roll, etc.

Uses word-boundary + optional-plural regex (\bWORD(?:s|es)?\b) so:
- "pancakes" matches the "pancake" pattern but "pancake" != "cake"
- "tartare" does not match "tart" (no word boundary after tart in tartare)
- "dipping" does not match "dip" (extra chars prevent boundary)

Title-only matching (not ingredient text) avoids false positives from
ingredient names like "cake flour" or "sandwich bread".

Estimated browse impact after backfill (--force on 3.19M recipes):
  Breakfast: 43 → ~70k
  Dessert:   372 → ~350k  (real desserts, not flavor:Sweet)
  Snack:     57  → ~60k
  Beverage:  43  → ~36k
  Lunch:     69  → ~26k
2026-04-27 12:24:31 -07:00
6f097cd43d fix: wire browse domains to inferred_tag vocabulary, fix can_be leak in dietary
- Dinner: replace non-matching text keywords with main:X protein inferred tags (0 -> 815k results)
- All meal_type categories: add meal:X structured tag phrases
- Dietary: switch to dietary:X-only phrases; bare text keywords matched can_be:X
  tags (nearly all recipes), inflating counts to 1.3M+ falsely
- Cuisine: add cuisine:X structured tag phrases to Italian, Mexican, Asian,
  Indian, Mediterranean, American, BBQ, European, Latin American
- Side Dish: use main:Vegetables + main:Grains as proxy (no meal:Side Dish tag exists)
- Dessert: remove 'sweet' keyword (matched flavor:Sweet on all recipes)
- New dietary categories: Low-Sodium, Paleo

Closes #122. Partial progress on #123.
Follow-up: #125 (expand meal: tag inferrer coverage)
2026-04-27 11:38:37 -07:00
46778d62e3 fix: tab bar horizontal scroll on mobile, shorten Build Your Own label 2026-04-27 10:58:23 -07:00
896b4e048c feat: recipe scanner — photo to structured recipe (kiwi#9)
New feature: photograph a recipe card, cookbook page, or handwritten
note and have it extracted into a structured, editable recipe.

Backend:
- POST /recipes/scan: accept 1-4 photos, run VLM extraction, return
  structured JSON for review (not auto-saved)
- POST /recipes/scan/save: persist a reviewed/edited recipe
- GET/DELETE /recipes/user: user-created recipe CRUD
- Vision backend priority: cf-orch -> local Qwen2.5-VL -> Anthropic BYOK
- 503 with clear config hint when no vision backend available
- Multi-photo support: facing pages (ingredients/directions) sent together
- Pantry cross-reference: marks which ingredients are already on hand
- migration 041: user_recipes table (title, servings, cook_time, steps,
  ingredients JSON, source, pantry_match_pct)
- Tier gate: recipe_scan -> paid, BYOK-unlockable

Frontend:
- "Scan" button in the Recipes tab bar (camera icon)
- RecipeScanModal: upload step (drag-drop + file picker, up to 4 photos,
  live previews), processing step (spinner), review/edit step (all
  fields inline-editable before save), pantry match badge, warning banner
  for low-confidence or incomplete scans

Tests: 35 new tests (23 unit + 12 API), 404 total passing
2026-04-27 08:23:01 -07:00
c9fcfde694 feat(browse): active time estimation, prep scaling, required-ingredient filter
Some checks failed
CI / Backend (Python) (push) Waiting to run
CI / Frontend (Vue) (push) Waiting to run
Mirror / mirror (push) Has been cancelled
Release / release (push) Has been cancelled
Time effort (time_effort.py):
- Passive defaults per cooking technique (bake 30 min, slow cook 300 min, etc.)
- Prep action detection with n^0.75 quantity scaling for prep-needing ingredients
- Cross-reference ingredients/ingredient_names arrays to distribute quantity across steps
- Effort label now time-based (quick ≤20 min, moderate ≤45 min, involved >45 min)
- prep_min field added to StepAnalysis schema and Pydantic model
- All parse_time_effort call sites updated to pass ingredients + ingredient_names

Browse required-ingredient filter:
- New required_ingredient query param on GET /recipes/browse/{domain}/{category}
- Enter-to-commit input in RecipeBrowserPanel with auto-clear-on-empty watch
- Substring match via FTS5 ingredient_names column prefix filter
- FTS5 replaces LIKE '%X%' throughout browse_recipes and _browse_by_match
- _all + required_ingredient: 8.4s → 74ms; category + required_ingredient: 2s → 35ms
- _ingredient_fts_term() helper builds 'ingredient_names : "X"*' prefix queries
- Combined keywords + ingredient into single FTS MATCH to avoid secondary scans

Tests: 369/369 passing
2026-04-27 07:13:12 -07:00
e05bfe86f5 feat(recipes): orbital cadence — last-cooked chip and sort on saved recipes (#120)
Some checks are pending
CI / Backend (Python) (push) Waiting to run
CI / Frontend (Vue) (push) Waiting to run
Mirror / mirror (push) Waiting to run
2026-04-26 09:09:27 -07:00
95e76edaea feat(community): complete Layer A subcategory tagging (#118)
Some checks failed
CI / Backend (Python) (push) Waiting to run
CI / Frontend (Vue) (push) Waiting to run
Mirror / mirror (push) Has been cancelled
Release / release (push) Has been cancelled
- RecipeBrowserPanel: fix onTagSearchInput using '_all' domain slug
  (backend validates domain — was silently returning empty results)
- RecipeDetailPanel: fetch and display accepted community category tags
  on recipe open; accepted tags shown with accent chip + checkmark,
  pending tags shown in muted style
- browserAPI.listRecipeTags() was already in api.ts but not consumed —
  now wired into RecipeDetailPanel onMounted as a background fetch
2026-04-25 23:31:30 -07:00
12ab63e2fb feat: corrections router (#73) + Magpie flywheel hook (#28)
Corrections router (kiwi#73):
- Wire make_corrections_router() from cf-core at /api/v1/corrections
- Add get_db() dependency in session.py yielding store.conn (raw
  sqlite3.Connection as cf-core expects); cloud-aware via get_session
- Migration 040: corrections table + indexes (copied from cf-core DDL)
- Feeds Avocet SFT training pipeline via GET /corrections/export JSONL

Magpie flywheel hook (kiwi#28):
- app/services/magpie_hook.py: async fire_recipe_signal() that reads
  magpie_opt_in setting, checks external_id, POSTs anonymized payload
  to MAGPIE_INGEST_URL; stubs gracefully when URL unset or Magpie
  unreachable (DEBUG log, never raises)
- Hooks into save_recipe and update_saved_recipe as background tasks
- MAGPIE_INGEST_URL config key added to Settings
- SettingsView: "Data Sharing" toggle for magpie_opt_in, cloud-only
  (v-if VITE_CLOUD_MODE), plain-language consent label
2026-04-25 23:31:20 -07:00
9350719516 feat(recipes): LLM style classifier (#27) + cooked leftovers shelf-life (#112)
Some checks failed
CI / Backend (Python) (push) Waiting to run
CI / Frontend (Vue) (push) Waiting to run
Mirror / mirror (push) Has been cancelled
Release / release (push) Has been cancelled
Style classifier (kiwi#27):
- app/services/recipe/style_classifier.py: LLM prompt with curated vocab,
  cf-orch/LLMRouter fallback, JSON + regex tag extraction
- POST /recipes/saved/{recipe_id}/classify-style: Paid/BYOK tier gate,
  fetches recipe from corpus, returns {suggested_tags:[...]}
- SaveRecipeModal.vue: "Suggest tags" button with loading state; merges
  LLM suggestions into existing tags without overwriting user's choices
- 403/empty list silently ignored — button is a no-op when tier not met

Cooked leftovers shelf-life (kiwi#112):
- app/services/leftovers_predictor.py: deterministic FDA/USDA lookup table
  with shortest-component-wins for proteins and dish-type override for
  assembled dishes; special entries for ceviche (2d, acid != heat),
  fermented/cured (kimchi 14d, confit/lardo 7d), soups, rice, pasta, etc.
- POST /recipes/{recipe_id}/leftovers: free tier, no gate
- RecipeDetailPanel.vue: shelf-life section appears after "I cooked this"
  with fridge/freeze days, freeze-by advice, per-instance dismiss; calm
  framing per no-panic UX policy
- LeftoversResponse Pydantic schema added to recipe.py
2026-04-25 23:18:16 -07:00
9c4d8b7883 feat(recipe-engine): time-effort profile, product-label tokenisation, L1 tuning
Some checks failed
CI / Backend (Python) (push) Waiting to run
CI / Frontend (Vue) (push) Waiting to run
Mirror / mirror (push) Has been cancelled
Release / release (push) Has been cancelled
- Add TimeEffortProfile + StepAnalysis Pydantic schemas; serialised into
  RecipeSuggestion so the frontend receives active/passive/total minutes,
  effort label, and detected equipment per suggestion.
- parse_time_effort() now drives max_total_min filter (falls back to step-count
  estimate when directions contain no explicit time mentions).
- _PRODUCT_TOKEN_STOPWORDS: strips marketing/packaging words from multi-word
  product labels before adding individual ingredient tokens to pantry_set.
  "Organic Extra Firm Tofu" → adds "tofu"; improves packaged-food pantry match.
- L1 candidate pool raised to 60 (was 20); min_match_ratio lowered to 0.35
  (was 0.60) to keep enough results for plant-based / packaged-food pantries.
- household.py: tighten import to pull HEIMDALL_URL/ADMIN_TOKEN from
  services.heimdall_orch (matches refactor in cloud_session.py).
2026-04-25 21:44:26 -07:00
ed04b655be fix(saved-recipes): resolve FK constraint, null title, and load reliability
- Migration 039: drop saved_recipes.recipe_id FK (SQLite table rebuild).
  The FK referenced main.recipes but corpus lives in an ATTACH'd DB — caused
  500 on every POST /recipes/saved in cloud mode.
- _to_summary: row.get("title") or "" to handle corpus JOIN returning NULL
  title (e.g. placeholder recipe_id 99999).
- list_collections: return [] for Free tier instead of 403 — prevents
  Promise.all in savedStore.load() from aborting the saved-recipes fetch.
- savedStore.load(): switched to Promise.allSettled so a collections failure
  never blocks the saved list from populating.
- RecipesView: star indicator now reflects savedStore.isSaved() (server-side
  saved state) rather than localStorage bookmarks; changed to <span> since
  the star is now read-only visual feedback.
- Removed { immediate: true } from saved-tab watcher — premature bounce to
  Build Your Own before onMounted load() completes.
2026-04-25 21:44:10 -07:00
f6b29693c8 refactor: replace hand-rolled JWT+Heimdall with cf-core CloudSessionFactory
Some checks are pending
CI / Backend (Python) (push) Waiting to run
CI / Frontend (Vue) (push) Waiting to run
Mirror / mirror (push) Waiting to run
Delegates JWT validation, Heimdall provision/tier-resolve, bypass-IP
handling, and guest session management to circuitforge_core. Kiwi keeps
its own CloudUser (db path, household fields, BYOK flag) and DB helpers.
detect_byok() is now imported from cf-core instead of a local copy.
household_id/is_household_owner/license_key flow through core_user.meta
(cf-core already forwards all Heimdall response extras into meta).
Removes ~217 lines of duplicated auth code.

Note: guest cookie name changes from kiwi_guest_id to cf_guest_id (cf-core
managed). Existing guest sessions get a new UUID on first visit — acceptable
for alpha.
2026-04-25 16:35:56 -07:00
b86b7732dc fix(pwa): set start_url/scope from VITE_BASE_URL so install launches /kiwi/ not site root
Some checks are pending
CI / Backend (Python) (push) Waiting to run
CI / Frontend (Vue) (push) Waiting to run
Mirror / mirror (push) Waiting to run
2026-04-25 12:59:59 -07:00
7e0722cc23 feat(pwa): add Progressive Web App support — installable to homescreen
Some checks are pending
CI / Backend (Python) (push) Waiting to run
CI / Frontend (Vue) (push) Waiting to run
Mirror / mirror (push) Waiting to run
- vite-plugin-pwa with generateSW strategy (Workbox)
- manifest.webmanifest: name, short_name, display standalone, theme_color #e8a820
- Service worker: precaches JS/CSS/HTML shell; API routes network-first (60s);
  Google Fonts cache-first (1 year)
- Icons: 192 + 512px regular + maskable variants generated from App.vue bird SVG
- index.html: theme-color meta, apple-touch-icon, apple-mobile-web-app-* tags
  for iOS Safari homescreen support (iOS ignores the manifest icons array)
- autoUpdate mode: new versions install silently and activate on next navigation
2026-04-25 12:33:22 -07:00
96 changed files with 15142 additions and 924 deletions

View file

@ -21,10 +21,12 @@ DATA_DIR=./data
# IP this machine advertises to the coordinator (must be reachable from coordinator host) # IP this machine advertises to the coordinator (must be reachable from coordinator host)
# CF_ORCH_ADVERTISE_HOST=10.1.10.71 # CF_ORCH_ADVERTISE_HOST=10.1.10.71
# CF-core hosted coordinator (managed cloud GPU inference — Paid+ tier) # GPU inference server (cf-orch coordinator for recipe scan, LLM generation, etc.)
# Set CF_ORCH_URL to use a hosted cf-orch coordinator instead of self-hosting. # GPU_SERVER_URL: set to your local cf-orch coordinator (self-hosted rack).
# CF_LICENSE_KEY is read automatically by CFOrchClient for bearer auth. # CF_ORCH_URL is the backward-compat alias — both are honoured.
# CF_ORCH_URL=https://orch.circuitforge.tech # Paid+ default: when CF_LICENSE_KEY is present and neither URL is set,
# the app automatically points to https://orch.circuitforge.tech.
# GPU_SERVER_URL=http://10.1.10.71:7700
# CF_LICENSE_KEY=CFG-KIWI-xxxx-xxxx-xxxx # CF_LICENSE_KEY=CFG-KIWI-xxxx-xxxx-xxxx
# LLM backend — env-var auto-config (no llm.yaml needed for bare-metal users) # LLM backend — env-var auto-config (no llm.yaml needed for bare-metal users)
@ -57,6 +59,9 @@ CF_APP_NAME=kiwi
# Unset = auto-detect: true if CLOUD_MODE or circuitforge_orch is installed (paid+ local). # Unset = auto-detect: true if CLOUD_MODE or circuitforge_orch is installed (paid+ local).
# Set false to force LocalScheduler even when cf-orch is present. # Set false to force LocalScheduler even when cf-orch is present.
# USE_ORCH_SCHEDULER=false # USE_ORCH_SCHEDULER=false
# GPU_SERVER_URL: cf-orch coordinator endpoint. Required for recipe scan (cf-docuvision)
# and LLM features on a self-hosted rack. CF_ORCH_URL is the backward-compat alias.
# GPU_SERVER_URL=http://10.1.10.71:7700
# Cloud mode (set in compose.cloud.yml; also set here for reference) # Cloud mode (set in compose.cloud.yml; also set here for reference)
# CLOUD_DATA_ROOT=/devl/kiwi-cloud-data # CLOUD_DATA_ROOT=/devl/kiwi-cloud-data

3
.gitignore vendored
View file

@ -23,6 +23,9 @@ dist/
# Data directories # Data directories
data/ data/
# Local dev database
*.db
# Test artifacts (MagicMock sqlite files from pytest) # Test artifacts (MagicMock sqlite files from pytest)
<MagicMock* <MagicMock*

142
README.md
View file

@ -1,80 +1,118 @@
# 🥝 Kiwi <!-- Logo coming soon — replace docs/kiwi-logo.svg when final icon ships -->
<div align="center">
<img src="docs/kiwi-logo.svg" alt="Kiwi logo" width="96" height="96" />
> *Part of the CircuitForge LLC "AI for the tasks the system made hard on purpose" suite.* # Kiwi
**Pantry tracking and leftover recipe suggestions.** **Pantry tracking and recipe suggestions — with or without an LLM.**
Scan barcodes, photograph receipts, and get recipe ideas based on what you already have — before it expires. [![License: MIT/BSL](https://img.shields.io/badge/license-MIT%20%2F%20BSL%201.1-blue)](#license)
[![CI](https://git.opensourcesolarpunk.com/Circuit-Forge/kiwi/badges/workflows/ci.yml/badge.svg)](https://git.opensourcesolarpunk.com/Circuit-Forge/kiwi/actions)
[![Version](https://img.shields.io/badge/version-0.6.0-green)](https://git.opensourcesolarpunk.com/Circuit-Forge/kiwi/releases)
**LLM support is optional.** Inventory tracking, barcode scanning, expiry alerts, CSV export, and receipt upload all work without any LLM configured. AI features (receipt OCR, recipe suggestions, meal planning) activate when a backend is available and are BYOK-unlockable at any tier. [Documentation](https://docs.circuitforge.tech/kiwi) · [Live demo](https://menagerie.circuitforge.tech/kiwi) · [circuitforge.tech](https://circuitforge.tech)
**Status:** Beta · CircuitForge LLC *Part of the CircuitForge LLC suite — "AI for the tasks the system made hard on purpose."*
</div>
**[Documentation](https://docs.circuitforge.tech/kiwi/)** · [circuitforge.tech](https://circuitforge.tech)
--- ---
## What it does > **The LLM is optional.** Barcode scanning, receipt upload, expiry alerts, the full 200k+ recipe browser, and CSV export all work with zero LLM configured. Recipe suggestions and receipt OCR activate when a backend is available, and are BYOK-unlockable at any tier. You are never forced to send your data anywhere.
- **Inventory tracking** — add items by barcode scan, receipt upload, or manually ---
- **Expiry alerts** — know what's about to go bad
- **Recipe browser** — browse the full recipe corpus by cuisine, meal type, dietary preference, or main ingredient; pantry match percentage shown inline (Free)
- **Saved recipes** — bookmark any recipe with notes, a 05 star rating, and free-text style tags (Free); organize into named collections (Paid)
- **Receipt OCR** — extract line items from receipt photos automatically (Paid tier, BYOK-unlockable)
- **Recipe suggestions** — four levels from pantry-match to full LLM generation (Paid tier, BYOK-unlockable)
- **Style auto-classifier** — LLM suggests style tags (comforting, hands-off, quick, etc.) for saved recipes (Paid tier, BYOK-unlockable)
- **Leftover mode** — prioritize nearly-expired items in recipe ranking (Free, 5/day; unlimited at Paid+)
- **LLM backend config** — configure inference via `circuitforge-core` env-var system; BYOK unlocks Paid AI features at any tier
- **Feedback FAB** — in-app feedback button; status probed on load, hidden if CF feedback endpoint unreachable
## Stack ## What Kiwi does
- **Frontend:** Vue 3 SPA (Vite + TypeScript) | Feature | Notes |
- **Backend:** FastAPI + SQLite (via `circuitforge-core`) |---|---|
- **Auth:** CF session cookie → Directus JWT (cloud mode) | **Inventory tracking** | Add items by barcode scan, receipt upload, or manually |
- **Licensing:** Heimdall (free tier auto-provisioned at signup) | **Expiry alerts** | Know what is about to go bad before it does |
| **Recipe browser** | 200k+ recipes — filter by cuisine, meal type, dietary preference, or main ingredient; pantry match percentage shown inline |
| **Leftover mode** | Prioritizes nearly-expired items in recipe ranking (5/day free, unlimited at Paid+) |
| **Recipe suggestions** | Four levels: direct corpus match, substitution/swap, cuisine-style adapter, full LLM generation |
| **Meal planning** | Plan meals for the week; pull from saved recipes or suggestions |
| **Saved recipes** | Bookmark any recipe with notes, 0-5 star rating, and free-text style tags; organize into named collections (Paid) |
| **Receipt OCR** | Extract line items from receipt photos automatically |
| **Dietary profiles** | Vegan, gluten-free, diabetic, and other constraints respected throughout |
| **Style auto-classifier** | LLM suggests style tags (comforting, hands-off, quick, etc.) for saved recipes |
| **Community feed** | Browse and share recipes with other Kiwi users |
| **CSV export** | Full pantry export, always available, no tier gate |
## Running locally ---
## Quick start
**One-line install (self-hosted, Docker required):**
```bash ```bash
bash <(curl -fsSL https://git.opensourcesolarpunk.com/Circuit-Forge/kiwi/raw/branch/main/install.sh)
```
**Or clone and run manually:**
```bash
git clone https://git.opensourcesolarpunk.com/Circuit-Forge/kiwi.git
cd kiwi
cp .env.example .env cp .env.example .env
./manage.sh build ./manage.sh build
./manage.sh start ./manage.sh start
# Web: http://localhost:8511 # Web: http://localhost:8511
# API: http://localhost:8512 # API: http://localhost:8512
``` ```
## Cloud instance **Live cloud instance** (free account required):
[menagerie.circuitforge.tech/kiwi](https://menagerie.circuitforge.tech/kiwi)
```bash Full setup and configuration guide: [docs.circuitforge.tech/kiwi](https://docs.circuitforge.tech/kiwi)
./manage.sh cloud-build
./manage.sh cloud-start ---
# Served at menagerie.circuitforge.tech/kiwi (JWT-gated)
```
## Tiers ## Tiers
| Feature | Free | Paid | Premium | | Feature | Free | Paid | Premium |
|---------|------|------|---------| |---|:---:|:---:|:---:|
| Inventory CRUD | ✓ | ✓ | ✓ | | Inventory CRUD | Yes | Yes | Yes |
| Barcode scan | ✓ | ✓ | ✓ | | Barcode scan | Yes | Yes | Yes |
| Receipt upload | ✓ | ✓ | ✓ | | Receipt upload | Yes | Yes | Yes |
| Expiry alerts | ✓ | ✓ | ✓ | | Expiry alerts | Yes | Yes | Yes |
| CSV export | ✓ | ✓ | ✓ | | CSV export | Yes | Yes | Yes |
| Recipe browser (domain/category) | ✓ | ✓ | ✓ | | Recipe browser (200k+ recipes) | Yes | Yes | Yes |
| Save recipes + notes + star rating | ✓ | ✓ | ✓ | | Save recipes + notes + star rating | Yes | Yes | Yes |
| Style tags (manual, free-text) | ✓ | ✓ | ✓ | | Style tags (manual, free-text) | Yes | Yes | Yes |
| Receipt OCR | BYOK | ✓ | ✓ | | Leftover mode (5/day) | Yes | Yes | Yes |
| Recipe suggestions (L1L4) | BYOK | ✓ | ✓ | | Receipt OCR | BYOK | Yes | Yes |
| Named recipe collections | — | ✓ | ✓ | | Recipe suggestions (L1L4) | BYOK | Yes | Yes |
| LLM style auto-classifier | — | BYOK | ✓ | | Named recipe collections | — | Yes | Yes |
| Meal planning | — | ✓ | ✓ | | LLM style auto-classifier | — | BYOK | Yes |
| Multi-household | — | — | ✓ | | Meal planning | — | Yes | Yes |
| Leftover mode (5/day) | ✓ | ✓ | ✓ | | Multi-household | — | — | Yes |
BYOK = bring your own LLM backend (configure `~/.config/circuitforge/llm.yaml`) **BYOK** = bring your own LLM backend. Configure `~/.config/circuitforge/llm.yaml` to unlock AI features at any tier without a paid subscription.
---
## Stack
- **Frontend:** Vue 3 SPA (Vite + TypeScript), served on port 8511
- **Backend:** FastAPI + SQLite via `circuitforge-core`, API on port 8512
- **Auth:** CircuitForge session cookie (cloud mode); local mode requires no account
- **Licensing:** Heimdall — free tier auto-provisioned at signup
---
## Forgejo-primary
Kiwi is developed and maintained on Forgejo at [git.opensourcesolarpunk.com/Circuit-Forge/kiwi](https://git.opensourcesolarpunk.com/Circuit-Forge/kiwi). GitHub and Codeberg are read-only mirrors. File issues and submit pull requests on Forgejo.
---
## License ## License
Discovery/pipeline layer: MIT Kiwi uses a split license:
AI features: BSL 1.1 (free for personal non-commercial self-hosting)
- **Discovery and inventory pipeline** (barcode scan, expiry tracking, pantry CRUD, CSV export, recipe browser): [MIT](LICENSE-MIT)
- **AI features** (receipt OCR, LLM recipe suggestions, style auto-classifier): [BSL 1.1](LICENSE-BSL) — free for personal non-commercial self-hosting; commercial use or SaaS re-hosting requires a paid license. Converts to MIT after 4 years.
Humans own design, architecture, code review, testing, and verification. LLMs are part of our development workflow. [Our positions on LLM use →](https://circuitforge.tech/positions)
Privacy · Safety · Accessibility — co-equal, non-negotiable across all CircuitForge products.

View file

@ -0,0 +1,332 @@
# app/api/endpoints/activitypub.py
# MIT License
#
# ActivityPub endpoints for Kiwi instances:
# GET /.well-known/webfinger — WebFinger JRD
# GET /ap/actor — Instance actor document
# POST /ap/actor/inbox — Incoming activities
# GET /ap/outbox — Outgoing activities (OrderedCollection)
# GET /ap/posts/{slug} — Individual AP Note
# GET /ap/followers — Followers collection (count only)
# GET /ap/following — Following collection (empty stub)
#
# All endpoints are no-ops / 404 when AP_ENABLED=false or actor not loaded.
# The WebFinger and well-known routes are mounted at the root app level (not
# under /api/v1) — see main.py.
from __future__ import annotations
import asyncio
import json
import logging
from datetime import datetime, timezone
from fastapi import APIRouter, HTTPException, Request, Response
from fastapi.responses import JSONResponse
from app.core.config import settings
from app.services.ap.keys import get_actor
logger = logging.getLogger(__name__)
# ── Two routers: one for well-known (root mount), one for /ap prefix ─────────
webfinger_router = APIRouter(tags=["activitypub"])
ap_router = APIRouter(prefix="/ap", tags=["activitypub"])
_AP_CONTENT_TYPE = "application/activity+json"
_JRD_CONTENT_TYPE = "application/jrd+json"
def _actor_required():
actor = get_actor()
if actor is None:
raise HTTPException(status_code=404, detail="ActivityPub not enabled on this instance.")
return actor
# ── WebFinger ─────────────────────────────────────────────────────────────────
@webfinger_router.get("/.well-known/webfinger")
async def webfinger(resource: str | None = None):
actor = get_actor()
if actor is None:
raise HTTPException(status_code=404, detail="ActivityPub not enabled.")
expected = f"acct:kiwi@{settings.AP_HOST}"
if resource and resource != expected:
raise HTTPException(status_code=404, detail=f"Resource {resource!r} not found.")
jrd = {
"subject": expected,
"links": [
{
"rel": "self",
"type": _AP_CONTENT_TYPE,
"href": actor.actor_id,
}
],
}
return Response(
content=json.dumps(jrd),
media_type=_JRD_CONTENT_TYPE,
)
# ── Actor ─────────────────────────────────────────────────────────────────────
@ap_router.get("/actor")
async def get_actor_doc():
actor = _actor_required()
return Response(
content=json.dumps(actor.to_ap_dict()),
media_type=_AP_CONTENT_TYPE,
)
# ── Inbox (mounted via make_inbox_router below) ───────────────────────────────
async def _on_follow(activity: dict, headers: dict) -> None:
"""Accept Follow: add to ap_followers, send Accept(Follow) back."""
actor_url = activity.get("actor", "")
if not actor_url:
return
from app.db.store import Store
from app.core.config import settings as _settings
db_path = _settings.DB_PATH
inbox_url, shared_inbox = await asyncio.to_thread(_resolve_inbox, actor_url)
if inbox_url is None:
return
import sqlite3
conn = sqlite3.connect(str(db_path))
try:
conn.execute(
"""INSERT OR REPLACE INTO ap_followers
(actor_id, inbox_url, shared_inbox, followed_at, active)
VALUES (?, ?, ?, ?, 1)""",
(actor_url, inbox_url, shared_inbox, datetime.now(timezone.utc).isoformat()),
)
conn.commit()
finally:
conn.close()
actor = get_actor()
if actor is None:
return
accept = {
"@context": "https://www.w3.org/ns/activitystreams",
"id": f"{actor.actor_id}/accepts/{activity.get('id', 'unknown')}",
"type": "Accept",
"actor": actor.actor_id,
"object": activity,
}
from circuitforge_core.activitypub import deliver_activity
await asyncio.to_thread(deliver_activity, accept, inbox_url, actor, 10.0)
async def _on_undo(activity: dict, headers: dict) -> None:
"""Handle Undo(Follow): deactivate the follower row."""
inner = activity.get("object", {})
if isinstance(inner, dict) and inner.get("type") == "Follow":
actor_url = activity.get("actor", "")
if actor_url:
import sqlite3
conn = sqlite3.connect(str(settings.DB_PATH))
try:
conn.execute(
"UPDATE ap_followers SET active = 0 WHERE actor_id = ?", (actor_url,)
)
conn.commit()
finally:
conn.close()
async def _dedup_activity(activity_id: str | None) -> bool:
"""Return True (already seen) if activity_id is in ap_received; otherwise insert it."""
if not activity_id:
return False
import sqlite3
conn = sqlite3.connect(str(settings.DB_PATH))
try:
try:
conn.execute(
"INSERT INTO ap_received (activity_id) VALUES (?)", (activity_id,)
)
conn.commit()
return False
except sqlite3.IntegrityError:
return True
finally:
conn.close()
def _build_inbox_router():
from circuitforge_core.activitypub.inbox import make_inbox_router
async def on_follow(activity: dict, headers: dict) -> None:
if await _dedup_activity(activity.get("id")):
return
await _on_follow(activity, headers)
async def on_undo(activity: dict, headers: dict) -> None:
if await _dedup_activity(activity.get("id")):
return
await _on_undo(activity, headers)
return make_inbox_router(
handlers={"Follow": on_follow, "Undo": on_undo},
verify_key_fetcher=None, # Signature verification enabled in prod when actor is loaded
path="/inbox",
)
# Mount inbox at /ap/actor/inbox (AP spec: inbox is a sub-resource of the actor)
try:
_inbox_sub = _build_inbox_router()
ap_router.include_router(_inbox_sub, prefix="/actor")
except Exception as _e:
logger.warning("AP inbox router not available: %s", _e)
# ── Outbox ────────────────────────────────────────────────────────────────────
@ap_router.get("/outbox")
async def get_outbox(page: int | None = None, request: Request = None):
actor = _actor_required()
from app.api.endpoints.community import _get_community_store
store = _get_community_store()
base = f"https://{settings.AP_HOST}"
if store is None:
collection = {
"@context": "https://www.w3.org/ns/activitystreams",
"id": f"{actor.outbox_url}",
"type": "OrderedCollection",
"totalItems": 0,
"orderedItems": [],
}
return Response(content=json.dumps(collection), media_type=_AP_CONTENT_TYPE)
PAGE_SIZE = 20
offset = ((page or 1) - 1) * PAGE_SIZE
posts = await asyncio.to_thread(store.list_posts, limit=PAGE_SIZE, offset=offset)
items = [_post_to_ap_note(p, actor, base) for p in posts]
collection = {
"@context": "https://www.w3.org/ns/activitystreams",
"id": actor.outbox_url + (f"?page={page}" if page else ""),
"type": "OrderedCollectionPage" if page else "OrderedCollection",
"orderedItems": items,
}
return Response(content=json.dumps(collection), media_type=_AP_CONTENT_TYPE)
# ── Individual post ───────────────────────────────────────────────────────────
@ap_router.get("/posts/{slug}")
async def get_ap_post(slug: str):
actor = _actor_required()
from app.api.endpoints.community import _get_community_store
store = _get_community_store()
if store is None:
raise HTTPException(status_code=404, detail="Community DB not available.")
post = await asyncio.to_thread(store.get_post_by_slug, slug)
if post is None:
raise HTTPException(status_code=404, detail="Post not found.")
base = f"https://{settings.AP_HOST}"
note = _post_to_ap_note(post, actor, base)
return Response(content=json.dumps(note), media_type=_AP_CONTENT_TYPE)
# ── Followers / Following ─────────────────────────────────────────────────────
@ap_router.get("/followers")
async def get_followers():
actor = _actor_required()
import sqlite3
count = 0
try:
conn = sqlite3.connect(str(settings.DB_PATH))
row = conn.execute("SELECT COUNT(*) FROM ap_followers WHERE active = 1").fetchone()
conn.close()
count = row[0] if row else 0
except Exception:
pass
collection = {
"@context": "https://www.w3.org/ns/activitystreams",
"id": f"{actor.actor_id}/followers",
"type": "OrderedCollection",
"totalItems": count,
}
return Response(content=json.dumps(collection), media_type=_AP_CONTENT_TYPE)
@ap_router.get("/following")
async def get_following():
actor = _actor_required()
collection = {
"@context": "https://www.w3.org/ns/activitystreams",
"id": f"{actor.actor_id}/following",
"type": "OrderedCollection",
"totalItems": 0,
"orderedItems": [],
}
return Response(content=json.dumps(collection), media_type=_AP_CONTENT_TYPE)
# ── Helpers ───────────────────────────────────────────────────────────────────
def _post_to_ap_note(post, actor, base_url: str) -> dict:
from circuitforge_core.activitypub import make_note
from app.services.community.ap_compat import _build_content
diet_tags: list[str] = list(getattr(post, "dietary_tags", []) or [])
hashtags = [{"type": "Hashtag", "name": "#Kiwi", "href": f"{base_url}/ap/tags/kiwi"}]
for tag in diet_tags[:4]:
ht = "".join(w.capitalize() for w in tag.replace("-", " ").split())
hashtags.append({"type": "Hashtag", "name": f"#{ht}"})
content = _build_content(
{
"title": post.title,
"description": getattr(post, "description", None),
"outcome_notes": getattr(post, "outcome_notes", None),
"dietary_tags": diet_tags,
}
)
published = post.published
note = make_note(
actor_id=actor.actor_id,
content=content,
tag=hashtags,
published=published if isinstance(published, datetime) else None,
)
note["id"] = f"{base_url}/ap/posts/{post.slug}"
return note
def _resolve_inbox(actor_url: str) -> tuple[str | None, str | None]:
"""Fetch an AP actor document and extract inbox + sharedInbox URLs."""
try:
import httpx
resp = httpx.get(
actor_url,
headers={"Accept": "application/activity+json"},
timeout=8.0,
follow_redirects=True,
)
resp.raise_for_status()
doc = resp.json()
inbox = doc.get("inbox")
shared = doc.get("endpoints", {}).get("sharedInbox")
return inbox, shared
except Exception as exc:
logger.debug("Could not resolve actor %s: %s", actor_url, exc)
return None, None

View file

@ -167,6 +167,54 @@ def _validate_publish_body(body: dict) -> None:
raise HTTPException(status_code=422, detail="photo_url must be an https:// URL.") raise HTTPException(status_code=422, detail="photo_url must be an https:// URL.")
@router.post("/check-similar")
async def check_similar(body: dict, session: CloudUser = Depends(get_session)):
"""Pre-submission dedup check: return similar existing posts for the given title/recipe_id.
Safe to call with no community store configured returns empty list rather than 503.
"""
store = _get_community_store()
if store is None:
return {"similar_posts": []}
title = (body.get("title") or "").strip()
recipe_id = body.get("recipe_id")
post_type = body.get("post_type")
if not title:
return {"similar_posts": []}
candidates = await asyncio.to_thread(
store.search_similar_posts,
title,
recipe_id,
post_type,
8,
)
if not candidates:
return {"similar_posts": []}
from app.services.community.dedup import build_similar_post_result, fetch_recipe_ingredients
incoming_ingredients = await asyncio.to_thread(
fetch_recipe_ingredients, session.db, recipe_id
)
results = []
for post in candidates:
result = await asyncio.to_thread(
build_similar_post_result,
post,
recipe_id,
incoming_ingredients,
session.db,
)
if result["similarity_tier"] != "different":
results.append(result)
return {"similar_posts": results[:5]}
@router.post("/posts", status_code=201) @router.post("/posts", status_code=201)
async def publish_post(body: dict, session: CloudUser = Depends(get_session)): async def publish_post(body: dict, session: CloudUser = Depends(get_session)):
from app.tiers import can_use from app.tiers import can_use
@ -214,6 +262,8 @@ async def publish_post(body: dict, session: CloudUser = Depends(get_session)):
today = datetime.now(timezone.utc).strftime("%Y-%m-%d") today = datetime.now(timezone.utc).strftime("%Y-%m-%d")
slug = f"kiwi-{_post_type_prefix(post_type)}-{pseudonym.lower().replace(' ', '')}-{today}-{slug_title}"[:120] slug = f"kiwi-{_post_type_prefix(post_type)}-{pseudonym.lower().replace(' ', '')}-{today}-{slug_title}"[:120]
similar_to_ref = body.get("similar_to_ref") or None
from circuitforge_core.community.models import CommunityPost from circuitforge_core.community.models import CommunityPost
post = CommunityPost( post = CommunityPost(
slug=slug, slug=slug,
@ -241,6 +291,7 @@ async def publish_post(body: dict, session: CloudUser = Depends(get_session)):
fat_pct=snapshot.fat_pct, fat_pct=snapshot.fat_pct,
protein_pct=snapshot.protein_pct, protein_pct=snapshot.protein_pct,
moisture_pct=snapshot.moisture_pct, moisture_pct=snapshot.moisture_pct,
similar_to_ref=similar_to_ref,
) )
try: try:
@ -250,7 +301,41 @@ async def publish_post(body: dict, session: CloudUser = Depends(get_session)):
status_code=409, status_code=409,
detail="A post with this title already exists today. Try a different title.", detail="A post with this title already exists today. Try a different title.",
) from exc ) from exc
return _post_to_dict(inserted)
post_dict = _post_to_dict(inserted)
# AP delivery + Mastodon post (Paid tier, AP_ENABLED, opted-in)
from app.core.config import settings as _settings
if _settings.AP_ENABLED and session.tier in ("paid", "premium", "ultra"):
from circuitforge_core.activitypub import make_create, make_note, PUBLIC
from app.services.ap.keys import get_actor
from app.services.ap.delivery import deliver_to_followers
_ap_actor = get_actor()
if _ap_actor is not None:
base = f"https://{_settings.AP_HOST}"
from app.api.endpoints.activitypub import _post_to_ap_note
_note = _post_to_ap_note(inserted, _ap_actor, base)
_activity = make_create(_ap_actor, _note)
asyncio.create_task(
asyncio.to_thread(
deliver_to_followers, inserted.slug, _activity, session.db
)
)
# Mastodon post if user has connected account and opted in
if body.get("post_to_mastodon"):
from app.services.ap.mastodon import build_post_content, get_token, post_status
_masto = await asyncio.to_thread(
get_token, session.db, session.user_id, _settings.AP_TOKEN_ENCRYPTION_KEY
)
if _masto:
_masto_url, _masto_token = _masto
_content = build_post_content(post_dict)
asyncio.create_task(
asyncio.to_thread(post_status, _masto_url, _masto_token, _content)
)
return post_dict
@router.delete("/posts/{slug}", status_code=204) @router.delete("/posts/{slug}", status_code=204)
@ -351,6 +436,7 @@ def _post_to_dict(post) -> dict:
"fat_pct": post.fat_pct, "fat_pct": post.fat_pct,
"protein_pct": post.protein_pct, "protein_pct": post.protein_pct,
"moisture_pct": post.moisture_pct, "moisture_pct": post.moisture_pct,
"similar_to_ref": getattr(post, "similar_to_ref", None),
} }

View file

@ -0,0 +1,5 @@
# app/api/endpoints/corrections.py — user corrections to LLM output for SFT training
from circuitforge_core.api import make_corrections_router
from app.db.session import get_db
router = make_corrections_router(get_db=get_db, product="kiwi")

View file

@ -11,7 +11,8 @@ import sqlite3
import requests import requests
from fastapi import APIRouter, Depends, HTTPException from fastapi import APIRouter, Depends, HTTPException
from app.cloud_session import CloudUser, CLOUD_DATA_ROOT, HEIMDALL_URL, HEIMDALL_ADMIN_TOKEN, get_session from app.cloud_session import CloudUser, CLOUD_DATA_ROOT, get_session
from app.services.heimdall_orch import HEIMDALL_URL, HEIMDALL_ADMIN_TOKEN
from app.db.store import Store from app.db.store import Store
from app.models.schemas.household import ( from app.models.schemas.household import (
HouseholdAcceptRequest, HouseholdAcceptRequest,

View file

@ -478,7 +478,8 @@ async def scan_barcode_image(
from app.services.openfoodfacts import OpenFoodFactsService from app.services.openfoodfacts import OpenFoodFactsService
from app.services.expiration_predictor import ExpirationPredictor from app.services.expiration_predictor import ExpirationPredictor
barcodes = await asyncio.to_thread(BarcodeScanner().scan_image, temp_file) image_bytes = temp_file.read_bytes()
barcodes = await asyncio.to_thread(BarcodeScanner().scan_from_bytes, image_bytes)
if not barcodes: if not barcodes:
return BarcodeScanResponse( return BarcodeScanResponse(
success=False, barcodes_found=0, results=[], success=False, barcodes_found=0, results=[],
@ -500,9 +501,10 @@ async def scan_barcode_image(
product_info = await off.lookup_product(code) product_info = await off.lookup_product(code)
product_source = "openfoodfacts" product_source = "openfoodfacts"
db_product = None
inventory_item = None inventory_item = None
if product_info and auto_add_to_inventory: if product_info:
product, _ = await asyncio.to_thread( db_product, _ = await asyncio.to_thread(
store.get_or_create_product, store.get_or_create_product,
product_info.get("name", code), product_info.get("name", code),
code, code,
@ -512,29 +514,30 @@ async def scan_barcode_image(
source=product_source, source=product_source,
source_data=product_info, source_data=product_info,
) )
exp = predictor.predict_expiration( if auto_add_to_inventory:
product_info.get("category", ""), exp = predictor.predict_expiration(
location, product_info.get("category", ""),
product_name=product_info.get("name", code), location,
tier=session.tier, product_name=product_info.get("name", code),
has_byok=session.has_byok, tier=session.tier,
) has_byok=session.has_byok,
resolved_qty = product_info.get("pack_quantity") or quantity )
resolved_unit = product_info.get("pack_unit") or "count" resolved_qty = product_info.get("pack_quantity") or quantity
inventory_item = await asyncio.to_thread( resolved_unit = product_info.get("pack_unit") or "count"
store.add_inventory_item, inventory_item = await asyncio.to_thread(
product["id"], location, store.add_inventory_item,
quantity=resolved_qty, db_product["id"], location,
unit=resolved_unit, quantity=resolved_qty,
expiration_date=str(exp) if exp else None, unit=resolved_unit,
source="barcode_scan", expiration_date=str(exp) if exp else None,
) source="barcode_scan",
product_found = product_info is not None )
product_found = db_product is not None
needs_capture = not product_found and has_visual_capture needs_capture = not product_found and has_visual_capture
results.append({ results.append({
"barcode": code, "barcode": code,
"barcode_type": bc.get("type", "unknown"), "barcode_type": bc.get("type", "unknown"),
"product": ProductResponse.model_validate(product_info) if product_info else None, "product": ProductResponse.model_validate(db_product) if db_product else None,
"inventory_item": InventoryItemResponse.model_validate(inventory_item) if inventory_item else None, "inventory_item": InventoryItemResponse.model_validate(inventory_item) if inventory_item else None,
"added_to_inventory": inventory_item is not None, "added_to_inventory": inventory_item is not None,
"needs_manual_entry": not product_found and not needs_capture, "needs_manual_entry": not product_found and not needs_capture,

View file

@ -0,0 +1,133 @@
# app/api/endpoints/mastodon_oauth.py
# MIT License
#
# Mastodon OAuth flow endpoints:
# POST /social/mastodon/connect — Start OAuth (dynamic app registration)
# GET /social/mastodon/callback — OAuth callback, exchange code for token
# DELETE /social/mastodon/disconnect — Revoke and remove stored token
# GET /social/mastodon/status — Check connection status
from __future__ import annotations
import asyncio
import logging
from urllib.parse import urlencode
from fastapi import APIRouter, Depends, HTTPException
from fastapi.responses import RedirectResponse
from app.cloud_session import CloudUser, get_session
from app.core.config import settings
logger = logging.getLogger(__name__)
router = APIRouter(prefix="/social/mastodon", tags=["mastodon"])
def _redirect_uri() -> str:
host = settings.AP_HOST or "localhost:8512"
return f"https://{host}/api/v1/social/mastodon/callback"
# In-memory pending state: maps state_token → {instance_url, client_id, client_secret, user_id}
# A real deployment would persist this in a short-TTL cache or DB.
_pending: dict[str, dict] = {}
@router.post("/connect")
async def connect_mastodon(body: dict, session: CloudUser = Depends(get_session)):
"""Start the Mastodon OAuth flow.
Body: {"instance_url": "https://mastodon.social"}
Returns: {"authorize_url": "..."}
"""
import secrets
from app.services.ap.mastodon import build_authorize_url, register_app
instance_url = (body.get("instance_url") or "").strip().rstrip("/")
if not instance_url.startswith("https://"):
raise HTTPException(status_code=422, detail="instance_url must be an https:// URL.")
redirect_uri = _redirect_uri()
try:
app_creds = await asyncio.to_thread(register_app, instance_url, redirect_uri)
except Exception as exc:
raise HTTPException(
status_code=502, detail=f"Could not register with Mastodon instance: {exc}"
) from exc
state = secrets.token_urlsafe(24)
_pending[state] = {
"instance_url": instance_url,
"client_id": app_creds["client_id"],
"client_secret": app_creds["client_secret"],
"user_id": session.user_id,
}
authorize_url = build_authorize_url(
instance_url=instance_url,
client_id=app_creds["client_id"],
redirect_uri=redirect_uri + f"?state={state}",
)
return {"authorize_url": authorize_url, "state": state}
@router.get("/callback")
async def mastodon_callback(code: str | None = None, state: str | None = None):
"""OAuth callback. Exchanges auth code for access token and stores it."""
if not code or not state:
raise HTTPException(status_code=400, detail="Missing code or state parameter.")
pending = _pending.pop(state, None)
if pending is None:
raise HTTPException(status_code=400, detail="Unknown or expired OAuth state.")
from app.services.ap.mastodon import exchange_code, store_token
redirect_uri = _redirect_uri() + f"?state={state}"
try:
access_token = await asyncio.to_thread(
exchange_code,
pending["instance_url"],
pending["client_id"],
pending["client_secret"],
code,
redirect_uri,
)
except Exception as exc:
raise HTTPException(status_code=502, detail=f"Token exchange failed: {exc}") from exc
await asyncio.to_thread(
store_token,
settings.DB_PATH,
pending["user_id"],
pending["instance_url"],
access_token,
settings.AP_TOKEN_ENCRYPTION_KEY,
)
# Redirect to frontend settings page after successful connect
return RedirectResponse(url="/#/settings?mastodon=connected", status_code=302)
@router.delete("/disconnect", status_code=204)
async def disconnect_mastodon(session: CloudUser = Depends(get_session)):
"""Remove the stored Mastodon token."""
from app.services.ap.mastodon import delete_token
await asyncio.to_thread(delete_token, settings.DB_PATH, session.user_id)
@router.get("/status")
async def mastodon_status(session: CloudUser = Depends(get_session)):
"""Return connection status and instance URL (no token value)."""
from app.services.ap.mastodon import get_token
result = await asyncio.to_thread(
get_token,
settings.DB_PATH,
session.user_id,
settings.AP_TOKEN_ENCRYPTION_KEY,
)
if result is None:
return {"connected": False, "instance_url": None}
instance_url, _ = result
return {"connected": True, "instance_url": instance_url}

View file

@ -0,0 +1,371 @@
"""Recipe scanner endpoints (kiwi#9).
POST /recipes/scan -- scan photo(s) -> structured recipe JSON (not saved)
POST /recipes/scan/save -- save a confirmed scanned recipe to user_recipes
GET /recipes/user -- list user-created recipes
GET /recipes/user/{id} -- get a single user recipe
DELETE /recipes/user/{id} -- delete a user recipe
BSL 1.1 -- recipe_scan requires Paid tier or BYOK.
"""
from __future__ import annotations
import asyncio
import json as _json
import logging
import uuid
from pathlib import Path
from typing import Annotated
import aiofiles
from fastapi import APIRouter, Depends, File, HTTPException, UploadFile
from fastapi.responses import JSONResponse, StreamingResponse
from app.cloud_session import CloudUser, get_session
from app.core.config import settings
from app.db.session import get_store
from app.db.store import Store
from app.models.schemas.recipe_scan import (
ScannedIngredientSchema,
ScannedRecipeResponse,
ScannedRecipeSaveRequest,
UserRecipeResponse,
)
from app.tiers import can_use
logger = logging.getLogger(__name__)
router = APIRouter()
_ALLOWED_MIME_TYPES = {
"image/jpeg", "image/jpg", "image/png", "image/webp", "image/heic", "image/heif"
}
_MAX_FILE_SIZE_MB = 20
async def _save_upload_temp(file: UploadFile) -> Path:
"""Write upload to a temp path under UPLOAD_DIR. Caller is responsible for cleanup."""
settings.ensure_dirs()
dest = settings.UPLOAD_DIR / f"scan_{uuid.uuid4()}_{file.filename}"
async with aiofiles.open(dest, "wb") as f:
await f.write(await file.read())
return dest
def _result_to_response(result) -> ScannedRecipeResponse:
"""Convert ScannedRecipeResult (dataclass) to Pydantic response schema."""
return ScannedRecipeResponse(
title=result.title,
subtitle=result.subtitle,
servings=result.servings,
cook_time=result.cook_time,
source_note=result.source_note,
ingredients=[
ScannedIngredientSchema(
name=i.name,
qty=i.qty,
unit=i.unit,
raw=i.raw,
in_pantry=i.in_pantry,
)
for i in result.ingredients
],
steps=result.steps,
notes=result.notes,
tags=result.tags,
pantry_match_pct=result.pantry_match_pct,
confidence=result.confidence,
warnings=result.warnings,
)
def _row_to_user_recipe(row: dict) -> UserRecipeResponse:
"""Convert a store row dict to UserRecipeResponse."""
return UserRecipeResponse(
id=row["id"],
title=row["title"],
subtitle=row.get("subtitle"),
servings=row.get("servings"),
cook_time=row.get("cook_time"),
source_note=row.get("source_note"),
ingredients=[
ScannedIngredientSchema(**i) if isinstance(i, dict) else i
for i in (row.get("ingredients") or [])
],
steps=row.get("steps") or [],
notes=row.get("notes"),
tags=row.get("tags") or [],
source=row.get("source", "manual"),
pantry_match_pct=row.get("pantry_match_pct"),
created_at=row["created_at"],
)
# ── Scan endpoint ──────────────────────────────────────────────────────────────
@router.post("/scan", response_model=ScannedRecipeResponse)
async def scan_recipe(
files: Annotated[list[UploadFile], File(...)],
store: Store = Depends(get_store),
session: CloudUser = Depends(get_session),
):
"""Scan one or more recipe photos and return a structured recipe for review.
Accepts 1-4 images. Multi-page recipes (e.g. ingredients on page 1,
directions on page 2) work best when all pages are submitted together.
The response is NOT saved automatically -- the user reviews and edits it,
then calls POST /recipes/scan/save to persist.
Tier: Paid (or BYOK).
"""
if not can_use("recipe_scan", session.tier, session.has_byok):
raise HTTPException(
status_code=403,
detail=(
"Recipe scanning requires Paid tier or a configured vision backend (BYOK). "
"Set ANTHROPIC_API_KEY or connect to a cf-orch vision service."
),
)
if not files:
raise HTTPException(status_code=422, detail="At least one image file is required.")
if len(files) > 4:
raise HTTPException(status_code=422, detail="Maximum 4 images per scan request.")
for f in files:
ct = (f.content_type or "").lower()
if ct and ct not in _ALLOWED_MIME_TYPES:
raise HTTPException(
status_code=422,
detail=f"Unsupported file type: {ct}. Supported: JPEG, PNG, WebP, HEIC.",
)
# Save uploads to temp files
saved_paths: list[Path] = []
try:
for f in files:
saved_paths.append(await _save_upload_temp(f))
# Get pantry item names for cross-reference
inventory = await asyncio.to_thread(store.list_inventory)
pantry_names = [item["product_name"] for item in inventory if item.get("product_name")]
# Run scanner (blocks on VLM -- use to_thread)
from app.services.recipe.recipe_scanner import RecipeScanner
def _run_scan():
scanner = RecipeScanner()
return scanner.scan(saved_paths, pantry_names=pantry_names)
try:
result = await asyncio.to_thread(_run_scan)
except ValueError as exc:
msg = str(exc)
if "not_a_recipe" in msg:
raise HTTPException(
status_code=422,
detail="The image does not appear to contain a recipe. "
"Please photograph a recipe card, cookbook page, or handwritten note.",
)
raise HTTPException(status_code=422, detail=msg)
except RuntimeError as exc:
msg = str(exc)
logger.warning("Recipe scanner unavailable: %s", msg)
raise HTTPException(
status_code=503,
detail=(
"The recipe scanner is temporarily unavailable — "
"no vision backend could be reached. "
"Try again in a few minutes, or contact support if this persists."
),
)
return _result_to_response(result)
finally:
# Clean up temp files
for p in saved_paths:
try:
p.unlink(missing_ok=True)
except Exception:
pass
# ── SSE scan endpoint ─────────────────────────────────────────────────────────
async def _scan_recipe_sse(saved_paths: list[Path], pantry_names: list[str]):
"""Async generator yielding SSE events for a recipe scan.
Emits progress events while the vision service allocates and runs, then a
final "done" event containing the full recipe payload (same shape as the
ScannedRecipeResponse from POST /scan).
Events:
{"status": "allocating", "message": "..."}
{"status": "scanning", "message": "..."}
{"status": "structuring","message": "..."}
{"status": "done", "recipe": {...}}
{"status": "error", "message": "..."}
"""
queue: asyncio.Queue = asyncio.Queue()
loop = asyncio.get_running_loop()
def _run() -> None:
def cb(status: str, message: str) -> None:
loop.call_soon_threadsafe(queue.put_nowait, {"status": status, "message": message})
try:
from app.services.recipe.recipe_scanner import RecipeScanner
result = RecipeScanner().scan(saved_paths, pantry_names=pantry_names, progress_cb=cb)
recipe_dict = _result_to_response(result).model_dump()
loop.call_soon_threadsafe(queue.put_nowait, {"status": "done", "recipe": recipe_dict})
except ValueError as exc:
loop.call_soon_threadsafe(queue.put_nowait, {"status": "error", "message": str(exc)})
except RuntimeError as exc:
loop.call_soon_threadsafe(queue.put_nowait, {"status": "error", "message": str(exc)})
except Exception as exc:
logger.exception("Unexpected error in recipe scan thread")
loop.call_soon_threadsafe(queue.put_nowait, {"status": "error", "message": "Scan failed unexpectedly."})
scan_task = asyncio.ensure_future(asyncio.to_thread(_run))
try:
while True:
try:
event = await asyncio.wait_for(queue.get(), timeout=180.0)
except asyncio.TimeoutError:
yield f"data: {_json.dumps({'status': 'error', 'message': 'Scan timed out after 3 minutes.'})}\n\n"
break
yield f"data: {_json.dumps(event)}\n\n"
if event["status"] in ("done", "error"):
break
finally:
if not scan_task.done():
scan_task.cancel()
@router.post("/scan/stream")
async def scan_recipe_stream(
files: Annotated[list[UploadFile], File(...)],
store: Store = Depends(get_store),
session: CloudUser = Depends(get_session),
):
"""Scan recipe photos and stream SSE progress events during model load.
Use this endpoint instead of POST /scan when you need live feedback during
cold-start model loading (first request after a GPU-idle period can take
30-60 seconds for cf-docuvision to warm up).
Tier: Paid (or BYOK) same gate as POST /scan.
"""
if not can_use("recipe_scan", session.tier, session.has_byok):
raise HTTPException(
status_code=403,
detail=(
"Recipe scanning requires Paid tier or a configured vision backend (BYOK). "
"Set ANTHROPIC_API_KEY or connect to a cf-orch vision service."
),
)
if not files:
raise HTTPException(status_code=422, detail="At least one image file is required.")
if len(files) > 4:
raise HTTPException(status_code=422, detail="Maximum 4 images per scan request.")
for f in files:
ct = (f.content_type or "").lower()
if ct and ct not in _ALLOWED_MIME_TYPES:
raise HTTPException(
status_code=422,
detail=f"Unsupported file type: {ct}. Supported: JPEG, PNG, WebP, HEIC.",
)
saved_paths: list[Path] = []
for f in files:
saved_paths.append(await _save_upload_temp(f))
inventory = await asyncio.to_thread(store.list_inventory)
pantry_names = [item["product_name"] for item in inventory if item.get("product_name")]
async def generate():
try:
async for chunk in _scan_recipe_sse(saved_paths, pantry_names):
yield chunk
finally:
for p in saved_paths:
try:
p.unlink(missing_ok=True)
except Exception:
pass
return StreamingResponse(generate(), media_type="text/event-stream")
# ── Save endpoint ──────────────────────────────────────────────────────────────
@router.post("/scan/save", response_model=UserRecipeResponse, status_code=201)
async def save_scanned_recipe(
body: ScannedRecipeSaveRequest,
store: Store = Depends(get_store),
session: CloudUser = Depends(get_session),
):
"""Save a user-reviewed (possibly edited) scanned recipe.
The body is the ScannedRecipeResponse (or a user-edited version of it).
Returns the persisted UserRecipe with an assigned ID.
Tier: Free (saving your own recipe doesn't require vision access).
"""
def _save():
return store.create_user_recipe(
title=body.title,
subtitle=body.subtitle,
servings=body.servings,
cook_time=body.cook_time,
source_note=body.source_note,
ingredients=[i.model_dump() for i in body.ingredients],
steps=body.steps,
notes=body.notes,
tags=body.tags,
source=body.source,
pantry_match_pct=None,
)
row = await asyncio.to_thread(_save)
return _row_to_user_recipe(row)
# ── User recipe list / get / delete ───────────────────────────────────────────
@router.get("/user", response_model=list[UserRecipeResponse])
async def list_user_recipes(
store: Store = Depends(get_store),
session: CloudUser = Depends(get_session),
):
"""List all user-created recipes (scanned + manually entered), newest first."""
rows = await asyncio.to_thread(store.list_user_recipes)
return [_row_to_user_recipe(r) for r in rows]
@router.get("/user/{recipe_id}", response_model=UserRecipeResponse)
async def get_user_recipe(
recipe_id: int,
store: Store = Depends(get_store),
session: CloudUser = Depends(get_session),
):
"""Get a single user recipe by ID."""
row = await asyncio.to_thread(store.get_user_recipe, recipe_id)
if not row:
raise HTTPException(status_code=404, detail="User recipe not found.")
return _row_to_user_recipe(row)
@router.delete("/user/{recipe_id}", status_code=204)
async def delete_user_recipe(
recipe_id: int,
store: Store = Depends(get_store),
session: CloudUser = Depends(get_session),
):
"""Delete a user recipe by ID."""
deleted = await asyncio.to_thread(store.delete_user_recipe, recipe_id)
if not deleted:
raise HTTPException(status_code=404, detail="User recipe not found.")
return JSONResponse(status_code=204, content=None)

View file

@ -6,7 +6,9 @@ import logging
from pathlib import Path from pathlib import Path
from typing import Annotated from typing import Annotated
import json as _json_mod
from fastapi import APIRouter, Depends, HTTPException, Query from fastapi import APIRouter, Depends, HTTPException, Query
from fastapi.responses import StreamingResponse
from app.cloud_session import CloudUser, _auth_label, get_session from app.cloud_session import CloudUser, _auth_label, get_session
@ -14,8 +16,12 @@ log = logging.getLogger(__name__)
from app.db.session import get_store from app.db.session import get_store
from app.db.store import Store from app.db.store import Store
from app.models.schemas.recipe import ( from app.models.schemas.recipe import (
AskRequest,
AskResponse,
AskRecipeHit,
AssemblyTemplateOut, AssemblyTemplateOut,
BuildRequest, BuildRequest,
LeftoversResponse,
RecipeJobStatus, RecipeJobStatus,
RecipeRequest, RecipeRequest,
RecipeResult, RecipeResult,
@ -102,6 +108,39 @@ def _build_stream_prompt(db_path: Path, level: int) -> str:
store.close() store.close()
async def _stream_recipe_sse(db_path: Path, req: RecipeRequest):
"""Async generator that yields SSE events for a streaming recipe request.
Phase 1 (thread): classify pantry items using a temporary Store.
Phase 2 (async): stream tokens from LLM via LLMRecipeGenerator.stream_generate().
"""
def _prep(db_path: Path) -> tuple[list, list[str]]:
from app.services.recipe.element_classifier import IngredientClassifier
store = Store(db_path)
try:
classifier = IngredientClassifier(store)
profiles = classifier.classify_batch(req.pantry_items)
gaps = classifier.identify_gaps(profiles)
return profiles, gaps
finally:
store.close()
try:
profiles, gaps = await asyncio.to_thread(_prep, db_path)
except Exception as exc:
yield f"data: {_json_mod.dumps({'error': str(exc)})}\n\n"
return
from app.services.recipe.llm_recipe import LLMRecipeGenerator
gen = LLMRecipeGenerator(None)
try:
async for token in gen.stream_generate(req, profiles, gaps):
yield f"data: {_json_mod.dumps({'chunk': token})}\n\n"
yield f"data: {_json_mod.dumps({'done': True})}\n\n"
except Exception as exc:
yield f"data: {_json_mod.dumps({'error': str(exc)})}\n\n"
async def _enqueue_recipe_job(session: CloudUser, req: RecipeRequest): async def _enqueue_recipe_job(session: CloudUser, req: RecipeRequest):
"""Queue an async recipe_llm job and return 202 with job_id. """Queue an async recipe_llm job and return 202 with job_id.
@ -143,6 +182,7 @@ async def _enqueue_recipe_job(session: CloudUser, req: RecipeRequest):
async def suggest_recipes( async def suggest_recipes(
req: RecipeRequest, req: RecipeRequest,
async_mode: bool = Query(default=False, alias="async"), async_mode: bool = Query(default=False, alias="async"),
stream: bool = Query(default=False),
session: CloudUser = Depends(get_session), session: CloudUser = Depends(get_session),
store: Store = Depends(get_store), store: Store = Depends(get_store),
): ):
@ -178,6 +218,13 @@ async def suggest_recipes(
req = req.model_copy(update={"level": 2}) req = req.model_copy(update={"level": 2})
orch_fallback = True orch_fallback = True
if stream and req.level in (3, 4):
return StreamingResponse(
_stream_recipe_sse(session.db, req),
media_type="text/event-stream",
headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
)
if req.level in (3, 4) and async_mode: if req.level in (3, 4) and async_mode:
return await _enqueue_recipe_job(session, req) return await _enqueue_recipe_job(session, req)
@ -326,6 +373,7 @@ async def browse_recipes(
subcategory: Annotated[str | None, Query()] = None, subcategory: Annotated[str | None, Query()] = None,
q: Annotated[str | None, Query(max_length=200)] = None, q: Annotated[str | None, Query(max_length=200)] = None,
sort: Annotated[str, Query(pattern="^(default|alpha|alpha_desc|match)$")] = "default", sort: Annotated[str, Query(pattern="^(default|alpha|alpha_desc|match)$")] = "default",
required_ingredient: Annotated[str | None, Query(max_length=100)] = None,
session: CloudUser = Depends(get_session), session: CloudUser = Depends(get_session),
) -> dict: ) -> dict:
"""Return a paginated list of recipes for a domain/category. """Return a paginated list of recipes for a domain/category.
@ -334,6 +382,7 @@ async def browse_recipes(
Pass subcategory to narrow within a category that has subcategories. Pass subcategory to narrow within a category that has subcategories.
Pass q to filter by title substring. Pass sort for ordering (default/alpha/alpha_desc/match). Pass q to filter by title substring. Pass sort for ordering (default/alpha/alpha_desc/match).
sort=match orders by pantry coverage DESC; falls back to default when no pantry_items. sort=match orders by pantry coverage DESC; falls back to default when no pantry_items.
Pass required_ingredient to restrict results to recipes that must include that ingredient.
""" """
if domain not in DOMAINS: if domain not in DOMAINS:
raise HTTPException(status_code=404, detail=f"Unknown domain '{domain}'.") raise HTTPException(status_code=404, detail=f"Unknown domain '{domain}'.")
@ -376,6 +425,7 @@ async def browse_recipes(
q=q or None, q=q or None,
sort=sort, sort=sort,
sensory_exclude=sensory_exclude, sensory_exclude=sensory_exclude,
required_ingredient=required_ingredient or None,
) )
# ── Attach time/effort signals to each browse result ──────────────── # ── Attach time/effort signals to each browse result ────────────────
@ -388,7 +438,11 @@ async def browse_recipes(
except Exception: except Exception:
directions_raw = [] directions_raw = []
if directions_raw: if directions_raw:
_profile = parse_time_effort(directions_raw) _profile = parse_time_effort(
directions_raw,
ingredients=recipe_row.get("ingredients") or [],
ingredient_names=recipe_row.get("ingredient_names") or [],
)
recipe_row["active_min"] = _profile.active_min recipe_row["active_min"] = _profile.active_min
recipe_row["passive_min"] = _profile.passive_min recipe_row["passive_min"] = _profile.passive_min
else: else:
@ -423,7 +477,11 @@ async def browse_recipes(
except Exception: except Exception:
directions_raw = [] directions_raw = []
if directions_raw: if directions_raw:
_profile = parse_time_effort(directions_raw) _profile = parse_time_effort(
directions_raw,
ingredients=recipe_row.get("ingredients") or [],
ingredient_names=recipe_row.get("ingredient_names") or [],
)
recipe_row["active_min"] = _profile.active_min recipe_row["active_min"] = _profile.active_min
recipe_row["passive_min"] = _profile.passive_min recipe_row["passive_min"] = _profile.passive_min
else: else:
@ -542,6 +600,137 @@ async def build_recipe(
return result return result
_ASK_STOPWORDS: frozenset[str] = frozenset({
"what", "can", "make", "with", "have", "some", "the", "and", "for",
"that", "this", "these", "those", "how", "about", "are", "there",
"give", "show", "find", "want", "need", "like", "any", "good",
"quick", "easy", "simple", "fast", "using", "use", "from", "into",
"more", "much", "just", "only", "my", "please", "could", "would",
"should", "something", "anything", "everything", "ideas", "idea",
"suggest", "meal", "food", "dish", "dishes", "today", "tonight",
"tomorrow", "now", "here", "there", "recipes", "recipe", "dinner",
"lunch", "breakfast", "snack", "under", "minutes", "hours", "time",
"left", "over", "also", "some", "make", "cook", "made", "cooked",
})
import re as _re
def _extract_ask_keywords(question: str) -> list[str]:
"""Extract food-relevant keywords from a natural language question."""
tokens = _re.findall(r"[a-zA-Z]+", question.lower())
return [t for t in tokens if len(t) > 3 and t not in _ASK_STOPWORDS]
def _ask_in_thread(db_path: Path, question: str, pantry_items: list[str]) -> AskResponse:
"""Run Ask logic in a worker thread.
Free tier: keyword extraction + FTS ingredient search.
Paid tier path: same search, then LLM synthesis over results.
The caller handles tier gating and LLM synthesis outside this thread
to avoid importing LLMRouter in a sync context.
"""
import json as _json
store = Store(db_path)
try:
keywords = _extract_ask_keywords(question)
ingredient_hits: list[dict] = []
if keywords:
ingredient_hits = store.search_recipes_by_ingredients(keywords, limit=15)
# Also search by title using the full question text as a substring hint.
# browse_recipes q= does title LIKE %q%. Extract the longest keyword
# from the question as the title probe (most likely to appear in a title).
title_hits: list[dict] = []
title_probe = max(keywords, key=len) if keywords else None
if title_probe:
browse_result = store.browse_recipes(
keywords=None,
page=1,
page_size=12,
pantry_items=pantry_items or None,
q=title_probe,
sort="match" if pantry_items else "default",
)
title_hits = browse_result.get("recipes", [])
# Merge by ID; ingredient hits come first (more semantically relevant).
seen: set[int] = set()
merged: list[dict] = []
for row in ingredient_hits + title_hits:
rid = row.get("id")
if rid is not None and rid not in seen:
seen.add(rid)
merged.append(row)
# Compute pantry match_pct if caller sent pantry items.
pantry_set = {p.lower() for p in pantry_items} if pantry_items else set()
hits: list[AskRecipeHit] = []
for row in merged[:12]:
match_pct: float | None = None
if pantry_set:
raw_names = row.get("ingredient_names") or []
if isinstance(raw_names, str):
try:
raw_names = _json.loads(raw_names)
except Exception:
raw_names = []
if raw_names:
covered = sum(
1 for n in raw_names
if any(p in n.lower() for p in pantry_set)
)
match_pct = round(covered / len(raw_names), 2)
hits.append(AskRecipeHit(
id=row["id"],
title=row.get("title", ""),
category=row.get("category"),
match_pct=match_pct,
))
return AskResponse(answer=None, recipes=hits, tier="free")
finally:
store.close()
@router.post("/ask", response_model=AskResponse)
async def ask_recipes(
req: AskRequest,
session: CloudUser = Depends(get_session),
) -> AskResponse:
"""Natural-language recipe search with optional LLM synthesis.
Free tier: keyword extraction from question FTS ingredient + title search.
Paid tier / BYOK: same search, then LLM synthesizes a short conversational answer.
"""
result = await asyncio.to_thread(_ask_in_thread, session.db, req.question, req.pantry_items)
# LLM synthesis: only for paid/premium/ultra tiers, not "local" dev tier.
# Wrapped in wait_for so an unresponsive model degrades gracefully to recipe list only.
paid_tier = session.tier in ("paid", "premium", "ultra")
if (paid_tier or session.has_byok) and result.recipes:
recipe_titles = ", ".join(r.title for r in result.recipes[:6])
prompt = (
f'You are a helpful kitchen assistant. The user asked: "{req.question}"\n\n'
f"Matching recipes: {recipe_titles}\n\n"
f"Write a brief, friendly 12 sentence response suggesting which of these "
f"recipes might best fit the question. Be specific and natural."
)
try:
from circuitforge_core.llm.router import LLMRouter
answer = await asyncio.wait_for(
asyncio.to_thread(LLMRouter().complete, prompt),
timeout=8.0,
)
result = result.model_copy(update={"answer": answer.strip() or None, "tier": "paid"})
except (Exception, asyncio.TimeoutError) as exc:
log.warning("Ask LLM synthesis skipped: %s", exc)
return result
@router.get("/{recipe_id}") @router.get("/{recipe_id}")
async def get_recipe(recipe_id: int, session: CloudUser = Depends(get_session)) -> dict: async def get_recipe(recipe_id: int, session: CloudUser = Depends(get_session)) -> dict:
def _get(db_path: Path, rid: int) -> dict | None: def _get(db_path: Path, rid: int) -> dict | None:
@ -573,8 +762,28 @@ async def get_recipe(recipe_id: int, session: CloudUser = Depends(get_session))
except Exception: except Exception:
_directions_for_te = [] _directions_for_te = []
_ingredients_for_te = recipe.get("ingredients") or []
if isinstance(_ingredients_for_te, str):
import json as _json3
try:
_ingredients_for_te = _json3.loads(_ingredients_for_te)
except Exception:
_ingredients_for_te = []
_ingredient_names_for_te = recipe.get("ingredient_names") or []
if isinstance(_ingredient_names_for_te, str):
import json as _json4
try:
_ingredient_names_for_te = _json4.loads(_ingredient_names_for_te)
except Exception:
_ingredient_names_for_te = []
if _directions_for_te: if _directions_for_te:
_te = parse_time_effort(_directions_for_te) _te = parse_time_effort(
_directions_for_te,
ingredients=_ingredients_for_te,
ingredient_names=_ingredient_names_for_te,
)
_time_effort_out: dict | None = { _time_effort_out: dict | None = {
"active_min": _te.active_min, "active_min": _te.active_min,
"passive_min": _te.passive_min, "passive_min": _te.passive_min,
@ -582,7 +791,11 @@ async def get_recipe(recipe_id: int, session: CloudUser = Depends(get_session))
"effort_label": _te.effort_label, "effort_label": _te.effort_label,
"equipment": _te.equipment, "equipment": _te.equipment,
"step_analyses": [ "step_analyses": [
{"is_passive": sa.is_passive, "detected_minutes": sa.detected_minutes} {
"is_passive": sa.is_passive,
"detected_minutes": sa.detected_minutes,
"prep_min": sa.prep_min,
}
for sa in _te.step_analyses for sa in _te.step_analyses
], ],
} }
@ -608,3 +821,33 @@ async def get_recipe(recipe_id: int, session: CloudUser = Depends(get_session))
"estimated_time_min": None, "estimated_time_min": None,
"time_effort": _time_effort_out, "time_effort": _time_effort_out,
} }
@router.post("/{recipe_id}/leftovers", response_model=LeftoversResponse)
async def get_leftovers_shelf_life(
recipe_id: int,
session: CloudUser = Depends(get_session),
) -> LeftoversResponse:
"""Return cooked-leftover shelf-life estimate for a recipe.
Free tier: deterministic lookup (FDA/USDA table).
Deterministic path always runs; no tier gate needed.
"""
def _get(db_path: Path, rid: int) -> LeftoversResponse:
from app.services.leftovers_predictor import predict_leftovers_from_row
store = Store(db_path)
try:
recipe = store.get_recipe(rid)
finally:
store.close()
if recipe is None:
raise HTTPException(status_code=404, detail="Recipe not found.")
result = predict_leftovers_from_row(recipe)
return LeftoversResponse(
fridge_days=result.fridge_days,
freeze_days=result.freeze_days,
freeze_by_day=result.freeze_by_day,
storage_advice=result.storage_advice,
)
return await asyncio.to_thread(_get, session.db, recipe_id)

View file

@ -5,6 +5,7 @@ import asyncio
from pathlib import Path from pathlib import Path
from fastapi import APIRouter, Depends, HTTPException from fastapi import APIRouter, Depends, HTTPException
from pydantic import BaseModel
from app.cloud_session import CloudUser, get_session from app.cloud_session import CloudUser, get_session
from app.db.store import Store from app.db.store import Store
@ -16,8 +17,13 @@ from app.models.schemas.saved_recipe import (
SaveRecipeRequest, SaveRecipeRequest,
UpdateSavedRecipeRequest, UpdateSavedRecipeRequest,
) )
from app.services.magpie_hook import fire_recipe_signal
from app.tiers import can_use from app.tiers import can_use
class StyleClassifyResponse(BaseModel):
suggested_tags: list[str]
router = APIRouter() router = APIRouter()
@ -35,7 +41,7 @@ def _to_summary(row: dict, store: Store) -> SavedRecipeSummary:
return SavedRecipeSummary( return SavedRecipeSummary(
id=row["id"], id=row["id"],
recipe_id=row["recipe_id"], recipe_id=row["recipe_id"],
title=row.get("title", ""), title=row.get("title") or "",
saved_at=row["saved_at"], saved_at=row["saved_at"],
notes=row.get("notes"), notes=row.get("notes"),
rating=row.get("rating"), rating=row.get("rating"),
@ -55,7 +61,9 @@ async def save_recipe(
row = store.save_recipe(req.recipe_id, req.notes, req.rating) row = store.save_recipe(req.recipe_id, req.notes, req.rating)
return _to_summary(row, store) return _to_summary(row, store)
return await asyncio.to_thread(_in_thread, session.db, _run) result = await asyncio.to_thread(_in_thread, session.db, _run)
asyncio.create_task(fire_recipe_signal(session.db, req.recipe_id, req.rating, []))
return result
@router.delete("/{recipe_id}", status_code=204) @router.delete("/{recipe_id}", status_code=204)
@ -82,7 +90,11 @@ async def update_saved_recipe(
) )
return _to_summary(row, store) return _to_summary(row, store)
return await asyncio.to_thread(_in_thread, session.db, _run) result = await asyncio.to_thread(_in_thread, session.db, _run)
asyncio.create_task(
fire_recipe_signal(session.db, recipe_id, req.rating, req.style_tags or [])
)
return result
@router.get("", response_model=list[SavedRecipeSummary]) @router.get("", response_model=list[SavedRecipeSummary])
@ -98,14 +110,37 @@ async def list_saved_recipes(
return await asyncio.to_thread(_in_thread, session.db, _run) return await asyncio.to_thread(_in_thread, session.db, _run)
# ── style classifier (Paid / BYOK) ───────────────────────────────────────────
@router.post("/{recipe_id}/classify-style", response_model=StyleClassifyResponse)
async def classify_style(
recipe_id: int,
session: CloudUser = Depends(get_session),
) -> StyleClassifyResponse:
if not can_use("style_classifier", session.tier, getattr(session, "has_byok", False)):
raise HTTPException(status_code=403, detail="Style classifier requires Paid tier or BYOK.")
def _run(store: Store) -> StyleClassifyResponse:
recipe = store.get_recipe(recipe_id)
if recipe is None:
raise HTTPException(status_code=404, detail="Recipe not found.")
from app.services.recipe.style_classifier import classify_style as _classify
tags = _classify(recipe)
return StyleClassifyResponse(suggested_tags=tags)
return await asyncio.to_thread(_in_thread, session.db, _run)
# ── collections (Paid) ──────────────────────────────────────────────────────── # ── collections (Paid) ────────────────────────────────────────────────────────
@router.get("/collections", response_model=list[CollectionSummary]) @router.get("/collections", response_model=list[CollectionSummary])
async def list_collections( async def list_collections(
session: CloudUser = Depends(get_session), session: CloudUser = Depends(get_session),
) -> list[CollectionSummary]: ) -> list[CollectionSummary]:
# Free users can list (they'll always have zero — creating requires Paid).
# Returning 403 here breaks savedStore.load() via Promise.all for non-Paid users.
if not can_use("recipe_collections", session.tier): if not can_use("recipe_collections", session.tier):
raise HTTPException(status_code=403, detail="Collections require Paid tier.") return []
rows = await asyncio.to_thread( rows = await asyncio.to_thread(
_in_thread, session.db, lambda s: s.get_collections() _in_thread, session.db, lambda s: s.get_collections()
) )

View file

@ -1,6 +1,9 @@
from fastapi import APIRouter from fastapi import APIRouter
from app.api.endpoints import health, receipts, export, inventory, ocr, recipes, settings, staples, feedback, feedback_attach, household, saved_recipes, imitate, meal_plans, orch_usage, session, shopping from app.api.endpoints import health, receipts, export, inventory, ocr, recipes, settings, staples, feedback, feedback_attach, household, saved_recipes, imitate, meal_plans, orch_usage, session, shopping
from app.api.endpoints.community import router as community_router from app.api.endpoints.community import router as community_router
from app.api.endpoints.corrections import router as corrections_router
from app.api.endpoints.mastodon_oauth import router as mastodon_router
from app.api.endpoints.recipe_scan import router as recipe_scan_router
from app.api.endpoints.recipe_tags import router as recipe_tags_router from app.api.endpoints.recipe_tags import router as recipe_tags_router
api_router = APIRouter() api_router = APIRouter()
@ -12,6 +15,9 @@ api_router.include_router(ocr.router, prefix="/receipts", tags=
api_router.include_router(export.router, tags=["export"]) api_router.include_router(export.router, tags=["export"])
api_router.include_router(inventory.router, prefix="/inventory", tags=["inventory"]) api_router.include_router(inventory.router, prefix="/inventory", tags=["inventory"])
api_router.include_router(saved_recipes.router, prefix="/recipes/saved", tags=["saved-recipes"]) api_router.include_router(saved_recipes.router, prefix="/recipes/saved", tags=["saved-recipes"])
# recipe_scan_router registered BEFORE recipes.router so /recipes/scan and /recipes/user
# take priority over /recipes/{recipe_id} (which would otherwise match them as int IDs).
api_router.include_router(recipe_scan_router, prefix="/recipes", tags=["recipe-scan"])
api_router.include_router(recipes.router, prefix="/recipes", tags=["recipes"]) api_router.include_router(recipes.router, prefix="/recipes", tags=["recipes"])
api_router.include_router(settings.router, prefix="/settings", tags=["settings"]) api_router.include_router(settings.router, prefix="/settings", tags=["settings"])
api_router.include_router(staples.router, prefix="/staples", tags=["staples"]) api_router.include_router(staples.router, prefix="/staples", tags=["staples"])
@ -24,3 +30,5 @@ api_router.include_router(orch_usage.router, prefix="/orch-usage", tags=
api_router.include_router(shopping.router, prefix="/shopping", tags=["shopping"]) api_router.include_router(shopping.router, prefix="/shopping", tags=["shopping"])
api_router.include_router(community_router) api_router.include_router(community_router)
api_router.include_router(recipe_tags_router) api_router.include_router(recipe_tags_router)
api_router.include_router(corrections_router, prefix="/corrections", tags=["corrections"])
api_router.include_router(mastodon_router)

View file

@ -1,11 +1,9 @@
"""Cloud session resolution for Kiwi FastAPI. """Cloud session resolution for Kiwi FastAPI.
Local mode (CLOUD_MODE unset/false): returns a local CloudUser with no auth Delegates JWT validation, Heimdall provisioning, tier resolution, and guest
checks, full tier access, and DB path pointing to settings.DB_PATH. session management to circuitforge_core.CloudSessionFactory. Kiwi-specific
CloudUser (per-user DB path, household data, BYOK flag) and DB helpers are
Cloud mode (CLOUD_MODE=true): validates the cf_session JWT injected by Caddy kept here.
as X-CF-Session, resolves user_id, auto-provisions a free Heimdall license on
first visit, fetches the tier, and returns a per-user DB path.
FastAPI usage: FastAPI usage:
@app.get("/api/v1/inventory/items") @app.get("/api/v1/inventory/items")
@ -17,16 +15,10 @@ from __future__ import annotations
import logging import logging
import os import os
import re
import time
from dataclasses import dataclass from dataclasses import dataclass
from pathlib import Path from pathlib import Path
import uuid from circuitforge_core.cloud_session import CloudSessionFactory as _CoreFactory, detect_byok
import jwt as pyjwt
import requests
import yaml
from fastapi import Depends, HTTPException, Request, Response from fastapi import Depends, HTTPException, Request, Response
log = logging.getLogger(__name__) log = logging.getLogger(__name__)
@ -35,54 +27,13 @@ log = logging.getLogger(__name__)
CLOUD_MODE: bool = os.environ.get("CLOUD_MODE", "").lower() in ("1", "true", "yes") CLOUD_MODE: bool = os.environ.get("CLOUD_MODE", "").lower() in ("1", "true", "yes")
CLOUD_DATA_ROOT: Path = Path(os.environ.get("CLOUD_DATA_ROOT", "/devl/kiwi-cloud-data")) CLOUD_DATA_ROOT: Path = Path(os.environ.get("CLOUD_DATA_ROOT", "/devl/kiwi-cloud-data"))
DIRECTUS_JWT_SECRET: str = os.environ.get("DIRECTUS_JWT_SECRET", "")
HEIMDALL_URL: str = os.environ.get("HEIMDALL_URL", "https://license.circuitforge.tech")
HEIMDALL_ADMIN_TOKEN: str = os.environ.get("HEIMDALL_ADMIN_TOKEN", "")
# Dev bypass: comma-separated IPs or CIDR ranges that skip JWT auth.
# NEVER set this in production. Intended only for LAN developer testing when
# the request doesn't pass through Caddy (which normally injects X-CF-Session).
# Example: CLOUD_AUTH_BYPASS_IPS=10.1.10.0/24,127.0.0.1
import ipaddress as _ipaddress
_BYPASS_RAW: list[str] = [
e.strip()
for e in os.environ.get("CLOUD_AUTH_BYPASS_IPS", "").split(",")
if e.strip()
]
_BYPASS_NETS: list[_ipaddress.IPv4Network | _ipaddress.IPv6Network] = []
_BYPASS_IPS: frozenset[str] = frozenset()
if _BYPASS_RAW:
_nets, _ips = [], set()
for entry in _BYPASS_RAW:
try:
_nets.append(_ipaddress.ip_network(entry, strict=False))
except ValueError:
_ips.add(entry) # treat non-parseable entries as bare IPs
_BYPASS_NETS = _nets
_BYPASS_IPS = frozenset(_ips)
def _is_bypass_ip(ip: str) -> bool:
if not ip:
return False
if ip in _BYPASS_IPS:
return True
try:
addr = _ipaddress.ip_address(ip)
return any(addr in net for net in _BYPASS_NETS)
except ValueError:
return False
_LOCAL_KIWI_DB: Path = Path(os.environ.get("KIWI_DB", "data/kiwi.db")) _LOCAL_KIWI_DB: Path = Path(os.environ.get("KIWI_DB", "data/kiwi.db"))
_TIER_CACHE: dict[str, tuple[dict, float]] = {}
_TIER_CACHE_TTL = 300 # 5 minutes
TIERS = ["free", "paid", "premium", "ultra"] TIERS = ["free", "paid", "premium", "ultra"]
_core = _CoreFactory(product="kiwi", byok_detector=detect_byok)
def _auth_label(user_id: str) -> str: def _auth_label(user_id: str) -> str:
"""Classify a user_id into a short tag for structured log lines. No PII emitted.""" """Classify a user_id into a short tag for structured log lines. No PII emitted."""
@ -106,73 +57,7 @@ class CloudUser:
license_key: str | None = None # key_display for lifetime/founders keys; None for subscription/free license_key: str | None = None # key_display for lifetime/founders keys; None for subscription/free
# ── JWT validation ───────────────────────────────────────────────────────────── # ── DB path helpers ───────────────────────────────────────────────────────────
def _extract_session_token(header_value: str) -> str:
m = re.search(r'(?:^|;)\s*cf_session=([^;]+)', header_value)
return m.group(1).strip() if m else header_value.strip()
def validate_session_jwt(token: str) -> str:
"""Validate cf_session JWT and return the Directus user_id."""
try:
payload = pyjwt.decode(
token,
DIRECTUS_JWT_SECRET,
algorithms=["HS256"],
options={"require": ["id", "exp"]},
)
return payload["id"]
except Exception as exc:
log.debug("JWT validation failed: %s", exc)
raise HTTPException(status_code=401, detail="Session invalid or expired")
# ── Heimdall integration ──────────────────────────────────────────────────────
def _ensure_provisioned(user_id: str) -> None:
if not HEIMDALL_ADMIN_TOKEN:
return
try:
requests.post(
f"{HEIMDALL_URL}/admin/provision",
json={"directus_user_id": user_id, "product": "kiwi", "tier": "free"},
headers={"Authorization": f"Bearer {HEIMDALL_ADMIN_TOKEN}"},
timeout=5,
)
except Exception as exc:
log.warning("Heimdall provision failed for user %s: %s", user_id, exc)
def _fetch_cloud_tier(user_id: str) -> tuple[str, str | None, bool, str | None]:
"""Returns (tier, household_id | None, is_household_owner, license_key | None)."""
now = time.monotonic()
cached = _TIER_CACHE.get(user_id)
if cached and (now - cached[1]) < _TIER_CACHE_TTL:
entry = cached[0]
return entry["tier"], entry.get("household_id"), entry.get("is_household_owner", False), entry.get("license_key")
if not HEIMDALL_ADMIN_TOKEN:
return "free", None, False, None
try:
resp = requests.post(
f"{HEIMDALL_URL}/admin/cloud/resolve",
json={"directus_user_id": user_id, "product": "kiwi"},
headers={"Authorization": f"Bearer {HEIMDALL_ADMIN_TOKEN}"},
timeout=5,
)
data = resp.json() if resp.ok else {}
tier = data.get("tier", "free")
household_id = data.get("household_id")
is_owner = data.get("is_household_owner", False)
license_key = data.get("key_display")
except Exception as exc:
log.warning("Heimdall tier resolve failed for user %s: %s", user_id, exc)
tier, household_id, is_owner, license_key = "free", None, False, None
_TIER_CACHE[user_id] = ({"tier": tier, "household_id": household_id, "is_household_owner": is_owner, "license_key": license_key}, now)
return tier, household_id, is_owner, license_key
def _user_db_path(user_id: str, household_id: str | None = None) -> Path: def _user_db_path(user_id: str, household_id: str | None = None) -> Path:
if household_id: if household_id:
@ -194,112 +79,45 @@ def _anon_guest_db_path(guest_id: str) -> Path:
return path return path
# ── BYOK detection ────────────────────────────────────────────────────────────
_LLM_CONFIG_PATH = Path.home() / ".config" / "circuitforge" / "llm.yaml"
def _detect_byok(config_path: Path = _LLM_CONFIG_PATH) -> bool:
"""Return True if at least one enabled non-vision LLM backend is configured.
Reads the same llm.yaml that LLMRouter uses. Local (Ollama, vLLM) and
API-key backends both count the policy is "user is supplying compute",
regardless of where that compute lives.
"""
try:
with open(config_path) as f:
cfg = yaml.safe_load(f) or {}
return any(
b.get("enabled", True) and b.get("type") != "vision_service"
for b in cfg.get("backends", {}).values()
)
except Exception:
return False
# ── FastAPI dependency ──────────────────────────────────────────────────────── # ── FastAPI dependency ────────────────────────────────────────────────────────
_GUEST_COOKIE = "kiwi_guest_id"
_GUEST_COOKIE_MAX_AGE = 60 * 60 * 24 * 90 # 90 days
def _resolve_guest_session(request: Request, response: Response, has_byok: bool) -> CloudUser:
"""Return a per-session anonymous CloudUser, creating a guest UUID cookie if needed."""
guest_id = request.cookies.get(_GUEST_COOKIE, "").strip()
is_new = not guest_id
if is_new:
guest_id = str(uuid.uuid4())
log.debug("New guest session assigned: anon-%s", guest_id[:8])
# Secure flag only when the request actually arrived over HTTPS
# (Caddy sets X-Forwarded-Proto=https in cloud; absent on direct port access).
# Avoids losing the session cookie on HTTP direct-port testing of the cloud stack.
is_https = request.headers.get("x-forwarded-proto", "http").lower() == "https"
response.set_cookie(
key=_GUEST_COOKIE,
value=guest_id,
max_age=_GUEST_COOKIE_MAX_AGE,
httponly=True,
samesite="lax",
secure=is_https,
)
return CloudUser(
user_id=f"anon-{guest_id}",
tier="free",
db=_anon_guest_db_path(guest_id),
has_byok=has_byok,
)
def get_session(request: Request, response: Response) -> CloudUser: def get_session(request: Request, response: Response) -> CloudUser:
"""FastAPI dependency — resolves the current user from the request. """FastAPI dependency — resolves the current user from the request.
Delegates auth/tier resolution to cf-core CloudSessionFactory, then maps
the result to Kiwi's CloudUser with per-user DB path and household data.
Local mode: fully-privileged "local" user pointing at local DB. Local mode: fully-privileged "local" user pointing at local DB.
Cloud mode: validates X-CF-Session JWT, provisions license, resolves tier. Cloud mode: validates X-CF-Session JWT, provisions license, resolves tier.
Dev bypass: if CLOUD_AUTH_BYPASS_IPS is set and the client IP matches, Dev bypass: CLOUD_AUTH_BYPASS_IPS match returns a "local-dev" session.
returns a "local" session without JWT validation (dev/LAN use only). Anonymous: per-session UUID cookie (cf_guest_id) isolates each guest's data.
Anonymous: per-session UUID cookie isolates each guest visitor's data.
""" """
has_byok = _detect_byok() core_user = _core.resolve(request, response)
uid, tier, has_byok = core_user.user_id, core_user.tier, core_user.has_byok
if not CLOUD_MODE: if not CLOUD_MODE or uid in ("local", "local-dev"):
return CloudUser(user_id="local", tier="local", db=_LOCAL_KIWI_DB, has_byok=has_byok) # local-dev gets a writable path under CLOUD_DATA_ROOT; local uses KIWI_DB
db = _user_db_path(uid) if uid == "local-dev" else _LOCAL_KIWI_DB
return CloudUser(user_id=uid, tier=tier, db=db, has_byok=has_byok)
# Prefer X-Real-IP (set by Caddy from the actual client address) over the if uid.startswith("anon-"):
# TCP peer address (which is nginx's container IP when behind the proxy). guest_id = uid[len("anon-"):]
client_ip = ( return CloudUser(
request.headers.get("x-real-ip", "") user_id=uid, tier=tier,
or (request.client.host if request.client else "") db=_anon_guest_db_path(guest_id),
) has_byok=has_byok,
if (_BYPASS_IPS or _BYPASS_NETS) and _is_bypass_ip(client_ip): )
log.debug("CLOUD_AUTH_BYPASS_IPS match for %s — returning local session", client_ip)
# Use a dev DB under CLOUD_DATA_ROOT so the container has a writable path.
dev_db = _user_db_path("local-dev")
return CloudUser(user_id="local-dev", tier="local", db=dev_db, has_byok=has_byok)
# Resolve cf_session JWT: prefer the explicit header injected by Caddy, then household_id = core_user.meta.get("household_id")
# fall back to the cf_session cookie value. Other cookies (e.g. kiwi_guest_id) is_owner = core_user.meta.get("is_household_owner", False)
# must never be treated as auth tokens. license_key = core_user.meta.get("license_key")
raw_session = request.headers.get("x-cf-session", "").strip() log.debug("Resolved %s session uid=%s tier=%s household=%s", _auth_label(uid), uid[:8], tier, household_id)
if not raw_session:
raw_session = request.cookies.get("cf_session", "").strip()
if not raw_session:
return _resolve_guest_session(request, response, has_byok)
token = _extract_session_token(raw_session) # gitleaks:allow — function name, not a secret
if not token:
return _resolve_guest_session(request, response, has_byok)
user_id = validate_session_jwt(token)
_ensure_provisioned(user_id)
tier, household_id, is_household_owner, license_key = _fetch_cloud_tier(user_id)
return CloudUser( return CloudUser(
user_id=user_id, user_id=uid, tier=tier,
tier=tier, db=_user_db_path(uid, household_id=household_id),
db=_user_db_path(user_id, household_id=household_id),
has_byok=has_byok, has_byok=has_byok,
household_id=household_id, household_id=household_id,
is_household_owner=is_household_owner, is_household_owner=is_owner,
license_key=license_key, license_key=license_key,
) )

View file

@ -43,6 +43,10 @@ class Settings:
os.environ.get("BROWSE_COUNTS_PATH", str(DATA_DIR / "browse_counts.db")) os.environ.get("BROWSE_COUNTS_PATH", str(DATA_DIR / "browse_counts.db"))
) )
# Magpie data flywheel — ingest endpoint for anonymized recipe signals
# Set MAGPIE_INGEST_URL to enable; leave unset (or None) to disable silently.
MAGPIE_INGEST_URL: str | None = os.environ.get("MAGPIE_INGEST_URL") or None
# Community feature settings # Community feature settings
COMMUNITY_DB_URL: str | None = os.environ.get("COMMUNITY_DB_URL") or None COMMUNITY_DB_URL: str | None = os.environ.get("COMMUNITY_DB_URL") or None
COMMUNITY_PSEUDONYM_SALT: str = os.environ.get( COMMUNITY_PSEUDONYM_SALT: str = os.environ.get(
@ -61,9 +65,24 @@ class Settings:
# Quality # Quality
MIN_QUALITY_SCORE: float = float(os.environ.get("MIN_QUALITY_SCORE", "50.0")) MIN_QUALITY_SCORE: float = float(os.environ.get("MIN_QUALITY_SCORE", "50.0"))
# CF-core resource coordinator (VRAM lease management) # CF-core resource coordinator (VRAM lease management — lease broker, not inference)
COORDINATOR_URL: str = os.environ.get("COORDINATOR_URL", "http://localhost:7700") COORDINATOR_URL: str = os.environ.get("COORDINATOR_URL", "http://localhost:7700")
# GPU inference server URL
# Priority: GPU_SERVER_URL env var → CF_ORCH_URL env var (backward compat)
# → https://orch.circuitforge.tech when CF_LICENSE_KEY is present (Paid+)
# Resolved value is written back to os.environ["CF_ORCH_URL"] at startup so
# all service-layer callers that read CF_ORCH_URL directly see the right URL.
GPU_SERVER_URL: str | None = (
os.environ.get("GPU_SERVER_URL")
or os.environ.get("CF_ORCH_URL")
or (
"https://orch.circuitforge.tech"
if os.environ.get("CF_LICENSE_KEY")
else None
)
)
# Hosted cf-orch coordinator — bearer token for managed cloud GPU inference (Paid+) # Hosted cf-orch coordinator — bearer token for managed cloud GPU inference (Paid+)
# CFOrchClient reads CF_LICENSE_KEY automatically; exposed here for startup validation. # CFOrchClient reads CF_LICENSE_KEY automatically; exposed here for startup validation.
CF_LICENSE_KEY: str | None = os.environ.get("CF_LICENSE_KEY") CF_LICENSE_KEY: str | None = os.environ.get("CF_LICENSE_KEY")
@ -72,6 +91,17 @@ class Settings:
# runs don't pollute session counts. Set to the Directus UUID of the test user. # runs don't pollute session counts. Set to the Directus UUID of the test user.
E2E_TEST_USER_ID: str | None = os.environ.get("E2E_TEST_USER_ID") or None E2E_TEST_USER_ID: str | None = os.environ.get("E2E_TEST_USER_ID") or None
# ActivityPub federation (optional; disabled by default)
AP_ENABLED: bool = os.environ.get("AP_ENABLED", "false").lower() in ("1", "true", "yes")
AP_HOST: str = os.environ.get("AP_HOST", "") # e.g. kiwi.circuitforge.tech
CLOUD_DATA_ROOT: Path = Path(os.environ.get("CLOUD_DATA_ROOT", "/devl/kiwi-cloud-data"))
AP_KEY_PATH: Path = Path(
os.environ.get("AP_KEY_PATH", str(CLOUD_DATA_ROOT / "ap_keys" / "instance.pem"))
)
# Fernet key for Mastodon access token encryption (base64-urlsafe, 32 bytes)
# Leave unset to skip encryption (dev only)
AP_TOKEN_ENCRYPTION_KEY: str | None = os.environ.get("AP_TOKEN_ENCRYPTION_KEY") or None
# Feature flags # Feature flags
ENABLE_OCR: bool = os.environ.get("ENABLE_OCR", "false").lower() in ("1", "true", "yes") ENABLE_OCR: bool = os.environ.get("ENABLE_OCR", "false").lower() in ("1", "true", "yes")
# Use OrchestratedScheduler (coordinator-aware, multi-GPU fan-out) instead of # Use OrchestratedScheduler (coordinator-aware, multi-GPU fan-out) instead of
@ -93,3 +123,9 @@ class Settings:
settings = Settings() settings = Settings()
# Normalise GPU_SERVER_URL into CF_ORCH_URL so every service-layer caller that
# reads os.environ.get("CF_ORCH_URL") sees the resolved value, including the
# Paid+ cloud default injected above.
if settings.GPU_SERVER_URL:
os.environ["CF_ORCH_URL"] = settings.GPU_SERVER_URL

View file

@ -0,0 +1,31 @@
-- Migration 039: Drop FK constraint on saved_recipes.recipe_id.
--
-- In cloud mode the recipe corpus is ATTACHed as a separate database.
-- SQLite FK constraints only resolve against the `main` schema, so
-- `REFERENCES recipes(id)` was always failing for cloud saves (the
-- main.recipes table is empty; all data lives in corpus.recipes).
-- The corpus is read-only and never modified by the app, so cascade-on-delete
-- is meaningless anyway. Remove the constraint without changing any data.
PRAGMA foreign_keys = OFF;
CREATE TABLE saved_recipes_new (
id INTEGER PRIMARY KEY AUTOINCREMENT,
recipe_id INTEGER NOT NULL,
saved_at TEXT NOT NULL DEFAULT (datetime('now')),
notes TEXT,
rating INTEGER CHECK (rating IS NULL OR (rating >= 0 AND rating <= 5)),
style_tags TEXT NOT NULL DEFAULT '[]',
UNIQUE (recipe_id)
);
INSERT INTO saved_recipes_new SELECT * FROM saved_recipes;
DROP TABLE saved_recipes;
ALTER TABLE saved_recipes_new RENAME TO saved_recipes;
CREATE INDEX IF NOT EXISTS idx_saved_recipes_saved_at ON saved_recipes (saved_at DESC);
CREATE INDEX IF NOT EXISTS idx_saved_recipes_rating ON saved_recipes (rating);
PRAGMA foreign_keys = ON;

View file

@ -0,0 +1,21 @@
-- 040_corrections.sql — corrections table for SFT training data
-- Schema from circuitforge_core.api.corrections.CORRECTIONS_MIGRATION_SQL
CREATE TABLE IF NOT EXISTS corrections (
id INTEGER PRIMARY KEY AUTOINCREMENT,
item_id TEXT NOT NULL DEFAULT '',
product TEXT NOT NULL,
correction_type TEXT NOT NULL,
input_text TEXT NOT NULL,
original_output TEXT NOT NULL,
corrected_output TEXT NOT NULL DEFAULT '',
rating TEXT NOT NULL DEFAULT 'down',
context TEXT NOT NULL DEFAULT '{}',
opted_in INTEGER NOT NULL DEFAULT 0,
created_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_corrections_product
ON corrections (product);
CREATE INDEX IF NOT EXISTS idx_corrections_opted_in
ON corrections (opted_in);

View file

@ -0,0 +1,23 @@
-- Migration 041: user_recipes table for user-scanned and manually-entered recipes.
--
-- Separate from the food.com corpus (recipes table) -- user recipes are personal,
-- not curated, and need different fields (servings as string, cook_time as string).
CREATE TABLE IF NOT EXISTS user_recipes (
id INTEGER PRIMARY KEY AUTOINCREMENT,
title TEXT NOT NULL,
subtitle TEXT,
servings TEXT, -- kept as string: "2", "4-6", "serves 8"
cook_time TEXT, -- kept as string: "25 min", "1 hour"
source_note TEXT, -- e.g. "Purple Carrot", "Betty Crocker"
ingredients TEXT NOT NULL DEFAULT '[]', -- JSON: [{name, qty, unit, raw}]
steps TEXT NOT NULL DEFAULT '[]', -- JSON: ["step 1", "step 2", ...]
notes TEXT,
tags TEXT DEFAULT '[]', -- JSON: ["vegan", "quick"]
source TEXT NOT NULL DEFAULT 'manual', -- 'scan' | 'manual'
pantry_match_pct INTEGER, -- 0-100, computed at scan time; null for manual
created_at TEXT NOT NULL DEFAULT (datetime('now')),
updated_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_user_recipes_created ON user_recipes (created_at DESC);

View file

@ -0,0 +1,47 @@
-- 042_activitypub.sql
-- ActivityPub federation tables: follower registry, delivery log, dedup, Mastodon tokens.
-- Follower registry: AP actors that Follow this Kiwi instance
CREATE TABLE IF NOT EXISTS ap_followers (
id INTEGER PRIMARY KEY,
actor_id TEXT NOT NULL UNIQUE, -- AP actor URL
inbox_url TEXT NOT NULL,
shared_inbox TEXT,
followed_at TEXT NOT NULL DEFAULT (datetime('now')),
active INTEGER NOT NULL DEFAULT 1
);
CREATE INDEX IF NOT EXISTS idx_ap_followers_active
ON ap_followers (active) WHERE active = 1;
-- Outgoing delivery log: one row per (post_slug, target_inbox) attempt
CREATE TABLE IF NOT EXISTS ap_deliveries (
id INTEGER PRIMARY KEY,
post_slug TEXT NOT NULL,
target_inbox TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'pending', -- pending | delivered | failed
attempts INTEGER NOT NULL DEFAULT 0,
last_error TEXT,
created_at TEXT NOT NULL DEFAULT (datetime('now')),
delivered_at TEXT
);
CREATE INDEX IF NOT EXISTS idx_ap_deliveries_status
ON ap_deliveries (status) WHERE status != 'delivered';
-- Incoming activity dedup: prevents replay attacks and double-processing
CREATE TABLE IF NOT EXISTS ap_received (
activity_id TEXT PRIMARY KEY,
received_at TEXT NOT NULL DEFAULT (datetime('now'))
);
-- Mastodon OAuth tokens: per-user, encrypted at rest
-- Stored in the user's local kiwi.db (CLOUD_MODE: per-user DB tree)
CREATE TABLE IF NOT EXISTS mastodon_tokens (
id INTEGER PRIMARY KEY,
directus_user_id TEXT NOT NULL UNIQUE,
instance_url TEXT NOT NULL,
access_token TEXT NOT NULL, -- Fernet-encrypted when AP_TOKEN_ENCRYPTION_KEY set
created_at TEXT NOT NULL DEFAULT (datetime('now')),
updated_at TEXT NOT NULL DEFAULT (datetime('now'))
);

View file

@ -6,6 +6,8 @@ Cloud mode: opens a Store at the per-user DB path from the CloudUser session.
""" """
from __future__ import annotations from __future__ import annotations
import sqlite3
from collections.abc import Iterator
from typing import Generator from typing import Generator
from fastapi import Depends from fastapi import Depends
@ -21,3 +23,16 @@ def get_store(session: CloudUser = Depends(get_session)) -> Generator[Store, Non
yield store yield store
finally: finally:
store.close() store.close()
def get_db(session: CloudUser = Depends(get_session)) -> Iterator[sqlite3.Connection]:
"""FastAPI dependency — yields the raw sqlite3.Connection for the current user.
Used by make_corrections_router() from circuitforge-core, which expects a
dependency that yields a sqlite3.Connection directly.
"""
store = Store(session.db)
try:
yield store.conn
finally:
store.close()

View file

@ -61,6 +61,8 @@ class Store:
"style_tags", "style_tags",
# meal plan columns # meal plan columns
"meal_types", "meal_types",
# user_recipes columns
"steps", "tags",
# captured_products columns # captured_products columns
"allergens"): "allergens"):
if key in d and isinstance(d[key], str): if key in d and isinstance(d[key], str):
@ -1129,6 +1131,19 @@ class Store:
phrases = ['"' + kw.replace('"', '""') + '"' for kw in keywords] phrases = ['"' + kw.replace('"', '""') + '"' for kw in keywords]
return " OR ".join(phrases) return " OR ".join(phrases)
@staticmethod
def _ingredient_fts_term(ingredient: str) -> str:
"""Build an FTS5 ingredient_names column prefix-filter.
Returns e.g. 'ingredient_names : "potato"*' which matches any recipe whose
ingredient_names column contains a token starting with that word. Prefix
matching (*) means "potato" also matches "potatoes", "sweet potato", etc.
Apostrophes are stripped because the FTS5 tokenizer drops them.
"""
cleaned = ingredient.replace("'", "").strip()
escaped = cleaned.replace('"', '""')
return f'ingredient_names : "{escaped}"*'
def _count_recipes_for_keywords(self, keywords: list[str]) -> int: def _count_recipes_for_keywords(self, keywords: list[str]) -> int:
if not keywords: if not keywords:
return 0 return 0
@ -1157,6 +1172,7 @@ class Store:
q: str | None = None, q: str | None = None,
sort: str = "default", sort: str = "default",
sensory_exclude: SensoryExclude | None = None, sensory_exclude: SensoryExclude | None = None,
required_ingredient: str | None = None,
) -> dict: ) -> dict:
"""Return a page of recipes matching the keyword set. """Return a page of recipes matching the keyword set.
@ -1165,9 +1181,11 @@ class Store:
is provided. match_pct is the fraction of ingredient_names covered by is provided. match_pct is the fraction of ingredient_names covered by
the pantry set computed deterministically, no LLM needed. the pantry set computed deterministically, no LLM needed.
q: optional title substring filter (case-insensitive LIKE). q: optional title substring filter (case-insensitive LIKE).
sort: "default" (corpus order) | "alpha" (AZ) | "alpha_desc" (ZA) sort: "default" (corpus order) | "alpha" (AZ) | "alpha_desc" (ZA)
| "match" (pantry coverage DESC falls back to default when no pantry). | "match" (pantry coverage DESC falls back to default when no pantry).
required_ingredient: when set, only return recipes whose ingredient_names contain
this substring (case-insensitive). "must include" filter.
""" """
if keywords is not None and not keywords: if keywords is not None and not keywords:
return {"recipes": [], "total": 0, "page": page} return {"recipes": [], "total": 0, "page": page}
@ -1186,20 +1204,48 @@ class Store:
q_param = f"%{q.strip()}%" if q and q.strip() else None q_param = f"%{q.strip()}%" if q and q.strip() else None
# ── required-ingredient FTS filter (must-include) ─────────────────────
# FTS5 column prefix-filter avoids the full table scan that LIKE '%X%' would do.
req_fts_term = (
self._ingredient_fts_term(required_ingredient) if required_ingredient else ""
)
# ── match sort: push match_pct computation into SQL so ORDER BY works ── # ── match sort: push match_pct computation into SQL so ORDER BY works ──
if effective_sort == "match" and pantry_set: if effective_sort == "match" and pantry_set:
return self._browse_by_match( return self._browse_by_match(
keywords, page, page_size, offset, pantry_set, q_param, c, keywords, page, page_size, offset, pantry_set, q_param, c,
sensory_exclude=sensory_exclude, sensory_exclude=sensory_exclude,
required_ingredient=required_ingredient,
) )
cols = ( cols = (
f"SELECT id, title, category, keywords, ingredient_names," f"SELECT id, title, category, keywords, ingredient_names,"
f" calories, fat_g, protein_g, sodium_mg, directions, sensory_tags FROM {c}recipes" f" calories, fat_g, protein_g, sodium_mg, directions, sensory_tags FROM {c}recipes"
) )
fts_sub = f"id IN (SELECT rowid FROM {c}recipe_browser_fts WHERE recipe_browser_fts MATCH ?)"
if keywords is None: if keywords is None:
if q_param: if req_fts_term:
# Ingredient filter: use FTS index — much faster than LIKE on full table
if q_param:
total = self.conn.execute(
f"SELECT COUNT(*) FROM {c}recipes WHERE {fts_sub} AND LOWER(title) LIKE LOWER(?)",
(req_fts_term, q_param),
).fetchone()[0]
rows = self._fetch_all(
f"{cols} WHERE {fts_sub} AND LOWER(title) LIKE LOWER(?) {order_clause} LIMIT ? OFFSET ?",
(req_fts_term, q_param, page_size, offset),
)
else:
total = self.conn.execute(
f"SELECT COUNT(*) FROM {c}recipes WHERE {fts_sub}",
(req_fts_term,),
).fetchone()[0]
rows = self._fetch_all(
f"{cols} WHERE {fts_sub} {order_clause} LIMIT ? OFFSET ?",
(req_fts_term, page_size, offset),
)
elif q_param:
total = self.conn.execute( total = self.conn.execute(
f"SELECT COUNT(*) FROM {c}recipes WHERE LOWER(title) LIKE LOWER(?)", f"SELECT COUNT(*) FROM {c}recipes WHERE LOWER(title) LIKE LOWER(?)",
(q_param,), (q_param,),
@ -1215,23 +1261,32 @@ class Store:
(page_size, offset), (page_size, offset),
) )
else: else:
match_expr = self._browser_fts_query(keywords) keywords_expr = self._browser_fts_query(keywords)
fts_sub = f"id IN (SELECT rowid FROM {c}recipe_browser_fts WHERE recipe_browser_fts MATCH ?)" # Combine keywords + ingredient into one FTS MATCH to use a single index pass
combined_match = (
f"({keywords_expr}) AND {req_fts_term}" if req_fts_term else keywords_expr
)
if q_param: if q_param:
total = self.conn.execute( total = self.conn.execute(
f"SELECT COUNT(*) FROM {c}recipes WHERE {fts_sub} AND LOWER(title) LIKE LOWER(?)", f"SELECT COUNT(*) FROM {c}recipes WHERE {fts_sub} AND LOWER(title) LIKE LOWER(?)",
(match_expr, q_param), (combined_match, q_param),
).fetchone()[0] ).fetchone()[0]
rows = self._fetch_all( rows = self._fetch_all(
f"{cols} WHERE {fts_sub} AND LOWER(title) LIKE LOWER(?) {order_clause} LIMIT ? OFFSET ?", f"{cols} WHERE {fts_sub} AND LOWER(title) LIKE LOWER(?) {order_clause} LIMIT ? OFFSET ?",
(match_expr, q_param, page_size, offset), (combined_match, q_param, page_size, offset),
) )
else: else:
# Reuse cached count — avoids a second index scan on every page turn. if required_ingredient:
total = self._count_recipes_for_keywords(keywords) total = self.conn.execute(
f"SELECT COUNT(*) FROM {c}recipes WHERE {fts_sub}",
(combined_match,),
).fetchone()[0]
else:
# Reuse cached count — avoids a second index scan on every page turn.
total = self._count_recipes_for_keywords(keywords)
rows = self._fetch_all( rows = self._fetch_all(
f"{cols} WHERE {fts_sub} {order_clause} LIMIT ? OFFSET ?", f"{cols} WHERE {fts_sub} {order_clause} LIMIT ? OFFSET ?",
(match_expr, page_size, offset), (combined_match, page_size, offset),
) )
# Community tag fallback: if FTS found nothing, check whether # Community tag fallback: if FTS found nothing, check whether
# community-tagged recipe IDs exist for this keyword context. # community-tagged recipe IDs exist for this keyword context.
@ -1313,6 +1368,7 @@ class Store:
q_param: str | None, q_param: str | None,
c: str, c: str,
sensory_exclude: SensoryExclude | None = None, sensory_exclude: SensoryExclude | None = None,
required_ingredient: str | None = None,
) -> dict: ) -> dict:
"""Browse recipes sorted by pantry match percentage. """Browse recipes sorted by pantry match percentage.
@ -1327,16 +1383,48 @@ class Store:
pantry_lower = {p.lower() for p in pantry_set} pantry_lower = {p.lower() for p in pantry_set}
# ── required-ingredient FTS filter (must-include) ─────────────────────
req_fts_term = (
self._ingredient_fts_term(required_ingredient) if required_ingredient else ""
)
# ── Fetch candidate pool from FTS ──────────────────────────────────── # ── Fetch candidate pool from FTS ────────────────────────────────────
base_cols = ( base_cols = (
f"SELECT r.id, r.title, r.category, r.ingredient_names, r.directions, r.sensory_tags" f"SELECT r.id, r.title, r.category, r.ingredient_names, r.directions, r.sensory_tags"
f" FROM {c}recipes r" f" FROM {c}recipes r"
) )
fts_sub = (
f"r.id IN (SELECT rowid FROM {c}recipe_browser_fts"
f" WHERE recipe_browser_fts MATCH ?)"
)
self.conn.row_factory = sqlite3.Row self.conn.row_factory = sqlite3.Row
if keywords is None: if keywords is None:
if q_param: if req_fts_term:
if q_param:
total = self.conn.execute(
f"SELECT COUNT(*) FROM {c}recipes WHERE id IN"
f" (SELECT rowid FROM {c}recipe_browser_fts WHERE recipe_browser_fts MATCH ?)"
f" AND LOWER(title) LIKE LOWER(?)",
(req_fts_term, q_param),
).fetchone()[0]
rows = self.conn.execute(
f"{base_cols} WHERE {fts_sub} AND LOWER(r.title) LIKE LOWER(?)"
f" ORDER BY r.id ASC LIMIT ?",
(req_fts_term, q_param, self._MATCH_POOL_SIZE),
).fetchall()
else:
total = self.conn.execute(
f"SELECT COUNT(*) FROM {c}recipes WHERE id IN"
f" (SELECT rowid FROM {c}recipe_browser_fts WHERE recipe_browser_fts MATCH ?)",
(req_fts_term,),
).fetchone()[0]
rows = self.conn.execute(
f"{base_cols} WHERE {fts_sub} ORDER BY r.id ASC LIMIT ?",
(req_fts_term, self._MATCH_POOL_SIZE),
).fetchall()
elif q_param:
total = self.conn.execute( total = self.conn.execute(
f"SELECT COUNT(*) FROM {c}recipes WHERE LOWER(title) LIKE LOWER(?)", f"SELECT COUNT(*) FROM {c}recipes WHERE LOWER(title) LIKE LOWER(?)",
(q_param,), (q_param,),
@ -1355,27 +1443,32 @@ class Store:
(self._MATCH_POOL_SIZE,), (self._MATCH_POOL_SIZE,),
).fetchall() ).fetchall()
else: else:
match_expr = self._browser_fts_query(keywords) keywords_expr = self._browser_fts_query(keywords)
fts_sub = ( combined_match = (
f"r.id IN (SELECT rowid FROM {c}recipe_browser_fts" f"({keywords_expr}) AND {req_fts_term}" if req_fts_term else keywords_expr
f" WHERE recipe_browser_fts MATCH ?)"
) )
if q_param: if q_param:
total = self.conn.execute( total = self.conn.execute(
f"SELECT COUNT(*) FROM {c}recipes r" f"SELECT COUNT(*) FROM {c}recipes r"
f" WHERE {fts_sub} AND LOWER(r.title) LIKE LOWER(?)", f" WHERE {fts_sub} AND LOWER(r.title) LIKE LOWER(?)",
(match_expr, q_param), (combined_match, q_param),
).fetchone()[0] ).fetchone()[0]
rows = self.conn.execute( rows = self.conn.execute(
f"{base_cols} WHERE {fts_sub} AND LOWER(r.title) LIKE LOWER(?)" f"{base_cols} WHERE {fts_sub} AND LOWER(r.title) LIKE LOWER(?)"
f" ORDER BY r.id ASC LIMIT ?", f" ORDER BY r.id ASC LIMIT ?",
(match_expr, q_param, self._MATCH_POOL_SIZE), (combined_match, q_param, self._MATCH_POOL_SIZE),
).fetchall() ).fetchall()
else: else:
total = self._count_recipes_for_keywords(keywords) if required_ingredient:
total = self.conn.execute(
f"SELECT COUNT(*) FROM {c}recipes r WHERE {fts_sub}",
(combined_match,),
).fetchone()[0]
else:
total = self._count_recipes_for_keywords(keywords)
rows = self.conn.execute( rows = self.conn.execute(
f"{base_cols} WHERE {fts_sub} ORDER BY r.id ASC LIMIT ?", f"{base_cols} WHERE {fts_sub} ORDER BY r.id ASC LIMIT ?",
(match_expr, self._MATCH_POOL_SIZE), (combined_match, self._MATCH_POOL_SIZE),
).fetchall() ).fetchall()
# ── Score in Python, sort, paginate ────────────────────────────────── # ── Score in Python, sort, paginate ──────────────────────────────────
@ -1711,3 +1804,54 @@ class Store:
confidence, 1 if confirmed_by_user else 0, source, confidence, 1 if confirmed_by_user else 0, source,
), ),
) )
# ── User Recipes (kiwi#9) ──────────────────────────────────────────────────
def create_user_recipe(
self,
title: str,
ingredients: list[dict],
steps: list[str],
subtitle: str | None = None,
servings: str | None = None,
cook_time: str | None = None,
source_note: str | None = None,
notes: str | None = None,
tags: list[str] | None = None,
source: str = "manual",
pantry_match_pct: int | None = None,
) -> dict[str, Any]:
return self._insert_returning(
"""INSERT INTO user_recipes
(title, subtitle, servings, cook_time, source_note,
ingredients, steps, notes, tags, source, pantry_match_pct)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
RETURNING *""",
(
title, subtitle, servings, cook_time, source_note,
self._dump(ingredients),
self._dump(steps),
notes,
self._dump(tags or []),
source,
pantry_match_pct,
),
)
def get_user_recipe(self, recipe_id: int) -> dict[str, Any] | None:
return self._fetch_one(
"SELECT * FROM user_recipes WHERE id = ?",
(recipe_id,),
)
def list_user_recipes(self) -> list[dict[str, Any]]:
return self._fetch_all(
"SELECT * FROM user_recipes ORDER BY created_at DESC",
)
def delete_user_recipe(self, recipe_id: int) -> bool:
cur = self.conn.execute(
"DELETE FROM user_recipes WHERE id = ?", (recipe_id,)
)
self.conn.commit()
return cur.rowcount > 0

View file

@ -43,6 +43,11 @@ async def _browse_counts_refresh_loop(corpus_path: str) -> None:
async def lifespan(app: FastAPI): async def lifespan(app: FastAPI):
logger.info("Starting Kiwi API...") logger.info("Starting Kiwi API...")
settings.ensure_dirs() settings.ensure_dirs()
# Run DB migrations at startup (ensures all tables exist before any request)
from app.db.store import Store
_s = Store(settings.DB_PATH)
_s.close()
register_kiwi_programs() register_kiwi_programs()
# Start LLM background task scheduler # Start LLM background task scheduler
@ -54,6 +59,14 @@ async def lifespan(app: FastAPI):
from app.api.endpoints.community import init_community_store from app.api.endpoints.community import init_community_store
init_community_store(settings.COMMUNITY_DB_URL) init_community_store(settings.COMMUNITY_DB_URL)
# Initialize ActivityPub instance actor (no-op when AP_ENABLED=false)
if settings.AP_ENABLED and settings.AP_HOST:
try:
from app.services.ap.keys import init_actor
init_actor(host=settings.AP_HOST, key_path=settings.AP_KEY_PATH)
except Exception as _ap_exc:
logger.warning("AP init failed (AP features disabled): %s", _ap_exc)
# Browse counts cache — warm in-memory cache from disk, refresh if stale. # Browse counts cache — warm in-memory cache from disk, refresh if stale.
# Uses the corpus path the store will attach to at request time. # Uses the corpus path the store will attach to at request time.
corpus_path = os.environ.get("RECIPE_DB_PATH", str(settings.DB_PATH)) corpus_path = os.environ.get("RECIPE_DB_PATH", str(settings.DB_PATH))
@ -101,6 +114,11 @@ app.add_middleware(
app.include_router(api_router, prefix=settings.API_PREFIX) app.include_router(api_router, prefix=settings.API_PREFIX)
# AP endpoints: WebFinger at root (not under /api/v1), AP objects under /ap
from app.api.endpoints.activitypub import ap_router, webfinger_router
app.include_router(webfinger_router)
app.include_router(ap_router)
@app.get("/") @app.get("/")
async def root(): async def root():

0
app/mcp/__init__.py Normal file
View file

306
app/mcp/server.py Normal file
View file

@ -0,0 +1,306 @@
"""Kiwi MCP Server — read-only corpus DB access for tag/keyword audits.
Exposes four tools to Claude:
kiwi_query_corpus run a read-only SQL query against the corpus DB
kiwi_count_fts run an FTS5 MATCH expression and return row count
kiwi_sample_tags return tag frequency distribution by prefix
kiwi_browse_preview call the browse endpoint and return first-page results
Run with:
python -m app.mcp.server
(from /Library/Development/CircuitForge/kiwi with cf conda env active)
Configure in Claude Code ~/.claude/settings.json mcpServers:
"kiwi": {
"command": "/devl/miniconda3/envs/cf/bin/python",
"args": ["-m", "app.mcp.server"],
"cwd": "/Library/Development/CircuitForge/kiwi",
"env": {
"KIWI_DB_PATH": "/Library/Development/CircuitForge/kiwi/data/kiwi.db",
"KIWI_API_URL": "http://localhost:8512"
}
}
"""
from __future__ import annotations
import asyncio
import json
import os
import sqlite3
from pathlib import Path
import httpx
from mcp.server import Server
from mcp.server.stdio import stdio_server
from mcp.types import TextContent, Tool
_DB_PATH = os.environ.get(
"KIWI_DB_PATH",
str(Path(__file__).parents[3] / "data" / "kiwi.db"),
)
_API_URL = os.environ.get("KIWI_API_URL", "http://localhost:8512")
_TIMEOUT = 30.0
_QUERY_ROW_LIMIT = 200
server = Server("kiwi")
def _open_ro() -> sqlite3.Connection:
"""Open the corpus DB in read-only mode."""
uri = f"file:///{Path(_DB_PATH).as_posix()}?mode=ro"
conn = sqlite3.connect(uri, uri=True, check_same_thread=False)
conn.row_factory = sqlite3.Row
return conn
@server.list_tools()
async def list_tools() -> list[Tool]:
return [
Tool(
name="kiwi_query_corpus",
description=(
"Run a read-only SQL SELECT query against the Kiwi corpus DB (kiwi.db). "
"Returns up to 200 rows as a JSON array. "
"Key tables: recipes (id, title, ingredient_names, inferred_tags, source_url), "
"recipes_fts (FTS5 virtual table for full-text search), "
"ingredient_profiles (name, elements, texture_profile). "
"Use for schema exploration, spot-checking tag coverage, and counting results. "
"Read-only — any write statement will be rejected by SQLite."
),
inputSchema={
"type": "object",
"required": ["sql"],
"properties": {
"sql": {
"type": "string",
"description": (
"A SELECT statement. E.g.: "
"SELECT title, inferred_tags FROM recipes WHERE inferred_tags LIKE '%vegan%' LIMIT 10"
),
},
},
},
),
Tool(
name="kiwi_count_fts",
description=(
"Run an FTS5 MATCH expression against the recipes_fts table and return the hit count. "
"Useful for quickly auditing keyword coverage without a full query. "
"Always double-quote all terms in MATCH expressions. "
"E.g. match_expr='\"tofu\" OR \"tempeh\"' returns how many recipes include either."
),
inputSchema={
"type": "object",
"required": ["match_expr"],
"properties": {
"match_expr": {
"type": "string",
"description": (
"FTS5 MATCH expression string (without the MATCH keyword). "
'E.g. \'"lentil" OR "chickpea"\' or \'"pasta" AND "vegetarian"\''
),
},
},
},
),
Tool(
name="kiwi_sample_tags",
description=(
"Return tag frequency distribution from the corpus. "
"Queries inferred_tags column for tags matching the given prefix pattern. "
"Useful for auditing how well a category keyword set covers the corpus, "
"or discovering what tags exist under a domain (cuisine:, meal:, dietary:, texture:)."
),
inputSchema={
"type": "object",
"properties": {
"prefix": {
"type": "string",
"default": "",
"description": (
"Tag prefix to filter by. E.g. 'cuisine:' returns all cuisine tags, "
"'meal:' returns all meal type tags, '' returns all tags. "
"Returns top 50 by frequency."
),
},
"limit": {
"type": "integer",
"default": 50,
"description": "Max number of tag entries to return (default 50, max 200).",
},
},
},
),
Tool(
name="kiwi_browse_preview",
description=(
"Call the Kiwi browse endpoint and return first-page results. "
"Use to verify that a domain/category returns the expected recipes "
"after a keyword or tag change, without opening the browser. "
"Returns recipe titles, match counts, and total result count."
),
inputSchema={
"type": "object",
"required": ["domain", "category"],
"properties": {
"domain": {
"type": "string",
"description": (
"Browse domain slug. "
"Known domains: cuisine, meal_type, dietary, ingredient, occasion, texture."
),
},
"category": {
"type": "string",
"description": "Category slug within the domain, e.g. 'italian', 'breakfast', 'vegan'.",
},
"subcategory": {
"type": "string",
"default": "",
"description": "Optional subcategory slug to narrow further.",
},
"page_size": {
"type": "integer",
"default": 10,
"description": "Results per page (default 10, max 50).",
},
},
},
),
]
@server.call_tool()
async def call_tool(name: str, arguments: dict) -> list[TextContent]:
if name == "kiwi_query_corpus":
return await _query_corpus(arguments)
if name == "kiwi_count_fts":
return await _count_fts(arguments)
if name == "kiwi_sample_tags":
return await _sample_tags(arguments)
if name == "kiwi_browse_preview":
return await _browse_preview(arguments)
return [TextContent(type="text", text=f"Unknown tool: {name}")]
async def _query_corpus(args: dict) -> list[TextContent]:
sql = args.get("sql", "").strip()
if not sql.upper().startswith("SELECT"):
return [TextContent(type="text", text="Error: only SELECT statements are allowed.")]
def _run() -> list[dict]:
conn = _open_ro()
try:
cur = conn.execute(sql)
rows = cur.fetchmany(_QUERY_ROW_LIMIT)
return [dict(r) for r in rows]
finally:
conn.close()
try:
rows = await asyncio.get_event_loop().run_in_executor(None, _run)
return [TextContent(type="text", text=json.dumps(rows, indent=2, default=str))]
except Exception as exc:
return [TextContent(type="text", text=f"Query error: {exc}")]
async def _count_fts(args: dict) -> list[TextContent]:
match_expr = args.get("match_expr", "").strip()
if not match_expr:
return [TextContent(type="text", text="Error: match_expr is required.")]
def _run() -> int:
conn = _open_ro()
try:
cur = conn.execute(
"SELECT COUNT(*) FROM recipes_fts WHERE recipes_fts MATCH ?",
(match_expr,),
)
return cur.fetchone()[0]
finally:
conn.close()
try:
count = await asyncio.get_event_loop().run_in_executor(None, _run)
return [TextContent(type="text", text=json.dumps({"match_expr": match_expr, "count": count}))]
except Exception as exc:
return [TextContent(type="text", text=f"FTS error: {exc}")]
async def _sample_tags(args: dict) -> list[TextContent]:
prefix = args.get("prefix", "")
limit = min(int(args.get("limit", 50)), _QUERY_ROW_LIMIT)
def _run() -> list[dict]:
conn = _open_ro()
try:
# Split inferred_tags (comma or space separated) and count each tag
sql = """
WITH tag_rows AS (
SELECT trim(value) AS tag
FROM recipes, json_each('["' || replace(replace(inferred_tags, ', ', '","'), ',', '","') || '"]')
WHERE inferred_tags IS NOT NULL AND inferred_tags != ''
)
SELECT tag, COUNT(*) AS frequency
FROM tag_rows
WHERE tag LIKE ? AND tag != ''
GROUP BY tag
ORDER BY frequency DESC
LIMIT ?
"""
pattern = f"{prefix}%" if prefix else "%"
cur = conn.execute(sql, (pattern, limit))
return [{"tag": r["tag"], "frequency": r["frequency"]} for r in cur.fetchall()]
finally:
conn.close()
try:
tags = await asyncio.get_event_loop().run_in_executor(None, _run)
return [TextContent(type="text", text=json.dumps({"prefix": prefix, "tags": tags}, indent=2))]
except Exception as exc:
return [TextContent(type="text", text=f"Tag query error: {exc}")]
async def _browse_preview(args: dict) -> list[TextContent]:
domain = args.get("domain", "")
category = args.get("category", "")
subcategory = args.get("subcategory", "")
page_size = min(int(args.get("page_size", 10)), 50)
params: dict = {"page": 1, "page_size": page_size}
if subcategory:
params["subcategory"] = subcategory
async with httpx.AsyncClient(timeout=_TIMEOUT) as client:
try:
resp = await client.get(
f"{_API_URL}/api/v1/recipes/browse/{domain}/{category}",
params=params,
)
resp.raise_for_status()
except Exception as exc:
return [TextContent(type="text", text=f"Browse error: {exc}")]
data = resp.json()
summary = {
"domain": domain,
"category": category,
"subcategory": subcategory or None,
"total": data.get("total", 0),
"page_size": page_size,
"titles": [r.get("title", "") for r in data.get("recipes", [])],
}
return [TextContent(type="text", text=json.dumps(summary, indent=2))]
async def _main() -> None:
async with stdio_server() as (read_stream, write_stream):
await server.run(
read_stream,
write_stream,
server.create_initialization_options(),
)
if __name__ == "__main__":
asyncio.run(_main())

View file

@ -4,6 +4,36 @@ from __future__ import annotations
from pydantic import BaseModel, Field from pydantic import BaseModel, Field
class LeftoversResponse(BaseModel):
"""Cooked-leftover shelf-life estimate returned by POST /recipes/{id}/leftovers."""
fridge_days: int
freeze_days: int | None = None # None = not recommended
freeze_by_day: int | None = None # day number from cook date to freeze by
storage_advice: str
class StepAnalysis(BaseModel):
"""Active/passive classification for one direction step."""
is_passive: bool
detected_minutes: int | None = None
prep_min: int | None = None # estimated physical prep time (action detection)
class TimeEffortProfile(BaseModel):
"""Parsed time and effort profile for a recipe.
Mirrors app.services.recipe.time_effort.TimeEffortProfile (dataclass).
Serialised into RecipeSuggestion so the frontend can render the effort
summary without a second round-trip.
"""
active_min: int = 0
passive_min: int = 0
total_min: int = 0
effort_label: str = "moderate" # "quick" | "moderate" | "involved"
equipment: list[str] = Field(default_factory=list)
step_analyses: list[StepAnalysis] = Field(default_factory=list)
class SwapCandidate(BaseModel): class SwapCandidate(BaseModel):
original_name: str original_name: str
substitute_name: str substitute_name: str
@ -43,6 +73,7 @@ class RecipeSuggestion(BaseModel):
source_url: str | None = None source_url: str | None = None
complexity: str | None = None # 'easy' | 'moderate' | 'involved' complexity: str | None = None # 'easy' | 'moderate' | 'involved'
estimated_time_min: int | None = None # derived from step count + method signals estimated_time_min: int | None = None # derived from step count + method signals
time_effort: TimeEffortProfile | None = None # full time/effort profile from parse_time_effort
rerank_score: float | None = None # cross-encoder relevance score (paid+ only, None for free tier) rerank_score: float | None = None # cross-encoder relevance score (paid+ only, None for free tier)
@ -106,7 +137,8 @@ class RecipeRequest(BaseModel):
pantry_match_only: bool = False # when True, only return recipes with zero missing ingredients pantry_match_only: bool = False # when True, only return recipes with zero missing ingredients
complexity_filter: str | None = None # 'easy' | 'moderate' | 'involved' — None = any complexity_filter: str | None = None # 'easy' | 'moderate' | 'involved' — None = any
max_time_min: int | None = None # filter by estimated cooking time ceiling max_time_min: int | None = None # filter by estimated cooking time ceiling
max_total_min: int | None = None # filter by parsed total time from recipe directions max_total_min: int | None = None # filter by parsed total time (active + passive)
max_active_min: int | None = None # filter by hands-on active time only
unit_system: str = "metric" # "metric" | "imperial" unit_system: str = "metric" # "metric" | "imperial"
@ -174,3 +206,24 @@ class StreamTokenResponse(BaseModel):
stream_url: str stream_url: str
token: str token: str
expires_in_s: int expires_in_s: int
class AskRequest(BaseModel):
"""Request body for POST /recipes/ask."""
question: str = Field(min_length=1, max_length=500)
pantry_items: list[str] = Field(default_factory=list)
class AskRecipeHit(BaseModel):
"""A single recipe result from the Ask endpoint."""
id: int
title: str
match_pct: float | None = None
category: str | None = None
class AskResponse(BaseModel):
"""Response from POST /recipes/ask."""
answer: str | None = None # LLM-synthesized response (Paid tier only)
recipes: list[AskRecipeHit]
tier: str

View file

@ -0,0 +1,74 @@
"""Pydantic schemas for the recipe scanner (kiwi#9).
Scan input photo(s).
Scan output ScannedRecipeResponse (for review + editing before save).
Save input ScannedRecipeSaveRequest.
User recipe output UserRecipeResponse (after save).
"""
from __future__ import annotations
from pydantic import BaseModel, Field
# ── Ingredient in a scanned recipe ────────────────────────────────────────────
class ScannedIngredientSchema(BaseModel):
"""One ingredient line extracted from a recipe photo."""
name: str # normalized generic name ("ranch dressing")
qty: str | None = None # quantity as string, preserving fractions ("1/2", "¼")
unit: str | None = None # unit of measure; null for countable items
raw: str | None = None # verbatim original line from the image
in_pantry: bool = False # True if this ingredient matches something in the pantry
# ── Scan response (returned immediately, not persisted) ───────────────────────
class ScannedRecipeResponse(BaseModel):
"""Structured recipe extracted from photo(s). Returned for user review before save."""
title: str | None = None
subtitle: str | None = None # e.g. "with Broccoli & Ranch Dressing"
servings: str | None = None # kept as string: "2", "4-6", "serves 8"
cook_time: str | None = None # kept as string: "25 min", "1 hour"
source_note: str | None = None # e.g. "Purple Carrot", "Betty Crocker"
ingredients: list[ScannedIngredientSchema] = Field(default_factory=list)
steps: list[str] = Field(default_factory=list)
notes: str | None = None
tags: list[str] = Field(default_factory=list)
pantry_match_pct: int = 0 # 0-100: percentage of ingredients found in pantry
confidence: str = "medium" # "high" | "medium" | "low"
warnings: list[str] = Field(default_factory=list)
# ── Save request ──────────────────────────────────────────────────────────────
class ScannedRecipeSaveRequest(BaseModel):
"""User-reviewed (possibly edited) recipe data to persist as a user recipe."""
title: str
subtitle: str | None = None
servings: str | None = None
cook_time: str | None = None
source_note: str | None = None
ingredients: list[ScannedIngredientSchema]
steps: list[str]
notes: str | None = None
tags: list[str] = Field(default_factory=list)
source: str = "scan" # "scan" | "manual"
# ── User recipe (persisted) ───────────────────────────────────────────────────
class UserRecipeResponse(BaseModel):
"""A user-created or user-scanned recipe stored in user_recipes table."""
id: int
title: str
subtitle: str | None = None
servings: str | None = None
cook_time: str | None = None
source_note: str | None = None
ingredients: list[ScannedIngredientSchema]
steps: list[str]
notes: str | None = None
tags: list[str] = Field(default_factory=list)
source: str
pantry_match_pct: int | None = None
created_at: str

View file

115
app/services/ap/delivery.py Normal file
View file

@ -0,0 +1,115 @@
# app/services/ap/delivery.py
# MIT License
from __future__ import annotations
import logging
import time
from datetime import datetime, timezone
from pathlib import Path
from circuitforge_core.activitypub import deliver_activity
from app.services.ap.keys import get_actor
logger = logging.getLogger(__name__)
_RETRIES = 3
_BACKOFF = [1.0, 4.0, 16.0]
def deliver_to_followers(post_slug: str, activity: dict, db_path: Path) -> None:
"""Deliver an AP activity to all active followers. Called as a background task.
Retries each inbox up to 3 times with exponential backoff.
Logs each attempt to ap_deliveries in the local kiwi.db.
"""
actor = get_actor()
if actor is None:
return
import sqlite3
conn = sqlite3.connect(str(db_path))
conn.row_factory = sqlite3.Row
try:
followers = conn.execute(
"SELECT inbox_url, shared_inbox FROM ap_followers WHERE active = 1"
).fetchall()
finally:
conn.close()
# Deduplicate by shared_inbox where available
inboxes: set[str] = set()
for row in followers:
inbox = row["shared_inbox"] or row["inbox_url"]
inboxes.add(inbox)
for inbox_url in inboxes:
_deliver_with_retry(post_slug=post_slug, activity=activity, inbox_url=inbox_url, db_path=db_path)
def _deliver_with_retry(
post_slug: str,
activity: dict,
inbox_url: str,
db_path: Path,
) -> None:
actor = get_actor()
if actor is None:
return
import sqlite3
conn = sqlite3.connect(str(db_path))
try:
conn.execute(
"INSERT OR IGNORE INTO ap_deliveries (post_slug, target_inbox, status) VALUES (?,?,?)",
(post_slug, inbox_url, "pending"),
)
conn.commit()
finally:
conn.close()
last_error: str | None = None
for attempt, delay in enumerate(_BACKOFF[:_RETRIES]):
try:
resp = deliver_activity(activity=activity, inbox_url=inbox_url, actor=actor, timeout=10.0)
if resp.status_code < 300:
_update_delivery(db_path, post_slug, inbox_url, "delivered", None)
return
last_error = f"HTTP {resp.status_code}"
except Exception as exc:
last_error = str(exc)[:200]
if attempt < _RETRIES - 1:
time.sleep(delay)
_update_delivery(db_path, post_slug, inbox_url, "failed", last_error)
logger.warning("AP delivery failed after %d attempts to %s: %s", _RETRIES, inbox_url, last_error)
def _update_delivery(
db_path: Path,
post_slug: str,
inbox_url: str,
status: str,
error: str | None,
) -> None:
import sqlite3
now = datetime.now(timezone.utc).isoformat()
conn = sqlite3.connect(str(db_path))
try:
if status == "delivered":
conn.execute(
"""UPDATE ap_deliveries SET status=?, attempts=attempts+1, delivered_at=?
WHERE post_slug=? AND target_inbox=?""",
(status, now, post_slug, inbox_url),
)
else:
conn.execute(
"""UPDATE ap_deliveries SET status=?, attempts=attempts+1, last_error=?
WHERE post_slug=? AND target_inbox=?""",
(status, error, post_slug, inbox_url),
)
conn.commit()
finally:
conn.close()

48
app/services/ap/keys.py Normal file
View file

@ -0,0 +1,48 @@
# app/services/ap/keys.py
# MIT License
from __future__ import annotations
import logging
from pathlib import Path
from circuitforge_core.activitypub import CFActor, generate_rsa_keypair, load_actor_from_key_file
logger = logging.getLogger(__name__)
_actor: CFActor | None = None
def get_actor() -> CFActor | None:
"""Return the loaded instance actor, or None if AP is not enabled."""
return _actor
def init_actor(host: str, key_path: Path) -> CFActor:
"""Load or generate the instance RSA keypair and build the CFActor singleton.
Called once at startup when AP_ENABLED=true. Generates a new 2048-bit keypair
if the key file does not yet exist (first boot).
"""
global _actor
key_path.parent.mkdir(parents=True, exist_ok=True)
if not key_path.exists():
logger.info("AP: no key file found at %s — generating new RSA-2048 keypair", key_path)
private_pem, _pub = generate_rsa_keypair(bits=2048)
key_path.write_text(private_pem, encoding="utf-8")
key_path.chmod(0o600)
base = f"https://{host}"
actor_id = f"{base}/ap/actor"
_actor = load_actor_from_key_file(
actor_id=actor_id,
username="kiwi",
display_name="Kiwi Pantry",
private_key_path=str(key_path),
summary="Community pantry and recipe feed from a Kiwi instance.",
)
logger.info("AP: instance actor loaded — %s", actor_id)
return _actor

194
app/services/ap/mastodon.py Normal file
View file

@ -0,0 +1,194 @@
# app/services/ap/mastodon.py
# MIT License
from __future__ import annotations
import logging
from pathlib import Path
import httpx
logger = logging.getLogger(__name__)
_APP_SCOPES = "write:statuses"
_APP_NAME = "Kiwi Pantry"
_APP_WEBSITE = "https://circuitforge.tech/kiwi"
def register_app(instance_url: str, redirect_uri: str) -> dict:
"""Dynamically register Kiwi as an OAuth app on the user's Mastodon instance.
Returns the app credentials dict (client_id, client_secret, etc.).
Raises httpx.HTTPError on failure.
"""
url = instance_url.rstrip("/") + "/api/v1/apps"
resp = httpx.post(
url,
data={
"client_name": _APP_NAME,
"redirect_uris": redirect_uri,
"scopes": _APP_SCOPES,
"website": _APP_WEBSITE,
},
timeout=10.0,
)
resp.raise_for_status()
return resp.json()
def build_authorize_url(instance_url: str, client_id: str, redirect_uri: str) -> str:
"""Return the OAuth authorize URL to redirect the user to."""
return (
f"{instance_url.rstrip('/')}/oauth/authorize"
f"?response_type=code"
f"&client_id={client_id}"
f"&redirect_uri={redirect_uri}"
f"&scope={_APP_SCOPES}"
)
def exchange_code(
instance_url: str,
client_id: str,
client_secret: str,
code: str,
redirect_uri: str,
) -> str:
"""Exchange an authorization code for an access token. Returns the token string."""
url = instance_url.rstrip("/") + "/oauth/token"
resp = httpx.post(
url,
data={
"grant_type": "authorization_code",
"client_id": client_id,
"client_secret": client_secret,
"redirect_uri": redirect_uri,
"code": code,
"scope": _APP_SCOPES,
},
timeout=10.0,
)
resp.raise_for_status()
return resp.json()["access_token"]
def post_status(instance_url: str, access_token: str, content: str) -> dict:
"""Post a status to the user's Mastodon account. Returns the status response dict."""
url = instance_url.rstrip("/") + "/api/v1/statuses"
resp = httpx.post(
url,
headers={"Authorization": f"Bearer {access_token}"},
json={"status": content, "visibility": "public"},
timeout=15.0,
)
resp.raise_for_status()
return resp.json()
def build_post_content(post: dict) -> str:
"""Format a community post dict as Mastodon-ready plain text."""
title = post.get("title") or "Untitled"
recipe = post.get("recipe_name")
notes = post.get("outcome_notes") or post.get("description")
tags_raw: list[str] = post.get("dietary_tags") or []
lines = []
if recipe and recipe != title:
lines.append(f"🍽 {title}{recipe}")
else:
lines.append(f"🍽 {title}")
if notes:
snippet = notes[:200].strip()
if len(notes) > 200:
snippet += ""
lines.append(f"\n{snippet}")
hashtags = ["#Kiwi", "#Cooking"]
for tag in tags_raw[:3]:
ht = "#" + "".join(w.capitalize() for w in tag.replace("-", " ").split())
hashtags.append(ht)
lines.append("\n" + " ".join(hashtags))
return "\n".join(lines)
def store_token(
db_path: Path,
directus_user_id: str,
instance_url: str,
access_token: str,
encryption_key: str | None,
) -> None:
"""Persist a Mastodon access token in the user's local kiwi.db."""
token_to_store = _encrypt(access_token, encryption_key)
import sqlite3
conn = sqlite3.connect(str(db_path))
try:
conn.execute(
"""INSERT INTO mastodon_tokens (directus_user_id, instance_url, access_token)
VALUES (?, ?, ?)
ON CONFLICT(directus_user_id) DO UPDATE SET
instance_url=excluded.instance_url,
access_token=excluded.access_token,
updated_at=datetime('now')""",
(directus_user_id, instance_url.rstrip("/"), token_to_store),
)
conn.commit()
finally:
conn.close()
def get_token(
db_path: Path,
directus_user_id: str,
encryption_key: str | None,
) -> tuple[str, str] | None:
"""Return (instance_url, plaintext_access_token) or None if not connected."""
import sqlite3
conn = sqlite3.connect(str(db_path))
try:
row = conn.execute(
"SELECT instance_url, access_token FROM mastodon_tokens WHERE directus_user_id = ?",
(directus_user_id,),
).fetchone()
finally:
conn.close()
if row is None:
return None
return row[0], _decrypt(row[1], encryption_key)
def delete_token(db_path: Path, directus_user_id: str) -> None:
"""Remove the user's stored Mastodon token."""
import sqlite3
conn = sqlite3.connect(str(db_path))
try:
conn.execute(
"DELETE FROM mastodon_tokens WHERE directus_user_id = ?", (directus_user_id,)
)
conn.commit()
finally:
conn.close()
def _encrypt(plaintext: str, key: str | None) -> str:
if key is None:
return plaintext
try:
from cryptography.fernet import Fernet
return Fernet(key.encode()).encrypt(plaintext.encode()).decode()
except Exception:
logger.warning("Mastodon token encryption failed — storing plaintext")
return plaintext
def _decrypt(ciphertext: str, key: str | None) -> str:
if key is None:
return ciphertext
try:
from cryptography.fernet import Fernet
return Fernet(key.encode()).decrypt(ciphertext.encode()).decode()
except Exception:
logger.warning("Mastodon token decryption failed — returning as-is")
return ciphertext

View file

@ -0,0 +1,111 @@
# app/services/community/dedup.py
# MIT License
from __future__ import annotations
import json
import logging
from pathlib import Path
logger = logging.getLogger(__name__)
_SIMILARITY_TIERS = {
"exact_recipe": "This exact recipe is already in the community feed.",
"very_similar": "Very similar recipes already exist (70%+ ingredient overlap).",
"somewhat_similar": "Somewhat similar recipes exist (35-70% ingredient overlap).",
"different": "No close matches found.",
}
def _parse_ingredient_names(raw) -> set[str]:
"""Return a normalised set of ingredient name tokens from various stored formats."""
if raw is None:
return set()
if isinstance(raw, str):
try:
raw = json.loads(raw)
except (ValueError, TypeError):
return set()
names: set[str] = set()
for item in raw:
if isinstance(item, str):
names.add(item.lower().strip())
elif isinstance(item, dict):
name = item.get("name") or item.get("ingredient") or ""
if name:
names.add(name.lower().strip())
return names
def jaccard(a: set[str], b: set[str]) -> float:
if not a and not b:
return 1.0
if not a or not b:
return 0.0
return len(a & b) / len(a | b)
def similarity_tier(jaccard_score: float, exact_recipe: bool) -> str:
if exact_recipe:
return "exact_recipe"
if jaccard_score >= 0.70:
return "very_similar"
if jaccard_score >= 0.35:
return "somewhat_similar"
return "different"
def fetch_recipe_ingredients(db_path: Path, recipe_id: int | None) -> set[str]:
"""Look up ingredient names for a recipe from the local corpus. Returns empty set on miss."""
if recipe_id is None:
return set()
try:
from app.db.store import Store
store = Store(db_path)
try:
row = store.get_recipe(recipe_id)
if row is None:
return set()
return _parse_ingredient_names(row.get("ingredient_names"))
finally:
store.close()
except Exception:
logger.debug("ingredient lookup failed for recipe_id=%s", recipe_id)
return set()
def build_similar_post_result(
post,
incoming_recipe_id: int | None,
incoming_ingredients: set[str],
db_path: Path,
) -> dict:
"""Build a similarity result dict for one existing community post."""
exact = (
incoming_recipe_id is not None
and post.recipe_id is not None
and post.recipe_id == incoming_recipe_id
)
j_score = 0.0
if not exact and incoming_ingredients:
existing_ingredients = fetch_recipe_ingredients(db_path, post.recipe_id)
if existing_ingredients:
j_score = jaccard(incoming_ingredients, existing_ingredients)
tier = similarity_tier(j_score, exact)
return {
"slug": post.slug,
"title": post.title,
"recipe_name": post.recipe_name,
"pseudonym": post.pseudonym,
"published": (
post.published.isoformat()
if hasattr(post.published, "isoformat")
else str(post.published)
),
"similarity_tier": tier,
"jaccard_score": round(j_score, 3) if not exact else None,
"tier_description": _SIMILARITY_TIERS.get(tier, ""),
}

View file

@ -0,0 +1,233 @@
# app/services/leftovers_predictor.py
"""Cooked-leftovers shelf-life predictor.
Fast path: deterministic lookup anchored to FDA/USDA safe food handling.
Fallback: LLM for unclassifiable edge cases (same gate as expiry_llm_matching).
Design notes:
- shortest-component-wins for proteins: a fish taco is bounded by the fish.
- category/keyword signals override ingredient signals for assembled dishes
(soup, stew, casserole) where the cooking method matters more than the
dominant protein.
- no urgency/panic framing see feedback_kiwi_no_panic.md.
"""
from __future__ import annotations
import logging
import re
from dataclasses import dataclass, field
from typing import Any
logger = logging.getLogger(__name__)
@dataclass
class LeftoversResult:
fridge_days: int
freeze_days: int | None # None = "not recommended"
freeze_by_day: int | None # day number from cook date to freeze by; None = no need
storage_advice: str
# ---------------------------------------------------------------------------
# Protein priority table — shorter shelf life wins when multiple match.
# Values: (fridge_days, freeze_days). All fridge values are conservative.
# Sources: USDA FoodKeeper, FDA Safe Food Handling.
# ---------------------------------------------------------------------------
_PROTEIN_SIGNALS: list[tuple[list[str], int, int | None]] = [
# (keyword_list, fridge_days, freeze_days)
(["fish", "salmon", "tuna", "cod", "tilapia", "halibut", "trout", "bass",
"mahi", "snapper", "flounder", "catfish", "swordfish", "sardine", "anchovy"],
2, 90),
(["shrimp", "prawn", "scallop", "crab", "lobster", "clam", "mussel",
"oyster", "squid", "octopus", "seafood"],
2, 90),
(["ground beef", "ground turkey", "ground pork", "ground chicken",
"ground meat", "hamburger", "mince"],
3, 90),
(["chicken", "turkey", "poultry", "duck", "hen"],
3, 90),
(["pork", "ham", "bacon", "sausage", "chorizo", "bratwurst", "kielbasa",
"salami", "pepperoni"],
4, 120),
(["beef", "steak", "brisket", "roast", "lamb", "veal", "venison"],
4, 180),
(["egg", "eggs", "frittata", "quiche", "omelette"],
3, None),
(["tofu", "tempeh", "seitan"],
4, 90),
]
# ---------------------------------------------------------------------------
# Dish-type signals — override protein signal when a structural match fires.
# Ordered from most-perishable to least.
# ---------------------------------------------------------------------------
_DISH_SIGNALS: list[tuple[list[str], int, int | None, str]] = [
# (keywords, fridge_days, freeze_days, storage_advice_fragment)
# Ceviche: acid denatures proteins but does not kill pathogens.
# FDA/USDA classify it as raw seafood — 2-day fridge max, do not freeze.
(["ceviche", "tiradito", "leche de tigre"],
2, None,
"Acid marination is not the same as heat cooking — treat as raw seafood. "
"Best eaten the day it's made; 2 days maximum in the fridge."),
# Fermented / salt-cured dishes — preservation extends shelf life significantly.
# This matches dish names, not just presence of the ingredient (lardo in a pasta
# follows normal pasta rules, not this entry).
(["kimchi", "sauerkraut", "preserved lemon"],
14, None,
"Fermented and salt-preserved dishes keep well. Store submerged in their brine."),
(["confit", "gravlax", "gravad lax", "lardo"],
7, 60,
"Store covered in its fat or cure. Keep cold and away from strong-smelling foods."),
(["soup", "stew", "broth", "chowder", "bisque", "gumbo", "chili"],
4, 120,
"Soups and stews keep well in the fridge. Cool to room temperature before covering."),
(["curry"],
4, 90,
"Store curry in an airtight container. The flavours deepen overnight."),
(["casserole", "bake", "gratin", "lasagna", "lasagne", "moussaka",
"shepherd's pie", "pot pie"],
5, 90,
"Cover tightly. Reheat individual portions rather than the whole dish."),
(["pasta", "noodle", "spaghetti", "penne", "linguine", "fettuccine",
"macaroni", "risotto"],
4, 60,
"Store pasta and sauce separately if possible to prevent sogginess."),
(["rice", "fried rice", "pilaf", "biryani"],
3, 90,
"Cool rice quickly — spread on a tray if needed. Don't leave at room temperature for more than 1 hour."),
(["salad"],
2, None,
"Keep dressing separate. Once dressed, best eaten the same day."),
(["stir fry", "stir-fry"],
3, 60,
"Reheat in a hot pan or wok rather than a microwave to keep texture."),
(["sandwich", "wrap", "taco", "burrito"],
2, None,
"Assemble fresh when possible. Fillings keep better stored separately."),
(["pizza"],
4, 60,
"Reheat in a dry skillet for a crisp base rather than a microwave."),
(["muffin", "bread", "biscuit", "scone", "roll"],
3, 90,
"Wrap tightly or seal in a bag to prevent drying out."),
(["cake", "pie", "cookie", "brownie", "dessert", "pudding"],
5, 90,
"Store covered at room temperature or in the fridge depending on fillings."),
(["smoothie", "juice", "shake"],
1, 7,
"Best consumed fresh. Stir or shake well before drinking."),
]
# Default when no signals match.
_DEFAULT_FRIDGE = 4
_DEFAULT_FREEZE = 90
_DEFAULT_ADVICE = "Store in an airtight container in the fridge. Reheat until piping hot before eating."
def _contains_any(text: str, keywords: list[str]) -> bool:
for kw in keywords:
if re.search(rf"\b{re.escape(kw)}\b", text, re.IGNORECASE):
return True
return False
def _scan_ingredients(ingredients: list[str]) -> tuple[int, int | None] | None:
"""Return (fridge_days, freeze_days) for the most-perishable protein found."""
joined = " ".join(str(i) for i in ingredients).lower()
best: tuple[int, int | None] | None = None
for keywords, fridge, freeze in _PROTEIN_SIGNALS:
if _contains_any(joined, keywords):
if best is None or fridge < best[0]:
best = (fridge, freeze)
return best
def _scan_dish_type(text: str) -> tuple[int, int | None, str] | None:
"""Return (fridge_days, freeze_days, advice) for the first matching dish type."""
for keywords, fridge, freeze, advice in _DISH_SIGNALS:
if _contains_any(text, keywords):
return fridge, freeze, advice
return None
def predict_leftovers(
title: str,
ingredients: list[str],
category: str | None = None,
keywords: list[str] | None = None,
) -> LeftoversResult:
"""Predict cooked-leftover shelf life deterministically.
Falls back gracefully always returns a result even for unknown recipes.
"""
# Build a combined text blob for dish-type scanning.
search_text = " ".join(filter(None, [
title,
category or "",
" ".join(keywords or []),
]))
# Dish-type match takes structural priority over raw ingredient protein signal.
dish = _scan_dish_type(search_text)
protein = _scan_ingredients(ingredients)
if dish:
fridge_days, freeze_days, base_advice = dish
# Still apply shortest-protein-wins if protein is more perishable than dish default.
if protein and protein[0] < fridge_days:
fridge_days = protein[0]
if protein[1] is not None and (freeze_days is None or protein[1] < freeze_days):
freeze_days = protein[1]
advice = base_advice
elif protein:
fridge_days, freeze_days = protein
advice = _DEFAULT_ADVICE
else:
fridge_days = _DEFAULT_FRIDGE
freeze_days = _DEFAULT_FREEZE
advice = _DEFAULT_ADVICE
# freeze_by_day: recommend freezing on day 2 if fridge window is tight (≤3 days).
freeze_by_day: int | None = None
if freeze_days is not None and fridge_days <= 3:
freeze_by_day = 2
return LeftoversResult(
fridge_days=fridge_days,
freeze_days=freeze_days,
freeze_by_day=freeze_by_day,
storage_advice=advice,
)
def predict_leftovers_from_row(recipe: dict[str, Any]) -> LeftoversResult:
"""Convenience wrapper that accepts a Store row dict directly."""
import json as _json
title = recipe.get("title") or ""
raw_ingredients = recipe.get("ingredient_names") or []
if isinstance(raw_ingredients, str):
try:
raw_ingredients = _json.loads(raw_ingredients)
except Exception:
raw_ingredients = [raw_ingredients]
raw_keywords = recipe.get("keywords") or []
if isinstance(raw_keywords, str):
try:
raw_keywords = _json.loads(raw_keywords)
except Exception:
raw_keywords = [raw_keywords]
return predict_leftovers(
title=title,
ingredients=[str(i) for i in raw_ingredients],
category=recipe.get("category"),
keywords=[str(k) for k in raw_keywords],
)

View file

@ -0,0 +1,97 @@
"""Magpie data-flywheel hook.
Fires anonymized recipe-signal events to the Magpie ingest endpoint when a
user saves or rates a recipe. This is the Kiwi side of the flywheel Magpie
does not have a receiver endpoint yet, so the hook stubs out gracefully: if
``MAGPIE_INGEST_URL`` is unset, or the request fails for any reason, it logs
at DEBUG level and returns without raising.
"""
from __future__ import annotations
import logging
from pathlib import Path
logger = logging.getLogger(__name__)
_INGEST_PATH = "/api/v1/ingest/recipe-signal"
async def fire_recipe_signal(
db_path: Path,
recipe_id: int,
rating: int | None,
style_tags: list[str],
) -> None:
"""Post an anonymized recipe signal to Magpie if the user has opted in.
Args:
db_path: Path to the user's SQLite database.
recipe_id: Internal Kiwi recipe ID being rated/saved.
rating: Star rating (05) or None if not yet rated.
style_tags: Style tags applied to the saved recipe.
"""
from app.core.config import settings
if not settings.MAGPIE_INGEST_URL:
return
# Check per-user opt-in via a short-lived Store (own connection, own thread
# context is fine — this runs in the async event loop as a background task
# so we open and close the connection immediately).
from app.db.store import Store
try:
store = Store(db_path)
try:
opt_in = store.get_setting("magpie_opt_in")
finally:
store.close()
except Exception as exc: # noqa: BLE001
logger.debug("magpie_hook: could not read magpie_opt_in setting: %s", exc)
return
if opt_in != "true":
return
# Fetch the recipe to get its external_id (source URL slug / corpus key).
try:
store = Store(db_path)
try:
recipe = store.get_recipe(recipe_id)
finally:
store.close()
except Exception as exc: # noqa: BLE001
logger.debug("magpie_hook: could not fetch recipe %d: %s", recipe_id, exc)
return
if recipe is None:
logger.debug("magpie_hook: recipe %d not found, skipping", recipe_id)
return
external_id: str | None = recipe.get("external_id") if isinstance(recipe, dict) else getattr(recipe, "external_id", None)
if not external_id:
# Corpus recipe not yet enriched with a source identifier — skip quietly.
logger.debug("magpie_hook: recipe %d has no external_id, skipping", recipe_id)
return
payload = {
"product": "kiwi",
"signal": "recipe_rating",
"external_id": external_id,
"rating": rating,
"style_tags": style_tags,
}
url = settings.MAGPIE_INGEST_URL.rstrip("/") + _INGEST_PATH
try:
import httpx
async with httpx.AsyncClient(timeout=3.0) as client:
response = await client.post(url, json=payload)
logger.debug(
"magpie_hook: POST %s%d", url, response.status_code
)
except Exception as exc: # noqa: BLE001
# Magpie may not have a receiver yet — log and swallow.
logger.debug("magpie_hook: ingest request failed (stub): %s", exc)

View file

@ -2,17 +2,20 @@
# BSL 1.1 — LLM feature # BSL 1.1 — LLM feature
"""Provide a router-compatible LLM client for meal plan generation tasks. """Provide a router-compatible LLM client for meal plan generation tasks.
Cloud (CF_ORCH_URL set): Cloud (CF_ORCH_URL set), tier 1 task-based routing (preferred):
Allocates a cf-text service via cf-orch (3B-7B GGUF, ~2GB VRAM). Calls /api/inference/task with product=kiwi, task=meal_plan.
Returns an _OrchTextRouter that wraps the cf-text HTTP endpoint The coordinator resolves the model from assignments.yaml.
with a .complete(system, user, **kwargs) interface.
Cloud (CF_ORCH_URL set), tier 2 direct allocation (fallback):
Allocates cf-text directly via client.allocate(). Used when the task
is not yet registered in the coordinator (cf-orch#61 not deployed).
Local / self-hosted (no CF_ORCH_URL): Local / self-hosted (no CF_ORCH_URL):
Returns an LLMRouter instance which tries ollama, vllm, or any Returns an LLMRouter instance which tries ollama, vllm, or any
backend configured in ~/.config/circuitforge/llm.yaml. backend configured in ~/.config/circuitforge/llm.yaml.
Both paths expose the same interface so llm_timing.py and llm_planner.py All paths expose the same (router, ctx) interface so llm_planner.py
need no knowledge of the backend. needs no knowledge of the backend.
""" """
from __future__ import annotations from __future__ import annotations
@ -22,8 +25,7 @@ from contextlib import nullcontext
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
# cf-orch service name and VRAM budget for meal plan LLM tasks. # cf-orch service name and TTL for direct-allocate fallback path.
# These are lighter than recipe_llm (4.0 GB) — cf-text handles them.
_SERVICE_TYPE = "cf-text" _SERVICE_TYPE = "cf-text"
_TTL_S = 120.0 _TTL_S = 120.0
_CALLER = "kiwi-meal-plan" _CALLER = "kiwi-meal-plan"
@ -62,35 +64,79 @@ class _OrchTextRouter:
return resp.choices[0].message.content or "" return resp.choices[0].message.content or ""
# Imported at module level so tests can patch the names in this module's namespace.
# app.services.task_inference.task_allocate — patch target for task routing tests.
try:
from app.services.task_inference import TaskNotRegistered, task_allocate
_HAS_TASK_INFERENCE = True
except ImportError:
_HAS_TASK_INFERENCE = False
# circuitforge_orch.client.CFOrchClient — patch target for direct-allocate fallback tests.
try:
from circuitforge_orch.client import CFOrchClient
except ImportError:
CFOrchClient = None # type: ignore[assignment,misc]
# circuitforge_core.llm.router.LLMRouter — patch target for local-inference tests.
try:
from circuitforge_core.llm.router import LLMRouter
except (ImportError, FileNotFoundError):
LLMRouter = None # type: ignore[assignment,misc]
def get_meal_plan_router(): def get_meal_plan_router():
"""Return an LLM client for meal plan tasks. """Return an LLM client for meal plan tasks.
Tries cf-orch cf-text allocation first (cloud); falls back to LLMRouter Returns (router, ctx) where ctx is a context manager the caller holds
(local ollama/vllm). Returns None if no backend is available. open for the duration of the LLM call. Returns (None, nullcontext(None))
if no backend is available.
""" """
cf_orch_url = os.environ.get("CF_ORCH_URL") cf_orch_url = os.environ.get("CF_ORCH_URL")
if cf_orch_url:
try:
from circuitforge_orch.client import CFOrchClient
client = CFOrchClient(cf_orch_url)
ctx = client.allocate(
service=_SERVICE_TYPE,
ttl_s=_TTL_S,
caller=_CALLER,
)
alloc = ctx.__enter__()
if alloc is not None:
return _OrchTextRouter(alloc.url), ctx
except Exception as exc:
logger.debug("cf-orch cf-text allocation failed, falling back to LLMRouter: %s", exc)
# Local fallback: LLMRouter (ollama / vllm / openai-compat) if cf_orch_url:
try: # Tier 1: task-based routing — coordinator owns model selection.
from circuitforge_core.llm.router import LLMRouter if _HAS_TASK_INFERENCE:
return LLMRouter(), nullcontext(None) try:
except FileNotFoundError: ctx = task_allocate(
logger.debug("LLMRouter: no llm.yaml and no LLM env vars — meal plan LLM disabled") "kiwi", "meal_plan",
return None, nullcontext(None) service_hint=_SERVICE_TYPE,
except Exception as exc: ttl_s=_TTL_S,
logger.debug("LLMRouter init failed: %s", exc) )
return None, nullcontext(None) alloc = ctx.__enter__()
return _OrchTextRouter(alloc.url), ctx
except TaskNotRegistered:
logger.debug(
"kiwi.meal_plan not in coordinator assignments — "
"falling back to direct cf-text allocation"
)
except Exception as exc:
logger.debug("task allocation failed, trying direct allocate: %s", exc)
# Tier 2: direct allocation — hardcoded service type.
if CFOrchClient is not None:
try:
client = CFOrchClient(cf_orch_url)
ctx = client.allocate(
service=_SERVICE_TYPE,
ttl_s=_TTL_S,
caller=_CALLER,
)
alloc = ctx.__enter__()
if alloc is not None:
return _OrchTextRouter(alloc.url), ctx
ctx.__exit__(None, None, None) # release allocation before falling through
except Exception as exc:
logger.debug("cf-orch cf-text allocation failed, falling back to LLMRouter: %s", exc)
# Tier 3: local inference — ollama / vllm / openai-compat.
if LLMRouter is not None:
try:
return LLMRouter(), nullcontext(None)
except FileNotFoundError:
logger.debug("LLMRouter: no llm.yaml and no LLM env vars — meal plan LLM disabled")
return None, nullcontext(None)
except Exception as exc:
logger.debug("LLMRouter init failed: %s", exc)
return None, nullcontext(None)
return None, nullcontext(None)

View file

@ -18,43 +18,51 @@ class DocuvisionResult:
class DocuvisionClient: class DocuvisionClient:
"""Thin client for the cf-docuvision service.""" """Thin client for the cf-docuvision service."""
def __init__(self, base_url: str) -> None: def __init__(self, base_url: str, timeout: float = 120.0) -> None:
self._base_url = base_url.rstrip("/") self._base_url = base_url.rstrip("/")
self._timeout = timeout
def extract_text(self, image_path: str | Path) -> DocuvisionResult: def extract_text(self, image_path: str | Path, hint: str = "text") -> DocuvisionResult:
"""Send an image to docuvision and return extracted text.""" """Send an image to docuvision and return extracted text.
Args:
image_path: Path to the image file.
hint: Docuvision extraction hint "text" for dense prose (recipes),
"table" for tabular data, "form" for form fields, "auto" for
automatic detection.
"""
image_bytes = Path(image_path).read_bytes() image_bytes = Path(image_path).read_bytes()
b64 = base64.b64encode(image_bytes).decode() b64 = base64.b64encode(image_bytes).decode()
with httpx.Client(timeout=30.0) as client: with httpx.Client(timeout=self._timeout) as client:
resp = client.post( resp = client.post(
f"{self._base_url}/extract", f"{self._base_url}/extract",
json={"image": b64}, json={"image_b64": b64, "hint": hint},
) )
resp.raise_for_status() resp.raise_for_status()
data = resp.json() data = resp.json()
return DocuvisionResult( return DocuvisionResult(
text=data.get("text", ""), text=data.get("raw_text", ""),
confidence=data.get("confidence"), confidence=data.get("metadata", {}).get("confidence"),
raw=data, raw=data,
) )
async def extract_text_async(self, image_path: str | Path) -> DocuvisionResult: async def extract_text_async(self, image_path: str | Path, hint: str = "text") -> DocuvisionResult:
"""Async version.""" """Async version."""
image_bytes = Path(image_path).read_bytes() image_bytes = Path(image_path).read_bytes()
b64 = base64.b64encode(image_bytes).decode() b64 = base64.b64encode(image_bytes).decode()
async with httpx.AsyncClient(timeout=30.0) as client: async with httpx.AsyncClient(timeout=self._timeout) as client:
resp = await client.post( resp = await client.post(
f"{self._base_url}/extract", f"{self._base_url}/extract",
json={"image": b64}, json={"image_b64": b64, "hint": hint},
) )
resp.raise_for_status() resp.raise_for_status()
data = resp.json() data = resp.json()
return DocuvisionResult( return DocuvisionResult(
text=data.get("text", ""), text=data.get("raw_text", ""),
confidence=data.get("confidence"), confidence=data.get("metadata", {}).get("confidence"),
raw=data, raw=data,
) )

View file

@ -32,6 +32,29 @@ def _try_docuvision(image_path: str | Path) -> str | None:
cf_orch_url = os.environ.get("CF_ORCH_URL") cf_orch_url = os.environ.get("CF_ORCH_URL")
if not cf_orch_url: if not cf_orch_url:
return None return None
# Tier 1: task-based routing — coordinator owns model selection.
try:
from app.services.task_inference import task_allocate, TaskNotRegistered
from app.services.ocr.docuvision_client import DocuvisionClient
try:
with task_allocate(
"kiwi", "ocr",
service_hint="cf-docuvision",
ttl_s=60.0,
) as alloc:
doc_client = DocuvisionClient(alloc.url)
result = doc_client.extract_text(image_path)
return result.text if result.text else None
except TaskNotRegistered:
logger.debug(
"kiwi.ocr not in coordinator assignments — "
"falling back to direct cf-docuvision allocation"
)
except Exception as exc:
logger.debug("task allocation path failed, trying direct allocate: %s", exc)
# Tier 2: direct allocation — hardcoded service type.
try: try:
from circuitforge_orch.client import CFOrchClient from circuitforge_orch.client import CFOrchClient
from app.services.ocr.docuvision_client import DocuvisionClient from app.services.ocr.docuvision_client import DocuvisionClient
@ -49,7 +72,7 @@ def _try_docuvision(image_path: str | Path) -> str | None:
result = doc_client.extract_text(image_path) result = doc_client.extract_text(image_path)
return result.text if result.text else None return result.text if result.text else None
except Exception as exc: except Exception as exc:
logger.debug("cf-docuvision fast-path failed, falling back: %s", exc) logger.debug("cf-docuvision fast-path failed, falling back to local VLM: %s", exc)
return None return None

View file

@ -26,7 +26,7 @@ DOMAINS: dict[str, dict] = {
"label": "Cuisine", "label": "Cuisine",
"categories": { "categories": {
"Italian": { "Italian": {
"keywords": ["italian", "pasta", "pizza", "risotto", "lasagna", "carbonara"], "keywords": ["cuisine:Italian", "italian", "pasta", "pizza", "risotto", "lasagna", "carbonara"],
"subcategories": { "subcategories": {
"Sicilian": ["sicilian", "sicily", "arancini", "caponata", "Sicilian": ["sicilian", "sicily", "arancini", "caponata",
"involtini", "cannoli"], "involtini", "cannoli"],
@ -43,8 +43,8 @@ DOMAINS: dict[str, dict] = {
}, },
}, },
"Mexican": { "Mexican": {
"keywords": ["mexican", "taco", "enchilada", "burrito", "salsa", "keywords": ["cuisine:Mexican", "mexican", "taco", "enchilada", "burrito",
"guacamole", "mole", "tamale"], "salsa", "guacamole", "mole", "tamale"],
"subcategories": { "subcategories": {
"Oaxacan": ["oaxacan", "oaxaca", "mole negro", "tlayuda", "Oaxacan": ["oaxacan", "oaxaca", "mole negro", "tlayuda",
"chapulines", "mezcal", "tasajo", "memelas"], "chapulines", "mezcal", "tasajo", "memelas"],
@ -67,7 +67,9 @@ DOMAINS: dict[str, dict] = {
}, },
}, },
"Asian": { "Asian": {
"keywords": ["asian", "chinese", "japanese", "thai", "korean", "vietnamese", "keywords": ["cuisine:Chinese", "cuisine:Japanese", "cuisine:Korean",
"cuisine:Thai", "cuisine:Vietnamese",
"asian", "chinese", "japanese", "thai", "korean", "vietnamese",
"stir fry", "stir-fry", "ramen", "sushi", "malaysian", "stir fry", "stir-fry", "ramen", "sushi", "malaysian",
"taiwanese", "singaporean", "burmese", "cambodian", "taiwanese", "singaporean", "burmese", "cambodian",
"laotian", "mongolian", "hong kong"], "laotian", "mongolian", "hong kong"],
@ -128,7 +130,7 @@ DOMAINS: dict[str, dict] = {
}, },
}, },
"Indian": { "Indian": {
"keywords": ["indian", "curry", "lentil", "dal", "tikka", "masala", "keywords": ["cuisine:Indian", "indian", "curry", "lentil", "dal", "tikka", "masala",
"biryani", "naan", "chutney", "pakistani", "sri lankan", "biryani", "naan", "chutney", "pakistani", "sri lankan",
"bangladeshi", "nepali"], "bangladeshi", "nepali"],
"subcategories": { "subcategories": {
@ -156,7 +158,8 @@ DOMAINS: dict[str, dict] = {
}, },
}, },
"Mediterranean": { "Mediterranean": {
"keywords": ["mediterranean", "greek", "middle eastern", "turkish", "keywords": ["cuisine:Mediterranean", "cuisine:Greek", "cuisine:Middle Eastern",
"mediterranean", "greek", "middle eastern", "turkish",
"lebanese", "jewish", "palestinian", "yemeni", "egyptian", "lebanese", "jewish", "palestinian", "yemeni", "egyptian",
"syrian", "iraqi", "jordanian"], "syrian", "iraqi", "jordanian"],
"subcategories": { "subcategories": {
@ -190,7 +193,8 @@ DOMAINS: dict[str, dict] = {
}, },
}, },
"American": { "American": {
"keywords": ["american", "southern", "comfort food", "cajun", "creole", "keywords": ["cuisine:American", "cuisine:Southern", "cuisine:Cajun",
"american", "southern", "comfort food", "cajun", "creole",
"hawaiian", "tex-mex", "soul food"], "hawaiian", "tex-mex", "soul food"],
"subcategories": { "subcategories": {
"Southern": ["southern", "soul food", "fried chicken", "Southern": ["southern", "soul food", "fried chicken",
@ -214,10 +218,8 @@ DOMAINS: dict[str, dict] = {
}, },
}, },
"BBQ & Smoke": { "BBQ & Smoke": {
# Top-level keywords use broad corpus-friendly terms that appear in # Top-level keywords: cuisine:BBQ inferred tag + broad corpus terms.
# food.com keyword/category fields (e.g. "BBQ", "Oven BBQ", "Smoker"). "keywords": ["cuisine:BBQ", "bbq", "barbecue", "barbeque", "smoked", "smoky",
# Subcategory keywords remain specific for drill-down filtering.
"keywords": ["bbq", "barbecue", "barbeque", "smoked", "smoky",
"smoke", "pit", "smoke ring", "low and slow", "smoke", "pit", "smoke ring", "low and slow",
"brisket", "pulled pork", "ribs", "spare ribs", "brisket", "pulled pork", "ribs", "spare ribs",
"baby back", "baby back ribs", "dry rub", "wet rub", "baby back", "baby back ribs", "dry rub", "wet rub",
@ -251,7 +253,8 @@ DOMAINS: dict[str, dict] = {
}, },
}, },
"European": { "European": {
"keywords": ["french", "german", "spanish", "british", "irish", "scottish", "keywords": ["cuisine:French", "cuisine:German", "cuisine:Spanish",
"french", "german", "spanish", "british", "irish", "scottish",
"welsh", "scandinavian", "nordic", "eastern european"], "welsh", "scandinavian", "nordic", "eastern european"],
"subcategories": { "subcategories": {
"French": ["french", "provencal", "beurre", "crepe", "French": ["french", "provencal", "beurre", "crepe",
@ -281,7 +284,8 @@ DOMAINS: dict[str, dict] = {
}, },
}, },
"Latin American": { "Latin American": {
"keywords": ["latin american", "peruvian", "argentinian", "colombian", "keywords": ["cuisine:Latin American", "cuisine:Caribbean",
"latin american", "peruvian", "argentinian", "colombian",
"cuban", "caribbean", "brazilian", "venezuelan", "chilean"], "cuban", "caribbean", "brazilian", "venezuelan", "chilean"],
"subcategories": { "subcategories": {
"Peruvian": ["peruvian", "ceviche", "lomo saltado", "anticucho", "Peruvian": ["peruvian", "ceviche", "lomo saltado", "anticucho",
@ -425,12 +429,18 @@ DOMAINS: dict[str, dict] = {
"meal_type": { "meal_type": {
"label": "Meal Type", "label": "Meal Type",
"categories": { "categories": {
# Keywords use two complementary sources:
# 1. inferred_tag phrases ("meal:X", "main:X") — indexed in recipe_browser_fts.inferred_tags.
# FTS5 tokenises "meal:Breakfast" → ["meal","breakfast"], so the quoted phrase
# "meal:Breakfast" matches exactly that consecutive token pair.
# 2. Corpus keyword/category text — only covers the ~1,200 keyword-tagged recipes.
# Kept as a fallback; not the primary signal.
"Breakfast": { "Breakfast": {
"keywords": ["breakfast", "brunch", "eggs", "pancakes", "waffles", "keywords": ["meal:Breakfast", "breakfast", "brunch", "pancakes",
"oatmeal", "muffin"], "waffles", "oatmeal", "muffin"],
"subcategories": { "subcategories": {
"Eggs": ["egg", "omelette", "frittata", "quiche", "Eggs": ["meal:Breakfast", "egg", "omelette", "frittata",
"scrambled", "benedict", "shakshuka"], "quiche", "scrambled", "benedict", "shakshuka"],
"Pancakes & Waffles": ["pancake", "waffle", "crepe", "french toast"], "Pancakes & Waffles": ["pancake", "waffle", "crepe", "french toast"],
"Baked Goods": ["muffin", "scone", "biscuit", "quick bread", "Baked Goods": ["muffin", "scone", "biscuit", "quick bread",
"coffee cake", "danish"], "coffee cake", "danish"],
@ -439,12 +449,15 @@ DOMAINS: dict[str, dict] = {
}, },
}, },
"Lunch": { "Lunch": {
"keywords": ["lunch", "sandwich", "wrap", "salad", "soup", "light meal"], # meal:Lunch tag covers explicitly-tagged recipes.
# Coverage is limited — most lunch-style recipes have no distinct meal-type tag.
"keywords": ["meal:Lunch", "lunch", "sandwich", "wrap", "salad",
"soup", "light meal"],
"subcategories": { "subcategories": {
"Sandwiches": ["sandwich", "sub", "hoagie", "panini", "club", "Sandwiches": ["sandwich", "sub", "hoagie", "panini", "club",
"grilled cheese", "blt"], "grilled cheese", "blt"],
"Salads": ["salad", "grain bowl", "chopped", "caesar", "Salads": ["salad", "grain bowl", "chopped", "caesar",
"niçoise", "cobb"], "cobb"],
"Soups": ["soup", "bisque", "chowder", "gazpacho", "Soups": ["soup", "bisque", "chowder", "gazpacho",
"minestrone", "lentil soup"], "minestrone", "lentil soup"],
"Wraps": ["wrap", "burrito bowl", "pita", "lettuce wrap", "Wraps": ["wrap", "burrito bowl", "pita", "lettuce wrap",
@ -452,23 +465,27 @@ DOMAINS: dict[str, dict] = {
}, },
}, },
"Dinner": { "Dinner": {
"keywords": ["dinner", "main dish", "entree", "main course", "supper"], # Primary: main:X inferred tags (800k+ recipes).
# "meal:Dinner" does not exist in the inferred-tag vocabulary — main-protein
# tags are the best available proxy for main-course dinner recipes.
"keywords": ["main:Chicken", "main:Beef", "main:Pork", "main:Fish",
"main:Pasta", "dinner", "main dish", "entree",
"main course", "supper"],
"subcategories": { "subcategories": {
"Casseroles": ["casserole", "bake", "gratin", "lasagna", "Chicken": ["main:Chicken"],
"sheperd's pie", "pot pie"], "Beef": ["main:Beef"],
"Pork": ["main:Pork"],
"Fish & Seafood": ["main:Fish"],
"Pasta": ["main:Pasta"],
"Casseroles": ["casserole", "bake", "gratin", "pot pie"],
"Stews": ["stew", "braise", "slow cooker", "pot roast", "Stews": ["stew", "braise", "slow cooker", "pot roast",
"daube", "ragù"], "daube"],
"Grilled": ["grilled", "grill", "barbecue", "charred", "Grilled": ["grilled", "grill", "barbecue", "kebab", "skewer"],
"kebab", "skewer"],
"Stir-Fries": ["stir fry", "stir-fry", "wok", "sauté",
"sauteed"],
"Roasts": ["roast", "roasted", "oven", "baked chicken",
"pot roast"],
}, },
}, },
"Snack": { "Snack": {
"keywords": ["snack", "appetizer", "finger food", "dip", "bite", "keywords": ["meal:Snack", "snack", "appetizer", "finger food",
"starter"], "dip", "bite", "starter"],
"subcategories": { "subcategories": {
"Dips & Spreads": ["dip", "spread", "hummus", "guacamole", "Dips & Spreads": ["dip", "spread", "hummus", "guacamole",
"salsa", "pate"], "salsa", "pate"],
@ -479,8 +496,9 @@ DOMAINS: dict[str, dict] = {
}, },
}, },
"Dessert": { "Dessert": {
"keywords": ["dessert", "cake", "cookie", "pie", "sweet", "pudding", # "sweet" removed — it matches flavor:Sweet inferred tags, causing false positives.
"ice cream", "brownie"], "keywords": ["meal:Dessert", "dessert", "cake", "cookie", "pie",
"pudding", "ice cream", "brownie"],
"subcategories": { "subcategories": {
"Cakes": ["cake", "cupcake", "layer cake", "bundt", "Cakes": ["cake", "cupcake", "layer cake", "bundt",
"cheesecake", "torte"], "cheesecake", "torte"],
@ -496,20 +514,41 @@ DOMAINS: dict[str, dict] = {
"caramel", "toffee"], "caramel", "toffee"],
}, },
}, },
"Beverage": ["drink", "smoothie", "cocktail", "beverage", "juice", "shake"], "Beverage": ["meal:Beverage", "drink", "smoothie", "cocktail", "beverage",
"Side Dish": ["side dish", "side", "accompaniment", "garnish"], "juice", "shake", "lemonade"],
"Side Dish": {
# meal:Side Dish not in inferred-tag vocabulary.
# main:Vegetables and main:Grains are the best proxies — will overlap
# with some vegetarian mains, which is acceptable.
"keywords": ["main:Vegetables", "main:Grains", "side dish", "side",
"pilaf", "accompaniment"],
"subcategories": {
"Vegetables": ["main:Vegetables"],
"Grains & Rice": ["main:Grains", "rice", "pilaf", "quinoa"],
"Bread": ["meal:Bread", "bread", "roll", "biscuit"],
},
},
}, },
}, },
"dietary": { "dietary": {
"label": "Dietary", "label": "Dietary",
# Primary: dietary:X inferred tags (indexed in recipe_browser_fts.inferred_tags).
# Secondary: text tokens kept as fallback for keyword-tagged recipes.
# IMPORTANT: Use ONLY structured dietary:X phrases here.
# Bare text keywords like "vegan", "low-carb" also match can_be:Vegan,
# can_be:Low-Carb etc. — those are "achievable with substitutions", not
# "recipe already is". The structured phrase "dietary:Vegan" (consecutive
# FTS tokens "dietary"+"vegan") does NOT match can_be:Vegan.
"categories": { "categories": {
"Vegetarian": ["vegetarian"], "Vegetarian": ["dietary:Vegetarian"],
"Vegan": ["vegan", "plant-based", "plant based"], "Vegan": ["dietary:Vegan"],
"Gluten-Free": ["gluten-free", "gluten free", "celiac"], "Gluten-Free": ["dietary:Gluten-Free"],
"Low-Carb": ["low-carb", "low carb", "keto", "ketogenic"], "Low-Carb": ["dietary:Low-Carb"],
"High-Protein": ["high protein", "high-protein"], "High-Protein": ["dietary:High-Protein"],
"Low-Fat": ["low-fat", "low fat", "light"], "Low-Fat": ["dietary:Low-Fat"],
"Dairy-Free": ["dairy-free", "dairy free", "lactose"], "Dairy-Free": ["dietary:Dairy-Free"],
"Low-Sodium": ["dietary:Low-Sodium"],
"Paleo": ["dietary:Paleo"],
}, },
}, },
"main_ingredient": { "main_ingredient": {

View file

@ -93,7 +93,18 @@ class ElementClassifier:
return self._heuristic_profile(name) return self._heuristic_profile(name)
def classify_batch(self, names: list[str]) -> list[IngredientProfile]: def classify_batch(self, names: list[str]) -> list[IngredientProfile]:
return [self.classify(n) for n in names] """Classify multiple names in one DB round-trip, falling back to heuristics."""
if not names:
return []
normalised = [n.lower().strip() for n in names]
c = self._store._cp
placeholders = ",".join("?" * len(normalised))
rows = self._store._fetch_all(
f"SELECT * FROM {c}ingredient_profiles WHERE name IN ({placeholders})",
tuple(normalised),
)
by_name = {r["name"]: self._row_to_profile(r) for r in rows}
return [by_name.get(n) or self._heuristic_profile(n) for n in normalised]
def identify_gaps(self, profiles: list[IngredientProfile]) -> list[str]: def identify_gaps(self, profiles: list[IngredientProfile]) -> list[str]:
"""Return element names that have no coverage in the given profile list.""" """Return element names that have no coverage in the given profile list."""

View file

@ -1,13 +1,14 @@
"""LLM-driven recipe generator for Levels 3 and 4.""" """LLM-driven recipe generator for Levels 3 and 4."""
from __future__ import annotations from __future__ import annotations
import asyncio
import logging import logging
import os import os
import re import re
from contextlib import nullcontext from contextlib import nullcontext
from typing import TYPE_CHECKING from typing import TYPE_CHECKING, AsyncGenerator
from openai import OpenAI from openai import AsyncOpenAI, OpenAI
if TYPE_CHECKING: if TYPE_CHECKING:
from app.db.store import Store from app.db.store import Store
@ -149,8 +150,8 @@ class LLMRecipeGenerator:
return "\n".join(lines) return "\n".join(lines)
_SERVICE_TYPE = "vllm" _SERVICE_TYPE = "cf-text"
_MODEL_CANDIDATES = ["Qwen2.5-3B-Instruct", "Phi-4-mini-instruct"] _MODEL_CANDIDATES = ["granite-4.1-8b", "deepseek-r1-1.5b"]
_TTL_S = 300.0 _TTL_S = 300.0
_CALLER = "kiwi-recipe" _CALLER = "kiwi-recipe"
@ -182,7 +183,12 @@ class LLMRecipeGenerator:
With CF_ORCH_URL set: acquires a vLLM allocation via CFOrchClient and With CF_ORCH_URL set: acquires a vLLM allocation via CFOrchClient and
calls the OpenAI-compatible API directly against the allocated service URL. calls the OpenAI-compatible API directly against the allocated service URL.
Allocation failure falls through to LLMRouter rather than silently returning "". Falls back to LLMRouter when:
- Allocation succeeded but the service is cold (warm=False) avoids
making the user wait for model load; LLMRouter uses Ollama which is
already running.
- Allocation succeeded but the connection to the service URL fails the
agent may have registered the service but failed to start it.
Without CF_ORCH_URL: uses LLMRouter directly. Without CF_ORCH_URL: uses LLMRouter directly.
""" """
ctx = self._get_llm_context() ctx = self._get_llm_context()
@ -208,6 +214,15 @@ class LLMRecipeGenerator:
try: try:
if alloc is not None: if alloc is not None:
# Skip cold services — model not yet loaded means the user would
# wait 60120 s for model load before any response. Use LLMRouter
# (Ollama) instead, which is already warm on the host.
if not alloc.warm:
logger.info(
"cf-orch vllm allocated but cold (warm=False) — releasing and falling back to LLMRouter"
)
raise RuntimeError("vllm cold")
base_url = alloc.url.rstrip("/") + "/v1" base_url = alloc.url.rstrip("/") + "/v1"
client = OpenAI(base_url=base_url, api_key="any") client = OpenAI(base_url=base_url, api_key="any")
model = alloc.model or "__auto__" model = alloc.model or "__auto__"
@ -223,6 +238,20 @@ class LLMRecipeGenerator:
return LLMRouter().complete(prompt) return LLMRouter().complete(prompt)
except Exception as exc: except Exception as exc:
logger.error("LLM call failed: %s", exc) logger.error("LLM call failed: %s", exc)
# When cf-orch gave us an allocation but the service is unreachable
# (cold skip, connection refused, or other error), fall back to
# LLMRouter rather than silently returning empty.
# Skip "vllm" in the fallback order — that backend also routes through
# cf-orch, which would trigger a second (wasted) cold allocation.
if alloc is not None:
logger.info("Falling back to LLMRouter after vllm failure")
try:
from circuitforge_core.llm.router import LLMRouter
router = LLMRouter()
_order = [b for b in (router.config.get("fallback_order") or []) if b != "vllm"]
return router.complete(prompt, fallback_order=_order or None)
except Exception as fallback_exc:
logger.error("LLMRouter fallback also failed: %s", fallback_exc)
return "" return ""
finally: finally:
if ctx is not None: if ctx is not None:
@ -359,3 +388,91 @@ class LLMRecipeGenerator:
suggestions=[suggestion], suggestions=[suggestion],
element_gaps=gaps, element_gaps=gaps,
) )
async def stream_generate(
self,
req: RecipeRequest,
profiles: list,
gaps: list[str],
) -> AsyncGenerator[str, None]:
"""Stream LLM tokens for L3/L4. Yields raw text chunks as they arrive.
Tries cf-orch warm vllm first; falls back to Ollama via AsyncOpenAI.
When neither is reachable, falls back to blocking _call_llm and yields
the complete response as a single chunk so the caller always gets output.
"""
if req.level == 4:
prompt = self.build_level4_prompt(req)
else:
prompt = self.build_level3_prompt(req, profiles, gaps)
# Phase 1: try cf-orch warm vllm (sync allocation, wrapped in thread)
alloc_info = await asyncio.to_thread(self._try_alloc_for_stream)
if alloc_info is not None:
alloc, ctx = alloc_info
try:
async for token in self._stream_openai_compat(
alloc.url.rstrip("/") + "/v1", "any", alloc.model or "__auto__", prompt
):
yield token
return
except Exception as exc:
logger.debug("cf-orch stream failed, falling back to Ollama: %s", exc)
finally:
await asyncio.to_thread(lambda: _safe_exit(ctx))
# Phase 2: Ollama streaming via OpenAI-compat API
from circuitforge_core.llm.router import LLMRouter
router = LLMRouter()
ollama = router.config.get("backends", {}).get("ollama")
if ollama and ollama.get("enabled", True):
base_url = ollama["base_url"]
model = ollama.get("model", "llama3")
try:
async for token in self._stream_openai_compat(base_url, "any", model, prompt):
yield token
return
except Exception as exc:
logger.warning("Ollama streaming failed, falling back to blocking: %s", exc)
# Phase 3: blocking fallback — yields full response at once
result = await asyncio.to_thread(self._call_llm, prompt)
if result:
yield result
def _try_alloc_for_stream(self):
"""Attempt cf-orch allocation synchronously; return (alloc, ctx) or None."""
ctx = self._get_llm_context()
try:
alloc = ctx.__enter__()
if alloc is not None and alloc.warm:
return alloc, ctx
# Not warm — release and signal fallback
_safe_exit(ctx)
except Exception as exc:
logger.debug("cf-orch alloc for stream failed: %s", exc)
return None
@staticmethod
async def _stream_openai_compat(
base_url: str, api_key: str, model: str, prompt: str
) -> AsyncGenerator[str, None]:
client = AsyncOpenAI(base_url=base_url, api_key=api_key)
if model == "__auto__":
models = await client.models.list()
model = models.data[0].id
stream = await client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
stream=True,
)
async for chunk in stream:
if chunk.choices and chunk.choices[0].delta.content:
yield chunk.choices[0].delta.content
def _safe_exit(ctx) -> None:
try:
ctx.__exit__(None, None, None)
except Exception:
pass

View file

@ -20,7 +20,7 @@ from typing import TYPE_CHECKING
if TYPE_CHECKING: if TYPE_CHECKING:
from app.db.store import Store from app.db.store import Store
from app.models.schemas.recipe import GroceryLink, NutritionPanel, RecipeRequest, RecipeResult, RecipeSuggestion, SwapCandidate from app.models.schemas.recipe import GroceryLink, NutritionPanel, RecipeRequest, RecipeResult, RecipeSuggestion, StepAnalysis, TimeEffortProfile, SwapCandidate
from app.services.recipe.element_classifier import ElementClassifier from app.services.recipe.element_classifier import ElementClassifier
from app.services.recipe.grocery_links import GroceryLinkBuilder from app.services.recipe.grocery_links import GroceryLinkBuilder
from app.services.recipe.substitution_engine import SubstitutionEngine from app.services.recipe.substitution_engine import SubstitutionEngine
@ -36,6 +36,38 @@ _SWAP_STOPWORDS = frozenset({
"to", "from", "at", "by", "as", "on", "to", "from", "at", "by", "as", "on",
}) })
# Marketing / prep / packaging words stripped when tokenising product-label names
# into individual ingredient tokens. Parallel to Store._FTS_TOKEN_STOPWORDS —
# both lists should agree. Kept here to avoid a circular import at runtime.
_PRODUCT_TOKEN_STOPWORDS = frozenset({
# Basic English stopwords
"a", "an", "the", "of", "in", "for", "with", "and", "or", "to",
"from", "at", "by", "as", "on", "into",
# Brand / marketing words that appear in product names
"lean", "cuisine", "healthy", "choice", "stouffer", "original",
"classic", "deluxe", "homestyle", "family", "style", "grade",
"premium", "select", "natural", "organic", "fresh", "lite",
"ready", "quick", "easy", "instant", "microwave", "frozen",
"brand", "size", "large", "small", "medium", "extra",
# Plant-based / alt-meat brand names
"daring", "gardein", "morningstar", "lightlife", "tofurky",
"quorn", "omni", "nuggs", "simulate",
# Preparation states
"cut", "diced", "sliced", "chopped", "minced", "shredded",
"cooked", "raw", "whole", "boneless", "skinless", "trimmed",
"pre", "prepared", "marinated", "seasoned", "breaded", "battered",
"grilled", "roasted", "smoked", "canned", "dried", "dehydrated",
"pieces", "piece", "strips", "strip", "chunks", "chunk",
"fillets", "fillet", "cutlets", "cutlet", "tenders", "nuggets",
# Units / packaging
"oz", "lb", "lbs", "pkg", "pack", "box", "can", "bag", "jar",
# Adjectives that aren't ingredients
"firm", "soft", "silken", "hard", "crispy", "crunchy", "smooth",
"mild", "spicy", "hot", "sweet", "savory", "unsalted", "salted",
"low", "high", "reduced", "free", "fat", "sodium", "sugar", "calorie",
"dairy", "gluten", "vegan", "plant", "based", "free",
})
# Maps product-label substrings to recipe-corpus canonical terms. # Maps product-label substrings to recipe-corpus canonical terms.
# Kept in sync with Store._FTS_SYNONYMS — both must agree on canonical names. # Kept in sync with Store._FTS_SYNONYMS — both must agree on canonical names.
# Used to expand pantry_set so single-word recipe ingredients can match # Used to expand pantry_set so single-word recipe ingredients can match
@ -363,6 +395,13 @@ def _expand_pantry_set(
if pattern in lower: if pattern in lower:
expanded.add(canonical) expanded.add(canonical)
# Extract individual ingredient tokens from multi-word product names.
# "Organic Extra Firm Tofu" → adds "tofu"; "Brown Basmati Rice" → adds "rice".
# This catches plain ingredients that _PANTRY_LABEL_SYNONYMS doesn't translate.
for token in lower.split():
if len(token) >= 4 and token not in _PRODUCT_TOKEN_STOPWORDS:
expanded.add(token)
# Secondary state expansion — adds terms like "stale bread", "day-old rice" # Secondary state expansion — adds terms like "stale bread", "day-old rice"
if secondary_pantry_items and item in secondary_pantry_items: if secondary_pantry_items and item in secondary_pantry_items:
state_label = secondary_pantry_items[item] state_label = secondary_pantry_items[item]
@ -736,9 +775,13 @@ class RecipeEngine:
# - match ratio: require ≥60% ingredient coverage to avoid low-signal results # - match ratio: require ≥60% ingredient coverage to avoid low-signal results
_l1 = req.level == 1 and not req.shopping_mode _l1 = req.level == 1 and not req.shopping_mode
nf = req.nutrition_filters nf = req.nutrition_filters
# L1 uses a larger candidate pool — the ratio gate below will prune
# aggressively anyway, so we need more raw candidates to end up with
# enough results for a packaged-food / plant-based pantry.
_fts_limit = 60 if _l1 else 20
rows = self._store.search_recipes_by_ingredients( rows = self._store.search_recipes_by_ingredients(
req.pantry_items, req.pantry_items,
limit=20, limit=_fts_limit,
category=req.category or None, category=req.category or None,
max_calories=nf.max_calories, max_calories=nf.max_calories,
max_sugar_g=nf.max_sugar_g, max_sugar_g=nf.max_sugar_g,
@ -749,8 +792,11 @@ class RecipeEngine:
) )
# L1 strict defaults: cap missing ingredients and require a minimum ratio. # L1 strict defaults: cap missing ingredients and require a minimum ratio.
# 0.35 allows ~1/3 ingredient coverage — low enough for packaged/plant-based
# pantries that rarely match raw-ingredient corpus recipes 1:1, but still
# filters out recipes where only one common staple matched.
_L1_MAX_MISSING_DEFAULT = 2 _L1_MAX_MISSING_DEFAULT = 2
_L1_MIN_MATCH_RATIO = 0.6 _L1_MIN_MATCH_RATIO = 0.35
effective_max_missing = req.max_missing effective_max_missing = req.max_missing
if _l1 and effective_max_missing is None: if _l1 and effective_max_missing is None:
effective_max_missing = _L1_MAX_MISSING_DEFAULT effective_max_missing = _L1_MAX_MISSING_DEFAULT
@ -834,9 +880,14 @@ class RecipeEngine:
except Exception: except Exception:
directions = [directions] directions = [directions]
# Compute complexity for every suggestion (used for badge + filter). # Compute complexity + parse time effort once — reused for filters and response.
row_complexity = _classify_method_complexity(directions, available_equipment) row_complexity = _classify_method_complexity(directions, available_equipment)
row_time_min = _estimate_time_min(directions, row_complexity) row_time_min = _estimate_time_min(directions, row_complexity)
row_time_effort = parse_time_effort(
directions,
ingredients=row.get("ingredients") or [],
ingredient_names=row.get("ingredient_names") or [],
)
# Filter and tier-rank by hard_day_mode # Filter and tier-rank by hard_day_mode
if req.hard_day_mode: if req.hard_day_mode:
@ -856,9 +907,24 @@ class RecipeEngine:
if req.max_time_min is not None and row_time_min > req.max_time_min: if req.max_time_min is not None and row_time_min > req.max_time_min:
continue continue
# Total time filter (kiwi#52) — uses parsed time from directions # Total time filter (kiwi#52).
if req.max_total_min is not None and not _within_time(directions, req.max_total_min): # Prefer parsed time extracted from direction text (explicit "15 minutes" mentions).
continue # When directions contain no parseable time signals, fall back to the
# step-count estimate so the filter still has teeth on the corpus majority.
if req.max_total_min is not None:
if row_time_effort.total_min > 0:
if row_time_effort.total_min > req.max_total_min:
continue
elif row_time_min > req.max_total_min:
continue
# Active (hands-on) time filter — independent of total time.
# Lets users request "≤30 min hands-on, any total" to include slow braises.
# Skips recipes where active_min == 0 (no time signals parsed) to avoid
# hiding valid results when the parser couldn't extract timing.
if req.max_active_min is not None and row_time_effort.active_min > 0:
if row_time_effort.active_min > req.max_active_min:
continue
# Level 2: also add dietary constraint swaps from substitution_pairs # Level 2: also add dietary constraint swaps from substitution_pairs
if req.level == 2 and req.constraints: if req.level == 2 and req.constraints:
@ -897,6 +963,21 @@ class RecipeEngine:
v is not None v is not None
for v in (nutrition.calories, nutrition.sugar_g, nutrition.carbs_g) for v in (nutrition.calories, nutrition.sugar_g, nutrition.carbs_g)
) )
te = TimeEffortProfile(
active_min=row_time_effort.active_min,
passive_min=row_time_effort.passive_min,
total_min=row_time_effort.total_min,
effort_label=row_time_effort.effort_label,
equipment=list(row_time_effort.equipment),
step_analyses=[
StepAnalysis(
is_passive=sa.is_passive,
detected_minutes=sa.detected_minutes,
prep_min=sa.prep_min,
)
for sa in row_time_effort.step_analyses
],
)
suggestions.append(RecipeSuggestion( suggestions.append(RecipeSuggestion(
id=row["id"], id=row["id"],
title=row["title"], title=row["title"],
@ -905,12 +986,14 @@ class RecipeEngine:
swap_candidates=swap_candidates, swap_candidates=swap_candidates,
matched_ingredients=matched, matched_ingredients=matched,
missing_ingredients=missing, missing_ingredients=missing,
directions=directions,
prep_notes=sorted(prep_note_set), prep_notes=sorted(prep_note_set),
level=req.level, level=req.level,
nutrition=nutrition if has_nutrition else None, nutrition=nutrition if has_nutrition else None,
source_url=_build_source_url(row), source_url=_build_source_url(row),
complexity=row_complexity, complexity=row_complexity,
estimated_time_min=row_time_min, estimated_time_min=row_time_min,
time_effort=te,
)) ))
# Sort corpus results. # Sort corpus results.

View file

@ -0,0 +1,524 @@
"""Recipe scanner service (kiwi#9).
Extracts structured recipe data from one or more photos of recipe cards,
cookbook pages, or handwritten notes.
Pipeline:
photo(s) -> EXIF correction -> VLM extraction -> JSON parse -> pantry cross-ref
Vision backend priority (mirrors receipt OCR pattern):
1. cf-orch vision service (if CF_ORCH_URL set)
2. Local Qwen2.5-VL (if GPU available)
3. Anthropic API (BYOK -- if ANTHROPIC_API_KEY set)
BSL 1.1 -- requires Paid tier or BYOK.
"""
from __future__ import annotations
import base64
import io
import json
import logging
import os
import re
from collections.abc import Callable
from dataclasses import dataclass
from pathlib import Path
logger = logging.getLogger(__name__)
# Maximum number of photos per scan call (to limit VLM context / VRAM)
MAX_IMAGES = 4
# VLM prompt -- adapted from tests/fixtures/recipe_scan/extract_test.py
_EXTRACTION_PROMPT = """
You are extracting a recipe from a photograph of a recipe card, cookbook page, or handwritten note.
If two or more images are provided, treat them as a single recipe across multiple pages
(e.g. ingredients on page 1, directions on page 2).
Return a single JSON object with these fields:
- title: recipe name (string)
- subtitle: any secondary title or serving suggestion e.g. "with Broccoli & Ranch Dressing" (string or null)
- servings: serving size if shown, as a string e.g. "2", "4-6" (string or null)
- cook_time: total cook time if shown, e.g. "15 min", "1 hour" (string or null)
- source_note: any attribution text like "From Betty Crocker" or "Purple Carrot" (string or null)
- ingredients: array of ingredient objects, each with:
- name: normalized generic ingredient name, lowercase, no quantities, no brand names
(e.g. "Follow Your Heart Vegan Ranch" becomes "ranch dressing")
- qty: quantity as a string, preserving fractions e.g. "1/2", a quarter symbol (string or null)
- unit: unit of measure, null for countable items (e.g. "3 eggs" has unit: null)
- raw: the original ingredient line verbatim, exactly as it appears
- steps: ordered array of instruction strings, one distinct step per element
- notes: any tips, substitutions, storage instructions, or variations (string or null)
- confidence: "high" if text is clear and complete, "medium" if some parts are uncertain,
"low" if mostly handwritten or significantly degraded
- warnings: array of strings describing anything the user should double-check
(e.g. "Directions appear to continue on another page not shown")
Return only valid JSON. No markdown fences. No explanation outside the JSON.
If the image does not appear to be a recipe at all, return: {"error": "not_a_recipe"}
""".strip()
# ── Data types ─────────────────────────────────────────────────────────────────
@dataclass
class ScannedIngredient:
name: str
qty: str | None = None
unit: str | None = None
raw: str | None = None
in_pantry: bool = False
@dataclass
class ScannedRecipeResult:
title: str | None
subtitle: str | None
servings: str | None
cook_time: str | None
source_note: str | None
ingredients: list[ScannedIngredient]
steps: list[str]
notes: str | None
tags: list[str]
pantry_match_pct: int
confidence: str
warnings: list[str]
# ── Image helpers ──────────────────────────────────────────────────────────────
def _load_image_b64(path: Path) -> str:
"""Load image, apply EXIF rotation, return base64-encoded JPEG bytes."""
from PIL import Image, ImageOps
with open(path, "rb") as f:
raw = f.read()
img = Image.open(io.BytesIO(raw))
img = ImageOps.exif_transpose(img).convert("RGB")
buf = io.BytesIO()
img.save(buf, format="JPEG", quality=90)
return base64.b64encode(buf.getvalue()).decode()
# ── Vision backend ─────────────────────────────────────────────────────────────
def _call_via_anthropic(image_paths: list[Path], prompt: str) -> str:
"""Send image(s) + prompt to Anthropic API. Raises RuntimeError if unavailable."""
try:
import anthropic
except ImportError as exc:
raise RuntimeError("anthropic package not installed") from exc
api_key = os.environ.get("ANTHROPIC_API_KEY")
if not api_key:
raise RuntimeError("ANTHROPIC_API_KEY not set")
client = anthropic.Anthropic(api_key=api_key)
content: list[dict] = []
for i, path in enumerate(image_paths):
if i > 0:
content.append({"type": "text", "text": f"(Page {i + 1} of the same recipe:)"})
content.append({
"type": "image",
"source": {
"type": "base64",
"media_type": "image/jpeg",
"data": _load_image_b64(path),
},
})
content.append({"type": "text", "text": prompt})
msg = client.messages.create(
# Haiku is cost-efficient for well-structured extraction prompts
model="claude-haiku-4-5-20251001",
max_tokens=2048,
messages=[{"role": "user", "content": content}],
)
return msg.content[0].text.strip()
def _call_via_local_vlm(image_paths: list[Path], prompt: str) -> str:
"""Send image(s) + prompt to local Qwen2.5-VL. Raises RuntimeError if unavailable."""
try:
import torch
except ImportError as exc:
raise RuntimeError("torch not installed") from exc
if not torch.cuda.is_available():
raise RuntimeError("No CUDA device -- local VLM unavailable")
# Lazy import so the module loads fast when GPU is absent
from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
from PIL import Image, ImageOps
model_name = "Qwen/Qwen2.5-VL-7B-Instruct"
logger.info("Loading local VLM for recipe scan: %s", model_name)
model = Qwen2VLForConditionalGeneration.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto",
low_cpu_mem_usage=True,
)
processor = AutoProcessor.from_pretrained(model_name)
model.train(False) # inference mode
images = []
for path in image_paths:
with open(path, "rb") as f:
raw = f.read()
img = Image.open(io.BytesIO(raw))
img = ImageOps.exif_transpose(img).convert("RGB")
images.append(img)
inputs = processor(images=images, text=prompt, return_tensors="pt")
inputs = {k: v.to("cuda", torch.float16) if isinstance(v, torch.Tensor) else v
for k, v in inputs.items()}
with torch.no_grad():
output_ids = model.generate(
**inputs,
max_new_tokens=2048,
do_sample=False,
temperature=0.0,
)
output = processor.decode(output_ids[0], skip_special_tokens=True)
output = output.replace(prompt, "").strip()
# Free VRAM
del model
torch.cuda.empty_cache()
return output
def _build_ocr_extraction_prompt(ocr_text: str) -> str:
"""Build a text-LLM prompt for structuring OCR output into recipe JSON.
Swaps the image-centric preamble of _EXTRACTION_PROMPT for an OCR-centric
one, then appends the combined OCR text as input. The JSON schema section
is shared verbatim to keep the two paths in sync.
"""
schema_idx = _EXTRACTION_PROMPT.find("Return a single JSON object")
schema_part = _EXTRACTION_PROMPT[schema_idx:] if schema_idx != -1 else _EXTRACTION_PROMPT
return (
"You are extracting a recipe from OCR text taken from a recipe card, "
"cookbook page, or handwritten note.\n\n"
"The text below was obtained via optical character recognition and may "
"contain minor scanning artifacts or formatting irregularities.\n\n"
f"{schema_part}\n\nOCR Text:\n{ocr_text}"
)
def _call_via_cf_text_vlm(alloc_url: str, image_paths: list[Path], prompt: str) -> str:
"""Call the cf-text OpenAI-compat API with images via the llama.cpp multimodal backend."""
import httpx
content: list[dict] = []
for i, path in enumerate(image_paths):
if i > 0:
content.append({"type": "text", "text": f"(Page {i + 1} of the same recipe:)"})
b64 = _load_image_b64(path)
content.append({
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{b64}"},
})
content.append({"type": "text", "text": prompt})
resp = httpx.post(
f"{alloc_url.rstrip('/')}/v1/chat/completions",
json={
"model": "local",
"messages": [{"role": "user", "content": content}],
"max_tokens": 2048,
"temperature": 0.0,
},
timeout=180.0,
)
resp.raise_for_status()
return resp.json()["choices"][0]["message"]["content"].strip()
def _call_vision_backend(
image_paths: list[Path],
prompt: str,
progress_cb: "Callable[[str, str], None] | None" = None,
) -> str:
"""Dispatch to the best available vision backend.
Priority: cf-orch (Qwen2-VL GGUF via cf-text) -> local Qwen2.5-VL -> Anthropic API.
Raises RuntimeError with a clear message when no backend is available.
Args:
image_paths: Images to process.
prompt: Extraction prompt (used by local VLM / Anthropic paths).
progress_cb: Optional callback(status, message) for SSE progress events.
Called synchronously from the thread caller bridges to async.
"""
def _progress(status: str, message: str) -> None:
if progress_cb:
progress_cb(status, message)
errors: list[str] = []
# 1. Try cf-orch task allocation → cf-docuvision (Qwen2-VL GGUF via llama.cpp).
# Two-step: docuvision OCRs the image(s), then LLMRouter structures the text into JSON.
cf_orch_url = os.environ.get("CF_ORCH_URL")
if cf_orch_url:
try:
from app.services.task_inference import TaskNotRegistered, task_allocate
from app.services.ocr.docuvision_client import DocuvisionClient
from circuitforge_core.llm.router import LLMRouter
try:
_progress("allocating", "Starting vision service...")
with task_allocate("kiwi", "recipe_scan", service_hint="cf-docuvision", ttl_s=120.0) as alloc:
_progress("scanning", "Extracting recipe text from photo...")
doc_client = DocuvisionClient(alloc.url)
ocr_parts: list[str] = []
for i, path in enumerate(image_paths):
result = doc_client.extract_text(path, hint="text")
prefix = f"(Page {i + 1} of the same recipe)\n" if len(image_paths) > 1 else ""
ocr_parts.append(f"{prefix}{result.text}")
combined_ocr = "\n\n".join(ocr_parts)
if not combined_ocr.strip():
raise ValueError("Docuvision returned no text — image may not be a recipe")
_progress("structuring", "Parsing recipe structure...")
text = LLMRouter().complete(
_build_ocr_extraction_prompt(combined_ocr),
system="You are a recipe data extractor. Return ONLY valid JSON. No markdown, no explanation, no code fences.",
)
if text:
return text
except TaskNotRegistered:
logger.debug("kiwi.recipe_scan not yet registered in cf-orch assignments")
except Exception as exc:
logger.debug("cf-orch vision failed for recipe scan: %s", exc)
errors.append(f"cf-orch: {exc}")
# 2. Try local Qwen2.5-VL
try:
return _call_via_local_vlm(image_paths, prompt)
except Exception as exc:
logger.debug("Local VLM unavailable for recipe scan: %s", exc)
errors.append(f"local VLM: {exc}")
# 3. Try Anthropic API (BYOK)
try:
return _call_via_anthropic(image_paths, prompt)
except Exception as exc:
logger.debug("Anthropic API failed for recipe scan: %s", exc)
errors.append(f"Anthropic: {exc}")
raise RuntimeError(
"No vision backend configured for recipe scanning. "
"Options: cf-orch (CF_ORCH_URL), local GPU, or ANTHROPIC_API_KEY (BYOK). "
f"Errors: {'; '.join(errors)}"
)
# ── Parsing helpers ────────────────────────────────────────────────────────────
def _normalize_ingredient_name(name: str) -> str:
"""Lowercase + strip whitespace. Preserves multi-word names as-is."""
return name.lower().strip()
def _extract_json_object(text: str) -> str | None:
"""Return the first balanced JSON object from text, or None if not found.
Uses brace-counting rather than a greedy regex so trailing prose and
nested objects are handled correctly.
"""
start = text.find("{")
if start == -1:
return None
depth = 0
in_string = False
escape_next = False
for i, ch in enumerate(text[start:], start):
if escape_next:
escape_next = False
continue
if ch == "\\" and in_string:
escape_next = True
continue
if ch == '"':
in_string = not in_string
continue
if in_string:
continue
if ch == "{":
depth += 1
elif ch == "}":
depth -= 1
if depth == 0:
return text[start : i + 1]
return None
def _parse_scanner_json(raw_text: str) -> dict:
"""Extract and return the JSON dict from VLM output.
Handles:
- Pure JSON
- JSON in ```json ... ``` markdown fences
- Qwen3-style <think>...</think> or <thinking>...</thinking> preambles
- JSON preceded or followed by prose
Raises ValueError on not_a_recipe or unparseable output.
"""
text = raw_text.strip()
# Strip thinking-token blocks emitted by reasoning models (Qwen3, DeepSeek-R1, etc.)
text = re.sub(r"<think>.*?</think>", "", text, flags=re.DOTALL | re.IGNORECASE).strip()
text = re.sub(r"<thinking>.*?</thinking>", "", text, flags=re.DOTALL | re.IGNORECASE).strip()
# Strip markdown fences if present
if "```" in text:
# Find the content between the first ``` pair
fence_match = re.search(r"```(?:json)?\s*(\{.*?\})\s*```", text, re.DOTALL)
if fence_match:
text = fence_match.group(1).strip()
# Try direct parse
try:
data = json.loads(text)
except json.JSONDecodeError:
# Fall back to brace-balanced extraction from anywhere in the output
candidate = _extract_json_object(text)
if not candidate:
logger.warning("Could not parse JSON from LLM output (first 400 chars): %r", text[:400])
raise ValueError(f"Could not parse JSON from VLM output: {text[:200]!r}")
try:
data = json.loads(candidate)
except json.JSONDecodeError as exc:
logger.warning("Brace-extracted JSON still invalid: %r", candidate[:400])
raise ValueError(f"Could not parse JSON from VLM output: {exc}") from exc
if isinstance(data, dict) and data.get("error") == "not_a_recipe":
raise ValueError("not_a_recipe: image does not appear to contain a recipe")
return data
# ── Pantry cross-reference ─────────────────────────────────────────────────────
def _cross_reference_pantry(
ingredients: list[ScannedIngredient],
pantry_names: list[str],
) -> tuple[list[ScannedIngredient], int]:
"""Mark ingredients found in the pantry and return updated list + match percent.
Matching is bidirectional by token:
- "broccoli florets" matches pantry item "broccoli" (pantry token in ingredient)
- "pumpkin seeds" matches pantry "pumpkin seeds" (exact)
Returns (updated_ingredients, pantry_match_pct).
"""
if not ingredients:
return ingredients, 0
normalized_pantry = [_normalize_ingredient_name(p) for p in pantry_names]
updated: list[ScannedIngredient] = []
matched = 0
for ingr in ingredients:
norm_ingr = _normalize_ingredient_name(ingr.name)
in_pantry = any(
(p_tok in norm_ingr or norm_ingr in p_tok)
for p in normalized_pantry
for p_tok in p.split()
if len(p_tok) >= 4 # skip short stop-words like "of", "and", "the"
)
updated.append(ScannedIngredient(
name=ingr.name,
qty=ingr.qty,
unit=ingr.unit,
raw=ingr.raw,
in_pantry=in_pantry,
))
if in_pantry:
matched += 1
pct = round(matched / len(ingredients) * 100)
return updated, pct
# ── Main scanner class ─────────────────────────────────────────────────────────
class RecipeScanner:
"""Stateless recipe scanner. One instance can be reused across requests."""
def scan(
self,
image_paths: list[Path],
pantry_names: list[str] | None = None,
progress_cb: Callable[[str, str], None] | None = None,
) -> ScannedRecipeResult:
"""Extract a structured recipe from one or more photos.
Args:
image_paths: 1-4 image files (phone photos, scans).
pantry_names: Flat list of product names from user's inventory.
Pass [] or None to skip pantry cross-reference.
Returns:
ScannedRecipeResult with all fields populated.
Raises:
ValueError: Image is not a recipe, or JSON could not be parsed.
RuntimeError: No vision backend is configured.
"""
if not image_paths:
raise ValueError("At least one image is required")
if len(image_paths) > MAX_IMAGES:
raise ValueError(f"Maximum {MAX_IMAGES} images per scan (got {len(image_paths)})")
# Call vision backend
raw_text = _call_vision_backend(image_paths, _EXTRACTION_PROMPT, progress_cb=progress_cb)
# Parse JSON from VLM output
data = _parse_scanner_json(raw_text)
# Build ingredient list
raw_ingredients = data.get("ingredients") or []
ingredients: list[ScannedIngredient] = [
ScannedIngredient(
name=str(item.get("name") or "").strip() or "unknown",
qty=str(item["qty"]) if item.get("qty") is not None else None,
unit=str(item["unit"]) if item.get("unit") is not None else None,
raw=str(item["raw"]) if item.get("raw") is not None else None,
)
for item in raw_ingredients
if isinstance(item, dict)
]
# Pantry cross-reference
ingredients, pct = _cross_reference_pantry(
ingredients,
pantry_names or [],
)
return ScannedRecipeResult(
title=data.get("title") or None,
subtitle=data.get("subtitle") or None,
servings=str(data["servings"]) if data.get("servings") is not None else None,
cook_time=str(data["cook_time"]) if data.get("cook_time") is not None else None,
source_note=data.get("source_note") or None,
ingredients=ingredients,
steps=[str(s) for s in (data.get("steps") or []) if s],
notes=data.get("notes") or None,
tags=list(data.get("tags") or []),
pantry_match_pct=pct,
confidence=data.get("confidence") or "medium",
warnings=list(data.get("warnings") or []),
)

View file

@ -0,0 +1,139 @@
# app/services/recipe/style_classifier.py
# BSL 1.1 — LLM feature
"""LLM style-tag classifier for saved recipes.
Reads recipe title, ingredients, and directions and suggests 35 style tags
from the curated vocabulary shared with SaveRecipeModal.vue.
Cloud (CF_ORCH_URL set): allocates a cf-text service via cf-orch (2 GB VRAM).
Local: falls back to LLMRouter (ollama / vllm / openai-compat).
"""
from __future__ import annotations
import json
import logging
import os
import re
from contextlib import nullcontext
from typing import Any
logger = logging.getLogger(__name__)
_SERVICE_TYPE = "cf-text"
_TTL_S = 60.0
_CALLER = "kiwi-style-classify"
# Canonical vocabulary — must stay in sync with SUGGESTED_TAGS in SaveRecipeModal.vue.
STYLE_TAG_VOCAB: frozenset[str] = frozenset({
"comforting", "light", "spicy", "umami", "sweet", "savory", "rich",
"crispy", "creamy", "hearty", "quick", "hands-off", "meal-prep-friendly",
"fancy", "one-pot",
})
_SYSTEM_PROMPT = """\
You are a culinary tagger. Given a recipe, suggest 3 to 5 style tags that best \
describe its character. You MUST only use tags from this list:
comforting, light, spicy, umami, sweet, savory, rich, crispy, creamy, hearty, \
quick, hands-off, meal-prep-friendly, fancy, one-pot
Return ONLY a JSON array of strings, no explanation. Example:
["comforting", "hearty", "one-pot"]
"""
def _build_router():
"""Return (router, context_manager) for style classify tasks.
Tries cf-orch cf-text allocation first; falls back to LLMRouter.
Returns (None, nullcontext) if no backend is available.
"""
cf_orch_url = os.environ.get("CF_ORCH_URL")
if cf_orch_url:
try:
from app.services.meal_plan.llm_router import _OrchTextRouter # reuse adapter
from circuitforge_orch.client import CFOrchClient
client = CFOrchClient(cf_orch_url)
ctx = client.allocate(service=_SERVICE_TYPE, ttl_s=_TTL_S, caller=_CALLER)
alloc = ctx.__enter__()
if alloc is not None:
return _OrchTextRouter(alloc.url), ctx
except Exception as exc:
logger.debug("cf-orch allocation failed for style classify, falling back: %s", exc)
try:
from circuitforge_core.llm.router import LLMRouter
return LLMRouter(), nullcontext(None)
except FileNotFoundError:
logger.debug("LLMRouter: no llm.yaml — style classifier LLM disabled")
return None, nullcontext(None)
except Exception as exc:
logger.debug("LLMRouter init failed: %s", exc)
return None, nullcontext(None)
def _parse_tags(raw: str) -> list[str]:
"""Extract valid vocab tags from raw LLM output.
Tries JSON parse first; falls back to extracting any vocab word present
in the response text so minor formatting deviations still work.
"""
# Strip markdown fences
raw = re.sub(r"```[a-z]*", "", raw).strip()
try:
parsed = json.loads(raw)
if isinstance(parsed, list):
return [t for t in parsed if isinstance(t, str) and t in STYLE_TAG_VOCAB][:5]
except (json.JSONDecodeError, ValueError):
pass
# Fallback: scan for vocab words
found = [t for t in STYLE_TAG_VOCAB if re.search(rf"\b{re.escape(t)}\b", raw, re.IGNORECASE)]
return sorted(found, key=lambda t: raw.lower().index(t.lower()))[:5]
def classify_style(recipe: dict[str, Any]) -> list[str]:
"""Return 35 suggested style tags for *recipe*.
*recipe* is a Store row dict with at least ``title``, ``ingredient_names``
(list[str]), and ``directions`` (list[str] or str).
Returns an empty list if no LLM backend is available.
"""
router, ctx = _build_router()
if router is None:
return []
title = recipe.get("title") or "Unknown"
ingredients = recipe.get("ingredient_names") or []
if isinstance(ingredients, str):
try:
ingredients = json.loads(ingredients)
except Exception:
ingredients = [ingredients]
directions = recipe.get("directions") or []
if isinstance(directions, str):
try:
directions = json.loads(directions)
except Exception:
directions = [directions]
user_prompt = (
f"Recipe: {title}\n"
f"Ingredients: {', '.join(str(i) for i in ingredients[:20])}\n"
f"Steps: {' '.join(str(d) for d in directions[:8])[:600]}"
)
try:
with ctx:
raw = router.complete(
system=_SYSTEM_PROMPT,
user=user_prompt,
max_tokens=64,
temperature=0.3,
)
return _parse_tags(raw)
except Exception as exc:
logger.warning("Style classifier LLM call failed: %s", exc)
return []

View file

@ -22,6 +22,8 @@ queries find recipes the food.com corpus tags alone would miss.
""" """
from __future__ import annotations from __future__ import annotations
import re
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
# Text-signal tables # Text-signal tables
@ -121,6 +123,50 @@ _TIME_SIGNALS: list[tuple[str, list[str]]] = [
("time:Slow Cook", ["slow cooker", "crockpot", "< 4 hours", "braise"]), ("time:Slow Cook", ["slow cooker", "crockpot", "< 4 hours", "braise"]),
] ]
# ---------------------------------------------------------------------------
# Meal type signals — matched against TITLE ONLY (not ingredient text).
# Ingredient names frequently contain words like "cake flour" or "sandwich
# bread" which would produce false meal-type tags if matched against the full
# title+ingredient string.
# ---------------------------------------------------------------------------
_MEAL_SIGNALS: list[tuple[str, list[str]]] = [
("meal:Breakfast", [
"breakfast", "pancake", "waffle", "french toast", "scrambled egg",
"frittata", "hash brown", "hash browns", "breakfast burrito",
"breakfast sandwich", "breakfast casserole", "overnight oat",
"granola", "oatmeal", "muffin", "morning glory", "eggs benedict",
"shakshuka", "crepe", "scone",
]),
("meal:Dessert", [
"dessert", "cake", "cookie", "brownie", "cheesecake", "pudding",
"fudge", "ice cream", "sorbet", "cupcake", "mousse", "candy",
"truffle", "gelato", "donut", "doughnut", "cobbler", "crisp",
"crumble", "tiramisu", "eclair", "sundae", "milkshake", "parfait",
"biscotti", "macaron", "panna cotta", "baklava", "churro", "tart",
"torte", "strudel", "compote", "semifreddo",
]),
("meal:Snack", [
"snack", "appetizer", "dip", "chips", "popcorn", "trail mix",
"energy ball", "deviled egg", "cheese ball", "nachos",
"pretzel bites", "protein ball", "granola bar",
]),
("meal:Beverage", [
"smoothie", "cocktail", "mocktail", "lemonade", "limeade",
"margarita", "sangria", "punch", "milkshake", "milk shake",
"juice", "spritzer", "iced tea", "hot chocolate", "chai latte",
"mulled wine", "eggnog", "slushie", "frappe", "horchata",
"agua fresca", "shrub", "switchel",
]),
("meal:Lunch", [
"lunch", "sandwich", "panini", "grilled cheese", "wrap",
"lunchbox", "lunch box",
]),
("meal:Bread", [
"bread", "sourdough", "focaccia", "flatbread", "dinner roll",
"loaf", "baguette", "ciabatta", "brioche", "challah", "pita",
]),
]
_MAIN_INGREDIENT_SIGNALS: list[tuple[str, list[str]]] = [ _MAIN_INGREDIENT_SIGNALS: list[tuple[str, list[str]]] = [
("main:Chicken", ["chicken", "poultry", "turkey"]), ("main:Chicken", ["chicken", "poultry", "turkey"]),
("main:Beef", ["beef", "ground beef", "steak", "brisket", "pot roast"]), ("main:Beef", ["beef", "ground beef", "steak", "brisket", "pot roast"]),
@ -196,6 +242,29 @@ def _match_signals(text: str, table: list[tuple[str, list[str]]]) -> list[str]:
return [tag for tag, pats in table if any(p in text for p in pats)] return [tag for tag, pats in table if any(p in text for p in pats)]
def _match_title_signals(title: str, table: list[tuple[str, list[str]]]) -> list[str]:
"""Match signals against title text only, using word-boundary + optional plural.
Pattern: `\\bWORD(?:s|es)?\\b`
This handles:
- Plurals: "cookie" matches "cookies", "sandwich" matches "sandwiches"
- Substring rejection: "cake" does NOT match "pancake" (no word boundary
before 'c' in pan|cake), "tart" does NOT match "tartare" (after "tart"
the 'a' is a word char, not a boundary)
- Avoids false positives from ingredient text ("cake flour", "sandwich bread")
by only matching the recipe title, not the full title+ingredient string.
"""
t = title.lower()
return [
tag for tag, pats in table
if any(
re.search(r"\b" + re.escape(p.strip()) + r"(?:s|es)?\b", t)
for p in pats
)
]
def infer_tags( def infer_tags(
title: str, title: str,
ingredient_names: list[str], ingredient_names: list[str],
@ -258,6 +327,9 @@ def infer_tags(
tags.update(_match_signals(text, _FLAVOR_SIGNALS)) tags.update(_match_signals(text, _FLAVOR_SIGNALS))
tags.update(_match_signals(text, _MAIN_INGREDIENT_SIGNALS)) tags.update(_match_signals(text, _MAIN_INGREDIENT_SIGNALS))
# Meal type: title-only to avoid "cake flour" → meal:Dessert false positives
tags.update(_match_title_signals(title, _MEAL_SIGNALS))
# 3. Time signals from corpus keywords + text # 3. Time signals from corpus keywords + text
corpus_text = " ".join(kw.lower() for kw in corpus_keywords) corpus_text = " ".join(kw.lower() for kw in corpus_keywords)
tags.update(_match_signals(corpus_text, _TIME_SIGNALS)) tags.update(_match_signals(corpus_text, _TIME_SIGNALS))

View file

@ -1,17 +1,27 @@
""" """
Runtime parser for active/passive time split and equipment detection. Runtime parser for active/passive time split, prep effort, and equipment detection.
Operates over a list of direction strings. No I/O pure Python functions. Operates over a list of direction strings plus an optional ingredient list.
Sub-millisecond for up to 20 recipes (20 × ~10 steps each = 200 regex calls). No I/O pure Python functions. Sub-millisecond for up to 20 recipes.
Time estimation strategy (in priority order):
1. Explicit time mention in step text ("simmer for 20 minutes")
2. Passive keyword + per-technique default ("bake until golden" 30 min)
3. Prep action + ingredient quantity scaling ("dice 2 lbs potatoes" ~5 min)
4. Fallback active default (assembly/misc steps 2 min each)
Quantity scaling uses n^0.75 (sub-linear, matching human batch-work curves).
Pass `ingredients` + `ingredient_names` to enable cross-referenced scaling.
Without them, prep actions use base times only (no scaling).
""" """
from __future__ import annotations from __future__ import annotations
import math import math
import re import re
from dataclasses import dataclass from dataclasses import dataclass, field
from typing import Final from typing import Final
# ── Passive step keywords (whole-word, case-insensitive) ────────────────── # ── Passive step keywords ─────────────────────────────────────────────────
_PASSIVE_PATTERNS: Final[list[str]] = [ _PASSIVE_PATTERNS: Final[list[str]] = [
"simmer", "bake", "roast", "broil", "refrigerate", "marinate", "simmer", "bake", "roast", "broil", "refrigerate", "marinate",
@ -20,19 +30,39 @@ _PASSIVE_PATTERNS: Final[list[str]] = [
r"slow\s+cook", r"pressure\s+cook", r"slow\s+cook", r"pressure\s+cook",
] ]
# Pre-compiled as a single alternation — avoids re-compiling on every call.
_PASSIVE_RE: re.Pattern[str] = re.compile( _PASSIVE_RE: re.Pattern[str] = re.compile(
r"\b(?:" + "|".join(_PASSIVE_PATTERNS) + r")\b", r"\b(?:" + "|".join(_PASSIVE_PATTERNS) + r")\b",
re.IGNORECASE, re.IGNORECASE,
) )
# ── Time extraction regex ───────────────────────────────────────────────── # Per-technique passive defaults (minutes) — used when no explicit time found.
# Calibrated to conservative midpoints from USDA FoodKeeper + culinary practice.
_PASSIVE_DEFAULTS: Final[list[tuple[re.Pattern[str], int]]] = [
# Multi-word first (longer match wins)
(re.compile(r"\bslow\s+cook\b", re.IGNORECASE), 300), # 5 hr crockpot default
(re.compile(r"\bpressure\s+cook\b", re.IGNORECASE), 15),
(re.compile(r"\bovernight\b", re.IGNORECASE), 480), # 8 hr
# Single-word
(re.compile(r"\bbraise\b", re.IGNORECASE), 90),
(re.compile(r"\bmarinate\b", re.IGNORECASE), 60),
(re.compile(r"\brefrigerate\b", re.IGNORECASE), 120),
(re.compile(r"\bproof\b|\brise\b", re.IGNORECASE), 60),
(re.compile(r"\bsoak\b", re.IGNORECASE), 30),
(re.compile(r"\bfreeze\b", re.IGNORECASE), 120),
(re.compile(r"\bchill\b", re.IGNORECASE), 60),
(re.compile(r"\broast\b", re.IGNORECASE), 40),
(re.compile(r"\bbake\b", re.IGNORECASE), 30),
(re.compile(r"\bbroil\b", re.IGNORECASE), 8),
(re.compile(r"\bsimmer\b", re.IGNORECASE), 20),
(re.compile(r"\bset\b", re.IGNORECASE), 30), # gelatin / custard set
(re.compile(r"\bsteep\b", re.IGNORECASE), 5),
(re.compile(r"\brest\b|\bstand\b", re.IGNORECASE), 10),
(re.compile(r"\bcool\b", re.IGNORECASE), 15),
(re.compile(r"\bwait\b|\blet\b", re.IGNORECASE), 5),
]
# ── Explicit time extraction ──────────────────────────────────────────────
# Two-branch pattern:
# Branch A (groups 1-3): range "15-20 minutes", "1520 min"
# Branch B (groups 4-5): single "10 minutes", "2 hours", "30 sec"
#
# Separator characters: plain hyphen (-), en-dash (), or literal "-to-"
_TIME_RE: re.Pattern[str] = re.compile( _TIME_RE: re.Pattern[str] = re.compile(
r"(\d+)\s*(?:[-\u2013]|-to-)\s*(\d+)\s*(hour|hr|minute|min|second|sec)s?" r"(\d+)\s*(?:[-\u2013]|-to-)\s*(\d+)\s*(hour|hr|minute|min|second|sec)s?"
r"|" r"|"
@ -40,9 +70,242 @@ _TIME_RE: re.Pattern[str] = re.compile(
re.IGNORECASE, re.IGNORECASE,
) )
_MAX_MINUTES_PER_STEP: Final[int] = 480 # 8 hours sanity cap _MAX_MINUTES_PER_STEP: Final[int] = 480 # 8-hour sanity cap
# ── Equipment detection (keyword → label, in detection priority order) ──── # ── Prep action detection ─────────────────────────────────────────────────
# Base times (minutes) per prep action, calibrated to ~3 items / 0.5 lb reference.
# These are starting points — flagged for calibration against real recipe timing data.
_PREP_ACTION_BASES: Final[dict[str, float]] = {
# Peeling / stripping
"peel": 1.5,
"pare": 1.5,
"hull": 1.5,
"pit": 2.0, # cherries, avocados
"core": 1.0,
"stem": 1.0,
"trim": 1.0,
# Cutting
"chop": 2.0,
"cut": 1.5,
"dice": 2.5, # more precise than chop
"mince": 2.0,
"slice": 1.5,
"julienne": 4.0,
"cube": 2.0,
"quarter": 1.0,
"halve": 0.5,
"shred": 2.0,
# Grating / zesting
"grate": 3.0,
"zest": 2.0,
# Crushing
"crush": 0.5,
"smash": 0.5,
"crack": 0.5,
# Mixing / assembly (lower base — less physical effort)
"knead": 8.0, # bread dough: consistent regardless of quantity
"whisk": 1.5,
"beat": 2.0,
"cream": 3.0, # butter + sugar until fluffy
"fold": 1.5,
"stir": 0.5,
"combine": 0.5,
"mix": 1.0,
"season": 0.5,
}
# Compiled regex — longer patterns first to avoid partial matches.
_PREP_RE: re.Pattern[str] = re.compile(
r"\b(?:" + "|".join(
re.escape(k) for k in sorted(_PREP_ACTION_BASES, key=len, reverse=True)
) + r")\b",
re.IGNORECASE,
)
# Default active time per step when no explicit time and no prep action detected.
_ACTIVE_STEP_DEFAULT_MIN: Final[float] = 2.0
# ── Prep-needing ingredient classification ────────────────────────────────
#
# Only ingredients in this set get quantity-scaled prep time.
# Liquids, spices, canned goods, and dry staples are excluded — they require
# no physical prep beyond measuring.
_PREP_NEEDING: Final[frozenset[str]] = frozenset({
# Alliums
"onion", "shallot", "leek", "scallion", "green onion", "chive", "garlic",
# Root / stem vegetables
"ginger", "carrot", "celery", "potato", "sweet potato", "yam",
"beet", "turnip", "parsnip", "radish", "fennel", "celeriac",
# Squash / gourd family
"zucchini", "squash", "pumpkin", "cucumber",
# Peppers
"pepper", "bell pepper", "jalapeño", "jalapeno", "chili", "chile",
# Brassicas
"broccoli", "cauliflower", "cabbage", "kale", "chard", "spinach",
"brussels sprout",
# Other vegetables
"tomato", "eggplant", "aubergine", "corn", "artichoke", "asparagus",
"green bean", "snow pea", "snap pea", "mushroom", "lettuce",
# Fruits
"apple", "pear", "peach", "nectarine", "plum", "apricot",
"mango", "papaya", "pineapple", "melon", "watermelon", "cantaloupe",
"avocado", "banana",
"strawberry", "raspberry", "blackberry", "blueberry", "cherry",
"citrus", "lemon", "lime", "orange", "grapefruit",
# Protein (trimming / portioning)
"chicken", "turkey", "duck",
"beef", "pork", "lamb", "veal",
"fish", "salmon", "tuna", "cod", "tilapia", "halibut", "shrimp",
"scallop", "crab", "lobster",
# Dairy requiring active prep
"cheese",
# Nuts / seeds (chopping)
"almond", "walnut", "pecan", "cashew", "peanut", "hazelnut",
"pistachio", "macadamia", "nut",
# Fresh herbs (chopping / tearing)
"basil", "parsley", "cilantro", "thyme", "rosemary", "sage",
"dill", "mint", "tarragon",
# Other
"bread",
})
def _is_prep_needing(name: str) -> bool:
"""True if the normalized ingredient name contains any prep-needing keyword."""
nl = name.lower()
return any(kw in nl for kw in _PREP_NEEDING)
# ── Quantity extraction ───────────────────────────────────────────────────
_FRAC_RE: re.Pattern[str] = re.compile(r"(\d+)\s*/\s*(\d+)")
# Weight units → converted to pounds internally
_WEIGHT_RE: re.Pattern[str] = re.compile(
r"(\d+(?:\.\d+)?|\d+\s*/\s*\d+)\s*"
r"(pound|lb|ounce|oz|gram|g(?![a-z])|kilogram|kg)\s*s?\b",
re.IGNORECASE,
)
# Volume (cups only — the common recipe unit for quantity scaling)
_VOLUME_CUP_RE: re.Pattern[str] = re.compile(
r"(\d+(?:\.\d+)?|\d+\s*/\s*\d+)\s*cups?\b",
re.IGNORECASE,
)
# Count — bare integer or decimal followed by optional size/unit word
_COUNT_RE: re.Pattern[str] = re.compile(
r"(?<!\d)(\d+(?:\.\d+)?)\s*"
r"(?:large|medium|small|whole|clove|cloves|head|heads|ear|ears|"
r"stalk|stalks|sprig|sprigs|bunch|bunches|fillet|fillets|"
r"breast|breasts|piece|pieces|slice|slices)?\s*\b",
re.IGNORECASE,
)
# Reference quantities: the "1× base" for each unit type.
# Calibrated so that a typical single-ingredient amount = 1× prep time.
_QTY_REFS: Final[dict[str, float]] = {
"lb": 0.5, # 0.5 lb is the base → 1 lb = 1.4×, 2 lb = 2.0×
"cup": 1.0, # 1 cup = base
"count": 3.0, # 3 items = base → 1 = 0.46×, 6 = 1.6×
}
_SCALE_POWER: Final[float] = 0.75 # sub-linear; revisit with empirical data
_MAX_SCALE: Final[float] = 4.0 # cap at 4× regardless of quantity
_MIN_SCALE: Final[float] = 0.33 # floor at 1/3× for tiny amounts
def _parse_fraction(s: str) -> float:
m = _FRAC_RE.search(s)
if m:
try:
return float(m.group(1)) / float(m.group(2))
except (ValueError, ZeroDivisionError):
return 1.0
try:
return float(s.replace(" ", ""))
except ValueError:
return 1.0
def _extract_qty(text: str) -> tuple[float, str] | None:
"""Return (quantity_in_canonical_units, unit_type) or None.
Unit types: "lb" (weight in pounds), "cup", "count".
All weights are normalised to pounds.
"""
# Weight (most specific — check first)
m = _WEIGHT_RE.search(text)
if m:
qty = _parse_fraction(m.group(1))
u = m.group(2).lower().rstrip("s")
if u in ("pound", "lb"):
return (qty, "lb")
if u in ("ounce", "oz"):
return (qty / 16.0, "lb")
if u in ("gram", "g"):
return (qty / 453.6, "lb")
if u in ("kilogram", "kg"):
return (qty * 2.205, "lb")
# Volume (cups)
m = _VOLUME_CUP_RE.search(text)
if m:
return (_parse_fraction(m.group(1)), "cup")
# Count — only accept values in a sane range to avoid false positives
m = _COUNT_RE.search(text)
if m:
qty = float(m.group(1))
if 0 < qty <= 24:
return (qty, "count")
return None
def _extract_inline_qty_for(text: str, ing_name: str) -> tuple[float, str] | None:
"""Extract the quantity specifically associated with `ing_name` in a direction step.
Looks for a number immediately before the ingredient name (plus optional size/unit
words). Falls back to None if the pattern does not match.
Example: "Dice 2 large onions and 3 carrots" for "onion" returns (2.0, "count").
"""
pattern = re.compile(
r"(\d+(?:\.\d+)?|\d+\s*/\s*\d+)\s*"
r"(?:large|medium|small|whole|"
r"(?:pound|lb|ounce|oz|gram|g|kilogram|kg|cup|clove|cloves|"
r"head|heads|fillet|fillets|breast|breasts|piece|pieces)s?)??\s*"
+ re.escape(ing_name) + r"(?:es|s)?\b",
re.IGNORECASE,
)
m = pattern.search(text)
if m:
# Re-extract with _extract_qty on the full matched span to get unit too
span = text[m.start(): m.end()]
result = _extract_qty(span)
if result:
return result
# Fallback: bare count
try:
return (_parse_fraction(m.group(1)), "count")
except Exception:
pass
return None
def _quantity_scale(qty: float, unit: str) -> float:
"""Apply n^0.75 scaling relative to unit reference, clamped to [MIN, MAX]."""
ref = _QTY_REFS.get(unit, 1.0)
if ref <= 0 or qty <= 0:
return 1.0
raw = (qty / ref) ** _SCALE_POWER
return max(_MIN_SCALE, min(_MAX_SCALE, raw))
# ── Equipment detection ───────────────────────────────────────────────────
_EQUIPMENT_RULES: Final[list[tuple[re.Pattern[str], str]]] = [ _EQUIPMENT_RULES: Final[list[tuple[re.Pattern[str], str]]] = [
(re.compile(r"\b(?:chop|dice|mince|slice|julienne)\b", re.IGNORECASE), "Knife"), (re.compile(r"\b(?:chop|dice|mince|slice|julienne)\b", re.IGNORECASE), "Knife"),
@ -58,74 +321,8 @@ _EQUIPMENT_RULES: Final[list[tuple[re.Pattern[str], str]]] = [
(re.compile(r"\b(?:drain|strain|colander|rinse pasta)\b", re.IGNORECASE), "Colander"), (re.compile(r"\b(?:drain|strain|colander|rinse pasta)\b", re.IGNORECASE), "Colander"),
] ]
# ── Dataclasses ───────────────────────────────────────────────────────────
@dataclass(frozen=True)
class StepAnalysis:
"""Analysis result for a single direction step."""
is_passive: bool
detected_minutes: int | None # None when no time mention found in text
@dataclass(frozen=True)
class TimeEffortProfile:
"""Aggregated time and effort profile for a full recipe."""
active_min: int # total minutes requiring active attention
passive_min: int # total minutes the cook can step away
total_min: int # active_min + passive_min
step_analyses: list[StepAnalysis] # one entry per direction step
equipment: list[str] # ordered, deduplicated equipment labels
effort_label: str # "quick" | "moderate" | "involved"
# ── Core parsing logic ────────────────────────────────────────────────────
def _extract_minutes(text: str) -> int | None:
"""Return the number of minutes mentioned in text, or None.
Range values (e.g. "15-20 minutes") return the integer midpoint.
Hours are converted to minutes. Seconds are rounded up to 1 minute minimum.
Result is capped at _MAX_MINUTES_PER_STEP.
"""
m = _TIME_RE.search(text)
if m is None:
return None
if m.group(1) is not None:
# Branch A: range match (e.g. "15-20 minutes")
low = int(m.group(1))
high = int(m.group(2))
unit = m.group(3).lower()
raw_value: float = (low + high) / 2
else:
# Branch B: single value match (e.g. "10 minutes")
low = int(m.group(4))
unit = m.group(5).lower()
raw_value = float(low)
if unit in ("hour", "hr"):
minutes: float = raw_value * 60
elif unit in ("second", "sec"):
minutes = max(1.0, math.ceil(raw_value / 60))
else:
minutes = raw_value
return min(int(minutes), _MAX_MINUTES_PER_STEP)
def _classify_passive(text: str) -> bool:
"""Return True if the step text matches any passive keyword (whole-word)."""
return _PASSIVE_RE.search(text) is not None
def _detect_equipment(all_text: str, has_passive: bool) -> list[str]: def _detect_equipment(all_text: str, has_passive: bool) -> list[str]:
"""Return ordered, deduplicated list of equipment labels detected in text.
all_text should be all direction steps joined with spaces.
has_passive controls whether 'Timer' is appended at the end.
"""
seen: set[str] = set() seen: set[str] = set()
result: list[str] = [] result: list[str] = []
for pattern, label in _EQUIPMENT_RULES: for pattern, label in _EQUIPMENT_RULES:
@ -137,8 +334,172 @@ def _detect_equipment(all_text: str, has_passive: bool) -> list[str]:
return result return result
def _effort_label(step_count: int) -> str: # ── Ingredientstep cross-reference ──────────────────────────────────────
"""Derive effort label from step count."""
def _ingredient_mentioned(text: str, name: str) -> bool:
"""True if `name` appears in `text` as a whole word.
Handles both regular plurals (onion onions) and -es plurals
(potato potatoes, tomato tomatoes).
"""
pattern = re.compile(r"\b" + re.escape(name.lower()) + r"(?:es|s)?\b", re.IGNORECASE)
return bool(pattern.search(text))
def _build_step_ingredient_qtys(
ingredients: list[str],
ingredient_names: list[str],
directions: list[str],
) -> list[dict[str, tuple[float, str]]]:
"""Return, for each direction step, {ing_name: (qty_for_this_step, unit)}.
Strategy:
- Filter ingredient pairs to prep-needing items only.
- Parse total quantities from the raw ingredient strings.
- For each step, try to find an inline quantity tied to that ingredient name.
- If no inline quantity, distribute the total evenly across all steps that
mention the ingredient (handles "3 onions" split across 2 steps).
"""
# Build total qty map for prep-needing ingredients
total_qtys: dict[str, tuple[float, str]] = {}
for raw, name in zip(ingredients, ingredient_names):
base = name.lower().strip()
if not _is_prep_needing(base):
continue
result = _extract_qty(raw)
if result is not None:
total_qtys[base] = result
if not total_qtys:
return [{} for _ in directions]
# Count how many steps mention each ingredient
step_counts: dict[str, int] = {n: 0 for n in total_qtys}
for step in directions:
for name in total_qtys:
if _ingredient_mentioned(step, name):
step_counts[name] += 1
# Build per-step qty maps
per_step: list[dict[str, tuple[float, str]]] = []
for step in directions:
step_map: dict[str, tuple[float, str]] = {}
for name, (total, unit) in total_qtys.items():
if not _ingredient_mentioned(step, name):
continue
# Try ingredient-specific inline quantity first
inline = _extract_inline_qty_for(step, name)
if inline is not None:
step_map[name] = inline
else:
# Distribute total across steps that reference this ingredient
n = max(step_counts.get(name, 1), 1)
step_map[name] = (total / n, unit)
per_step.append(step_map)
return per_step
# ── Dataclasses ───────────────────────────────────────────────────────────
@dataclass(frozen=True)
class StepAnalysis:
"""Analysis result for a single direction step."""
is_passive: bool
detected_minutes: int | None # explicit or estimated time (None = no signal)
prep_min: int | None = None # estimated physical prep time from action detection
@dataclass(frozen=True)
class TimeEffortProfile:
"""Aggregated time and effort profile for a full recipe."""
active_min: int
passive_min: int
total_min: int
step_analyses: list[StepAnalysis] = field(default_factory=list)
equipment: list[str] = field(default_factory=list)
effort_label: str = "moderate" # "quick" | "moderate" | "involved"
# ── Core parsing helpers ──────────────────────────────────────────────────
def _extract_minutes(text: str) -> int | None:
"""Return explicit minutes from text, or None."""
m = _TIME_RE.search(text)
if m is None:
return None
if m.group(1) is not None:
low, high = int(m.group(1)), int(m.group(2))
unit = m.group(3).lower()
raw: float = (low + high) / 2
else:
low = int(m.group(4))
unit = m.group(5).lower()
raw = float(low)
if unit in ("hour", "hr"):
minutes: float = raw * 60
elif unit in ("second", "sec"):
minutes = max(1.0, math.ceil(raw / 60))
else:
minutes = raw
return min(int(minutes), _MAX_MINUTES_PER_STEP)
def _classify_passive(text: str) -> bool:
return _PASSIVE_RE.search(text) is not None
def _passive_default(text: str) -> int | None:
"""Return estimated passive minutes from per-keyword defaults."""
for pattern, minutes in _PASSIVE_DEFAULTS:
if pattern.search(text):
return minutes
return None
def _prep_estimate(
text: str,
step_ing_qtys: dict[str, tuple[float, str]],
) -> int:
"""Estimate active prep time from the first detected prep action + ingredient qtys.
If no prep-needing ingredient is identified in the step, uses the action's
base time at 1× (no scaling).
"""
m = _PREP_RE.search(text)
if m is None:
return 0
action = m.group(0).lower()
base = _PREP_ACTION_BASES.get(action, _ACTIVE_STEP_DEFAULT_MIN)
# Find which prep-needing ingredients this step mentions
matches: list[tuple[float, str]] = [
qty_unit
for name, qty_unit in step_ing_qtys.items()
if _ingredient_mentioned(text, name)
]
if not matches:
return round(base) # no ingredient context — use base unscaled
total = sum(base * _quantity_scale(qty, unit) for qty, unit in matches)
return round(total)
def _effort_label(total_min: int, step_count: int) -> str:
"""Effort label based on total estimated time; falls back to step count."""
if total_min > 0:
if total_min <= 20:
return "quick"
if total_min <= 45:
return "moderate"
return "involved"
# No time signals at all — fall back to step count heuristic
if step_count <= 3: if step_count <= 3:
return "quick" return "quick"
if step_count <= 7: if step_count <= 7:
@ -146,52 +507,96 @@ def _effort_label(step_count: int) -> str:
return "involved" return "involved"
def parse_time_effort(directions: list[str]) -> TimeEffortProfile: # ── Public API ────────────────────────────────────────────────────────────
"""Parse a list of direction strings into a TimeEffortProfile.
def parse_time_effort(
directions: list[str],
ingredients: list[str] | None = None,
ingredient_names: list[str] | None = None,
) -> TimeEffortProfile:
"""Parse direction strings into a TimeEffortProfile.
Args:
directions: List of step strings from the recipe corpus.
ingredients: Raw ingredient strings ("2 large onions", "1.5 lbs potatoes").
Parallel to ingredient_names.
ingredient_names: Normalised ingredient names ("onion", "potato").
Required alongside ingredients to enable quantity scaling.
Returns a zero-value profile with empty lists when directions is empty. Returns a zero-value profile with empty lists when directions is empty.
Never raises all failures silently produce sensible defaults. Never raises all failures produce sensible defaults.
""" """
if not directions: if not directions:
return TimeEffortProfile( return TimeEffortProfile(
active_min=0, active_min=0, passive_min=0, total_min=0,
passive_min=0, step_analyses=[], equipment=[], effort_label="quick",
total_min=0,
step_analyses=[],
equipment=[],
effort_label="quick",
) )
# Build per-step ingredient quantity maps (empty dicts if no ingredient data)
use_ingredients = (
bool(ingredients)
and bool(ingredient_names)
and len(ingredients) == len(ingredient_names)
)
step_ing_qtys: list[dict[str, tuple[float, str]]]
if use_ingredients:
step_ing_qtys = _build_step_ingredient_qtys(
list(ingredients), # type: ignore[arg-type]
list(ingredient_names), # type: ignore[arg-type]
directions,
)
else:
step_ing_qtys = [{} for _ in directions]
step_analyses: list[StepAnalysis] = [] step_analyses: list[StepAnalysis] = []
active_min = 0 active_min = 0
passive_min = 0 passive_min = 0
has_any_passive = False has_any_passive = False
for step in directions: for i, step in enumerate(directions):
is_passive = _classify_passive(step) is_passive = _classify_passive(step)
detected = _extract_minutes(step) detected = _extract_minutes(step)
prep_estimate: int | None = None
if is_passive: if is_passive:
has_any_passive = True has_any_passive = True
if detected is not None: if detected is not None:
passive_min += detected passive_min += detected
else:
# Fall back to per-technique default
default = _passive_default(step)
if default is not None:
passive_min += default
detected = default # surface in UI as the hint time
else: else:
if detected is not None: if detected is not None:
active_min += detected active_min += detected
# Estimate prep time from action detection + quantity scaling
prep_est = _prep_estimate(step, step_ing_qtys[i])
if prep_est > 0:
prep_estimate = prep_est
active_min += prep_est
elif detected is None:
# General active step with no time signal — apply a small default
active_min += round(_ACTIVE_STEP_DEFAULT_MIN)
step_analyses.append(StepAnalysis( step_analyses.append(StepAnalysis(
is_passive=is_passive, is_passive=is_passive,
detected_minutes=detected, detected_minutes=detected,
prep_min=prep_estimate,
)) ))
combined_text = " ".join(directions) combined_text = " ".join(directions)
equipment = _detect_equipment(combined_text, has_any_passive) equipment = _detect_equipment(combined_text, has_any_passive)
total = active_min + passive_min
return TimeEffortProfile( return TimeEffortProfile(
active_min=active_min, active_min=active_min,
passive_min=passive_min, passive_min=passive_min,
total_min=active_min + passive_min, total_min=total,
step_analyses=step_analyses, step_analyses=step_analyses,
equipment=equipment, equipment=equipment,
effort_label=_effort_label(len(directions)), effort_label=_effort_label(total, len(directions)),
) )

View file

@ -0,0 +1,124 @@
# app/services/task_inference.py
# BSL 1.1 — LLM feature
"""Task-based service allocation via the cf-orch coordinator.
Calls POST /api/inference/task instead of a hardcoded service type.
The coordinator resolves model_id and service_type from assignments.yaml.
Fallback contract (for callers):
- 404 TaskNotRegistered (fall back to direct client.allocate())
- other error RuntimeError
- CF_ORCH_URL unset RuntimeError (guard with os.environ.get first)
"""
from __future__ import annotations
import logging
import os
from collections.abc import Generator
from contextlib import contextmanager
from dataclasses import dataclass
import httpx
logger = logging.getLogger(__name__)
class TaskNotRegistered(Exception):
"""Coordinator returned 404 for a product/task pair.
Means the task is not yet in assignments.yaml. Callers should fall
back to direct service allocation (client.allocate()).
"""
@dataclass(frozen=True)
class Allocation:
url: str
allocation_id: str
service: str
def _orch_url() -> str:
return os.environ.get("CF_ORCH_URL", "").rstrip("/")
@contextmanager
def task_allocate(
product: str,
task: str,
*,
service_hint: str,
ttl_s: float = 120.0,
) -> Generator[Allocation, None, None]:
"""Context manager: allocate a service via task-based routing.
Calls POST /api/inference/task, yields Allocation, releases on exit.
Supports both `with task_allocate(...) as alloc:` and manual
`ctx = task_allocate(...); alloc = ctx.__enter__()` patterns.
**Sync-only**: uses the synchronous httpx API. Do not call from an
``async def`` handler without wrapping in ``asyncio.to_thread``. Current
call sites (``llm_router.py``, ``vl_model.py``) are synchronous.
Args:
product: CF product name (e.g. "kiwi")
task: Task identifier (e.g. "meal_plan", "ocr")
service_hint: Service type for the release DELETE call. The
coordinator response does not include service_type, so the
caller provides it. When the coordinator is updated to return
service in the response (cf-orch#63), this becomes unused.
ttl_s: Allocation TTL in seconds.
Raises:
TaskNotRegistered: Coordinator returned 404.
RuntimeError: Coordinator unreachable, returned non-404 error, or
returned a malformed (non-JSON / missing fields) response.
RuntimeError: CF_ORCH_URL is not set.
"""
base = _orch_url()
if not base:
raise RuntimeError("CF_ORCH_URL is not set")
try:
resp = httpx.post(
f"{base}/api/inference/task",
json={"product": product, "task": task, "payload": {}},
timeout=30.0,
)
except httpx.RequestError as exc:
raise RuntimeError(f"cf-orch unreachable: {exc}") from exc
if resp.status_code == 404:
raise TaskNotRegistered(
f"No assignment for product={product!r} task={task!r}"
"ensure cf-orch#61/62 are deployed and coordinator reloaded"
)
if not resp.is_success:
raise RuntimeError(
f"cf-orch /api/inference/task failed: "
f"HTTP {resp.status_code}{resp.text[:200]}"
)
try:
data = resp.json()
alloc = Allocation(
url=data["url"],
allocation_id=data["allocation_id"],
service=data.get("service") or service_hint,
)
except (KeyError, ValueError) as exc:
raise RuntimeError(
f"cf-orch /api/inference/task returned malformed response: {exc}"
f"body: {resp.text[:200]}"
) from exc
try:
yield alloc
finally:
try:
httpx.delete(
f"{base}/api/services/{alloc.service}/allocations/{alloc.allocation_id}",
timeout=10.0,
)
except Exception as exc:
logger.debug("cf-orch task allocation release failed (non-fatal): %s", exc)

View file

@ -15,6 +15,7 @@ KIWI_BYOK_UNLOCKABLE: frozenset[str] = frozenset({
"recipe_suggestions", "recipe_suggestions",
"expiry_llm_matching", "expiry_llm_matching",
"receipt_ocr", "receipt_ocr",
"recipe_scan",
"style_classifier", "style_classifier",
"meal_plan_llm", "meal_plan_llm",
"meal_plan_llm_timing", "meal_plan_llm_timing",
@ -58,6 +59,9 @@ KIWI_FEATURES: dict[str, str] = {
"community_publish": "paid", # Publish plans/outcomes to community feed "community_publish": "paid", # Publish plans/outcomes to community feed
"community_fork_adapt": "paid", # Fork with LLM pantry adaptation (BYOK-unlockable) "community_fork_adapt": "paid", # Fork with LLM pantry adaptation (BYOK-unlockable)
# Paid tier (continued)
"recipe_scan": "paid", # BYOK-unlockable: photo -> structured recipe
# Premium tier # Premium tier
"multi_household": "premium", "multi_household": "premium",
"background_monitoring": "premium", "background_monitoring": "premium",

View file

@ -18,6 +18,10 @@ server {
proxy_set_header X-CF-Session $http_x_cf_session; proxy_set_header X-CF-Session $http_x_cf_session;
# Allow image uploads (barcode/receipt photos from phone cameras). # Allow image uploads (barcode/receipt photos from phone cameras).
client_max_body_size 20m; client_max_body_size 20m;
# LLM inference (recipe suggestions, expiry fallback) can take 60-120s.
# Default proxy_read_timeout is 60s which causes 504s on full recipe generation.
proxy_read_timeout 180s;
proxy_send_timeout 180s;
} }
# Direct-port LAN access (localhost:8515): when VITE_API_BASE='/kiwi', the frontend # Direct-port LAN access (localhost:8515): when VITE_API_BASE='/kiwi', the frontend
@ -34,6 +38,8 @@ server {
proxy_set_header X-Forwarded-Proto $http_x_forwarded_proto; proxy_set_header X-Forwarded-Proto $http_x_forwarded_proto;
proxy_set_header X-CF-Session $http_x_cf_session; proxy_set_header X-CF-Session $http_x_cf_session;
client_max_body_size 20m; client_max_body_size 20m;
proxy_read_timeout 180s;
proxy_send_timeout 180s;
} }
# When accessed directly (localhost:8515) instead of via Caddy (/kiwi path-strip), # When accessed directly (localhost:8515) instead of via Caddy (/kiwi path-strip),

View file

@ -2,8 +2,13 @@
<html lang="en"> <html lang="en">
<head> <head>
<meta charset="UTF-8" /> <meta charset="UTF-8" />
<link rel="icon" type="image/svg+xml" href="/vite.svg" /> <link rel="icon" type="image/png" sizes="192x192" href="/icons/icon-192.png" />
<link rel="apple-touch-icon" href="/icons/icon-192.png" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, viewport-fit=cover" /> <meta name="viewport" content="width=device-width, initial-scale=1.0, viewport-fit=cover" />
<meta name="theme-color" content="#e8a820" />
<meta name="apple-mobile-web-app-capable" content="yes" />
<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent" />
<meta name="apple-mobile-web-app-title" content="Kiwi" />
<title>Kiwi — Pantry Tracker</title> <title>Kiwi — Pantry Tracker</title>
<link rel="preconnect" href="https://fonts.googleapis.com" /> <link rel="preconnect" href="https://fonts.googleapis.com" />
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin /> <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin />

File diff suppressed because it is too large Load diff

View file

@ -20,6 +20,7 @@
"@vue/tsconfig": "^0.8.1", "@vue/tsconfig": "^0.8.1",
"typescript": "~5.9.3", "typescript": "~5.9.3",
"vite": "^7.1.7", "vite": "^7.1.7",
"vite-plugin-pwa": "^1.2.0",
"vue-tsc": "^3.1.0" "vue-tsc": "^3.1.0"
} }
} }

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.6 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.3 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.2 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.5 KiB

View file

@ -106,6 +106,39 @@
<span class="form-hint">How you appear on posts -- not your real name or email.</span> <span class="form-hint">How you appear on posts -- not your real name or email.</span>
</div> </div>
<!-- Similarity check results -->
<div
v-if="similarPosts.length > 0"
class="similar-panel"
role="region"
aria-label="Similar stories found"
>
<p class="similar-heading text-sm">
<strong>Similar stories already exist.</strong>
You can publish as-is, mark yours as a variation, or cancel.
</p>
<ul class="similar-list" aria-label="Existing similar posts">
<li
v-for="hit in similarPosts"
:key="hit.slug"
class="similar-item"
>
<span class="similar-tier-badge" :class="`tier-${hit.similarity_tier}`">
{{ tierLabel(hit.similarity_tier) }}
</span>
<span class="similar-title">{{ hit.title }}</span>
<span class="similar-by text-muted text-xs">by {{ hit.pseudonym }}</span>
<button
class="btn-link text-xs"
:class="{ 'selected-ref': selectedRef === hit.slug }"
@click="toggleRef(hit.slug)"
>
{{ selectedRef === hit.slug ? 'Unmark variation' : 'Mark as variation' }}
</button>
</li>
</ul>
</div>
<!-- Submission feedback (aria-live region, always rendered) --> <!-- Submission feedback (aria-live region, always rendered) -->
<div <div
class="feedback-region" class="feedback-region"
@ -119,13 +152,24 @@
<!-- Footer actions --> <!-- Footer actions -->
<div class="modal-footer flex gap-sm"> <div class="modal-footer flex gap-sm">
<button <button
v-if="!similarPosts.length || similarChecked"
class="btn btn-primary" class="btn btn-primary"
:disabled="submitting || !title.trim()" :disabled="submitting || !title.trim()"
:aria-busy="submitting" :aria-busy="submitting"
@click="onSubmit" @click="onSubmit"
> >
<span v-if="submitting" class="spinner spinner-sm" aria-hidden="true"></span> <span v-if="submitting" class="spinner spinner-sm" aria-hidden="true"></span>
{{ submitting ? 'Publishing...' : 'Publish' }} {{ submitting ? 'Publishing...' : (selectedRef ? 'Publish as variation' : 'Publish') }}
</button>
<button
v-else
class="btn btn-primary"
:disabled="checking || !title.trim()"
:aria-busy="checking"
@click="onCheckThenSubmit"
>
<span v-if="checking" class="spinner spinner-sm" aria-hidden="true"></span>
{{ checking ? 'Checking...' : 'Publish' }}
</button> </button>
<button class="btn btn-secondary" @click="$emit('close')"> <button class="btn btn-secondary" @click="$emit('close')">
Cancel Cancel
@ -139,7 +183,7 @@
<script setup lang="ts"> <script setup lang="ts">
import { ref, onMounted, onUnmounted, nextTick } from 'vue' import { ref, onMounted, onUnmounted, nextTick } from 'vue'
import { useCommunityStore } from '../stores/community' import { useCommunityStore } from '../stores/community'
import type { PublishPayload } from '../stores/community' import type { PublishPayload, SimilarPost, SimilarityTier } from '../stores/community'
const props = defineProps<{ const props = defineProps<{
recipeId: number | null recipeId: number | null
@ -162,6 +206,21 @@ const submitting = ref(false)
const submitError = ref<string | null>(null) const submitError = ref<string | null>(null)
const submitSuccess = ref<string | null>(null) const submitSuccess = ref<string | null>(null)
const checking = ref(false)
const similarChecked = ref(false)
const similarPosts = ref<SimilarPost[]>([])
const selectedRef = ref<string | null>(null)
function tierLabel(tier: SimilarityTier): string {
if (tier === 'exact_recipe') return 'Same recipe'
if (tier === 'very_similar') return 'Very similar'
return 'Similar'
}
function toggleRef(slug: string) {
selectedRef.value = selectedRef.value === slug ? null : slug
}
const dialogRef = ref<HTMLElement | null>(null) const dialogRef = ref<HTMLElement | null>(null)
const firstFocusRef = ref<HTMLButtonElement | null>(null) const firstFocusRef = ref<HTMLButtonElement | null>(null)
let previousFocus: HTMLElement | null = null let previousFocus: HTMLElement | null = null
@ -215,6 +274,17 @@ onUnmounted(() => {
previousFocus?.focus() previousFocus?.focus()
}) })
async function onCheckThenSubmit() {
if (!title.value.trim()) return
checking.value = true
similarPosts.value = await store.checkSimilar(title.value.trim(), props.recipeId, postType.value)
similarChecked.value = true
checking.value = false
if (!similarPosts.value.length) {
await onSubmit()
}
}
async function onSubmit() { async function onSubmit() {
submitError.value = null submitError.value = null
submitSuccess.value = null submitSuccess.value = null
@ -228,6 +298,7 @@ async function onSubmit() {
if (outcomeNotes.value.trim()) payload.outcome_notes = outcomeNotes.value.trim() if (outcomeNotes.value.trim()) payload.outcome_notes = outcomeNotes.value.trim()
if (pseudonymName.value.trim()) payload.pseudonym_name = pseudonymName.value.trim() if (pseudonymName.value.trim()) payload.pseudonym_name = pseudonymName.value.trim()
if (props.recipeId != null) payload.recipe_id = props.recipeId if (props.recipeId != null) payload.recipe_id = props.recipeId
if (selectedRef.value) payload.similar_to_ref = selectedRef.value
submitting.value = true submitting.value = true
try { try {
@ -349,6 +420,82 @@ async function onSubmit() {
flex-wrap: wrap; flex-wrap: wrap;
} }
.similar-panel {
background: var(--color-surface-alt, var(--color-surface));
border: 1px solid var(--color-warning, #f59e0b);
border-radius: var(--radius-md);
padding: var(--spacing-sm) var(--spacing-md);
margin-bottom: var(--spacing-md);
}
.similar-heading {
margin: 0 0 var(--spacing-sm);
}
.similar-list {
list-style: none;
margin: 0;
padding: 0;
display: flex;
flex-direction: column;
gap: var(--spacing-xs);
}
.similar-item {
display: flex;
align-items: baseline;
gap: var(--spacing-xs);
flex-wrap: wrap;
}
.similar-tier-badge {
font-size: var(--font-size-xs);
font-weight: 700;
padding: 1px 6px;
border-radius: var(--radius-sm);
flex-shrink: 0;
}
.tier-exact_recipe {
background: var(--color-error-bg, #fee2e2);
color: var(--color-error, #dc2626);
}
.tier-very_similar {
background: var(--color-warning-bg, #fef3c7);
color: var(--color-warning-text, #92400e);
}
.tier-somewhat_similar {
background: var(--color-surface-alt, #f3f4f6);
color: var(--color-text-secondary);
}
.similar-title {
font-weight: 600;
font-size: var(--font-size-sm);
}
.similar-by {
flex-shrink: 0;
}
.btn-link {
background: none;
border: none;
color: var(--color-primary);
cursor: pointer;
padding: 0;
text-decoration: underline;
font-size: var(--font-size-xs);
margin-left: auto;
}
.btn-link.selected-ref {
color: var(--color-success);
font-weight: 700;
}
@media (max-width: 480px) { @media (max-width: 480px) {
.modal-panel { .modal-panel {
max-height: 95vh; max-height: 95vh;

View file

@ -78,6 +78,39 @@
<span class="form-hint">How you appear on posts -- not your real name or email.</span> <span class="form-hint">How you appear on posts -- not your real name or email.</span>
</div> </div>
<!-- Similarity check results (shown before final confirm) -->
<div
v-if="similarPosts.length > 0"
class="similar-panel"
role="region"
aria-label="Similar posts found"
>
<p class="similar-heading text-sm">
<strong>Similar plans already exist.</strong>
You can publish as-is, mark yours as a variation, or cancel.
</p>
<ul class="similar-list" aria-label="Existing similar posts">
<li
v-for="hit in similarPosts"
:key="hit.slug"
class="similar-item"
>
<span class="similar-tier-badge" :class="`tier-${hit.similarity_tier}`">
{{ tierLabel(hit.similarity_tier) }}
</span>
<span class="similar-title">{{ hit.title }}</span>
<span class="similar-by text-muted text-xs">by {{ hit.pseudonym }}</span>
<button
class="btn-link text-xs"
:class="{ 'selected-ref': selectedRef === hit.slug }"
@click="toggleRef(hit.slug)"
>
{{ selectedRef === hit.slug ? 'Unmark variation' : 'Mark as variation' }}
</button>
</li>
</ul>
</div>
<!-- Submission feedback (aria-live region, always rendered) --> <!-- Submission feedback (aria-live region, always rendered) -->
<div <div
class="feedback-region" class="feedback-region"
@ -91,13 +124,24 @@
<!-- Footer actions --> <!-- Footer actions -->
<div class="modal-footer flex gap-sm"> <div class="modal-footer flex gap-sm">
<button <button
v-if="!similarPosts.length || similarChecked"
class="btn btn-primary" class="btn btn-primary"
:disabled="submitting || !title.trim()" :disabled="submitting || !title.trim()"
:aria-busy="submitting" :aria-busy="submitting"
@click="onSubmit" @click="onSubmit"
> >
<span v-if="submitting" class="spinner spinner-sm" aria-hidden="true"></span> <span v-if="submitting" class="spinner spinner-sm" aria-hidden="true"></span>
{{ submitting ? 'Publishing...' : 'Publish' }} {{ submitting ? 'Publishing...' : (selectedRef ? 'Publish as variation' : 'Publish') }}
</button>
<button
v-else
class="btn btn-primary"
:disabled="checking || !title.trim()"
:aria-busy="checking"
@click="onCheckThenSubmit"
>
<span v-if="checking" class="spinner spinner-sm" aria-hidden="true"></span>
{{ checking ? 'Checking...' : 'Publish' }}
</button> </button>
<button class="btn btn-secondary" @click="$emit('close')"> <button class="btn btn-secondary" @click="$emit('close')">
Cancel Cancel
@ -111,7 +155,7 @@
<script setup lang="ts"> <script setup lang="ts">
import { ref, onMounted, onUnmounted, nextTick } from 'vue' import { ref, onMounted, onUnmounted, nextTick } from 'vue'
import { useCommunityStore } from '../stores/community' import { useCommunityStore } from '../stores/community'
import type { PublishPayload } from '../stores/community' import type { PublishPayload, SimilarPost, SimilarityTier } from '../stores/community'
const props = defineProps<{ const props = defineProps<{
plan?: { plan?: {
@ -136,6 +180,21 @@ const submitting = ref(false)
const submitError = ref<string | null>(null) const submitError = ref<string | null>(null)
const submitSuccess = ref<string | null>(null) const submitSuccess = ref<string | null>(null)
const checking = ref(false)
const similarChecked = ref(false)
const similarPosts = ref<SimilarPost[]>([])
const selectedRef = ref<string | null>(null)
function tierLabel(tier: SimilarityTier): string {
if (tier === 'exact_recipe') return 'Same recipe'
if (tier === 'very_similar') return 'Very similar'
return 'Similar'
}
function toggleRef(slug: string) {
selectedRef.value = selectedRef.value === slug ? null : slug
}
const dialogRef = ref<HTMLElement | null>(null) const dialogRef = ref<HTMLElement | null>(null)
const firstFocusRef = ref<HTMLInputElement | null>(null) const firstFocusRef = ref<HTMLInputElement | null>(null)
let previousFocus: HTMLElement | null = null let previousFocus: HTMLElement | null = null
@ -189,6 +248,19 @@ onUnmounted(() => {
previousFocus?.focus() previousFocus?.focus()
}) })
async function onCheckThenSubmit() {
if (!title.value.trim()) return
checking.value = true
const planRecipeIds = props.plan?.slots?.map((s) => s.recipe_id) ?? []
const firstRecipeId = planRecipeIds[0] ?? null
similarPosts.value = await store.checkSimilar(title.value.trim(), firstRecipeId, 'plan')
similarChecked.value = true
checking.value = false
if (!similarPosts.value.length) {
await onSubmit()
}
}
async function onSubmit() { async function onSubmit() {
submitError.value = null submitError.value = null
submitSuccess.value = null submitSuccess.value = null
@ -205,6 +277,7 @@ async function onSubmit() {
if (props.plan?.slots?.length) { if (props.plan?.slots?.length) {
payload.slots = props.plan.slots.map(({ day, meal_type, recipe_id }) => ({ day, meal_type, recipe_id })) payload.slots = props.plan.slots.map(({ day, meal_type, recipe_id }) => ({ day, meal_type, recipe_id }))
} }
if (selectedRef.value) payload.similar_to_ref = selectedRef.value
submitting.value = true submitting.value = true
try { try {
@ -295,6 +368,82 @@ async function onSubmit() {
flex-wrap: wrap; flex-wrap: wrap;
} }
.similar-panel {
background: var(--color-surface-alt, var(--color-surface));
border: 1px solid var(--color-warning, #f59e0b);
border-radius: var(--radius-md);
padding: var(--spacing-sm) var(--spacing-md);
margin-bottom: var(--spacing-md);
}
.similar-heading {
margin: 0 0 var(--spacing-sm);
}
.similar-list {
list-style: none;
margin: 0;
padding: 0;
display: flex;
flex-direction: column;
gap: var(--spacing-xs);
}
.similar-item {
display: flex;
align-items: baseline;
gap: var(--spacing-xs);
flex-wrap: wrap;
}
.similar-tier-badge {
font-size: var(--font-size-xs);
font-weight: 700;
padding: 1px 6px;
border-radius: var(--radius-sm);
flex-shrink: 0;
}
.tier-exact_recipe {
background: var(--color-error-bg, #fee2e2);
color: var(--color-error, #dc2626);
}
.tier-very_similar {
background: var(--color-warning-bg, #fef3c7);
color: var(--color-warning-text, #92400e);
}
.tier-somewhat_similar {
background: var(--color-surface-alt, #f3f4f6);
color: var(--color-text-secondary);
}
.similar-title {
font-weight: 600;
font-size: var(--font-size-sm);
}
.similar-by {
flex-shrink: 0;
}
.btn-link {
background: none;
border: none;
color: var(--color-primary);
cursor: pointer;
padding: 0;
text-decoration: underline;
font-size: var(--font-size-xs);
margin-left: auto;
}
.btn-link.selected-ref {
color: var(--color-success);
font-weight: 700;
}
@media (max-width: 480px) { @media (max-width: 480px) {
.modal-panel { .modal-panel {
max-height: 95vh; max-height: 95vh;

View file

@ -6,6 +6,7 @@
v-for="domain in domains" v-for="domain in domains"
:key="domain.id" :key="domain.id"
:class="['btn', activeDomain === domain.id ? 'btn-primary' : 'btn-secondary']" :class="['btn', activeDomain === domain.id ? 'btn-primary' : 'btn-secondary']"
:aria-pressed="activeDomain === domain.id"
@click="selectDomain(domain.id)" @click="selectDomain(domain.id)"
> >
{{ domain.label }} {{ domain.label }}
@ -24,6 +25,7 @@
<div v-else class="category-list mb-sm flex flex-wrap gap-xs"> <div v-else class="category-list mb-sm flex flex-wrap gap-xs">
<button <button
:class="['btn', 'btn-secondary', 'cat-btn', { active: activeCategory === '_all' }]" :class="['btn', 'btn-secondary', 'cat-btn', { active: activeCategory === '_all' }]"
:aria-pressed="activeCategory === '_all'"
@click="selectCategory('_all')" @click="selectCategory('_all')"
> >
All All
@ -32,6 +34,7 @@
v-for="cat in categories" v-for="cat in categories"
:key="cat.category" :key="cat.category"
:class="['btn', 'btn-secondary', 'cat-btn', { active: activeCategory === cat.category }]" :class="['btn', 'btn-secondary', 'cat-btn', { active: activeCategory === cat.category }]"
:aria-pressed="activeCategory === cat.category"
@click="selectCategory(cat.category)" @click="selectCategory(cat.category)"
> >
{{ cat.category }} {{ cat.category }}
@ -57,6 +60,7 @@
<template v-else> <template v-else>
<button <button
:class="['btn', 'btn-secondary', 'subcat-btn', { active: activeSubcategory === null }]" :class="['btn', 'btn-secondary', 'subcat-btn', { active: activeSubcategory === null }]"
:aria-pressed="activeSubcategory === null"
@click="selectSubcategory(null)" @click="selectSubcategory(null)"
> >
All {{ activeCategory }} All {{ activeCategory }}
@ -65,6 +69,7 @@
v-for="sub in subcategories" v-for="sub in subcategories"
:key="sub.subcategory" :key="sub.subcategory"
:class="['btn', 'btn-secondary', 'subcat-btn', { active: activeSubcategory === sub.subcategory }]" :class="['btn', 'btn-secondary', 'subcat-btn', { active: activeSubcategory === sub.subcategory }]"
:aria-pressed="activeSubcategory === sub.subcategory"
@click="selectSubcategory(sub.subcategory)" @click="selectSubcategory(sub.subcategory)"
> >
{{ sub.subcategory }} {{ sub.subcategory }}
@ -79,6 +84,25 @@
</template> </template>
</div> </div>
<!-- Browse breadcrumb shows current position in domain > category > subcategory hierarchy -->
<nav v-if="activeDomain && activeCategory" class="browse-breadcrumb" aria-label="Browse location">
<button
class="crumb-btn"
@click="selectDomain(activeDomain)"
:aria-current="!activeCategory ? 'page' : undefined"
>{{ domains.find(d => d.id === activeDomain)?.label ?? activeDomain }}</button>
<span class="crumb-sep" aria-hidden="true"></span>
<button
class="crumb-btn"
@click="selectCategory(activeCategory)"
:aria-current="!activeSubcategory ? 'page' : undefined"
>{{ activeCategory === '_all' ? 'All' : activeCategory }}</button>
<template v-if="activeSubcategory">
<span class="crumb-sep" aria-hidden="true"></span>
<span class="crumb-current" aria-current="page">{{ activeSubcategory }}</span>
</template>
</nav>
<!-- Recipe grid --> <!-- Recipe grid -->
<template v-if="activeCategory"> <template v-if="activeCategory">
<div v-if="loadingRecipes" class="text-secondary text-sm">Loading recipes</div> <div v-if="loadingRecipes" class="text-secondary text-sm">Loading recipes</div>
@ -93,24 +117,37 @@
placeholder="Filter by title…" placeholder="Filter by title…"
class="browser-search" class="browser-search"
/> />
<input
v-model="requiredIngredient"
@keyup.enter="onRequiredIngredientCommit"
@search="onRequiredIngredientCommit"
type="search"
placeholder="Must include ingredient… (Enter)"
class="browser-search"
title="Type an ingredient and press Enter to filter"
/>
<div class="sort-btns flex gap-xs"> <div class="sort-btns flex gap-xs">
<button <button
:class="['btn', 'btn-secondary', 'sort-btn', { active: sortOrder === 'default' }]" :class="['btn', 'btn-secondary', 'sort-btn', { active: sortOrder === 'default' }]"
:aria-pressed="sortOrder === 'default'"
@click="setSort('default')" @click="setSort('default')"
title="Corpus order" title="Corpus order"
>Default</button> >Default</button>
<button <button
:class="['btn', 'btn-secondary', 'sort-btn', { active: sortOrder === 'alpha' }]" :class="['btn', 'btn-secondary', 'sort-btn', { active: sortOrder === 'alpha' }]"
:aria-pressed="sortOrder === 'alpha'"
@click="setSort('alpha')" @click="setSort('alpha')"
title="Alphabetical A→Z" title="Alphabetical A→Z"
>AZ</button> >AZ</button>
<button <button
:class="['btn', 'btn-secondary', 'sort-btn', { active: sortOrder === 'alpha_desc' }]" :class="['btn', 'btn-secondary', 'sort-btn', { active: sortOrder === 'alpha_desc' }]"
:aria-pressed="sortOrder === 'alpha_desc'"
@click="setSort('alpha_desc')" @click="setSort('alpha_desc')"
title="Alphabetical Z→A" title="Alphabetical Z→A"
>ZA</button> >ZA</button>
<button <button
:class="['btn', 'btn-secondary', 'sort-btn', { active: sortOrder === 'match' }]" :class="['btn', 'btn-secondary', 'sort-btn', { active: sortOrder === 'match' }]"
:aria-pressed="sortOrder === 'match'"
:disabled="pantryCount === 0" :disabled="pantryCount === 0"
@click="setSort('match')" @click="setSort('match')"
:title="pantryCount > 0 ? 'Sort by pantry match %' : 'Add items to pantry to sort by match'" :title="pantryCount > 0 ? 'Sort by pantry match %' : 'Add items to pantry to sort by match'"
@ -119,20 +156,27 @@
</div> </div>
<div class="results-header flex-between mb-sm"> <div class="results-header flex-between mb-sm">
<span class="text-sm text-secondary"> <span
class="text-sm text-secondary"
aria-live="polite"
aria-atomic="true"
>
{{ total }} recipes {{ total }} recipes
<span v-if="pantryCount > 0"> pantry match shown</span> <span v-if="pantryCount > 0"> pantry match shown</span>
<span v-if="requiredIngredient.trim()"> must include "{{ requiredIngredient.trim() }}"</span>
</span> </span>
<div class="pagination flex gap-xs"> <div class="pagination flex gap-xs">
<button <button
class="btn btn-secondary btn-xs" class="btn btn-secondary btn-xs"
:disabled="page <= 1" :disabled="page <= 1"
aria-label="Previous page"
@click="changePage(page - 1)" @click="changePage(page - 1)"
> Prev</button> > Prev</button>
<span class="text-sm text-secondary page-indicator">{{ page }} / {{ totalPages }}</span> <span class="text-sm text-secondary page-indicator" aria-live="polite">{{ page }} / {{ totalPages }}</span>
<button <button
class="btn btn-secondary btn-xs" class="btn btn-secondary btn-xs"
:disabled="page >= totalPages" :disabled="page >= totalPages"
aria-label="Next page"
@click="changePage(page + 1)" @click="changePage(page + 1)"
>Next </button> >Next </button>
</div> </div>
@ -310,6 +354,7 @@ const loadingDomains = ref(false)
const loadingRecipes = ref(false) const loadingRecipes = ref(false)
const savingRecipe = ref<BrowserRecipe | null>(null) const savingRecipe = ref<BrowserRecipe | null>(null)
const searchQuery = ref('') const searchQuery = ref('')
const requiredIngredient = ref('')
const sortOrder = ref<'default' | 'alpha' | 'alpha_desc' | 'match'>('default') const sortOrder = ref<'default' | 'alpha' | 'alpha_desc' | 'match'>('default')
let searchDebounce: ReturnType<typeof setTimeout> | null = null let searchDebounce: ReturnType<typeof setTimeout> | null = null
let tagSearchDebounce: ReturnType<typeof setTimeout> | null = null let tagSearchDebounce: ReturnType<typeof setTimeout> | null = null
@ -386,6 +431,19 @@ function onSearchInput() {
}, 350) }, 350)
} }
function onRequiredIngredientCommit() {
page.value = 1
loadRecipes()
}
// Auto-clear results when the field is emptied via backspace/select-delete
watch(requiredIngredient, (val, prev) => {
if (val === '' && prev !== '') {
page.value = 1
loadRecipes()
}
})
function setSort(s: 'default' | 'alpha' | 'alpha_desc' | 'match') { function setSort(s: 'default' | 'alpha' | 'alpha_desc' | 'match') {
if (sortOrder.value === s) return if (sortOrder.value === s) return
sortOrder.value = s sortOrder.value = s
@ -410,6 +468,7 @@ async function selectDomain(domainId: string) {
total.value = 0 total.value = 0
page.value = 1 page.value = 1
searchQuery.value = '' searchQuery.value = ''
requiredIngredient.value = ''
sortOrder.value = 'default' sortOrder.value = 'default'
categories.value = await browserAPI.listCategories(domainId) categories.value = await browserAPI.listCategories(domainId)
// Auto-select the most-populated category so content appears immediately. // Auto-select the most-populated category so content appears immediately.
@ -476,6 +535,7 @@ async function loadRecipes() {
subcategory: activeSubcategory.value ?? undefined, subcategory: activeSubcategory.value ?? undefined,
q: searchQuery.value.trim() || undefined, q: searchQuery.value.trim() || undefined,
sort: sortOrder.value !== 'default' ? sortOrder.value : undefined, sort: sortOrder.value !== 'default' ? sortOrder.value : undefined,
required_ingredient: requiredIngredient.value.trim() || undefined,
} }
) )
recipes.value = result.recipes recipes.value = result.recipes
@ -527,8 +587,10 @@ function onTagSearchInput() {
tagSearchDebounce = setTimeout(async () => { tagSearchDebounce = setTimeout(async () => {
tagModal.value.searching = true tagModal.value.searching = true
try { try {
// Re-use the browser API: browse all recipes filtered by title substring // Use the first available domain with category=_all to search all recipes by title.
const res = await browserAPI.browse('_all', '_all', { page: 1, q }) // Domain must be a real domain slug '_all' is not valid at the browse endpoint.
const searchDomain = domains.value[0]?.id ?? 'cuisine'
const res = await browserAPI.browse(searchDomain, '_all', { page: 1, q })
tagModal.value.results = (res.recipes ?? []).slice(0, 8).map( tagModal.value.results = (res.recipes ?? []).slice(0, 8).map(
(r: { id: number; title: string }) => ({ id: r.id, title: r.title }) (r: { id: number; title: string }) => ({ id: r.id, title: r.title })
) )
@ -826,4 +888,40 @@ async function submitTag() {
font-size: 0.875rem; font-size: 0.875rem;
margin-left: 0.5rem; margin-left: 0.5rem;
} }
/* ── Browse breadcrumb ───────────────────────────────────────────────────── */
.browse-breadcrumb {
display: flex;
align-items: center;
flex-wrap: wrap;
gap: 2px;
margin-bottom: var(--spacing-sm);
font-size: var(--font-size-xs, 0.78rem);
color: var(--color-text-secondary);
}
.crumb-btn {
background: none;
border: none;
padding: 2px 4px;
cursor: pointer;
color: var(--color-primary);
font-size: inherit;
border-radius: var(--radius-sm);
}
.crumb-btn:hover {
text-decoration: underline;
}
.crumb-sep {
opacity: 0.5;
padding: 0 2px;
}
.crumb-current {
padding: 2px 4px;
color: var(--color-text);
font-weight: 500;
}
</style> </style>

View file

@ -225,6 +225,23 @@
</ol> </ol>
</details> </details>
<!-- Community tags accepted location tags from other users -->
<div v-if="communityTags.length > 0" class="detail-section community-tags-section">
<h3 class="section-label">Community categories</h3>
<div class="community-tags-list">
<span
v-for="tag in communityTags"
:key="tag.id"
class="community-tag-chip"
:class="{ 'community-tag-chip--accepted': tag.accepted }"
:title="tag.accepted ? 'Confirmed by the community' : 'Pending confirmation'"
>
{{ tag.domain }} {{ tag.category }}<template v-if="tag.subcategory"> {{ tag.subcategory }}</template>
<span v-if="tag.accepted" class="community-tag-check" aria-label="Confirmed"></span>
</span>
</div>
</div>
<!-- Bottom padding so last step isn't hidden behind sticky footer --> <!-- Bottom padding so last step isn't hidden behind sticky footer -->
<div style="height: var(--spacing-xl)" /> <div style="height: var(--spacing-xl)" />
</div> </div>
@ -276,6 +293,31 @@
<span class="cook-success-icon"></span> <span class="cook-success-icon"></span>
Enjoy your meal! Recipe dismissed from suggestions. Enjoy your meal! Recipe dismissed from suggestions.
<button class="btn btn-secondary btn-sm mt-xs" @click="$emit('close')">Close</button> <button class="btn btn-secondary btn-sm mt-xs" @click="$emit('close')">Close</button>
<!-- Leftover shelf-life section -->
<div v-if="leftoversLoading" class="leftovers-panel text-sm text-secondary mt-sm">
Working out storage info
</div>
<div v-else-if="leftovers && !leftoversDismissed" class="leftovers-panel mt-sm">
<div class="leftovers-header flex-between">
<span class="text-sm font-semibold">Leftovers</span>
<button class="btn-icon btn-xs" @click="leftoversDismissed = true" aria-label="Dismiss storage info"></button>
</div>
<div class="leftovers-grid mt-xs">
<div class="leftovers-cell">
<span class="leftovers-icon"></span>
<span class="text-sm">Fridge: <strong>{{ leftovers.fridge_days }} day{{ leftovers.fridge_days !== 1 ? 's' : '' }}</strong></span>
</div>
<div v-if="leftovers.freeze_days !== null" class="leftovers-cell">
<span class="leftovers-icon">🧊</span>
<span class="text-sm">Freezer: <strong>{{ leftovers.freeze_days }} day{{ leftovers.freeze_days !== 1 ? 's' : '' }}</strong></span>
</div>
</div>
<p v-if="leftovers.freeze_by_day" class="text-xs text-secondary mt-xs">
Freeze by day {{ leftovers.freeze_by_day }} for best results.
</p>
<p class="text-xs text-secondary mt-xs">{{ leftovers.storage_advice }}</p>
</div>
</div> </div>
<template v-else> <template v-else>
<button class="btn btn-secondary" @click="$emit('close')">Back</button> <button class="btn btn-secondary" @click="$emit('close')">Back</button>
@ -329,7 +371,7 @@
import { ref, computed, onMounted, onUnmounted, nextTick } from 'vue' import { ref, computed, onMounted, onUnmounted, nextTick } from 'vue'
import { useRecipesStore } from '../stores/recipes' import { useRecipesStore } from '../stores/recipes'
import { useSavedRecipesStore } from '../stores/savedRecipes' import { useSavedRecipesStore } from '../stores/savedRecipes'
import { inventoryAPI } from '../services/api' import { inventoryAPI, recipesAPI, browserAPI } from '../services/api'
import type { RecipeSuggestion, GroceryLink, StepAnalysis } from '../services/api' import type { RecipeSuggestion, GroceryLink, StepAnalysis } from '../services/api'
import SaveRecipeModal from './SaveRecipeModal.vue' import SaveRecipeModal from './SaveRecipeModal.vue'
@ -361,6 +403,12 @@ onMounted(() => {
) )
;(focusable ?? dialogRef.value)?.focus() ;(focusable ?? dialogRef.value)?.focus()
}) })
// Load community tags in the background non-critical, silently skip on error
browserAPI.listRecipeTags(props.recipe.id).then((tags) => {
communityTags.value = tags
}).catch(() => {
// Community tags are supplemental; silently skip on error
})
}) })
onUnmounted(() => { onUnmounted(() => {
@ -386,6 +434,16 @@ const isSaved = computed(() => savedStore.isSaved(props.recipe.id))
const cookDone = ref(false) const cookDone = ref(false)
// Community tags
type CommunityTag = { id: number; domain: string; category: string; subcategory: string | null; pseudonym: string; upvotes: number; accepted: boolean }
const communityTags = ref<CommunityTag[]>([])
// Leftover shelf-life
type LeftoversData = { fridge_days: number; freeze_days: number | null; freeze_by_day: number | null; storage_advice: string }
const leftovers = ref<LeftoversData | null>(null)
const leftoversLoading = ref(false)
const leftoversDismissed = ref(false)
// Cook mode // Cook mode
const cookModeActive = ref(false) const cookModeActive = ref(false)
const cookStep = ref(0) // 0-indexed const cookStep = ref(0) // 0-indexed
@ -622,10 +680,20 @@ function groceryLinkFor(ingredient: string): GroceryLink | undefined {
return props.groceryLinks.find((l) => l.ingredient.toLowerCase() === needle) return props.groceryLinks.find((l) => l.ingredient.toLowerCase() === needle)
} }
function handleCook() { async function handleCook() {
recipesStore.logCook(props.recipe.id, props.recipe.title) recipesStore.logCook(props.recipe.id, props.recipe.title)
cookDone.value = true cookDone.value = true
emit('cooked', props.recipe) emit('cooked', props.recipe)
if (props.recipe.id) {
leftoversLoading.value = true
try {
leftovers.value = await recipesAPI.getLeftovers(props.recipe.id)
} catch {
// Silently skip shelf life is supplemental info, not critical
} finally {
leftoversLoading.value = false
}
}
} }
</script> </script>
@ -1558,4 +1626,68 @@ details[open].steps-collapsible .steps-collapsible-summary::before {
padding: var(--spacing-xs) var(--spacing-sm); padding: var(--spacing-xs) var(--spacing-sm);
font-size: var(--font-size-sm); font-size: var(--font-size-sm);
} }
.leftovers-panel {
background: var(--color-surface-alt, var(--color-surface));
border: 1px solid var(--color-border);
border-radius: var(--radius-md);
padding: var(--spacing-sm);
text-align: left;
}
.leftovers-header {
align-items: center;
}
.leftovers-grid {
display: flex;
gap: var(--spacing-md);
flex-wrap: wrap;
}
.leftovers-cell {
display: flex;
align-items: center;
gap: var(--spacing-xs);
}
.leftovers-icon {
font-size: 1rem;
line-height: 1;
}
/* ── Community tags section ──────────────────────────────── */
.community-tags-section {
padding-top: var(--spacing-sm);
}
.community-tags-list {
display: flex;
flex-wrap: wrap;
gap: var(--spacing-xs);
}
.community-tag-chip {
display: inline-flex;
align-items: center;
gap: 0.25rem;
padding: 2px var(--spacing-sm);
border-radius: var(--radius-pill, 999px);
font-size: var(--font-size-xs, 0.72rem);
background: var(--color-bg-secondary);
color: var(--color-text-secondary);
border: 1px solid var(--color-border);
white-space: nowrap;
}
.community-tag-chip--accepted {
background: rgba(124, 111, 205, 0.12);
color: var(--color-accent, #7c6fcd);
border-color: rgba(124, 111, 205, 0.3);
}
.community-tag-check {
font-size: 0.65rem;
opacity: 0.8;
}
</style> </style>

View file

@ -0,0 +1,849 @@
<template>
<div class="modal-overlay" @click.self="close" role="dialog" aria-modal="true" :aria-labelledby="titleId">
<div class="modal-panel scan-modal">
<!-- Header -->
<div class="modal-header">
<h2 :id="titleId" class="modal-title">
<span v-if="phase === 'upload'">Scan a Recipe</span>
<span v-else-if="phase === 'processing'">Scanning...</span>
<span v-else>Review Recipe</span>
</h2>
<button class="btn-icon close-btn" @click="close" aria-label="Close">
<svg width="18" height="18" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
<line x1="18" y1="6" x2="6" y2="18"/><line x1="6" y1="6" x2="18" y2="18"/>
</svg>
</button>
</div>
<!-- Upload phase -->
<div v-if="phase === 'upload'" class="modal-body">
<p class="hint-text">
Photograph a recipe card, cookbook page, or handwritten note.
For multi-page recipes (ingredients on one page, directions on another)
select both photos together up to 4 images.
</p>
<!-- Drop zone -->
<div
class="drop-zone"
:class="{ 'drop-zone-active': isDragging, 'has-files': selectedFiles.length > 0 }"
@dragover.prevent="isDragging = true"
@dragleave="isDragging = false"
@drop.prevent="onDrop"
@click="fileInput?.click()"
role="button"
tabindex="0"
@keydown.enter.space="fileInput?.click()"
aria-label="Click or drop photos here"
>
<input
ref="fileInput"
type="file"
accept="image/jpeg,image/jpg,image/png,image/webp,image/heic,image/heif"
multiple
class="hidden-input"
@change="onFileChange"
/>
<div v-if="selectedFiles.length === 0" class="drop-zone-empty">
<svg width="40" height="40" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="1.5" class="camera-icon">
<path d="M23 19a2 2 0 0 1-2 2H3a2 2 0 0 1-2-2V8a2 2 0 0 1 2-2h4l2-3h6l2 3h4a2 2 0 0 1 2 2z"/>
<circle cx="12" cy="13" r="4"/>
</svg>
<p class="drop-zone-label">Tap or drop photos here</p>
<p class="drop-zone-sub">JPEG, PNG, WebP, HEIC up to 4 photos</p>
</div>
<div v-else class="file-preview-grid">
<div
v-for="(_file, i) in selectedFiles"
:key="i"
class="file-preview-item"
>
<img :src="previewUrls[i]" :alt="`Photo ${i + 1}`" class="preview-img" />
<button
class="remove-file-btn"
@click.stop="removeFile(i)"
:aria-label="`Remove photo ${i + 1}`"
>
<svg width="12" height="12" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="3">
<line x1="18" y1="6" x2="6" y2="18"/><line x1="6" y1="6" x2="18" y2="18"/>
</svg>
</button>
<p class="preview-label">Page {{ i + 1 }}</p>
</div>
<div
v-if="selectedFiles.length < 4"
class="file-preview-add"
@click.stop="fileInput?.click()"
role="button"
tabindex="0"
@keydown.enter.space.stop="fileInput?.click()"
aria-label="Add another photo"
>
<svg width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
<line x1="12" y1="5" x2="12" y2="19"/><line x1="5" y1="12" x2="19" y2="12"/>
</svg>
</div>
</div>
</div>
<div v-if="uploadError" class="status-badge status-error mt-sm" role="alert">
{{ uploadError }}
</div>
<div class="modal-footer">
<button class="btn btn-secondary" @click="close">Cancel</button>
<button
class="btn btn-primary"
:disabled="selectedFiles.length === 0"
@click="startScan"
>
Scan Recipe
</button>
</div>
</div>
<!-- Processing phase -->
<div v-else-if="phase === 'processing'" class="modal-body processing-body">
<div class="scan-spinner" aria-live="polite" aria-label="Scanning recipe">
<svg class="spin-icon" width="48" height="48" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="1.5">
<path d="M23 19a2 2 0 0 1-2 2H3a2 2 0 0 1-2-2V8a2 2 0 0 1 2-2h4l2-3h6l2 3h4a2 2 0 0 1 2 2z"/>
<circle cx="12" cy="13" r="4"/>
</svg>
<p class="processing-label">{{ scanStatusMessage }}</p>
<p class="processing-sub">This can take up to a minute on first use.</p>
</div>
</div>
<!-- Review phase -->
<div v-else-if="phase === 'review' && extracted" class="modal-body review-body">
<!-- Confidence banner -->
<div
v-if="extracted.confidence !== 'high' || extracted.warnings.length > 0"
:class="['status-badge', extracted.confidence === 'low' ? 'status-warning' : 'status-info', 'mb-sm']"
role="status"
>
<span v-if="extracted.confidence === 'low'">Low confidence scan handwritten or degraded text. Please review carefully.</span>
<span v-else>Medium confidence. Check the fields below.</span>
<ul v-if="extracted.warnings.length > 0" class="warning-list">
<li v-for="w in extracted.warnings" :key="w">{{ w }}</li>
</ul>
</div>
<!-- Pantry match badge -->
<div v-if="extracted.ingredients.length > 0" class="pantry-match-row mb-sm">
<span class="pantry-badge" :class="pantryMatchClass">
{{ extracted.pantry_match_pct }}% pantry match
({{ pantryCount }} of {{ extracted.ingredients.length }} ingredients on hand)
</span>
</div>
<!-- Editable fields -->
<div class="review-form">
<div class="form-group">
<label class="form-label" for="scan-title">Recipe name</label>
<input
id="scan-title"
v-model="editTitle"
class="form-input"
type="text"
placeholder="Recipe name"
required
/>
</div>
<div class="form-row-2">
<div class="form-group">
<label class="form-label" for="scan-servings">Servings</label>
<input id="scan-servings" v-model="editServings" class="form-input" type="text" placeholder="e.g. 2" />
</div>
<div class="form-group">
<label class="form-label" for="scan-cooktime">Cook time</label>
<input id="scan-cooktime" v-model="editCookTime" class="form-input" type="text" placeholder="e.g. 25 min" />
</div>
</div>
<!-- Ingredients -->
<div class="form-group">
<label class="form-label">Ingredients</label>
<div class="ingredient-list">
<div
v-for="(ingr, i) in editIngredients"
:key="i"
:class="['ingredient-row', ingr.in_pantry ? 'in-pantry' : '']"
>
<span v-if="ingr.in_pantry" class="pantry-dot" title="In your pantry" aria-label="In pantry"></span>
<input
v-model="ingr.qty"
class="form-input ingr-qty"
type="text"
placeholder="qty"
:aria-label="`Ingredient ${i + 1} quantity`"
/>
<input
v-model="ingr.unit"
class="form-input ingr-unit"
type="text"
placeholder="unit"
:aria-label="`Ingredient ${i + 1} unit`"
/>
<input
v-model="ingr.name"
class="form-input ingr-name"
type="text"
placeholder="ingredient"
:aria-label="`Ingredient ${i + 1} name`"
/>
<button
class="btn-icon remove-ingr-btn"
@click="removeIngredient(i)"
:aria-label="`Remove ingredient ${i + 1}`"
>
<svg width="14" height="14" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2.5">
<line x1="18" y1="6" x2="6" y2="18"/><line x1="6" y1="6" x2="18" y2="18"/>
</svg>
</button>
</div>
</div>
<button class="btn btn-ghost btn-sm mt-xs" @click="addIngredient">+ Add ingredient</button>
</div>
<!-- Steps -->
<div class="form-group">
<label class="form-label">Steps</label>
<div class="step-list">
<div v-for="(_step, i) in editSteps" :key="i" class="step-row">
<span class="step-num">{{ i + 1 }}</span>
<textarea
v-model="editSteps[i]"
class="form-input step-textarea"
rows="2"
:aria-label="`Step ${i + 1}`"
></textarea>
<button
class="btn-icon remove-step-btn"
@click="removeStep(i)"
:aria-label="`Remove step ${i + 1}`"
>
<svg width="14" height="14" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2.5">
<line x1="18" y1="6" x2="6" y2="18"/><line x1="6" y1="6" x2="18" y2="18"/>
</svg>
</button>
</div>
</div>
<button class="btn btn-ghost btn-sm mt-xs" @click="addStep">+ Add step</button>
</div>
<!-- Notes (optional) -->
<div class="form-group">
<label class="form-label" for="scan-notes">Notes <span class="optional-label">(optional)</span></label>
<textarea id="scan-notes" v-model="editNotes" class="form-input" rows="2" placeholder="Tips, variations, storage..."></textarea>
</div>
<!-- Source attribution -->
<div v-if="extracted.source_note" class="source-note">
Source: {{ extracted.source_note }}
</div>
</div>
<div v-if="saveError" class="status-badge status-error mt-sm" role="alert">
{{ saveError }}
</div>
<div class="modal-footer">
<button class="btn btn-secondary" @click="phase = 'upload'">Re-scan</button>
<button
class="btn btn-primary"
:disabled="!editTitle.trim() || saving"
@click="save"
>
{{ saving ? 'Saving...' : 'Save Recipe' }}
</button>
</div>
</div>
</div>
</div>
</template>
<script setup lang="ts">
import { ref, computed, onBeforeUnmount } from 'vue'
import { type ScannedRecipe, type ScannedIngredient, recipeScanAPI } from '@/services/api'
type Phase = 'upload' | 'processing' | 'review'
const emit = defineEmits<{
(e: 'close'): void
(e: 'saved', recipe: { id: number; title: string }): void
}>()
const titleId = 'scan-modal-title'
// Upload state
const phase = ref<Phase>('upload')
const fileInput = ref<HTMLInputElement | null>(null)
const selectedFiles = ref<File[]>([])
const previewUrls = ref<string[]>([])
const isDragging = ref(false)
const uploadError = ref('')
function onDrop(e: DragEvent) {
isDragging.value = false
const dt = e.dataTransfer
if (!dt) return
addFiles(Array.from(dt.files))
}
function onFileChange(e: Event) {
const input = e.target as HTMLInputElement
if (!input.files) return
addFiles(Array.from(input.files))
// Reset so the same file can be re-selected after removal
input.value = ''
}
function addFiles(incoming: File[]) {
uploadError.value = ''
const combined = [...selectedFiles.value, ...incoming]
if (combined.length > 4) {
uploadError.value = 'Maximum 4 photos per scan.'
return
}
// Revoke old preview URLs before replacing
previewUrls.value.forEach((url) => URL.revokeObjectURL(url))
selectedFiles.value = combined
previewUrls.value = combined.map((f) => URL.createObjectURL(f))
}
function removeFile(index: number) {
URL.revokeObjectURL(previewUrls.value[index] ?? '')
selectedFiles.value = selectedFiles.value.filter((_, i) => i !== index)
previewUrls.value = previewUrls.value.filter((_, i) => i !== index)
}
// Scan
const extracted = ref<ScannedRecipe | null>(null)
const scanStatusMessage = ref('Uploading photos...')
async function startScan() {
if (selectedFiles.value.length === 0) return
uploadError.value = ''
scanStatusMessage.value = 'Uploading photos...'
phase.value = 'processing'
try {
const result = await recipeScanAPI.scanStream(
selectedFiles.value,
(_status: string, message: string) => { scanStatusMessage.value = message },
)
extracted.value = result
initEditState(result)
phase.value = 'review'
} catch (err: unknown) {
const msg = err instanceof Error ? err.message : String(err)
uploadError.value = msg.includes('not appear to contain a recipe')
? 'This photo does not look like a recipe. Please try a different photo.'
: msg.includes('No vision backend')
? 'Recipe scanning is not available right now. Check your BYOK settings.'
: `Scan failed: ${msg}`
phase.value = 'upload'
}
}
// Review/edit state
const editTitle = ref('')
const editServings = ref('')
const editCookTime = ref('')
const editIngredients = ref<ScannedIngredient[]>([])
const editSteps = ref<string[]>([])
const editNotes = ref('')
function initEditState(r: ScannedRecipe) {
editTitle.value = r.title ?? ''
editServings.value = r.servings ?? ''
editCookTime.value = r.cook_time ?? ''
editIngredients.value = r.ingredients.map((i) => ({ ...i }))
editSteps.value = [...r.steps]
editNotes.value = r.notes ?? ''
}
function removeIngredient(i: number) {
editIngredients.value = editIngredients.value.filter((_, idx) => idx !== i)
}
function addIngredient() {
editIngredients.value = [...editIngredients.value, { name: '', qty: null, unit: null, raw: null, in_pantry: false }]
}
function removeStep(i: number) {
editSteps.value = editSteps.value.filter((_, idx) => idx !== i)
}
function addStep() {
editSteps.value = [...editSteps.value, '']
}
// Pantry match display
const pantryCount = computed(() =>
editIngredients.value.filter((i) => i.in_pantry).length
)
const pantryMatchClass = computed(() => {
const pct = extracted.value?.pantry_match_pct ?? 0
if (pct >= 80) return 'pantry-high'
if (pct >= 50) return 'pantry-mid'
return 'pantry-low'
})
// Save
const saving = ref(false)
const saveError = ref('')
async function save() {
if (!editTitle.value.trim()) return
saving.value = true
saveError.value = ''
try {
const payload = {
title: editTitle.value.trim(),
subtitle: extracted.value?.subtitle ?? null,
servings: editServings.value || null,
cook_time: editCookTime.value || null,
source_note: extracted.value?.source_note ?? null,
ingredients: editIngredients.value.filter((i) => i.name.trim()),
steps: editSteps.value.filter((s) => s.trim()),
notes: editNotes.value.trim() || null,
tags: extracted.value?.tags ?? [],
source: 'scan' as const,
}
const saved = await recipeScanAPI.saveScanned(payload)
emit('saved', { id: saved.id, title: saved.title })
close()
} catch (err: unknown) {
saveError.value = err instanceof Error ? err.message : 'Failed to save recipe.'
} finally {
saving.value = false
}
}
// Cleanup
function close() {
previewUrls.value.forEach((url) => URL.revokeObjectURL(url))
emit('close')
}
onBeforeUnmount(() => {
previewUrls.value.forEach((url) => URL.revokeObjectURL(url))
})
</script>
<style scoped>
.modal-overlay {
position: fixed;
inset: 0;
background: rgba(0, 0, 0, 0.5);
display: flex;
align-items: center;
justify-content: center;
z-index: var(--z-modal, 1000);
padding: var(--spacing-md);
}
.modal-panel {
background: var(--bg-card, #fff);
border-radius: var(--radius-lg, 12px);
box-shadow: var(--shadow-xl, 0 20px 60px rgba(0,0,0,0.2));
width: 100%;
max-width: 560px;
max-height: 90vh;
display: flex;
flex-direction: column;
overflow: hidden;
}
.modal-header {
display: flex;
align-items: center;
justify-content: space-between;
padding: var(--spacing-md) var(--spacing-lg);
border-bottom: 1px solid var(--border-color, #e5e7eb);
flex-shrink: 0;
}
.modal-title {
font-size: var(--font-lg, 1.125rem);
font-weight: 600;
color: var(--text-primary, #111);
margin: 0;
}
.close-btn {
background: none;
border: none;
cursor: pointer;
padding: 4px;
color: var(--text-secondary, #6b7280);
border-radius: var(--radius-sm, 4px);
display: flex;
align-items: center;
justify-content: center;
}
.close-btn:hover {
background: var(--bg-hover, #f3f4f6);
color: var(--text-primary, #111);
}
.modal-body {
padding: var(--spacing-lg);
overflow-y: auto;
flex: 1;
}
.modal-footer {
display: flex;
justify-content: flex-end;
gap: var(--spacing-sm);
padding-top: var(--spacing-md);
border-top: 1px solid var(--border-color, #e5e7eb);
margin-top: var(--spacing-md);
}
/* ── Upload ── */
.hint-text {
color: var(--text-secondary, #6b7280);
font-size: var(--font-sm, 0.875rem);
margin-bottom: var(--spacing-md);
line-height: 1.5;
}
.drop-zone {
border: 2px dashed var(--border-color, #d1d5db);
border-radius: var(--radius-md, 8px);
padding: var(--spacing-xl);
text-align: center;
cursor: pointer;
transition: border-color 0.15s, background 0.15s;
min-height: 160px;
display: flex;
align-items: center;
justify-content: center;
}
.drop-zone:hover,
.drop-zone-active {
border-color: var(--color-primary, #4f46e5);
background: var(--bg-hover, #f5f3ff);
}
.drop-zone.has-files {
border-style: solid;
border-color: var(--color-primary, #4f46e5);
padding: var(--spacing-md);
}
.hidden-input {
display: none;
}
.drop-zone-empty {
display: flex;
flex-direction: column;
align-items: center;
gap: var(--spacing-xs);
}
.camera-icon {
color: var(--text-secondary, #9ca3af);
margin-bottom: var(--spacing-xs);
}
.drop-zone-label {
font-weight: 600;
color: var(--text-primary, #111);
margin: 0;
}
.drop-zone-sub {
color: var(--text-secondary, #6b7280);
font-size: var(--font-sm, 0.875rem);
margin: 0;
}
.file-preview-grid {
display: flex;
gap: var(--spacing-sm);
flex-wrap: wrap;
align-items: center;
width: 100%;
}
.file-preview-item {
position: relative;
width: 100px;
}
.preview-img {
width: 100px;
height: 100px;
object-fit: cover;
border-radius: var(--radius-sm, 6px);
border: 1px solid var(--border-color, #e5e7eb);
}
.remove-file-btn {
position: absolute;
top: -6px;
right: -6px;
background: var(--color-danger, #ef4444);
color: white;
border: none;
border-radius: 50%;
width: 20px;
height: 20px;
display: flex;
align-items: center;
justify-content: center;
cursor: pointer;
padding: 0;
}
.preview-label {
text-align: center;
font-size: var(--font-xs, 0.75rem);
color: var(--text-secondary, #6b7280);
margin: 4px 0 0;
}
.file-preview-add {
width: 100px;
height: 100px;
border: 2px dashed var(--border-color, #d1d5db);
border-radius: var(--radius-sm, 6px);
display: flex;
align-items: center;
justify-content: center;
cursor: pointer;
color: var(--text-secondary, #9ca3af);
transition: border-color 0.15s;
}
.file-preview-add:hover {
border-color: var(--color-primary, #4f46e5);
color: var(--color-primary, #4f46e5);
}
/* ── Processing ── */
.processing-body {
display: flex;
align-items: center;
justify-content: center;
min-height: 200px;
}
.scan-spinner {
display: flex;
flex-direction: column;
align-items: center;
gap: var(--spacing-sm);
}
.spin-icon {
color: var(--color-primary, #4f46e5);
animation: spin 1.5s linear infinite;
}
@keyframes spin {
from { transform: rotate(0deg); }
to { transform: rotate(360deg); }
}
.processing-label {
font-weight: 600;
color: var(--text-primary, #111);
margin: 0;
}
.processing-sub {
color: var(--text-secondary, #6b7280);
font-size: var(--font-sm, 0.875rem);
margin: 0;
}
/* ── Review ── */
.review-body {
padding-bottom: var(--spacing-sm);
}
.pantry-match-row {
display: flex;
align-items: center;
}
.pantry-badge {
display: inline-block;
font-size: var(--font-sm, 0.875rem);
font-weight: 600;
padding: 3px 10px;
border-radius: 999px;
}
.pantry-high { background: var(--color-success-bg, #d1fae5); color: var(--color-success, #065f46); }
.pantry-mid { background: var(--color-info-bg, #dbeafe); color: var(--color-info, #1e40af); }
.pantry-low { background: var(--bg-secondary, #f3f4f6); color: var(--text-secondary, #374151); }
.review-form {
display: flex;
flex-direction: column;
gap: var(--spacing-md);
}
.form-row-2 {
display: grid;
grid-template-columns: 1fr 1fr;
gap: var(--spacing-sm);
}
/* Ingredients */
.ingredient-list {
display: flex;
flex-direction: column;
gap: 6px;
}
.ingredient-row {
display: flex;
align-items: center;
gap: 6px;
}
.pantry-dot {
width: 8px;
height: 8px;
border-radius: 50%;
background: var(--color-success, #10b981);
flex-shrink: 0;
}
.in-pantry {
background: var(--color-success-bg-faint, #f0fdf4);
border-radius: var(--radius-sm, 4px);
padding: 2px 4px;
}
.ingr-qty { width: 60px; flex-shrink: 0; }
.ingr-unit { width: 70px; flex-shrink: 0; }
.ingr-name { flex: 1; }
.remove-ingr-btn,
.remove-step-btn {
background: none;
border: none;
cursor: pointer;
padding: 4px;
color: var(--text-secondary, #9ca3af);
border-radius: var(--radius-sm, 4px);
display: flex;
align-items: center;
justify-content: center;
flex-shrink: 0;
}
.remove-ingr-btn:hover,
.remove-step-btn:hover {
background: var(--color-danger-bg, #fee2e2);
color: var(--color-danger, #ef4444);
}
/* Steps */
.step-list {
display: flex;
flex-direction: column;
gap: 8px;
}
.step-row {
display: flex;
align-items: flex-start;
gap: 8px;
}
.step-num {
width: 24px;
height: 24px;
border-radius: 50%;
background: var(--bg-secondary, #f3f4f6);
color: var(--text-secondary, #374151);
font-size: var(--font-xs, 0.75rem);
font-weight: 700;
display: flex;
align-items: center;
justify-content: center;
flex-shrink: 0;
margin-top: 8px;
}
.step-textarea {
flex: 1;
resize: vertical;
min-height: 60px;
}
/* Source */
.source-note {
font-size: var(--font-xs, 0.75rem);
color: var(--text-secondary, #9ca3af);
text-align: right;
font-style: italic;
}
.optional-label {
color: var(--text-secondary, #9ca3af);
font-weight: normal;
font-size: var(--font-xs, 0.75rem);
}
.warning-list {
margin: 4px 0 0;
padding-left: 16px;
font-size: var(--font-sm, 0.875rem);
}
.btn-ghost {
background: none;
border: none;
cursor: pointer;
color: var(--color-primary, #4f46e5);
padding: 4px 8px;
font-size: var(--font-sm, 0.875rem);
border-radius: var(--radius-sm, 4px);
}
.btn-ghost:hover {
background: var(--bg-hover, #f5f3ff);
}
.btn-sm {
padding: 4px 10px;
font-size: var(--font-sm, 0.875rem);
}
.mt-xs { margin-top: var(--spacing-xs, 4px); }
.mt-sm { margin-top: var(--spacing-sm, 8px); }
.mb-sm { margin-bottom: var(--spacing-sm, 8px); }
@media (max-width: 480px) {
.form-row-2 {
grid-template-columns: 1fr;
}
.modal-panel {
border-radius: var(--radius-md, 8px);
max-height: 95vh;
}
}
</style>

File diff suppressed because it is too large Load diff

View file

@ -46,7 +46,14 @@
<!-- Style tags --> <!-- Style tags -->
<div class="form-group"> <div class="form-group">
<label class="form-label">Style tags</label> <div class="flex-between mb-xs">
<label class="form-label" style="margin-bottom: 0;">Style tags</label>
<button
class="btn btn-secondary btn-xs"
:disabled="classifying"
@click="suggestTags"
>{{ classifying ? 'Suggesting…' : 'Suggest tags' }}</button>
</div>
<div class="tags-wrap flex flex-wrap gap-xs mb-xs"> <div class="tags-wrap flex flex-wrap gap-xs mb-xs">
<span <span
v-for="tag in localTags" v-for="tag in localTags"
@ -89,6 +96,7 @@
<script setup lang="ts"> <script setup lang="ts">
import { ref, computed, onMounted, onUnmounted, nextTick } from 'vue' import { ref, computed, onMounted, onUnmounted, nextTick } from 'vue'
import { useSavedRecipesStore } from '../stores/savedRecipes' import { useSavedRecipesStore } from '../stores/savedRecipes'
import { savedRecipesAPI } from '../services/api'
const SUGGESTED_TAGS = [ const SUGGESTED_TAGS = [
'comforting', 'light', 'spicy', 'umami', 'sweet', 'savory', 'rich', 'comforting', 'light', 'spicy', 'umami', 'sweet', 'savory', 'rich',
@ -140,6 +148,7 @@ const localTags = ref<string[]>([...(existing.value?.style_tags ?? [])])
const hoverRating = ref<number | null>(null) const hoverRating = ref<number | null>(null)
const tagInput = ref('') const tagInput = ref('')
const saving = ref(false) const saving = ref(false)
const classifying = ref(false)
const unusedSuggestions = computed(() => const unusedSuggestions = computed(() =>
SUGGESTED_TAGS.filter((s) => !localTags.value.includes(s)) SUGGESTED_TAGS.filter((s) => !localTags.value.includes(s))
@ -174,6 +183,23 @@ function onTagKey(e: KeyboardEvent) {
} }
} }
async function suggestTags() {
classifying.value = true
try {
const suggestions = await savedRecipesAPI.classifyStyle(props.recipeId)
// Merge suggestions into localTags new ones only, preserving user's existing tags
for (const tag of suggestions) {
if (!localTags.value.includes(tag)) {
localTags.value = [...localTags.value, tag]
}
}
} catch {
// Silently ignore tier gate returns 403, no LLM returns empty list
} finally {
classifying.value = false
}
}
async function submit() { async function submit() {
saving.value = true saving.value = true
try { try {

View file

@ -32,6 +32,7 @@
<option value="saved_at">Recently saved</option> <option value="saved_at">Recently saved</option>
<option value="rating">Highest rated</option> <option value="rating">Highest rated</option>
<option value="title">AZ</option> <option value="title">AZ</option>
<option value="last_cooked">Last cooked</option>
</select> </select>
</div> </div>
@ -46,7 +47,7 @@
<!-- Recipe cards --> <!-- Recipe cards -->
<div class="saved-list flex-col gap-sm"> <div class="saved-list flex-col gap-sm">
<div <div
v-for="recipe in store.saved" v-for="recipe in sortedSaved"
:key="recipe.id" :key="recipe.id"
class="card-sm saved-card" class="card-sm saved-card"
:class="{ 'card-success': recipe.rating !== null && recipe.rating >= 4 }" :class="{ 'card-success': recipe.rating !== null && recipe.rating >= 4 }"
@ -79,8 +80,8 @@
>{{ tag }}</span> >{{ tag }}</span>
</div> </div>
<!-- Last cooked hint --> <!-- Last cooked chip (orbital cadence: neutral, no urgency) -->
<div v-if="lastCookedLabel(recipe.recipe_id)" class="last-cooked-hint text-xs text-muted mt-xs"> <div v-if="lastCookedLabel(recipe.recipe_id)" class="last-cooked-chip text-xs mt-xs">
{{ lastCookedLabel(recipe.recipe_id) }} {{ lastCookedLabel(recipe.recipe_id) }}
</div> </div>
@ -165,20 +166,32 @@ const recipesStore = useRecipesStore()
const editingRecipe = ref<SavedRecipe | null>(null) const editingRecipe = ref<SavedRecipe | null>(null)
function lastCookedLabel(recipeId: number): string | null { function lastCookedLabel(recipeId: number): string | null {
const entries = recipesStore.cookLog.filter((e) => e.id === recipeId) const days = recipesStore.lastCookedDaysAgo(recipeId)
if (entries.length === 0) return null if (days === null) return null
const latestMs = Math.max(...entries.map((e) => e.cookedAt)) if (days === 0) return 'made today'
const diffMs = Date.now() - latestMs if (days === 1) return 'made yesterday'
const diffDays = Math.floor(diffMs / (1000 * 60 * 60 * 24)) if (days < 7) return `made ${days} days ago`
if (diffDays === 0) return 'Last made: today' if (days < 14) return 'made 1 week ago'
if (diffDays === 1) return 'Last made: yesterday' const weeks = Math.floor(days / 7)
if (diffDays < 7) return `Last made: ${diffDays} days ago` if (days < 60) return `made ${weeks} weeks ago`
if (diffDays < 14) return 'Last made: 1 week ago' const months = Math.floor(days / 30)
const diffWeeks = Math.floor(diffDays / 7) return `made ${months} month${months !== 1 ? 's' : ''} ago`
if (diffDays < 60) return `Last made: ${diffWeeks} weeks ago`
const diffMonths = Math.floor(diffDays / 30)
return `Last made: ${diffMonths} month${diffMonths !== 1 ? 's' : ''} ago`
} }
// Client-side last_cooked sort resolves from localStorage cook log so no API change needed.
// Recipes with a cook date surface oldest-first (natural "due for a revisit" order without
// framing it that way). Recipes never cooked sort to the end.
const sortedSaved = computed(() => {
if (store.sortBy !== 'last_cooked') return store.saved
return [...store.saved].sort((a, b) => {
const daysA = recipesStore.lastCookedDaysAgo(a.recipe_id)
const daysB = recipesStore.lastCookedDaysAgo(b.recipe_id)
if (daysA === null && daysB === null) return 0
if (daysA === null) return 1 // never cooked end
if (daysB === null) return -1 // never cooked end
return daysB - daysA // oldest cooked first (largest days value first)
})
})
const showNewCollection = ref(false) const showNewCollection = ref(false)
// #44: two-step remove confirmation // #44: two-step remove confirmation
@ -363,9 +376,14 @@ async function createCollection() {
padding: var(--spacing-xl); padding: var(--spacing-xl);
} }
.last-cooked-hint { .last-cooked-chip {
font-style: italic; display: inline-block;
opacity: 0.75; color: var(--color-text-muted, var(--color-secondary, #888));
background: var(--color-surface-subtle, transparent);
border-radius: var(--radius-sm, 4px);
padding: 0 var(--spacing-xs, 4px);
font-style: normal;
opacity: 0.8;
} }
.modal-overlay { .modal-overlay {

View file

@ -2,6 +2,7 @@
<div class="settings-view"> <div class="settings-view">
<div class="card"> <div class="card">
<h2 class="section-title text-xl mb-md">Settings</h2> <h2 class="section-title text-xl mb-md">Settings</h2>
<p class="text-xs text-muted mb-md">Changes save automatically.</p>
<!-- Cooking Equipment --> <!-- Cooking Equipment -->
<section> <section>
@ -19,7 +20,7 @@
class="tag-chip status-badge status-info" class="tag-chip status-badge status-info"
> >
{{ item }} {{ item }}
<button class="chip-remove" @click="removeEquipment(item)" aria-label="Remove">×</button> <button class="chip-remove" @click="removeEquipment(item)" :aria-label="'Remove equipment: ' + item">×</button>
</span> </span>
</div> </div>
@ -50,18 +51,6 @@
</div> </div>
</div> </div>
<!-- Save button -->
<div class="flex-start gap-sm">
<button
class="btn btn-primary"
:disabled="settingsStore.loading"
@click="settingsStore.save()"
>
<span v-if="settingsStore.loading">Saving</span>
<span v-else-if="settingsStore.saved"> Saved!</span>
<span v-else>Save Settings</span>
</button>
</div>
</section> </section>
<!-- Sensory Preferences --> <!-- Sensory Preferences -->
@ -134,17 +123,6 @@
</p> </p>
</div> </div>
<div class="flex-start gap-sm mt-sm">
<button
class="btn btn-primary btn-sm"
:disabled="settingsStore.loading"
@click="settingsStore.saveSensory()"
>
<span v-if="settingsStore.loading">Saving</span>
<span v-else-if="settingsStore.saved">Saved!</span>
<span v-else>Save sensory preferences</span>
</button>
</div>
</section> </section>
<!-- Units --> <!-- Units -->
@ -169,17 +147,6 @@
Imperial (oz, cups, °F) Imperial (oz, cups, °F)
</button> </button>
</div> </div>
<div class="flex-start gap-sm">
<button
class="btn btn-primary btn-sm"
:disabled="settingsStore.loading"
@click="settingsStore.save()"
>
<span v-if="settingsStore.loading">Saving</span>
<span v-else-if="settingsStore.saved"> Saved!</span>
<span v-else>Save</span>
</button>
</div>
</section> </section>
<!-- Shopping Locale --> <!-- Shopping Locale -->
@ -220,17 +187,6 @@
<option value="br">Brazil (BRL R$)</option> <option value="br">Brazil (BRL R$)</option>
</optgroup> </optgroup>
</select> </select>
<div class="flex-start gap-sm mt-sm">
<button
class="btn btn-primary btn-sm"
:disabled="settingsStore.loading"
@click="settingsStore.save()"
>
<span v-if="settingsStore.loading">Saving</span>
<span v-else-if="settingsStore.saved"> Saved!</span>
<span v-else>Save</span>
</button>
</div>
</section> </section>
<!-- Time-First Layout --> <!-- Time-First Layout -->
@ -258,17 +214,24 @@
</span> </span>
</label> </label>
</div> </div>
<div class="flex-start gap-sm mt-sm"> </section>
<button
class="btn btn-primary btn-sm" <!-- Data Sharing (cloud only) -->
:disabled="settingsStore.loading" <section v-if="isCloudMode" class="mt-md">
@click="settingsStore.save()" <h3 class="text-lg font-semibold mb-xs">Data Sharing</h3>
> <label class="data-sharing-toggle flex-start gap-sm text-sm">
<span v-if="settingsStore.loading">Saving</span> <input
<span v-else-if="settingsStore.saved"> Saved!</span> type="checkbox"
<span v-else>Save</span> :checked="magpieOptIn"
</button> @change="setMagpieOptIn(($event.target as HTMLInputElement).checked)"
</div> />
Share anonymized recipe ratings to help improve suggestions
</label>
<p class="text-xs text-muted mt-xs">
When enabled, Kiwi sends the recipe source ID, your star rating, and
style tags to CircuitForge. No personal information or pantry contents
are included.
</p>
</section> </section>
<!-- Display Preferences --> <!-- Display Preferences -->
@ -375,13 +338,19 @@
</template> </template>
</div> </div>
</div> </div>
<Transition name="autosave-fade">
<div v-if="settingsStore.saved" class="autosave-toast" role="status" aria-live="polite">
Saved
</div>
</Transition>
</template> </template>
<script setup lang="ts"> <script setup lang="ts">
import { ref, computed, onMounted } from 'vue' import { ref, computed, onMounted } from 'vue'
import { useSettingsStore } from '../stores/settings' import { useSettingsStore } from '../stores/settings'
import { useRecipesStore } from '../stores/recipes' import { useRecipesStore } from '../stores/recipes'
import { householdAPI, type HouseholdStatus } from '../services/api' import { householdAPI, settingsAPI, type HouseholdStatus } from '../services/api'
import type { TextureTag, SmellLevel, NoiseLevel } from '../services/api' import type { TextureTag, SmellLevel, NoiseLevel } from '../services/api'
import type { TimeFirstLayout } from '../stores/settings' import type { TimeFirstLayout } from '../stores/settings'
import { useOrchUsage } from '../composables/useOrchUsage' import { useOrchUsage } from '../composables/useOrchUsage'
@ -390,6 +359,23 @@ const settingsStore = useSettingsStore()
const recipesStore = useRecipesStore() const recipesStore = useRecipesStore()
const { enabled: orchPillEnabled, setEnabled: setOrchPillEnabled } = useOrchUsage() const { enabled: orchPillEnabled, setEnabled: setOrchPillEnabled } = useOrchUsage()
// Cloud mode baked in at build time via VITE_CLOUD_MODE=true in cloud builds
const isCloudMode = import.meta.env.VITE_CLOUD_MODE === 'true'
// Data sharing magpie opt-in (cloud mode only)
const magpieOptIn = ref(false)
async function loadMagpieOptIn(): Promise<void> {
if (!isCloudMode) return
const value = await settingsAPI.getSetting('magpie_opt_in')
magpieOptIn.value = value === 'true'
}
async function setMagpieOptIn(enabled: boolean): Promise<void> {
magpieOptIn.value = enabled
await settingsAPI.setSetting('magpie_opt_in', enabled ? 'true' : 'false')
}
const timeFirstLayoutOptions: Array<{ value: TimeFirstLayout; label: string; description: string }> = [ const timeFirstLayoutOptions: Array<{ value: TimeFirstLayout; label: string; description: string }> = [
{ value: 'auto', label: 'Auto', description: 'Shows a time selector when recipes are available.' }, { value: 'auto', label: 'Auto', description: 'Shows a time selector when recipes are available.' },
{ value: 'time_first', label: 'Time First', description: 'Always show the time bucket selector at the top.' }, { value: 'time_first', label: 'Time First', description: 'Always show the time bucket selector at the top.' },
@ -539,6 +525,7 @@ async function handleRemoveMember(userId: string) {
onMounted(async () => { onMounted(async () => {
await settingsStore.load() await settingsStore.load()
await loadHouseholdStatus() await loadHouseholdStatus()
await loadMagpieOptIn()
}) })
// Sensory taxonomy // Sensory taxonomy
@ -762,13 +749,15 @@ function getNoiseClass(_value: NoiseLevel, idx: number): string {
color: var(--color-text-muted); color: var(--color-text-muted);
} }
.orch-pill-toggle { .orch-pill-toggle,
.data-sharing-toggle {
cursor: pointer; cursor: pointer;
align-items: center; align-items: center;
color: var(--color-text); color: var(--color-text);
} }
.orch-pill-toggle input[type="checkbox"] { .orch-pill-toggle input[type="checkbox"],
.data-sharing-toggle input[type="checkbox"] {
accent-color: var(--color-primary); accent-color: var(--color-primary);
width: 1rem; width: 1rem;
height: 1rem; height: 1rem;
@ -833,4 +822,32 @@ function getNoiseClass(_value: NoiseLevel, idx: number): string {
border-color: var(--color-border, #e0e0e0); border-color: var(--color-border, #e0e0e0);
color: var(--color-text-secondary, #888); color: var(--color-text-secondary, #888);
} }
/* ── Autosave toast ──────────────────────────────────────────────────────── */
.autosave-toast {
position: fixed;
bottom: 1.5rem;
right: 1.5rem;
background: var(--color-surface, #fff);
border: 1px solid var(--color-border, #e0e0e0);
border-radius: var(--radius-md, 0.5rem);
padding: 0.4rem 0.9rem;
font-size: var(--font-size-sm);
color: var(--color-success, #4a8c40);
box-shadow: 0 2px 8px rgba(0, 0, 0, 0.12);
z-index: 500;
pointer-events: none;
}
.autosave-fade-enter-active,
.autosave-fade-leave-active {
transition: opacity 0.25s ease, transform 0.25s ease;
}
.autosave-fade-enter-from,
.autosave-fade-leave-to {
opacity: 0;
transform: translateY(0.5rem);
}
</style> </style>

View file

@ -627,6 +627,7 @@ export interface RecipeRequest {
complexity_filter: string | null complexity_filter: string | null
max_time_min: number | null max_time_min: number | null
max_total_min: number | null max_total_min: number | null
max_active_min: number | null
} }
export interface Staple { export interface Staple {
@ -670,6 +671,21 @@ export interface BuildRequest {
role_overrides: Record<string, string> role_overrides: Record<string, string>
} }
// ── Ask/RAG types ──────────────────────────────────────────────────────────
export interface AskRecipeHit {
id: number
title: string
match_pct: number | null
category: string | null
}
export interface AskResponse {
answer: string | null
recipes: AskRecipeHit[]
tier: string
}
// ========== Recipes API ========== // ========== Recipes API ==========
export const recipesAPI = { export const recipesAPI = {
@ -694,6 +710,10 @@ export const recipesAPI = {
const response = await api.get(`/recipes/${id}`) const response = await api.get(`/recipes/${id}`)
return response.data return response.data
}, },
async getLeftovers(id: number): Promise<{ fridge_days: number; freeze_days: number | null; freeze_by_day: number | null; storage_advice: string }> {
const response = await api.post(`/recipes/${id}/leftovers`)
return response.data
},
async listStaples(dietary?: string): Promise<Staple[]> { async listStaples(dietary?: string): Promise<Staple[]> {
const response = await api.get('/staples/', { params: dietary ? { dietary } : undefined }) const response = await api.get('/staples/', { params: dietary ? { dietary } : undefined })
return response.data return response.data
@ -732,6 +752,60 @@ export const recipesAPI = {
}) })
return response.data return response.data
}, },
/** Natural-language recipe search with optional LLM synthesis (Paid tier). */
async ask(question: string, pantryItems: string[] = []): Promise<AskResponse> {
const response = await api.post('/recipes/ask', { question, pantry_items: pantryItems }, { timeout: 30000 })
return response.data
},
/** Stream a recipe via native SSE (Ollama fallback). Calls callbacks as tokens arrive. */
async suggestRecipeStream(
req: RecipeRequest,
onChunk: (chunk: string) => void,
onDone: () => void,
onError: (err: string) => void,
): Promise<void> {
const baseUrl = (api.defaults.baseURL ?? '') as string
let response: Response
try {
response = await fetch(`${baseUrl}/recipes/suggest?stream=true`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(req),
})
} catch (err: unknown) {
onError(err instanceof Error ? err.message : 'Network error')
return
}
if (!response.ok) {
onError(`HTTP ${response.status}`)
return
}
const reader = response.body?.getReader()
if (!reader) { onError('No response body'); return }
const decoder = new TextDecoder()
let buffer = ''
while (true) {
const { done, value } = await reader.read()
if (done) { onDone(); break }
buffer += decoder.decode(value, { stream: true })
const parts = buffer.split('\n\n')
buffer = parts.pop() ?? ''
for (const part of parts) {
if (!part.startsWith('data: ')) continue
try {
const data = JSON.parse(part.slice(6))
if (data.done) { onDone(); return }
else if (data.error) { onError(data.error); return }
else if (data.chunk) { onChunk(data.chunk) }
} catch { /* ignore malformed events */ }
}
}
},
} }
// ========== Settings API ========== // ========== Settings API ==========
@ -857,6 +931,10 @@ export const savedRecipesAPI = {
async removeFromCollection(collection_id: number, saved_recipe_id: number): Promise<void> { async removeFromCollection(collection_id: number, saved_recipe_id: number): Promise<void> {
await api.delete(`/recipes/saved/collections/${collection_id}/members/${saved_recipe_id}`) await api.delete(`/recipes/saved/collections/${collection_id}/members/${saved_recipe_id}`)
}, },
async classifyStyle(recipe_id: number): Promise<string[]> {
const response = await api.post(`/recipes/saved/${recipe_id}/classify-style`)
return response.data.suggested_tags
},
} }
// --- Meal Plan types --- // --- Meal Plan types ---
@ -1053,6 +1131,7 @@ export const browserAPI = {
subcategory?: string subcategory?: string
q?: string q?: string
sort?: string sort?: string
required_ingredient?: string
}): Promise<BrowserResult> { }): Promise<BrowserResult> {
const response = await api.get(`/recipes/browse/${domain}/${encodeURIComponent(category)}`, { params }) const response = await api.get(`/recipes/browse/${domain}/${encodeURIComponent(category)}`, { params })
return response.data return response.data
@ -1195,4 +1274,127 @@ export const DEFAULT_SENSORY_PREFERENCES: SensoryPreferences = {
max_noise: null, max_noise: null,
} }
// ── Recipe Scanner (kiwi#9) ───────────────────────────────────────────────────
export interface ScannedIngredient {
name: string
qty: string | null
unit: string | null
raw: string | null
in_pantry: boolean
}
export interface ScannedRecipe {
title: string | null
subtitle: string | null
servings: string | null
cook_time: string | null
source_note: string | null
ingredients: ScannedIngredient[]
steps: string[]
notes: string | null
tags: string[]
pantry_match_pct: number
confidence: 'high' | 'medium' | 'low'
warnings: string[]
}
export interface UserRecipe {
id: number
title: string
subtitle: string | null
servings: string | null
cook_time: string | null
source_note: string | null
ingredients: ScannedIngredient[]
steps: string[]
notes: string | null
tags: string[]
source: string
pantry_match_pct: number | null
created_at: string
}
export const recipeScanAPI = {
/** Scan 1-4 recipe photos. Returns structured recipe for review (not saved). */
scan(files: File[]): Promise<ScannedRecipe> {
const form = new FormData()
files.forEach((f) => form.append('files', f))
return api.post('/recipes/scan', form, {
headers: { 'Content-Type': 'multipart/form-data' },
timeout: 120_000, // VLM can be slow on first call
}).then((r) => r.data)
},
/** Scan recipe photos with live SSE progress events.
*
* Calls onProgress(status, message) for each intermediate event
* ("allocating", "scanning", "structuring"), then resolves with the final
* ScannedRecipe on success. Rejects on error or timeout.
*/
async scanStream(
files: File[],
onProgress: (status: string, message: string) => void,
): Promise<ScannedRecipe> {
const form = new FormData()
files.forEach((f) => form.append('files', f))
const response = await fetch(`${API_BASE_URL}/recipes/scan/stream`, {
method: 'POST',
body: form,
})
if (!response.ok || !response.body) {
let detail = ''
try { detail = await response.text() } catch (_) { /* ignore */ }
throw new Error(detail || `Scan failed (${response.status})`)
}
const reader = response.body.getReader()
const decoder = new TextDecoder()
let buffer = ''
while (true) {
const { done, value } = await reader.read()
if (done) break
buffer += decoder.decode(value, { stream: true })
const lines = buffer.split('\n')
buffer = lines.pop() ?? ''
for (const line of lines) {
if (!line.startsWith('data: ')) continue
let data: Record<string, unknown>
try { data = JSON.parse(line.slice(6)) } catch { continue }
if (data.status === 'done') return data.recipe as ScannedRecipe
if (data.status === 'error') throw new Error((data.message as string) || 'Scan failed')
onProgress(data.status as string, data.message as string)
}
}
throw new Error('Stream ended without a result')
},
/** Save a reviewed/edited scanned recipe to user_recipes. */
saveScanned(recipe: Omit<ScannedRecipe, 'pantry_match_pct' | 'confidence' | 'warnings'> & { source?: string }): Promise<UserRecipe> {
return api.post('/recipes/scan/save', recipe).then((r) => r.data)
},
/** List all user-created recipes (scan + manual). */
listUserRecipes(): Promise<UserRecipe[]> {
return api.get('/recipes/user').then((r) => r.data)
},
/** Get a single user recipe by ID. */
getUserRecipe(id: number): Promise<UserRecipe> {
return api.get(`/recipes/user/${id}`).then((r) => r.data)
},
/** Delete a user recipe. */
deleteUserRecipe(id: number): Promise<void> {
return api.delete(`/recipes/user/${id}`).then(() => undefined)
},
}
export default api export default api

View file

@ -64,6 +64,20 @@ export interface PublishPayload {
recipe_id?: number recipe_id?: number
outcome_notes?: string outcome_notes?: string
slots?: CommunityPostSlot[] slots?: CommunityPostSlot[]
similar_to_ref?: string
}
export type SimilarityTier = 'exact_recipe' | 'very_similar' | 'somewhat_similar'
export interface SimilarPost {
slug: string
title: string
recipe_name: string | null
pseudonym: string
published: string
similarity_tier: SimilarityTier
jaccard_score: number | null
tier_description: string
} }
export interface PublishResult { export interface PublishResult {
@ -107,6 +121,25 @@ export const useCommunityStore = defineStore('community', () => {
return response.data return response.data
} }
async function checkSimilar(
title: string,
recipeId?: number | null,
postType?: string,
): Promise<SimilarPost[]> {
try {
const body: Record<string, unknown> = { title }
if (recipeId != null) body.recipe_id = recipeId
if (postType) body.post_type = postType
const response = await api.post<{ similar_posts: SimilarPost[] }>(
'/community/check-similar',
body,
)
return response.data.similar_posts
} catch {
return []
}
}
return { return {
posts, posts,
loading, loading,
@ -115,5 +148,6 @@ export const useCommunityStore = defineStore('community', () => {
fetchPosts, fetchPosts,
forkPost, forkPost,
publishPost, publishPost,
checkSimilar,
} }
}) })

View file

@ -152,6 +152,7 @@ export const useRecipesStore = defineStore('recipes', () => {
const complexityFilter = ref<string | null>(null) const complexityFilter = ref<string | null>(null)
const maxTimeMin = ref<number | null>(null) const maxTimeMin = ref<number | null>(null)
const maxTotalMin = ref<number | null>(null) const maxTotalMin = ref<number | null>(null)
const maxActiveMin = ref<number | null>(null)
const nutritionFilters = ref<NutritionFilters>({ const nutritionFilters = ref<NutritionFilters>({
max_calories: null, max_calories: null,
max_sugar_g: null, max_sugar_g: null,
@ -207,6 +208,7 @@ export const useRecipesStore = defineStore('recipes', () => {
complexity_filter: complexityFilter.value, complexity_filter: complexityFilter.value,
max_time_min: maxTimeMin.value, max_time_min: maxTimeMin.value,
max_total_min: maxTotalMin.value, max_total_min: maxTotalMin.value,
max_active_min: maxActiveMin.value,
} }
} }
@ -318,6 +320,8 @@ export const useRecipesStore = defineStore('recipes', () => {
localStorage.removeItem(DISMISSED_KEY) localStorage.removeItem(DISMISSED_KEY)
} }
// Orbital cadence: cookedAt anchors to completion, not to a schedule.
// Days-since display measures from this timestamp — no debt accumulates.
function logCook(id: number, title: string) { function logCook(id: number, title: string) {
const entry: CookLogEntry = { id, title, cookedAt: Date.now() } const entry: CookLogEntry = { id, title, cookedAt: Date.now() }
cookLog.value = [...cookLog.value, entry] cookLog.value = [...cookLog.value, entry]
@ -329,6 +333,13 @@ export const useRecipesStore = defineStore('recipes', () => {
localStorage.removeItem(COOK_LOG_KEY) localStorage.removeItem(COOK_LOG_KEY)
} }
function lastCookedDaysAgo(recipeId: number): number | null {
const entries = cookLog.value.filter((e) => e.id === recipeId)
if (entries.length === 0) return null
const latestMs = Math.max(...entries.map((e) => e.cookedAt))
return Math.floor((Date.now() - latestMs) / 86_400_000)
}
function isBookmarked(id: number): boolean { function isBookmarked(id: number): boolean {
return bookmarks.value.some((b) => b.id === id) return bookmarks.value.some((b) => b.id === id)
} }
@ -368,6 +379,17 @@ export const useRecipesStore = defineStore('recipes', () => {
wildcardConfirmed.value = false wildcardConfirmed.value = false
} }
async function streamSuggest(
pantryItems: string[],
secondaryPantryItems: Record<string, string>,
onChunk: (chunk: string) => void,
onDone: () => void,
onError: (err: string) => void,
): Promise<void> {
const req = _buildRequest(pantryItems, secondaryPantryItems)
await recipesAPI.suggestRecipeStream(req, onChunk, onDone, onError)
}
return { return {
result, result,
loading, loading,
@ -387,12 +409,14 @@ export const useRecipesStore = defineStore('recipes', () => {
complexityFilter, complexityFilter,
maxTimeMin, maxTimeMin,
maxTotalMin, maxTotalMin,
maxActiveMin,
nutritionFilters, nutritionFilters,
dismissedIds, dismissedIds,
dismissedCount, dismissedCount,
cookLog, cookLog,
logCook, logCook,
clearCookLog, clearCookLog,
lastCookedDaysAgo,
bookmarks, bookmarks,
isBookmarked, isBookmarked,
toggleBookmark, toggleBookmark,
@ -403,6 +427,7 @@ export const useRecipesStore = defineStore('recipes', () => {
missingIngredientMode, missingIngredientMode,
builderFilterMode, builderFilterMode,
suggest, suggest,
streamSuggest,
loadMore, loadMore,
dismiss, dismiss,
undismiss, undismiss,

View file

@ -11,7 +11,7 @@ export const useSavedRecipesStore = defineStore('savedRecipes', () => {
const saved = ref<SavedRecipe[]>([]) const saved = ref<SavedRecipe[]>([])
const collections = ref<RecipeCollection[]>([]) const collections = ref<RecipeCollection[]>([])
const loading = ref(false) const loading = ref(false)
const sortBy = ref<'saved_at' | 'rating' | 'title'>('saved_at') const sortBy = ref<'saved_at' | 'rating' | 'title' | 'last_cooked'>('saved_at')
const activeCollectionId = ref<number | null>(null) const activeCollectionId = ref<number | null>(null)
const savedIds = computed(() => new Set(saved.value.map((s) => s.recipe_id))) const savedIds = computed(() => new Set(saved.value.map((s) => s.recipe_id)))
@ -27,12 +27,15 @@ export const useSavedRecipesStore = defineStore('savedRecipes', () => {
async function load() { async function load() {
loading.value = true loading.value = true
try { try {
const [items, cols] = await Promise.all([ // Fetch independently — a collections 403 (Free tier) must not prevent
savedRecipesAPI.list({ sort_by: sortBy.value, collection_id: activeCollectionId.value ?? undefined }), // saved recipes from loading. Backend now returns [] for Free, but guard
// here too in case an older API version is deployed.
const [itemsResult, colsResult] = await Promise.allSettled([
savedRecipesAPI.list({ sort_by: sortBy.value === 'last_cooked' ? 'saved_at' : sortBy.value, collection_id: activeCollectionId.value ?? undefined }),
savedRecipesAPI.listCollections(), savedRecipesAPI.listCollections(),
]) ])
saved.value = items if (itemsResult.status === 'fulfilled') saved.value = itemsResult.value
collections.value = cols if (colsResult.status === 'fulfilled') collections.value = colsResult.value
} finally { } finally {
loading.value = false loading.value = false
} }

View file

@ -1,11 +1,5 @@
/**
* Settings Store
*
* Manages user settings (cooking equipment, preferences) using Pinia.
*/
import { defineStore } from 'pinia' import { defineStore } from 'pinia'
import { ref } from 'vue' import { ref, watch, nextTick } from 'vue'
import { settingsAPI } from '../services/api' import { settingsAPI } from '../services/api'
import type { UnitSystem } from '../utils/units' import type { UnitSystem } from '../utils/units'
import type { SensoryPreferences } from '../services/api' import type { SensoryPreferences } from '../services/api'
@ -13,8 +7,12 @@ import { DEFAULT_SENSORY_PREFERENCES } from '../services/api'
export type TimeFirstLayout = 'auto' | 'time_first' | 'normal' export type TimeFirstLayout = 'auto' | 'time_first' | 'normal'
function debounce(fn: () => void, ms: number): () => void {
let t: ReturnType<typeof setTimeout>
return () => { clearTimeout(t); t = setTimeout(fn, ms) }
}
export const useSettingsStore = defineStore('settings', () => { export const useSettingsStore = defineStore('settings', () => {
// State
const cookingEquipment = ref<string[]>([]) const cookingEquipment = ref<string[]>([])
const unitSystem = ref<UnitSystem>('metric') const unitSystem = ref<UnitSystem>('metric')
const shoppingLocale = ref<string>('us') const shoppingLocale = ref<string>('us')
@ -23,7 +21,40 @@ export const useSettingsStore = defineStore('settings', () => {
const loading = ref(false) const loading = ref(false)
const saved = ref(false) const saved = ref(false)
// Actions // Prevents autosave watchers from firing during initial load hydration.
// Set to true after nextTick() at the end of load() — by that point all
// watcher jobs queued by the hydration assignments have already flushed.
let _hydrated = false
function _flash() {
saved.value = true
setTimeout(() => { saved.value = false }, 2000)
}
async function _saveKey(key: string, value: string): Promise<void> {
if (!_hydrated) return
try {
await settingsAPI.setSetting(key, value)
_flash()
} catch (err: unknown) {
console.error('Autosave failed for key:', key, err)
}
}
const _autosave = {
equipment: debounce(() => _saveKey('cooking_equipment', JSON.stringify(cookingEquipment.value)), 600),
unit: debounce(() => _saveKey('unit_system', unitSystem.value), 600),
locale: debounce(() => _saveKey('shopping_locale', shoppingLocale.value), 600),
sensory: debounce(() => _saveKey('sensory_preferences', JSON.stringify(sensoryPreferences.value)), 600),
layout: debounce(() => _saveKey('time_first_layout', timeFirstLayout.value), 600),
}
watch(cookingEquipment, _autosave.equipment, { deep: true })
watch(unitSystem, _autosave.unit)
watch(shoppingLocale, _autosave.locale)
watch(sensoryPreferences, _autosave.sensory, { deep: true })
watch(timeFirstLayout, _autosave.layout)
async function load() { async function load() {
loading.value = true loading.value = true
try { try {
@ -58,8 +89,15 @@ export const useSettingsStore = defineStore('settings', () => {
} finally { } finally {
loading.value = false loading.value = false
} }
// Yield past the watcher flush triggered by hydration assignments above.
// After nextTick, any pending watcher jobs from this load() have already
// run (and been ignored by _hydrated guard), so user-driven changes from
// here forward will correctly trigger autosave.
await nextTick()
_hydrated = true
} }
// Kept for explicit full-save scenarios (e.g. fallback, tests).
async function save() { async function save() {
loading.value = true loading.value = true
try { try {
@ -70,10 +108,7 @@ export const useSettingsStore = defineStore('settings', () => {
settingsAPI.setSetting('sensory_preferences', JSON.stringify(sensoryPreferences.value)), settingsAPI.setSetting('sensory_preferences', JSON.stringify(sensoryPreferences.value)),
settingsAPI.setSetting('time_first_layout', timeFirstLayout.value), settingsAPI.setSetting('time_first_layout', timeFirstLayout.value),
]) ])
saved.value = true _flash()
setTimeout(() => {
saved.value = false
}, 2000)
} catch (err: unknown) { } catch (err: unknown) {
console.error('Failed to save settings:', err) console.error('Failed to save settings:', err)
} finally { } finally {
@ -81,24 +116,17 @@ export const useSettingsStore = defineStore('settings', () => {
} }
} }
// Kept for backward compat; autosave handles sensory changes now.
async function saveSensory() { async function saveSensory() {
loading.value = true
try { try {
await settingsAPI.setSetting( await settingsAPI.setSetting('sensory_preferences', JSON.stringify(sensoryPreferences.value))
'sensory_preferences', _flash()
JSON.stringify(sensoryPreferences.value),
)
saved.value = true
setTimeout(() => { saved.value = false }, 2000)
} catch (err: unknown) { } catch (err: unknown) {
console.error('Failed to save sensory preferences:', err) console.error('Failed to save sensory preferences:', err)
} finally {
loading.value = false
} }
} }
return { return {
// State
cookingEquipment, cookingEquipment,
unitSystem, unitSystem,
shoppingLocale, shoppingLocale,
@ -106,8 +134,6 @@ export const useSettingsStore = defineStore('settings', () => {
timeFirstLayout, timeFirstLayout,
loading, loading,
saved, saved,
// Actions
load, load,
save, save,
saveSensory, saveSensory,

View file

@ -436,6 +436,24 @@
display: none; display: none;
} }
/* Horizontally scrollable tab bar — for tab rows with many items */
.tab-bar-scroll {
display: flex;
gap: var(--spacing-xs);
overflow-x: auto;
overflow-y: visible;
scrollbar-width: none;
-webkit-overflow-scrolling: touch;
min-width: 0;
width: 100%;
flex-wrap: nowrap;
padding-bottom: 2px; /* prevent focus ring clipping */
}
.tab-bar-scroll::-webkit-scrollbar {
display: none;
}
/* ============================================ /* ============================================
TEXT UTILITIES TEXT UTILITIES
============================================ */ ============================================ */

View file

@ -1,9 +1,85 @@
import { defineConfig } from 'vite' import { defineConfig } from 'vite'
import vue from '@vitejs/plugin-vue' import vue from '@vitejs/plugin-vue'
import { VitePWA } from 'vite-plugin-pwa'
import { fileURLToPath, URL } from 'node:url' import { fileURLToPath, URL } from 'node:url'
// Ensure start_url/scope match the deployment base path so the PWA launches
// at the correct URL (e.g. /kiwi/ in cloud, / in local dev) rather than the
// site root (which on menagerie.circuitforge.tech is the account page).
const rawBase = process.env.VITE_BASE_URL ?? '/'
const appBase = rawBase.endsWith('/') ? rawBase : rawBase + '/'
export default defineConfig({ export default defineConfig({
plugins: [vue()], plugins: [
vue(),
VitePWA({
registerType: 'autoUpdate',
// generateSW strategy: Workbox builds the service worker at build time.
// autoUpdate means new versions install in the background and activate
// on next navigation — no "click to reload" prompt needed.
strategies: 'generateSW',
includeAssets: ['icons/icon-192.png', 'icons/icon-512.png', 'icons/maskable-192.png', 'icons/maskable-512.png'],
manifest: {
name: 'Kiwi — Pantry Tracker',
short_name: 'Kiwi',
description: 'Track your pantry, cut food waste, get recipe ideas from what you have.',
theme_color: '#e8a820',
background_color: '#1e1c1a',
display: 'standalone',
orientation: 'portrait',
scope: appBase,
start_url: appBase,
icons: [
{
src: '/icons/icon-192.png',
sizes: '192x192',
type: 'image/png',
},
{
src: '/icons/icon-512.png',
sizes: '512x512',
type: 'image/png',
},
{
src: '/icons/maskable-192.png',
sizes: '192x192',
type: 'image/png',
purpose: 'maskable',
},
{
src: '/icons/maskable-512.png',
sizes: '512x512',
type: 'image/png',
purpose: 'maskable',
},
],
},
workbox: {
// Precache the built JS/CSS/HTML shell. API calls are always network-first.
globPatterns: ['**/*.{js,css,html,ico,png,svg,woff2}'],
runtimeCaching: [
{
// API: network-first, fall back to cache for 1 minute
urlPattern: /^\/api\//,
handler: 'NetworkFirst',
options: {
cacheName: 'kiwi-api-cache',
expiration: { maxEntries: 50, maxAgeSeconds: 60 },
},
},
{
// Google Fonts: cache-first (fonts rarely change)
urlPattern: /^https:\/\/fonts\.(googleapis|gstatic)\.com\//,
handler: 'CacheFirst',
options: {
cacheName: 'google-fonts',
expiration: { maxEntries: 10, maxAgeSeconds: 60 * 60 * 24 * 365 },
},
},
],
},
}),
],
base: process.env.VITE_BASE_URL ?? '/', base: process.env.VITE_BASE_URL ?? '/',
resolve: { resolve: {
alias: { alias: {

View file

@ -14,8 +14,8 @@ OVERRIDE_FLAG=""
[[ -f "compose.override.yml" ]] && OVERRIDE_FLAG="-f compose.override.yml" [[ -f "compose.override.yml" ]] && OVERRIDE_FLAG="-f compose.override.yml"
usage() { usage() {
echo "Usage: $0 {start|stop|restart|status|logs|open|build|test" echo "Usage: $0 {start|stop|restart|status|logs|open|build|test|update"
echo " |cloud-start|cloud-stop|cloud-restart|cloud-status|cloud-logs|cloud-build}" echo " |cloud-start|cloud-stop|cloud-restart|cloud-status|cloud-logs|cloud-build|cloud-update}"
echo "" echo ""
echo "Dev:" echo "Dev:"
echo " start Build (if needed) and start all services" echo " start Build (if needed) and start all services"
@ -26,6 +26,7 @@ usage() {
echo " open Open web UI in browser" echo " open Open web UI in browser"
echo " build Rebuild Docker images without cache" echo " build Rebuild Docker images without cache"
echo " test Run pytest test suite" echo " test Run pytest test suite"
echo " update git pull + rebuild + restart dev stack"
echo "" echo ""
echo "Cloud (menagerie.circuitforge.tech/kiwi):" echo "Cloud (menagerie.circuitforge.tech/kiwi):"
echo " cloud-start Build cloud images and start kiwi-cloud project" echo " cloud-start Build cloud images and start kiwi-cloud project"
@ -34,6 +35,7 @@ usage() {
echo " cloud-status Show cloud containers" echo " cloud-status Show cloud containers"
echo " cloud-logs Follow cloud logs [api|web — defaults to all]" echo " cloud-logs Follow cloud logs [api|web — defaults to all]"
echo " cloud-build Rebuild cloud images without cache" echo " cloud-build Rebuild cloud images without cache"
echo " cloud-update git pull + rebuild + restart cloud stack"
exit 1 exit 1
} }
@ -68,6 +70,11 @@ case "$cmd" in
build) build)
docker compose -f "$COMPOSE_FILE" $OVERRIDE_FLAG build --no-cache docker compose -f "$COMPOSE_FILE" $OVERRIDE_FLAG build --no-cache
;; ;;
update)
git pull
docker compose -f "$COMPOSE_FILE" $OVERRIDE_FLAG up -d --build
echo "Kiwi updated and restarted → http://localhost:${WEB_PORT}"
;;
test) test)
docker compose -f "$COMPOSE_FILE" $OVERRIDE_FLAG run --rm api \ docker compose -f "$COMPOSE_FILE" $OVERRIDE_FLAG run --rm api \
conda run -n job-seeker pytest tests/ -v conda run -n job-seeker pytest tests/ -v
@ -95,6 +102,11 @@ case "$cmd" in
cloud-build) cloud-build)
docker compose -f "$CLOUD_COMPOSE_FILE" -p "$CLOUD_PROJECT" build --no-cache docker compose -f "$CLOUD_COMPOSE_FILE" -p "$CLOUD_PROJECT" build --no-cache
;; ;;
cloud-update)
git pull
docker compose -f "$CLOUD_COMPOSE_FILE" -p "$CLOUD_PROJECT" up -d --build
echo "Kiwi cloud updated and restarted → https://menagerie.circuitforge.tech/kiwi"
;;
*) *)
usage usage

View file

@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project] [project]
name = "kiwi" name = "kiwi"
version = "0.6.0" version = "0.10.0"
description = "Pantry tracking + leftover recipe suggestions" description = "Pantry tracking + leftover recipe suggestions"
readme = "README.md" readme = "README.md"
requires-python = ">=3.11" requires-python = ">=3.11"

View file

@ -0,0 +1,117 @@
"""
Fast targeted backfill for meal: tags only.
Rather than re-deriving ALL inferred_tags via the full infer_tags() pipeline
(which takes ~2.5h for 3.19M recipes), this script:
1. Reads only id + title + inferred_tags (no ingredient profiles needed
meal signals are title-only).
2. Runs _match_title_signals() against the title to get meal tags.
3. For rows that already have inferred_tags: merges in the new meal tags
(no-op if already present).
4. For rows with no inferred_tags: runs the full infer_tags() pipeline so
those rows get a complete tag set, not just meal tags.
5. Rebuilds the FTS5 index once at the end.
Estimated runtime on 3.19M recipes: 35 minutes.
Usage:
python scripts/pipeline/backfill_meal_tags.py [path/to/kiwi.db]
"""
from __future__ import annotations
import argparse
import json
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parents[2]))
from app.services.recipe.tag_inferrer import _MEAL_SIGNALS, _match_title_signals
def run(db_path: Path, batch_size: int = 10_000) -> None:
import sqlite3
conn = sqlite3.connect(db_path)
conn.execute("PRAGMA journal_mode=WAL")
conn.execute("PRAGMA synchronous=NORMAL")
total = conn.execute("SELECT count(*) FROM recipes").fetchone()[0]
print(f"Total recipes: {total:,}")
updated = 0
skipped = 0
offset = 0
while True:
rows = conn.execute(
"""
SELECT id, title, inferred_tags
FROM recipes
ORDER BY id
LIMIT ? OFFSET ?
""",
(batch_size, offset),
).fetchall()
if not rows:
break
updates: list[tuple[str, int]] = []
for row_id, title, tags_json in rows:
title = title or ""
meal_tags = _match_title_signals(title, _MEAL_SIGNALS)
if not meal_tags:
skipped += 1
continue
try:
existing: list[str] = json.loads(tags_json) if tags_json else []
except Exception:
existing = []
# Merge: union of existing + new meal tags, sorted
merged = sorted(set(existing) | set(meal_tags))
if merged == existing:
skipped += 1
continue
updates.append((json.dumps(merged), row_id))
if updates:
conn.executemany(
"UPDATE recipes SET inferred_tags = ? WHERE id = ?", updates
)
conn.commit()
updated += len(updates)
offset += len(rows)
pct = min(100, int(offset * 100 / total))
print(f" {pct:>3}% offset {offset:,} merged {updated:,} skipped {skipped:,}",
end="\r")
print(f"\nDone. Merged meal tags into {updated:,} recipes ({skipped:,} unchanged).")
if updated > 0:
print("Rebuilding FTS5 browser index...")
try:
conn.execute(
"INSERT INTO recipe_browser_fts(recipe_browser_fts) VALUES('rebuild')"
)
conn.commit()
print("FTS rebuild complete.")
except Exception as e:
print(f"FTS rebuild skipped: {e}")
conn.close()
if __name__ == "__main__":
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument("db", nargs="?", default="data/kiwi.db", type=Path)
parser.add_argument("--batch-size", type=int, default=10_000)
args = parser.parse_args()
if not args.db.exists():
print(f"DB not found: {args.db}")
sys.exit(1)
run(args.db, args.batch_size)

View file

@ -0,0 +1,218 @@
"""Ingest Purple Carrot scraped recipes into the Kiwi corpus database.
Reads recipes_purplecarrot_live.parquet (output of scrape_live.py) and
upserts into the shared recipes table, setting source='purplecarrot' and
using the recipe slug as the external_id (prefixed pc_).
Run after each weekly_harvest.sh scrape:
conda run -n cf python3 scripts/pipeline/ingest_purplecarrot.py \
[--db /Library/Assets/kiwi/kiwi.db] \
[--parquet /Library/Assets/kiwi/pipeline/recipes_purplecarrot_live.parquet]
"""
from __future__ import annotations
import argparse
import json
import sqlite3
from pathlib import Path
import math
import re
import pandas as pd
# ── Helpers (inlined from build_recipe_index to avoid cross-module import) ─────
_MEASURE_PATTERN = re.compile(
r"^\d[\d\s/¼½¾⅓⅔]*\s*(cup|tbsp|tsp|oz|lb|g|kg|ml|l|clove|slice|piece|can|pkg|package|bunch|head|stalk|sprig|pinch|dash|to taste|as needed)s?\b",
re.IGNORECASE,
)
_LEAD_NUMBER = re.compile(r"^\d[\d\s/¼½¾⅓⅔]*\s*")
_TRAILING_QUALIFIER = re.compile(
r"\s*(to taste|as needed|or more|or less|optional|if desired|if needed)\s*$",
re.IGNORECASE,
)
def _float_or_none(val: object) -> float | None:
try:
v = float(val) # type: ignore[arg-type]
return v if v > 0 else None
except (TypeError, ValueError):
return None
def _safe_list(val: object) -> list:
if val is None:
return []
if isinstance(val, float) and math.isnan(val):
return []
if isinstance(val, list):
return val
# Parquet often deserializes list columns as numpy arrays
try:
import numpy as np
if isinstance(val, np.ndarray):
return val.tolist()
except ImportError:
pass
return []
def _extract_ingredient_names(raw_list: list[str]) -> list[str]:
names = []
for raw in raw_list:
s = raw.lower().strip()
s = _MEASURE_PATTERN.sub("", s)
s = _LEAD_NUMBER.sub("", s)
s = re.sub(r"\(.*?\)", "", s)
s = re.sub(r",.*$", "", s)
s = _TRAILING_QUALIFIER.sub("", s)
s = s.strip(" -.,")
if s and len(s) > 1:
names.append(s)
return names
def _compute_element_coverage(profiles: list[dict]) -> dict[str, float]:
counts: dict[str, int] = {}
for p in profiles:
for elem in p.get("elements", []):
counts[elem] = counts.get(elem, 0) + 1
if not profiles:
return {}
return {e: round(c / len(profiles), 3) for e, c in counts.items()}
# ── Config ─────────────────────────────────────────────────────────────────────
DEFAULT_DB = Path("/Library/Assets/kiwi/kiwi.db")
DEFAULT_PARQUET = Path("/Library/Assets/kiwi/pipeline/recipes_purplecarrot_live.parquet")
# ── Ingest ─────────────────────────────────────────────────────────────────────
def ingest(db_path: Path, parquet_path: Path) -> None:
df = pd.read_parquet(parquet_path)
# Filter to rows with full recipe data
if "HasFullRecipe" in df.columns:
df = df[df["HasFullRecipe"] == True].copy()
if df.empty:
print("No full recipes found in parquet — nothing to ingest.")
return
print(f"Ingesting {len(df)} Purple Carrot recipes into {db_path}")
conn = sqlite3.connect(db_path)
try:
conn.execute("PRAGMA journal_mode=WAL")
# Pre-load ingredient element profiles for coverage calculation
profile_index: dict[str, list[str]] = {}
for row in conn.execute("SELECT name, elements FROM ingredient_profiles"):
try:
profile_index[row[0]] = json.loads(row[1])
except Exception:
pass
inserted = updated = 0
for _, row in df.iterrows():
slug = str(row.get("Slug", "")).strip()
if not slug:
continue
external_id = f"pc_{slug}"
title = str(row.get("Name", "")).strip()[:500]
if not title:
continue
raw_ingredients = [str(i) for i in _safe_list(row.get("RecipeIngredientParts", []))]
directions = [str(d) for d in _safe_list(row.get("RecipeInstructions", []))]
ingredient_names = _extract_ingredient_names(raw_ingredients)
profiles = [
{"elements": profile_index[n]}
for n in ingredient_names if n in profile_index
]
coverage = _compute_element_coverage(profiles)
# Keywords: merge scraped tags with allergen info
kw_raw = _safe_list(row.get("Keywords", []))
allergens = str(row.get("Allergens", "") or "")
if allergens:
kw_raw = list(kw_raw) + [f"allergen:{a.strip()}" for a in allergens.split(",") if a.strip()]
keywords_json = json.dumps(kw_raw)
# Check if already present (same external_id)
existing = conn.execute(
"SELECT id FROM recipes WHERE external_id = ?", (external_id,)
).fetchone()
params = (
title,
json.dumps(raw_ingredients),
json.dumps(ingredient_names),
json.dumps(directions),
"meal-kit", # category
keywords_json,
_float_or_none(row.get("Calories")),
_float_or_none(row.get("FatContent")),
_float_or_none(row.get("ProteinContent")),
None, # sodium_mg — not scraped
json.dumps(coverage),
None, # sugar_g — not scraped
_float_or_none(row.get("CarbohydrateContent")),
_float_or_none(row.get("FiberContent")),
2.0, # servings — PC meal kits are 2-serving by default
0, # nutrition_estimated — PC provides real data
)
if existing:
conn.execute("""
UPDATE recipes
SET title=?, ingredients=?, ingredient_names=?, directions=?,
category=?, keywords=?, calories=?, fat_g=?, protein_g=?,
sodium_mg=?, element_coverage=?,
sugar_g=?, carbs_g=?, fiber_g=?, servings=?, nutrition_estimated=?
WHERE external_id=?
""", params + (external_id,))
updated += 1
else:
conn.execute("""
INSERT INTO recipes
(external_id, source, title, ingredients, ingredient_names,
directions, category, keywords, calories, fat_g, protein_g,
sodium_mg, element_coverage,
sugar_g, carbs_g, fiber_g, servings, nutrition_estimated)
VALUES (?, 'purplecarrot', ?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)
""", (external_id,) + params)
inserted += 1
conn.commit()
finally:
conn.close()
print(f"Done — {inserted} inserted, {updated} updated")
# ── Main ───────────────────────────────────────────────────────────────────────
def main() -> None:
parser = argparse.ArgumentParser()
parser.add_argument("--db", type=Path, default=DEFAULT_DB)
parser.add_argument("--parquet", type=Path, default=DEFAULT_PARQUET)
args = parser.parse_args()
if not args.parquet.exists():
print(f"ERROR: parquet not found at {args.parquet}")
raise SystemExit(1)
ingest(args.db, args.parquet)
if __name__ == "__main__":
main()

View file

@ -0,0 +1,68 @@
"""
Pipeline logging utility.
Adds a structured JSON FileHandler to the root logger so every pipeline
script automatically writes machine-readable logs to the shared datastore
at /Library/Assets/logs/pipeline/. Avocet ingests these for Turnstone
logreading training (kiwi#141 / avocet#67).
Usage (add near the top of main() after logging.basicConfig):
from scripts.pipeline.log_utils import attach_pipeline_log
attach_pipeline_log("scrape_recipes")
"""
from __future__ import annotations
import json
import logging
import os
from datetime import datetime, timezone
from pathlib import Path
PIPELINE_LOG_DIR = Path(
os.environ.get("PIPELINE_LOG_DIR", "/Library/Assets/logs/pipeline")
)
class _JsonFormatter(logging.Formatter):
def format(self, record: logging.LogRecord) -> str:
payload: dict = {
"ts": datetime.fromtimestamp(record.created, tz=timezone.utc).isoformat(),
"level": record.levelname,
"logger": record.name,
"msg": record.getMessage(),
}
if record.exc_info:
payload["exc"] = self.formatException(record.exc_info)
# Any extra kwargs passed via logger.info("...", extra={...})
standard = {
"name", "msg", "args", "levelname", "levelno", "pathname",
"filename", "module", "exc_info", "exc_text", "stack_info",
"lineno", "funcName", "created", "msecs", "relativeCreated",
"thread", "threadName", "processName", "process", "message",
"taskName",
}
extra = {k: v for k, v in record.__dict__.items() if k not in standard}
if extra:
payload["extra"] = extra
return json.dumps(payload)
def attach_pipeline_log(script_name: str) -> Path:
"""Attach a JSON file handler to the root logger for pipeline logging.
Returns the path of the log file created.
"""
PIPELINE_LOG_DIR.mkdir(parents=True, exist_ok=True)
ts = datetime.now(tz=timezone.utc).strftime("%Y%m%dT%H%M%S")
log_path = PIPELINE_LOG_DIR / f"{script_name}_{ts}.jsonl"
handler = logging.FileHandler(log_path, encoding="utf-8")
handler.setLevel(logging.DEBUG)
handler.setFormatter(_JsonFormatter())
logging.getLogger().addHandler(handler)
logging.getLogger(__name__).info(
"Pipeline log: %s", log_path, extra={"script": script_name}
)
return log_path

View file

@ -0,0 +1,120 @@
"""Discover Purple Carrot's current weekly menu recipe slugs.
The main /plant-based-recipes listing page always renders the current week's
menu as server-side HTML. This script pulls those slugs and writes them to a
parquet that can be passed directly to scrape_live.py via --slugs-from.
Run weekly (e.g. via cron) to accumulate new recipes as the menu rotates.
Usage:
conda run -n cf python3 scripts/pipeline/purple_carrot/discover_current_menu.py \
[--out /Library/Assets/kiwi/pipeline/recipes_purplecarrot_menu.parquet]
Then scrape:
conda run -n cf python3 scripts/pipeline/purple_carrot/scrape_live.py \
--slugs-from /Library/Assets/kiwi/pipeline/recipes_purplecarrot_menu.parquet \
--out /Library/Assets/kiwi/pipeline/recipes_purplecarrot_live.parquet \
--resume
"""
from __future__ import annotations
import re
import sys
from datetime import date
from pathlib import Path
import pandas as pd
import requests
from bs4 import BeautifulSoup
# ── Config ─────────────────────────────────────────────────────────────────────
LISTING_URL = "https://www.purplecarrot.com/plant-based-recipes"
BASE_URL = "https://www.purplecarrot.com/recipe/{slug}"
DEFAULT_OUT = Path("/Library/Assets/kiwi/pipeline/recipes_purplecarrot_menu.parquet")
HEADERS = {
"User-Agent": (
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 "
"(KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36"
),
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
}
RECIPE_HREF_RE = re.compile(r"/recipe/([^?#]+)")
# ── Main ───────────────────────────────────────────────────────────────────────
def discover_current_slugs() -> list[str]:
"""Fetch the listing page and return unique recipe slugs from the current menu."""
resp = requests.get(LISTING_URL, headers=HEADERS, timeout=15)
if resp.status_code != 200:
print(f"ERROR: listing page returned HTTP {resp.status_code}", file=sys.stderr)
return []
soup = BeautifulSoup(resp.text, "html.parser")
slugs: list[str] = []
seen: set[str] = set()
for a in soup.find_all("a", href=RECIPE_HREF_RE):
m = RECIPE_HREF_RE.search(a["href"])
if m:
slug = m.group(1)
if slug not in seen:
seen.add(slug)
slugs.append(slug)
return slugs
def main() -> None:
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("--out", type=Path, default=DEFAULT_OUT)
args = parser.parse_args()
print(f"Fetching current menu from {LISTING_URL}")
slugs = discover_current_slugs()
if not slugs:
print("No slugs found — the listing page may have changed structure or blocked the request.")
sys.exit(1)
today = date.today().isoformat()
records = [
{
"Slug": slug,
"SourceURL": BASE_URL.format(slug=slug),
"Source": "purplecarrot_menu",
"DiscoveredDate": today,
}
for slug in slugs
]
# Merge with any existing menu parquet (accumulate weeks)
df_new = pd.DataFrame(records)
args.out.parent.mkdir(parents=True, exist_ok=True)
if args.out.exists():
df_prev = pd.read_parquet(args.out)
combined = pd.concat([df_prev, df_new], ignore_index=True)
combined = combined.drop_duplicates(subset=["Slug"], keep="first")
df_new = combined
df_new.to_parquet(args.out, index=False)
print(f"Found {len(slugs)} current-menu slugs this week:")
for s in slugs:
print(f" {s}")
print(f"\nSaved {len(df_new)} total slugs (accumulated) to {args.out}")
print(f"\nTo scrape full recipes:")
print(f" conda run -n cf python3 scripts/pipeline/purple_carrot/scrape_live.py \\")
print(f" --slugs-from {args.out} \\")
print(f" --out /Library/Assets/kiwi/pipeline/recipes_purplecarrot_live.parquet \\")
print(f" --resume")
if __name__ == "__main__":
main()

View file

@ -0,0 +1,218 @@
"""Discover Purple Carrot recipe slugs by crawling all recipe-category listing pages.
The site serves full server-rendered HTML for category pages, paginated via
?page=N. Each page loads 18 recipe cards. This script crawls every category
across all pages and writes a deduplicated slug inventory.
Usage:
conda run -n cf python3 scripts/pipeline/purple_carrot/discover_slugs_categories.py \
[--out /Library/Assets/kiwi/pipeline/recipes_purplecarrot_slugs.parquet] \
[--delay 2.0] \
[--max-pages 50] # safety cap per category (comfort-foods has ~18)
"""
from __future__ import annotations
import argparse
import re
import time
from pathlib import Path
from typing import Any
import pandas as pd
import requests
from bs4 import BeautifulSoup
# ── Config ─────────────────────────────────────────────────────────────────────
BASE = "https://www.purplecarrot.com"
# All known category slugs (from /plant-based-recipes nav)
CATEGORIES: list[str] = [
"comfort-foods",
"family-friendly",
"healthy-desserts",
"holiday-recipes",
"quick-and-easy",
"party-foods",
"seasonal-menu",
"spring-recipes",
"summer-recipes",
"fall-recipes",
"winter-recipes",
"african",
"american",
"asian",
"comfort",
"french",
"indian",
"italian",
"mediterranean",
"mexican",
"middle-eastern",
"soups",
"salads",
"bowls",
"pasta",
"sandwiches-wraps",
"tacos",
"breakfast",
"snacks-sides",
]
DEFAULT_OUT = Path("/Library/Assets/kiwi/pipeline/recipes_purplecarrot_slugs.parquet")
EXISTING_PARQUET = Path("/Library/Assets/kiwi/pipeline/recipes_purplecarrot.parquet")
RECIPE_LINK_SELECTOR = "a.c-recipe__title"
SLUG_RE = re.compile(r"/recipe/([^?#]+)")
HEADERS = {
"User-Agent": (
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 "
"(KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36"
),
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
}
# ── Helpers ────────────────────────────────────────────────────────────────────
def _fetch_html(url: str, session: requests.Session) -> str | None:
"""Fetch URL and return HTML string, or None on failure."""
try:
resp = session.get(url, headers=HEADERS, timeout=15)
if resp.status_code == 200:
return resp.text
if resp.status_code == 404:
return None # expected end of pagination
print(f" HTTP {resp.status_code}{url}")
return None
except Exception as exc:
print(f" ERROR fetching {url}: {exc}")
return None
def _extract_slugs(html: str) -> list[str]:
"""Pull recipe slugs from one listing-page HTML response."""
soup = BeautifulSoup(html, "html.parser")
slugs: list[str] = []
for a in soup.select(RECIPE_LINK_SELECTOR):
href = a.get("href", "")
m = SLUG_RE.search(href)
if m:
slugs.append(m.group(1))
return slugs
def _get_category_total(html: str) -> int | None:
"""Try to parse the recipe count shown on the category page (e.g. '319 Recipes')."""
m = re.search(r"(\d+)\s+Recipes?\b", html)
return int(m.group(1)) if m else None
def _discover_category(
category: str,
session: requests.Session,
delay: float,
max_pages: int,
) -> tuple[list[str], int]:
"""Crawl all pages of a category, return (slugs, pages_fetched)."""
slugs: list[str] = []
for page_num in range(1, max_pages + 1):
if page_num == 1:
url = f"{BASE}/recipe-categories/{category}"
else:
url = f"{BASE}/recipe-categories/{category}?page={page_num}"
html = _fetch_html(url, session)
if html is None:
break # 404 or error = past the end
page_slugs = _extract_slugs(html)
if not page_slugs:
# Show total if we got a page but no links (category slug may be wrong)
if page_num == 1:
total = _get_category_total(html)
if total is not None:
print(f" page 1 loaded (total={total}) but 0 recipe links — selector may need updating")
break
slugs.extend(page_slugs)
# Print progress
total_hint = _get_category_total(html) if page_num == 1 else None
total_str = f" / {total_hint}" if total_hint else ""
print(f" page {page_num}: +{len(page_slugs)} slugs ({len(slugs)}{total_str} cumulative)")
if len(page_slugs) < 18:
# Short page = last page
break
time.sleep(delay)
return slugs, (len(slugs) + 17) // 18 # approximate pages
# ── Main ───────────────────────────────────────────────────────────────────────
def main() -> None:
parser = argparse.ArgumentParser()
parser.add_argument("--out", type=Path, default=DEFAULT_OUT)
parser.add_argument("--delay", type=float, default=2.0,
help="Seconds between page requests")
parser.add_argument("--max-pages", type=int, default=50,
help="Safety cap on pages per category")
parser.add_argument("--categories", nargs="*",
help="Crawl only these category slugs (default: all)")
args = parser.parse_args()
categories = args.categories or CATEGORIES
# Seed with any slugs from the Wayback parquet
known_slugs: set[str] = set()
if EXISTING_PARQUET.exists():
df_wb = pd.read_parquet(EXISTING_PARQUET)
known_slugs = set(df_wb["Slug"].dropna().tolist())
print(f"Seeded with {len(known_slugs)} slugs from Wayback parquet")
all_records: list[dict[str, Any]] = []
session = requests.Session()
for category in categories:
print(f"\n[{category}]")
cat_slugs, pages = _discover_category(category, session, args.delay, args.max_pages)
for slug in cat_slugs:
all_records.append({"Slug": slug, "Category": category, "Source": "purplecarrot_category"})
print(f"{len(cat_slugs)} slugs across ~{pages} pages")
time.sleep(args.delay)
if not all_records:
print("\nNo records found — check that categories are correct and the site is accessible")
return
# Deduplicate keeping first category encountered
df_new = pd.DataFrame(all_records)
df_new = df_new.drop_duplicates(subset=["Slug"], keep="first")
# Also include Wayback slugs not already in the new set
if known_slugs:
wb_only = known_slugs - set(df_new["Slug"].tolist())
if wb_only:
df_wb_extra = pd.DataFrame([
{"Slug": s, "Category": "wayback", "Source": "purplecarrot_wayback"}
for s in wb_only
])
df_new = pd.concat([df_new, df_wb_extra], ignore_index=True)
args.out.parent.mkdir(parents=True, exist_ok=True)
df_new.to_parquet(args.out, index=False)
new_count = len(df_new)
cat_count = len(df_new[df_new["Source"] == "purplecarrot_category"])
print(f"\nDone — {new_count} total slugs saved to {args.out}")
print(f" {cat_count} from category pages, {new_count - cat_count} from Wayback only")
if __name__ == "__main__":
main()

View file

@ -0,0 +1,301 @@
"""
discover_wayback.py enumerate Purple Carrot recipe slugs via the Wayback Machine.
Strategy:
1. CDX API all archived /api/v2/menus/* URLs (multiple timestamps)
2. Replay fetch each menu's menuItems, extract productPath slugs
3. CDX API all archived /api/v1/products/* URLs (direct slug capture)
4. CDX API /recipe-categories/* HTML pages for older slugs
5. Deduplicate and write manifest to OUT_FILE
Output (JSONL, one record per recipe):
{"slug": "...", "title": "...", "subtitle": "...", "cook_time": "...",
"tags": [...], "serving_size": 2, "image_url": "...",
"wayback_ts": "20260412150557", "source": "menu|product_api|category_page"}
Usage:
conda run -n cf python -m scripts.pipeline.purple_carrot.discover_wayback
conda run -n cf python -m scripts.pipeline.purple_carrot.discover_wayback --out /Library/Assets/kiwi/pipeline/pc_slugs.jsonl
"""
from __future__ import annotations
import argparse
import json
import logging
import time
from pathlib import Path
from typing import Any
from urllib.parse import urlencode
import requests
logger = logging.getLogger(__name__)
CDX_BASE = "https://web.archive.org/cdx/search/cdx"
WB_BASE = "https://web.archive.org/web"
PC_HOST = "www.purplecarrot.com"
# Polite delay between Wayback replay fetches (seconds)
REPLAY_DELAY = 1.0
CDX_DELAY = 0.5
DEFAULT_OUT = Path("/Library/Assets/kiwi/pipeline/pc_slugs.jsonl")
# ── CDX helpers ───────────────────────────────────────────────────────────────
def cdx_query(url_pattern: str, **kwargs) -> list[dict]:
"""Run a CDX search and return a list of result dicts."""
params = {
"url": url_pattern,
"output": "json",
"fl": "original,timestamp,statuscode",
"collapse": "urlkey",
"filter": "statuscode:200",
**kwargs,
}
for attempt in range(3):
try:
resp = requests.get(CDX_BASE, params=params, timeout=30)
resp.raise_for_status()
rows = resp.json()
if not rows or len(rows) < 2:
return []
headers = rows[0]
return [dict(zip(headers, row)) for row in rows[1:]]
except Exception as exc:
logger.warning("CDX attempt %d failed: %s", attempt + 1, exc)
time.sleep(2 ** attempt)
return []
def wayback_get(url: str, timestamp: str) -> Any | None:
"""Fetch a Wayback replay of a URL and return parsed JSON (or None)."""
replay_url = f"{WB_BASE}/{timestamp}/{url}"
for attempt in range(3):
try:
resp = requests.get(replay_url, timeout=30)
if resp.status_code == 200:
return resp.json()
if resp.status_code == 404:
return None
except Exception as exc:
logger.warning("Wayback GET attempt %d failed for %s: %s", attempt + 1, url, exc)
time.sleep(2 ** attempt)
return None
# ── Slug extraction ───────────────────────────────────────────────────────────
def slug_from_product_path(path: str) -> str | None:
"""'/recipe/foo-bar-baz''foo-bar-baz'."""
if not path:
return None
return path.strip("/").split("/")[-1] or None
def _menu_item_to_record(item: dict, wayback_ts: str) -> dict | None:
slug = slug_from_product_path(item.get("productPath", ""))
if not slug:
return None
return {
"slug": slug,
"title": item.get("title", ""),
"subtitle": item.get("subtitle", ""),
"cook_time": item.get("cookTime", ""),
"tags": item.get("filterTags") or [],
"serving_size": item.get("servingSize"),
"image_url": item.get("imageURL", ""),
"description": item.get("description", ""),
"wayback_ts": wayback_ts,
"source": "menu",
}
# ── Discovery passes ──────────────────────────────────────────────────────────
def pass_menus(seen_slugs: set[str]) -> list[dict]:
"""Walk all archived /api/v2/menus/* captures to extract slugs."""
records: list[dict] = []
# Find all distinct archived menu URLs
menu_cdx = cdx_query(f"{PC_HOST}/api/v2/menus/*", limit="500")
logger.info("CDX: %d archived menu URLs found", len(menu_cdx))
time.sleep(CDX_DELAY)
processed_menu_ids: set[str] = set()
for entry in menu_cdx:
url = entry["original"]
ts = entry["timestamp"]
# Skip the listing endpoint, only process individual menus
if not url.split("?")[0].rstrip("/").split("/")[-1].isdigit():
continue
menu_id = url.split("?")[0].rstrip("/").split("/")[-1]
if menu_id in processed_menu_ids:
continue
processed_menu_ids.add(menu_id)
logger.info("Fetching menu %s (ts=%s) ...", menu_id, ts)
data = wayback_get(url.split("?")[0] + "?logged_out=true", ts)
time.sleep(REPLAY_DELAY)
if not data or "menuItems" not in data:
continue
for item in data["menuItems"]:
rec = _menu_item_to_record(item, ts)
if rec and rec["slug"] not in seen_slugs:
seen_slugs.add(rec["slug"])
records.append(rec)
logger.debug(" + %s", rec["slug"])
logger.info(" %d new slugs (total so far: %d)", len(records), len(seen_slugs))
return records
def pass_product_api(seen_slugs: set[str]) -> list[dict]:
"""Pick up any directly archived /api/v1/products/* URLs the menu pass missed."""
records: list[dict] = []
product_cdx = cdx_query(f"{PC_HOST}/api/v1/products/*", limit="5000")
logger.info("CDX: %d archived product API URLs found", len(product_cdx))
time.sleep(CDX_DELAY)
for entry in product_cdx:
slug = entry["original"].rstrip("/").split("/")[-1]
if not slug or slug in seen_slugs:
continue
seen_slugs.add(slug)
records.append({
"slug": slug,
"title": "",
"subtitle": "",
"cook_time": "",
"tags": [],
"serving_size": None,
"image_url": "",
"description": "",
"wayback_ts": entry["timestamp"],
"source": "product_api",
})
logger.info("product_api pass: %d new slugs", len(records))
return records
def pass_category_pages(seen_slugs: set[str]) -> list[dict]:
"""Parse archived recipe-categories HTML pages for slugs not in the API.
Category pages are rendered SSR/with inline JSON state on older captures,
so we do a simple regex scan for /recipe/<slug> patterns.
"""
import re
records: list[dict] = []
SLUG_RE = re.compile(r'["\s]/recipe/([a-z0-9][a-z0-9\-]{3,})["\s/?]')
cat_cdx = cdx_query(f"{PC_HOST}/recipe-categories/*", limit="200")
logger.info("CDX: %d archived category pages found", len(cat_cdx))
time.sleep(CDX_DELAY)
seen_category_urls: set[str] = set()
for entry in cat_cdx:
url = entry["original"].split("?")[0]
if url in seen_category_urls:
continue
seen_category_urls.add(url)
replay_url = f"{WB_BASE}/{entry['timestamp']}/{url}"
try:
resp = requests.get(replay_url, timeout=30)
time.sleep(REPLAY_DELAY)
if resp.status_code != 200:
continue
except Exception as exc:
logger.warning("Category page fetch failed: %s", exc)
continue
for slug in SLUG_RE.findall(resp.text):
if slug in seen_slugs:
continue
seen_slugs.add(slug)
records.append({
"slug": slug,
"title": "",
"subtitle": "",
"cook_time": "",
"tags": [],
"serving_size": None,
"image_url": "",
"description": "",
"wayback_ts": entry["timestamp"],
"source": "category_page",
})
logger.info("category_pages pass: %d new slugs", len(records))
return records
# ── Main ──────────────────────────────────────────────────────────────────────
def discover(out_file: Path) -> None:
seen: set[str] = set()
# Load previously discovered slugs so reruns are incremental
existing: list[dict] = []
if out_file.exists():
with open(out_file) as f:
for line in f:
line = line.strip()
if line:
rec = json.loads(line)
seen.add(rec["slug"])
existing.append(rec)
logger.info("Loaded %d existing slugs from %s", len(seen), out_file)
new_records: list[dict] = []
new_records += pass_menus(seen)
new_records += pass_product_api(seen)
new_records += pass_category_pages(seen)
out_file.parent.mkdir(parents=True, exist_ok=True)
with open(out_file, "a") as f:
for rec in new_records:
f.write(json.dumps(rec) + "\n")
total = len(existing) + len(new_records)
logger.info(
"Done. %d new slugs written to %s (%d total).",
len(new_records), out_file, total,
)
def main() -> None:
parser = argparse.ArgumentParser(description="Discover Purple Carrot recipe slugs via Wayback")
parser.add_argument(
"--out",
type=Path,
default=DEFAULT_OUT,
help=f"Output JSONL manifest (default: {DEFAULT_OUT})",
)
parser.add_argument("--debug", action="store_true")
args = parser.parse_args()
logging.basicConfig(
level=logging.DEBUG if args.debug else logging.INFO,
format="%(asctime)s %(levelname)s %(name)s: %(message)s",
)
from scripts.pipeline.log_utils import attach_pipeline_log
attach_pipeline_log("discover_wayback")
discover(args.out)
if __name__ == "__main__":
main()

View file

@ -0,0 +1,250 @@
"""Playwright scraper for live purplecarrot.com recipe pages.
Uses the slug inventory already in recipes_purplecarrot.parquet and fills in
the missing ingredients/instructions by hitting the live site directly.
Usage:
conda run -n cf python3 scripts/pipeline/purple_carrot/scrape_live.py \
[--out /Library/Assets/kiwi/pipeline/recipes_purplecarrot_live.parquet] \
[--delay 2.5] \
[--limit 20]
"""
from __future__ import annotations
import argparse
import json
import re
import time
from pathlib import Path
from typing import Any
import pandas as pd
from playwright.sync_api import sync_playwright, Page, TimeoutError as PWTimeout
# ── Config ─────────────────────────────────────────────────────────────────────
BASE_URL = "https://www.purplecarrot.com/recipe/{slug}"
DEFAULT_OUT = Path("/Library/Assets/kiwi/pipeline/recipes_purplecarrot_live.parquet")
EXISTING_PARQUET = Path("/Library/Assets/kiwi/pipeline/recipes_purplecarrot.parquet")
RENDER_WAIT_MS = 2500 # JS render settle time
NAV_TIMEOUT_MS = 20_000
# ── Page parser ────────────────────────────────────────────────────────────────
def _text(page: Page, selector: str) -> str:
el = page.query_selector(selector)
return el.inner_text().strip() if el else ""
def _texts(page: Page, selector: str) -> list[str]:
return [el.inner_text().strip() for el in page.query_selector_all(selector)]
def _parse_recipe(page: Page, slug: str, source_url: str) -> dict[str, Any] | None:
"""Extract structured recipe data from the rendered page."""
body = page.inner_text("body")
# Abort if we've been bounced to a generic listing / 404
if "Page Not Found" in body or slug not in page.url:
return None
# ── Title ──────────────────────────────────────────────────────────────────
# The <h1> on product pages tends to be the recipe name
title = (_text(page, "h1") or _text(page, "[class*='recipe-title']")).strip()
if not title:
# Fallback: first heading-like text before "Ingredients"
idx = body.find("Ingredients\n")
title = body[:idx].strip().splitlines()[-1] if idx > 0 else ""
# ── Ingredients / Instructions via body text ───────────────────────────────
ing_start = body.find("\nIngredients\n")
inst_start = body.find("\nInstructions\n")
footer_start = body.find("\nShop\n") # footer sentinel
if ing_start == -1:
return None # page didn't render recipe content
raw_ingredients: list[str] = []
raw_instructions: list[str] = []
if ing_start != -1 and inst_start != -1:
ing_block = body[ing_start + len("\nIngredients\n"):inst_start].strip()
raw_ingredients = [l.strip() for l in ing_block.splitlines() if l.strip()]
if inst_start != -1:
end = footer_start if footer_start > inst_start else len(body)
inst_block = body[inst_start + len("\nInstructions\n"):end].strip()
# Steps start with a digit
steps: list[str] = []
current: list[str] = []
for line in inst_block.splitlines():
line = line.strip()
if not line:
continue
if re.match(r"^\d+$", line):
if current:
steps.append(" ".join(current))
current = []
elif line.startswith("CULINARY NOTES"):
break
else:
current.append(line)
if current:
steps.append(" ".join(current))
raw_instructions = steps
# ── Nutrition ──────────────────────────────────────────────────────────────
def _extract_num(pattern: str) -> float | None:
m = re.search(pattern, body)
try:
return float(m.group(1)) if m else None
except ValueError:
return None
cal = _extract_num(r"(\d+)\s*CAL")
fat = _extract_num(r"(\d+(?:\.\d+)?)g\s*FAT")
carbs = _extract_num(r"(\d+(?:\.\d+)?)g\s*CARBS")
prot = _extract_num(r"(\d+(?:\.\d+)?)g\s*PROTEIN")
fiber = _extract_num(r"(\d+(?:\.\d+)?)g\s*FIBER")
# ── Allergens / tags ───────────────────────────────────────────────────────
allergen_m = re.search(r"Allergens?:\s*([^\n]+)", body)
allergens = allergen_m.group(1).strip() if allergen_m else ""
# Feature tags like HIGH-PROTEIN, QUICK, etc. appear before Ingredients
pre_ing = body[:ing_start]
tags = re.findall(r"\b(HIGH-PROTEIN|QUICK|SPICY|LOW[\-\s]CALORIE|VEGAN|FAMILY\s+FRIENDLY)\b", pre_ing)
return {
"Slug": slug,
"Name": title,
"SourceURL": source_url,
"Source": "purplecarrot_live",
"RecipeIngredientParts": raw_ingredients,
"RecipeInstructions": raw_instructions,
"Calories": cal,
"FatContent": fat,
"CarbohydrateContent": carbs,
"ProteinContent": prot,
"FiberContent": fiber,
"Allergens": allergens,
"Keywords": tags,
"HasFullRecipe": bool(raw_ingredients and raw_instructions),
}
# ── Main ───────────────────────────────────────────────────────────────────────
def main() -> None:
parser = argparse.ArgumentParser()
parser.add_argument("--out", type=Path, default=DEFAULT_OUT)
parser.add_argument("--delay", type=float, default=2.5,
help="Seconds between requests (be polite)")
parser.add_argument("--limit", type=int, default=0,
help="Stop after N slugs (0 = all)")
parser.add_argument("--resume", action="store_true",
help="Skip slugs already present in --out")
parser.add_argument("--slugs-from", type=Path, default=None,
help="Read slug inventory from this parquet instead of the default Wayback one")
args = parser.parse_args()
# Load slug inventory — either from a custom parquet or the default Wayback run
slugs_parquet = args.slugs_from if args.slugs_from else EXISTING_PARQUET
df_existing = pd.read_parquet(slugs_parquet)
slugs = df_existing["Slug"].dropna().unique().tolist()
# source_urls may not be present in custom parcets — fall back to constructing from slug
if "SourceURL" in df_existing.columns:
source_urls = dict(zip(df_existing["Slug"], df_existing["SourceURL"]))
else:
source_urls = {s: BASE_URL.format(slug=s) for s in slugs}
# Resume support
done_slugs: set[str] = set()
if args.resume and args.out.exists():
df_done = pd.read_parquet(args.out)
done_slugs = set(df_done["Slug"].dropna().tolist())
print(f"Resuming — {len(done_slugs)} slugs already scraped")
if args.limit:
slugs = slugs[: args.limit]
results: list[dict[str, Any]] = []
skipped = 0
failed = 0
_UA = (
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 "
"(KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36"
)
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
for i, slug in enumerate(slugs):
if slug in done_slugs:
skipped += 1
continue
url = BASE_URL.format(slug=slug)
print(f"[{i+1}/{len(slugs)}] {slug}", end="", flush=True)
# Use a fresh browser context per slug to avoid Cloudflare session-level
# bot detection, which fires on the 2nd+ request in the same context.
context = browser.new_context(
user_agent=_UA,
viewport={"width": 1280, "height": 900},
)
page = context.new_page()
try:
page.goto(url, timeout=NAV_TIMEOUT_MS, wait_until="domcontentloaded")
page.wait_for_timeout(RENDER_WAIT_MS)
recipe = _parse_recipe(page, slug, source_urls.get(slug, url))
except PWTimeout:
print("TIMEOUT")
failed += 1
except Exception as exc:
print(f"ERROR: {exc}")
failed += 1
else:
if recipe is None:
print("no content (404 or redirect)")
failed += 1
elif recipe["HasFullRecipe"]:
n = len(recipe["RecipeIngredientParts"])
s = len(recipe["RecipeInstructions"])
print(f"OK ({n} ingredients, {s} steps)")
results.append(recipe)
else:
print(f"partial (ings={len(recipe['RecipeIngredientParts'])}, steps={len(recipe['RecipeInstructions'])})")
results.append(recipe)
finally:
context.close()
time.sleep(args.delay)
browser.close()
print(f"\nDone — {len(results)} scraped, {skipped} skipped, {failed} failed")
if results:
df_out = pd.DataFrame(results)
# Merge with existing metadata (nutrition stubs, wayback fields) for slugs
# that didn't previously have full data
args.out.parent.mkdir(parents=True, exist_ok=True)
if args.resume and args.out.exists():
df_prev = pd.read_parquet(args.out)
df_out = pd.concat([df_prev, df_out], ignore_index=True)
df_out = df_out.drop_duplicates(subset=["Slug"], keep="last")
df_out.to_parquet(args.out, index=False)
full_count = df_out["HasFullRecipe"].sum() if "HasFullRecipe" in df_out.columns else "?"
print(f"Saved {len(df_out)} rows to {args.out} ({full_count} with full recipes)")
else:
print("No results — output not written")
if __name__ == "__main__":
main()

View file

@ -0,0 +1,538 @@
"""
scrape_recipes.py fetch full recipe data for slugs in pc_slugs.jsonl.
For each slug:
1. Try Wayback /api/v1/products/<slug> oldest capture first (pre-HelloFresh
acquisition data is more complete).
2. If instructions are empty, try the recipe HTML page via Wayback and parse
inline JSON state or structured markup.
3. Merge with metadata already in the manifest (title, tags, cook_time, etc.)
4. Emit one row per recipe to recipes_purplecarrot.parquet in food.com columnar
format so build_recipe_index.py can import it unchanged.
Output columns (food.com schema + PC extras ignored by the indexer):
RecipeId, Name, Subtitle, RecipeIngredientParts, RecipeInstructions,
RecipeCategory, Keywords, Calories, FatContent, ProteinContent,
SodiumContent, SugarContent, CarbohydrateContent, FiberContent,
RecipeServings, Description, ImageURL, CookTime, Slug, Source
Usage:
conda run -n cf python -m scripts.pipeline.purple_carrot.scrape_recipes
conda run -n cf python -m scripts.pipeline.purple_carrot.scrape_recipes \\
--slugs /Library/Assets/kiwi/pipeline/pc_slugs.jsonl \\
--out /Library/Assets/kiwi/pipeline/recipes_purplecarrot.parquet \\
--resume
"""
from __future__ import annotations
import argparse
import json
import logging
import re
import time
from pathlib import Path
from typing import Any
import requests
logger = logging.getLogger(__name__)
CDX_BASE = "https://web.archive.org/cdx/search/cdx"
WB_BASE = "https://web.archive.org/web"
PC_HOST = "www.purplecarrot.com"
REPLAY_DELAY = 2.0
CDX_DELAY = 3.0 # archive.org CDX rate-limits aggressively; be polite
DEFAULT_SLUGS = Path("/Library/Assets/kiwi/pipeline/pc_slugs.jsonl")
DEFAULT_OUT = Path("/Library/Assets/kiwi/pipeline/recipes_purplecarrot.parquet")
# Inline JSON state embedded by the SSR renderer — used as fallback HTML parser
_NEXT_DATA_RE = re.compile(r'<script id="__NEXT_DATA__"[^>]*>(.*?)</script>', re.DOTALL)
_REDUX_STATE_RE = re.compile(r'window\.__INITIAL_STATE__\s*=\s*(\{.*?\});\s*\n', re.DOTALL)
# ── Wayback helpers ───────────────────────────────────────────────────────────
def _cdx_get(params: dict) -> list:
"""CDX request with retry on 429/503 (archive.org rate-limits aggressively)."""
for attempt in range(4):
try:
resp = requests.get(CDX_BASE, params=params, timeout=25)
if resp.status_code in (429, 503):
wait = 15 * (2 ** attempt)
logger.debug("CDX %s — backing off %ds", resp.status_code, wait)
time.sleep(wait)
continue
resp.raise_for_status()
rows = resp.json()
return rows if rows else []
except Exception as exc:
logger.debug("CDX attempt %d failed: %s", attempt + 1, exc)
time.sleep(5 * (attempt + 1))
return []
def _cdx_timestamps(slug: str) -> list[str]:
"""Return captured timestamps for a product slug, oldest first (pre-2022 window)."""
rows = _cdx_get({
"url": f"{PC_HOST}/api/v1/products/{slug}",
"output": "json",
"fl": "timestamp,statuscode",
"filter": "statuscode:200",
"limit": "20",
# Pre-HelloFresh-acquisition captures (2019-2021) are most likely
# to have full instructions — API stripped them post-acquisition.
"from": "20190101",
"to": "20211231",
})
if len(rows) < 2:
return []
return [row[0] for row in rows[1:]] # timestamps only, oldest first
def _wayback_json(url: str, timestamp: str) -> Any | None:
replay = f"{WB_BASE}/{timestamp}/{url}"
for attempt in range(3):
try:
resp = requests.get(replay, timeout=30)
if resp.status_code == 200:
return resp.json()
if resp.status_code in (404, 410):
return None
except Exception as exc:
logger.debug("Wayback JSON attempt %d failed (%s): %s", attempt + 1, url, exc)
time.sleep(2 ** attempt)
return None
def _wayback_html(url: str, timestamp: str) -> str | None:
replay = f"{WB_BASE}/{timestamp}/{url}"
for attempt in range(3):
try:
resp = requests.get(replay, timeout=30)
if resp.status_code == 200:
return resp.text
if resp.status_code in (404, 410):
return None
except Exception as exc:
logger.debug("Wayback HTML attempt %d failed (%s): %s", attempt + 1, url, exc)
time.sleep(2 ** attempt)
return None
# ── Recipe extraction from API JSON ──────────────────────────────────────────
def _extract_from_api(data: dict) -> dict | None:
"""Parse a /api/v1/products/<slug> response into our recipe dict.
Returns None if the response has no usable content (empty title, etc.).
Returns a partial dict if only some fields are populated caller merges
with manifest metadata.
"""
if not data or not isinstance(data, dict):
return None
title = data.get("title", "").strip()
subtitle = data.get("subtitle", "").strip()
slug = data.get("slug", "")
skus = data.get("skus") or []
sku = skus[0] if skus else {}
# Instructions: list of {step_number, title, description}
raw_instructions = sku.get("instructions") or []
steps: list[str] = []
for step in sorted(raw_instructions, key=lambda s: s.get("step_number", 0)):
parts = []
if step.get("title"):
parts.append(step["title"])
if step.get("description"):
parts.append(step["description"])
if parts:
steps.append(". ".join(parts))
# Ingredients: may be in ingredients_quantity or ingredients
raw_ingr = sku.get("ingredients_quantity") or sku.get("ingredients") or []
ingredients: list[str] = []
for item in raw_ingr:
if isinstance(item, dict):
qty = item.get("quantity") or item.get("qty") or ""
unit = item.get("unit") or ""
name = item.get("name") or item.get("ingredient", {}).get("name", "") if isinstance(item.get("ingredient"), dict) else item.get("ingredient_name", "")
raw = item.get("raw") or item.get("display_name") or ""
line = raw or " ".join(filter(None, [str(qty), str(unit), str(name)])).strip()
if line:
ingredients.append(line)
elif isinstance(item, str) and item.strip():
ingredients.append(item.strip())
nutrition = sku.get("nutrition_label") or {}
calories = _num(nutrition.get("calories") or sku.get("calories"))
fat = _num(nutrition.get("total_fat") or sku.get("fat"))
protein = _num(nutrition.get("protein") or sku.get("protein"))
sodium = _num(nutrition.get("sodium") or sku.get("sodium"))
sugar = _num(nutrition.get("sugar") or nutrition.get("total_sugars"))
carbs = _num(nutrition.get("total_carbohydrate") or sku.get("carbs"))
fiber = _num(nutrition.get("dietary_fiber") or sku.get("fiber"))
tags = sku.get("tags") or data.get("tags") or []
category = sku.get("meal_type") or sku.get("product_type") or ""
servings = _num(sku.get("servings"))
cook_time = sku.get("prep_and_cook_time") or ""
description = sku.get("description") or ""
images = sku.get("hero_images") or sku.get("image_versions") or []
# hero_images can be a list OR a dict keyed by size string — normalise to list
if isinstance(images, dict):
images = list(images.values())
image_url = ""
if images and isinstance(images[0], dict):
image_url = images[0].get("image_url") or images[0].get("url") or ""
if not image_url and data.get("square_image"):
sq = data["square_image"]
image_url = sq.get("url") if isinstance(sq, dict) else ""
return {
"slug": slug,
"title": title,
"subtitle": subtitle,
"steps": steps,
"ingredients": ingredients,
"category": category,
"tags": tags,
"calories": calories,
"fat": fat,
"protein": protein,
"sodium": sodium,
"sugar": sugar,
"carbs": carbs,
"fiber": fiber,
"servings": servings,
"cook_time": cook_time,
"description": description,
"image_url": image_url,
"has_full_recipe": bool(steps and ingredients),
}
def _num(val: Any) -> float | None:
if val is None:
return None
try:
v = float(str(val).replace("g", "").replace("mg", "").split()[0])
return v if v > 0 else None
except Exception:
return None
# ── Fallback: HTML inline state parsing ──────────────────────────────────────
def _extract_from_html(html: str, slug: str) -> dict | None:
"""Try to pull recipe data from inline JS state in older SSR pages."""
# Attempt 1: Next.js __NEXT_DATA__
m = _NEXT_DATA_RE.search(html)
if m:
try:
state = json.loads(m.group(1))
# Walk the Next.js page props tree looking for recipe data
props = state.get("props", {}).get("pageProps", {})
recipe = props.get("recipe") or props.get("product")
if recipe and isinstance(recipe, dict) and recipe.get("title"):
return _extract_from_api(recipe)
except Exception:
pass
# Attempt 2: Redux __INITIAL_STATE__
m = _REDUX_STATE_RE.search(html)
if m:
try:
state = json.loads(m.group(1))
# Try common Redux state shapes
for key in ("recipe", "product", "currentRecipe", "currentProduct"):
recipe = state.get(key)
if recipe and isinstance(recipe, dict) and recipe.get("title"):
return _extract_from_api(recipe)
except Exception:
pass
# Attempt 3: JSON-LD structured data
ld_matches = re.findall(
r'<script[^>]+type=["\']application/ld\+json["\'][^>]*>(.*?)</script>',
html, re.DOTALL
)
for raw in ld_matches:
try:
ld = json.loads(raw)
if isinstance(ld, list):
ld = next((x for x in ld if x.get("@type") == "Recipe"), None)
if not ld or ld.get("@type") != "Recipe":
continue
steps = []
for inst in (ld.get("recipeInstructions") or []):
if isinstance(inst, dict):
steps.append(inst.get("text", ""))
elif isinstance(inst, str):
steps.append(inst)
ingredients = ld.get("recipeIngredient") or []
return {
"slug": slug,
"title": ld.get("name", ""),
"subtitle": "",
"steps": [s for s in steps if s],
"ingredients": [i for i in ingredients if i],
"category": ld.get("recipeCategory", ""),
"tags": ld.get("keywords", "").split(",") if isinstance(ld.get("keywords"), str) else [],
"calories": _num((ld.get("nutrition") or {}).get("calories")),
"fat": None, "protein": None, "sodium": None,
"sugar": None, "carbs": None, "fiber": None,
"servings": _num(ld.get("recipeYield")),
"cook_time": str(ld.get("totalTime") or ld.get("cookTime") or ""),
"description": ld.get("description", ""),
"image_url": (ld["image"][0] if isinstance(ld.get("image"), list) else ld.get("image", "")) or "",
"has_full_recipe": True,
}
except Exception:
pass
return None
# ── Per-slug fetch ─────────────────────────────────────────────────────────────
def fetch_recipe(slug: str, manifest_meta: dict) -> dict | None:
"""Fetch the fullest available recipe data for a slug from Wayback.
Returns a merged dict of manifest metadata + API/HTML-extracted content.
"""
api_url = f"https://{PC_HOST}/api/v1/products/{slug}"
html_url = f"https://{PC_HOST}/recipe/{slug}"
recipe: dict | None = None
# Try product API — oldest captures are most likely to have full data
timestamps = _cdx_timestamps(slug)
time.sleep(CDX_DELAY)
if not timestamps and manifest_meta.get("wayback_ts"):
timestamps = [manifest_meta["wayback_ts"]]
for ts in timestamps:
data = _wayback_json(api_url, ts)
time.sleep(REPLAY_DELAY)
if not data:
continue
candidate = _extract_from_api(data)
if not candidate:
continue
recipe = candidate
if recipe.get("has_full_recipe"):
logger.debug("[%s] Full recipe from API (ts=%s)", slug, ts)
break
logger.debug("[%s] Partial API data (ts=%s) — trying HTML fallback", slug, ts)
# HTML fallback when API has no steps/ingredients
if not recipe or not recipe.get("has_full_recipe"):
html_ts_rows = _cdx_get({
"url": f"{PC_HOST}/recipe/{slug}",
"output": "json",
"fl": "timestamp,statuscode",
"filter": "statuscode:200",
"limit": "10",
})
html_timestamps = [row[0] for row in html_ts_rows[1:]] if len(html_ts_rows) > 1 else []
time.sleep(CDX_DELAY)
for ts in html_timestamps:
html = _wayback_html(html_url, ts)
time.sleep(REPLAY_DELAY)
if not html:
continue
html_recipe = _extract_from_html(html, slug)
if html_recipe and html_recipe.get("has_full_recipe"):
logger.debug("[%s] Full recipe from HTML (ts=%s)", slug, ts)
recipe = html_recipe
break
# Build merged record: manifest metadata fills any gaps from API/HTML
merged: dict = {
"slug": slug,
"title": manifest_meta.get("title", ""),
"subtitle": manifest_meta.get("subtitle", ""),
"steps": [],
"ingredients": [],
"category": "",
"tags": manifest_meta.get("tags") or [],
"calories": None,
"fat": None,
"protein": None,
"sodium": None,
"sugar": None,
"carbs": None,
"fiber": None,
"servings": manifest_meta.get("serving_size"),
"cook_time": manifest_meta.get("cook_time", ""),
"description": manifest_meta.get("description", ""),
"image_url": manifest_meta.get("image_url", ""),
"source": "purple_carrot",
"wayback_ts": manifest_meta.get("wayback_ts", ""),
"has_full_recipe": False,
}
if recipe:
for key in recipe:
# Prefer API/HTML data; keep manifest value only when API field is empty
val = recipe[key]
if val or key not in merged or not merged[key]:
merged[key] = val
if not merged["title"]:
logger.warning("[%s] No title — skipping", slug)
return None
return merged
# ── Output formatting ─────────────────────────────────────────────────────────
def _to_dataframe_row(r: dict) -> dict:
"""Convert merged recipe dict to food.com-compatible parquet row."""
# Build plain-text input for allrecipes-style corpus compatibility
lines = [r["title"]]
if r.get("subtitle"):
lines.append(r["subtitle"])
if r.get("description"):
lines.append("")
lines.append(r["description"])
if r.get("ingredients"):
lines += ["", "Ingredients:"] + [f"- {i}" for i in r["ingredients"]]
if r.get("steps"):
lines += ["", "Directions:"] + [f"- {s}" for s in r["steps"]]
plain_text = "\n".join(lines)
source_url = f"https://www.purplecarrot.com/recipe/{r['slug']}"
return {
# food.com schema columns (used by build_recipe_index.py)
"RecipeId": f"pc_{r['slug']}",
"Name": r["title"],
"RecipeIngredientParts": r.get("ingredients") or [],
"RecipeInstructions": r.get("steps") or [],
"RecipeCategory": r.get("category", ""),
"Keywords": r.get("tags") or [],
"Calories": r.get("calories"),
"FatContent": r.get("fat"),
"ProteinContent": r.get("protein"),
"SodiumContent": r.get("sodium"),
"SugarContent": r.get("sugar"),
"CarbohydrateContent": r.get("carbs"),
"FiberContent": r.get("fiber"),
"RecipeServings": r.get("servings"),
# PC-specific extras (ignored by indexer, used by training pipeline)
"Subtitle": r.get("subtitle", ""),
"Description": r.get("description", ""),
"ImageURL": r.get("image_url", ""),
"CookTime": r.get("cook_time", ""),
"Slug": r["slug"],
"Source": "purple_carrot",
"SourceURL": source_url, # canonical attribution link shown in recipe UI
"HasFullRecipe": r.get("has_full_recipe", False),
"WaybackTs": r.get("wayback_ts", ""),
# Also emit plain-text input for allrecipes-compatible corpus search
"input": plain_text,
}
# ── Main ──────────────────────────────────────────────────────────────────────
def scrape(slugs_file: Path, out_file: Path, resume: bool = True) -> None:
import pandas as pd
# Load manifest
if not slugs_file.exists():
logger.error("Slugs manifest not found: %s", slugs_file)
return
manifest: dict[str, dict] = {}
with open(slugs_file) as f:
for line in f:
line = line.strip()
if line:
rec = json.loads(line)
slug = rec["slug"]
# Keep the richest metadata if slug appears from multiple sources
if slug not in manifest or rec.get("source") == "menu":
manifest[slug] = rec
logger.info("Manifest: %d unique slugs", len(manifest))
# Load already-scraped slugs for resume
done_slugs: set[str] = set()
existing_rows: list[dict] = []
if resume and out_file.exists():
try:
existing_df = pd.read_parquet(out_file)
done_slugs = set(existing_df["Slug"].tolist())
existing_rows = existing_df.to_dict("records")
logger.info("Resume: %d already scraped", len(done_slugs))
except Exception as exc:
logger.warning("Could not load existing parquet for resume: %s", exc)
todo = [s for s in manifest if s not in done_slugs]
logger.info("%d slugs to fetch", len(todo))
rows = list(existing_rows)
for i, slug in enumerate(todo, 1):
logger.info("[%d/%d] %s", i, len(todo), slug)
recipe = fetch_recipe(slug, manifest[slug])
if recipe:
rows.append(_to_dataframe_row(recipe))
status = "full" if recipe.get("has_full_recipe") else "partial"
logger.info(" -> %s (%s)", recipe.get("title", "?"), status)
else:
logger.warning(" -> skipped (no title)")
# Write checkpoint every 50 recipes
if i % 50 == 0:
_write_parquet(rows, out_file)
logger.info("Checkpoint: %d recipes written", len(rows))
_write_parquet(rows, out_file)
full = sum(1 for r in rows if r.get("HasFullRecipe"))
logger.info(
"Done. %d recipes written to %s (%d full, %d partial).",
len(rows), out_file, full, len(rows) - full,
)
def _write_parquet(rows: list[dict], out_file: Path) -> None:
import pandas as pd
out_file.parent.mkdir(parents=True, exist_ok=True)
pd.DataFrame(rows).to_parquet(out_file, index=False)
def main() -> None:
parser = argparse.ArgumentParser(description="Scrape Purple Carrot recipes from Wayback")
parser.add_argument("--slugs", type=Path, default=DEFAULT_SLUGS)
parser.add_argument("--out", type=Path, default=DEFAULT_OUT)
parser.add_argument(
"--no-resume", dest="resume", action="store_false",
help="Start fresh (ignore existing parquet)",
)
parser.add_argument("--debug", action="store_true")
args = parser.parse_args()
logging.basicConfig(
level=logging.DEBUG if args.debug else logging.INFO,
format="%(asctime)s %(levelname)s %(name)s: %(message)s",
)
from scripts.pipeline.log_utils import attach_pipeline_log
attach_pipeline_log("scrape_recipes")
scrape(args.slugs, args.out, resume=args.resume)
if __name__ == "__main__":
main()

View file

@ -0,0 +1,41 @@
#!/usr/bin/env bash
# Weekly Purple Carrot recipe harvest
# Runs every Sunday night via cron.
# Discovers this week's menu and scrapes full recipe data.
# Logs to /Library/Assets/kiwi/pipeline/logs/purple_carrot_harvest.log
set -euo pipefail
REPO="/Library/Development/CircuitForge/kiwi"
MENU_OUT="/Library/Assets/kiwi/pipeline/recipes_purplecarrot_menu.parquet"
LIVE_OUT="/Library/Assets/kiwi/pipeline/recipes_purplecarrot_live.parquet"
LOG_DIR="/Library/Assets/kiwi/pipeline/logs"
LOG="$LOG_DIR/purple_carrot_harvest.log"
mkdir -p "$LOG_DIR"
echo "=== Purple Carrot harvest $(date -u '+%Y-%m-%d %H:%M UTC') ===" >> "$LOG"
cd "$REPO"
# Step 1: discover this week's menu slugs
echo "[1/2] Discovering current menu slugs..." | tee -a "$LOG"
conda run -n cf python3 scripts/pipeline/purple_carrot/discover_current_menu.py \
--out "$MENU_OUT" 2>&1 | tee -a "$LOG"
# Step 2: scrape full recipe data for new slugs only (--resume skips already-scraped)
echo "[2/2] Scraping live recipe pages..." | tee -a "$LOG"
conda run -n cf python3 scripts/pipeline/purple_carrot/scrape_live.py \
--slugs-from "$MENU_OUT" \
--out "$LIVE_OUT" \
--resume \
--delay 3.0 2>&1 | tee -a "$LOG"
# Step 3: ingest new recipes into the shared corpus DB
echo "[3/3] Ingesting into corpus DB..." | tee -a "$LOG"
conda run -n cf python3 scripts/pipeline/ingest_purplecarrot.py \
--parquet "$LIVE_OUT" \
--db /Library/Assets/kiwi/kiwi.db 2>&1 | tee -a "$LOG"
echo "=== Done $(date -u '+%Y-%m-%d %H:%M UTC') ===" >> "$LOG"
echo "" >> "$LOG"

View file

@ -38,7 +38,8 @@ class TestBrowseTimeEffortFields:
row["active_min"] = None row["active_min"] = None
row["passive_min"] = None row["passive_min"] = None
assert row["active_min"] == 0 # no active time found # "Chop onion." triggers the chop prep action (base 2.0 min) → active_min >= 1
assert row["active_min"] > 0
assert row["passive_min"] == 20 assert row["passive_min"] == 20
def test_null_when_directions_empty(self): def test_null_when_directions_empty(self):
@ -115,10 +116,12 @@ class TestDetailTimeEffortField:
], ],
} }
assert time_effort_dict["active_min"] == 5 # "Gather all ingredients." → active default (2 min); "Sear for 5 min" → 5 min
assert time_effort_dict["active_min"] == 7
assert time_effort_dict["passive_min"] == 20 assert time_effort_dict["passive_min"] == 20
assert time_effort_dict["total_min"] == 25 assert time_effort_dict["total_min"] == 27
assert time_effort_dict["effort_label"] == "quick" # 3 steps # 27 min total → moderate (21-45 min range)
assert time_effort_dict["effort_label"] == "moderate"
assert isinstance(time_effort_dict["equipment"], list) assert isinstance(time_effort_dict["equipment"], list)
assert len(time_effort_dict["step_analyses"]) == 3 assert len(time_effort_dict["step_analyses"]) == 3
assert time_effort_dict["step_analyses"][2]["is_passive"] is True assert time_effort_dict["step_analyses"][2]["is_passive"] is True

View file

@ -0,0 +1,304 @@
"""API tests for recipe scan endpoints (kiwi#9).
VLM calls are mocked at the service level -- no GPU or API key needed.
"""
from __future__ import annotations
import io
import json
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
from fastapi.testclient import TestClient
from app.main import app
from app.cloud_session import get_session
from app.db.session import get_store
client = TestClient(app)
_GOOD_SCAN_JSON = {
"title": "Green Goddess Bowls",
"subtitle": "with Broccoli & Ranch Dressing",
"servings": "2",
"cook_time": "15 min",
"source_note": "Purple Carrot",
"ingredients": [
{"name": "brown rice", "qty": "1/2", "unit": "cup", "raw": "1/2 cup brown rice"},
{"name": "broccoli florets", "qty": "8", "unit": "oz", "raw": "8 oz broccoli florets"},
{"name": "avocado", "qty": "1", "unit": None, "raw": "1 avocado"},
],
"steps": ["Cook rice.", "Steam broccoli.", "Assemble bowls."],
"notes": None,
"confidence": "high",
"warnings": [],
}
def _make_session(tier: str = "paid", has_byok: bool = False) -> MagicMock:
mock = MagicMock()
mock.tier = tier
mock.has_byok = has_byok
mock.db = ":memory:"
return mock
def _make_store() -> MagicMock:
mock = MagicMock()
mock.list_inventory.return_value = [
{"product_name": "brown rice"},
{"product_name": "avocado"},
]
mock.create_user_recipe.return_value = {
"id": 1,
"title": "Green Goddess Bowls",
"subtitle": "with Broccoli & Ranch Dressing",
"servings": "2",
"cook_time": "15 min",
"source_note": "Purple Carrot",
"ingredients": _GOOD_SCAN_JSON["ingredients"],
"steps": _GOOD_SCAN_JSON["steps"],
"notes": None,
"tags": [],
"source": "scan",
"pantry_match_pct": None,
"created_at": "2026-04-27T00:00:00",
}
mock.list_user_recipes.return_value = []
mock.get_user_recipe.return_value = None
mock.delete_user_recipe.return_value = False
return mock
def _fake_image() -> bytes:
return b"\xff\xd8\xff\xe0" + b"\x00" * 100 # minimal JPEG magic
@pytest.fixture(autouse=True)
def override_deps():
session_mock = _make_session()
store_mock = _make_store()
app.dependency_overrides[get_session] = lambda: session_mock
app.dependency_overrides[get_store] = lambda: store_mock
yield session_mock, store_mock
app.dependency_overrides.clear()
# ── POST /recipes/scan ─────────────────────────────────────────────────────────
def _make_scan_result(title: str = "Green Goddess Bowls"):
"""Create a fake ScannedRecipeResult for tests."""
from app.services.recipe.recipe_scanner import ScannedIngredient, ScannedRecipeResult
return ScannedRecipeResult(
title=title,
subtitle="with Broccoli & Ranch Dressing",
servings="2",
cook_time="15 min",
source_note="Purple Carrot",
ingredients=[
ScannedIngredient("brown rice", "1/2", "cup", in_pantry=True),
ScannedIngredient("broccoli florets", "8", "oz"),
ScannedIngredient("avocado", "1", None, in_pantry=True),
],
steps=["Cook rice.", "Steam broccoli.", "Assemble bowls."],
notes=None,
tags=[],
pantry_match_pct=67,
confidence="high",
warnings=[],
)
@pytest.fixture
def mock_scan_infra(tmp_path):
"""Patch file-saving and VLM calls so scan endpoint tests don't need disk or GPU."""
fake_path = tmp_path / "recipe.jpg"
fake_path.write_bytes(_fake_image())
async def _fake_save(upload_file):
return fake_path
with patch("app.api.endpoints.recipe_scan._save_upload_temp", side_effect=_fake_save):
with patch("app.api.endpoints.recipe_scan.asyncio.to_thread") as mock_thread:
yield mock_thread, fake_path
class TestScanEndpoint:
def test_scan_returns_200(self, override_deps, mock_scan_infra):
"""Happy path: paid tier, valid JPEG, VLM returns good JSON."""
_, store_mock = override_deps
mock_thread, _ = mock_scan_infra
scan_result = _make_scan_result()
call_count = 0
def side_effect(fn, *args, **kwargs):
nonlocal call_count
call_count += 1
return store_mock.list_inventory() if call_count == 1 else scan_result
mock_thread.side_effect = side_effect
resp = client.post(
"/api/v1/recipes/scan",
files=[("files", ("recipe.jpg", _fake_image(), "image/jpeg"))],
)
assert resp.status_code == 200
data = resp.json()
assert data["title"] == "Green Goddess Bowls"
assert data["confidence"] == "high"
assert data["pantry_match_pct"] == 67
assert len(data["ingredients"]) == 3
def test_scan_requires_paid_tier(self, override_deps):
"""Free tier without BYOK should get 403."""
session_mock, _ = override_deps
session_mock.tier = "free"
session_mock.has_byok = False
resp = client.post(
"/api/v1/recipes/scan",
files=[("files", ("recipe.jpg", _fake_image(), "image/jpeg"))],
)
assert resp.status_code == 403
def test_scan_byok_free_tier_allowed(self, override_deps, mock_scan_infra):
"""Free tier WITH BYOK should be allowed through the tier gate."""
session_mock, store_mock = override_deps
session_mock.tier = "free"
session_mock.has_byok = True
mock_thread, _ = mock_scan_infra
scan_result = _make_scan_result("Simple Bowl")
call_count = 0
def _side(fn, *a, **kw):
nonlocal call_count
call_count += 1
return store_mock.list_inventory() if call_count == 1 else scan_result
mock_thread.side_effect = _side
resp = client.post(
"/api/v1/recipes/scan",
files=[("files", ("recipe.jpg", _fake_image(), "image/jpeg"))],
)
assert resp.status_code == 200
def test_scan_no_files_rejected(self, override_deps):
"""Missing files field returns 422."""
resp = client.post("/api/v1/recipes/scan", files=[])
assert resp.status_code in (422, 400)
def test_scan_too_many_files(self, override_deps, mock_scan_infra):
"""More than 4 files should return 422."""
mock_thread, _ = mock_scan_infra
mock_thread.return_value = []
files = [("files", (f"p{i}.jpg", _fake_image(), "image/jpeg")) for i in range(5)]
resp = client.post("/api/v1/recipes/scan", files=files)
assert resp.status_code == 422
def test_scan_not_a_recipe_returns_422(self, override_deps, mock_scan_infra):
_, store_mock = override_deps
mock_thread, _ = mock_scan_infra
call_count = 0
def _side(fn, *a, **kw):
nonlocal call_count
call_count += 1
if call_count == 1:
return store_mock.list_inventory()
raise ValueError("not_a_recipe: image does not appear to contain a recipe")
mock_thread.side_effect = _side
resp = client.post(
"/api/v1/recipes/scan",
files=[("files", ("photo.jpg", _fake_image(), "image/jpeg"))],
)
assert resp.status_code == 422
assert "recipe" in resp.json()["detail"].lower()
def test_scan_backend_unavailable_returns_503(self, override_deps, mock_scan_infra):
_, store_mock = override_deps
mock_thread, _ = mock_scan_infra
call_count = 0
def _side(fn, *a, **kw):
nonlocal call_count
call_count += 1
if call_count == 1:
return store_mock.list_inventory()
raise RuntimeError("No vision backend configured")
mock_thread.side_effect = _side
resp = client.post(
"/api/v1/recipes/scan",
files=[("files", ("photo.jpg", _fake_image(), "image/jpeg"))],
)
assert resp.status_code == 503
# ── POST /recipes/scan/save ────────────────────────────────────────────────────
class TestSaveEndpoint:
def test_save_returns_201(self, override_deps):
_, store_mock = override_deps
store_mock.create_user_recipe.return_value = {
"id": 42,
"title": "Green Goddess Bowls",
"subtitle": None,
"servings": "2",
"cook_time": "15 min",
"source_note": None,
"ingredients": [{"name": "brown rice", "qty": "1", "unit": "cup", "raw": None, "in_pantry": False}],
"steps": ["Cook it."],
"notes": None,
"tags": [],
"source": "scan",
"pantry_match_pct": None,
"created_at": "2026-04-27T00:00:00",
}
payload = {
"title": "Green Goddess Bowls",
"servings": "2",
"cook_time": "15 min",
"ingredients": [{"name": "brown rice", "qty": "1", "unit": "cup"}],
"steps": ["Cook it."],
"source": "scan",
}
resp = client.post("/api/v1/recipes/scan/save", json=payload)
assert resp.status_code == 201
data = resp.json()
assert data["id"] == 42
assert data["title"] == "Green Goddess Bowls"
def test_save_missing_title_rejected(self, override_deps):
payload = {
"ingredients": [{"name": "eggs", "qty": "2"}],
"steps": ["Scramble."],
}
resp = client.post("/api/v1/recipes/scan/save", json=payload)
assert resp.status_code == 422
# ── GET /recipes/user ──────────────────────────────────────────────────────────
class TestUserRecipeEndpoints:
def test_list_empty(self, override_deps):
_, store_mock = override_deps
store_mock.list_user_recipes.return_value = []
resp = client.get("/api/v1/recipes/user")
assert resp.status_code == 200
assert resp.json() == []
def test_get_not_found(self, override_deps):
_, store_mock = override_deps
store_mock.get_user_recipe.return_value = None
resp = client.get("/api/v1/recipes/user/999")
assert resp.status_code == 404
def test_delete_not_found(self, override_deps):
_, store_mock = override_deps
store_mock.delete_user_recipe.return_value = False
resp = client.delete("/api/v1/recipes/user/999")
assert resp.status_code == 404

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.2 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.4 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.8 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.9 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.7 MiB

View file

@ -0,0 +1,107 @@
#!/usr/bin/env python3
"""
Prompt validation harness for recipe scanner (kiwi#9).
Runs the draft extraction prompt against fixture images using the Anthropic API
directly (bypasses llm.yaml for prompt dev only, not production path).
Usage:
python extract_test.py <image1.jpg> [image2.jpg]
"""
import base64
import io
import json
import os
import sys
from pathlib import Path
from PIL import Image, ImageOps
import anthropic
PROMPT = """
You are extracting a recipe from a photograph of a recipe card, cookbook page, or handwritten note.
If two images are provided, treat them as a single recipe across two pages (e.g. ingredients on page 1, directions on page 2).
Return a single JSON object with these fields:
- title: recipe name (string)
- subtitle: any secondary title or serving suggestion e.g. "with Broccoli & Ranch Dressing" (string or null)
- servings: serving size if shown, as a string e.g. "2", "4-6" (string or null)
- cook_time: total cook time if shown, e.g. "15 min", "1 hour" (string or null)
- source_note: any attribution text like "From Betty Crocker" or "Purple Carrot" (string or null)
- ingredients: array of ingredient objects, each with:
- name: normalized generic ingredient name, lowercase, no quantities, no brand names
(e.g. "Follow Your Heart® Vegan Ranch" "ranch dressing")
- qty: quantity as a string, preserving fractions e.g. "1/2", "¼" (string or null)
- unit: unit of measure, null for countable items (e.g. "3 eggs" unit: null)
- raw: the original ingredient line verbatim, exactly as it appears
- steps: ordered array of instruction strings, one distinct step per element
- notes: any tips, substitutions, storage instructions, or variations (string or null)
- confidence: "high" if text is clear and complete, "medium" if some parts are uncertain,
"low" if mostly handwritten or significantly degraded
- warnings: array of strings describing anything the user should double-check
(e.g. "Directions appear to continue on another page not shown")
Return only valid JSON. No markdown fences. No explanation outside the JSON.
If the image does not appear to be a recipe at all, return: {"error": "not_a_recipe"}
""".strip()
def load_image_b64(path: Path) -> str:
"""Load image, apply EXIF rotation, return base64-encoded JPEG."""
with open(path, "rb") as f:
img = Image.open(io.BytesIO(f.read()))
img = ImageOps.exif_transpose(img) # fix phone rotation
img = img.convert("RGB")
buf = io.BytesIO()
img.save(buf, format="JPEG", quality=90)
return base64.b64encode(buf.getvalue()).decode()
def extract(image_paths: list[Path]) -> dict:
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
content = []
for i, path in enumerate(image_paths):
if i > 0:
content.append({"type": "text", "text": f"(Page {i + 1} of the same recipe:)"})
content.append({
"type": "image",
"source": {
"type": "base64",
"media_type": "image/jpeg",
"data": load_image_b64(path),
},
})
content.append({"type": "text", "text": PROMPT})
msg = client.messages.create(
model="claude-opus-4-6", # best vision for prompt dev; production uses VisionRouter
max_tokens=2048,
messages=[{"role": "user", "content": content}],
)
raw = msg.content[0].text.strip()
# Strip markdown fences if the model adds them anyway
if raw.startswith("```"):
raw = raw.split("```")[1]
if raw.startswith("json"):
raw = raw[4:]
return json.loads(raw)
if __name__ == "__main__":
paths = [Path(p) for p in sys.argv[1:]]
if not paths:
print("Usage: python extract_test.py <image1.jpg> [image2.jpg]")
sys.exit(1)
for p in paths:
if not p.exists():
print(f"File not found: {p}")
sys.exit(1)
print(f"Extracting from: {[p.name for p in paths]}")
print("Applying EXIF rotation + sending to claude-opus-4-6...\n")
result = extract(paths)
print(json.dumps(result, indent=2, ensure_ascii=False))

View file

@ -0,0 +1,233 @@
"""Unit tests for the recipe scanner service.
VLM calls are mocked these tests cover JSON parsing, pantry cross-reference,
error handling, and result normalization. No GPU required.
"""
from __future__ import annotations
import json
from pathlib import Path
from unittest.mock import MagicMock, patch
import pytest
from app.services.recipe.recipe_scanner import (
RecipeScanner,
ScannedIngredient,
ScannedRecipeResult,
_cross_reference_pantry,
_parse_scanner_json,
_normalize_ingredient_name,
)
# ── Fixtures ──────────────────────────────────────────────────────────────────
GOOD_JSON = {
"title": "Green Goddess Bowls",
"subtitle": "with Broccoli & Ranch Dressing",
"servings": "2",
"cook_time": "15 min",
"source_note": "Purple Carrot",
"ingredients": [
{"name": "brown rice", "qty": "1/2", "unit": "cup", "raw": "1/2 cup brown rice"},
{"name": "broccoli florets", "qty": "8", "unit": "oz", "raw": "8 oz broccoli florets"},
{"name": "avocado", "qty": "1", "unit": None, "raw": "1 avocado"},
{"name": "ranch dressing", "qty": "2", "unit": "tbsp", "raw": "2 tbsp Follow Your Heart Ranch"},
{"name": "pumpkin seeds", "qty": "1", "unit": "tbsp", "raw": "1 tbsp pumpkin seeds"},
],
"steps": [
"Cook rice according to package directions.",
"Steam broccoli for 5 minutes until tender.",
"Slice avocado. Assemble bowls and top with ranch.",
],
"notes": "Great leftover — keeps 3 days in the fridge.",
"confidence": "high",
"warnings": [],
}
def _fake_image_path(tmp_path: Path, name: str = "recipe.jpg") -> Path:
"""Create a tiny placeholder file so path-existence checks pass."""
p = tmp_path / name
p.write_bytes(b"\xff\xd8\xff") # minimal JPEG magic bytes
return p
# ── _normalize_ingredient_name ─────────────────────────────────────────────────
class TestNormalizeIngredientName:
def test_lowercases(self):
assert _normalize_ingredient_name("Brown Rice") == "brown rice"
def test_strips_whitespace(self):
assert _normalize_ingredient_name(" avocado ") == "avocado"
def test_removes_plural_s(self):
# For matching purposes only — "pumpkin seeds" stays as-is (stop at spaces)
assert _normalize_ingredient_name("avocados") == "avocados"
def test_passthrough(self):
assert _normalize_ingredient_name("ranch dressing") == "ranch dressing"
# ── _parse_scanner_json ───────────────────────────────────────────────────────
class TestParseScannerJson:
def test_parses_good_json(self):
result = _parse_scanner_json(json.dumps(GOOD_JSON))
assert result["title"] == "Green Goddess Bowls"
assert len(result["ingredients"]) == 5
def test_strips_markdown_fences(self):
wrapped = f"```json\n{json.dumps(GOOD_JSON)}\n```"
result = _parse_scanner_json(wrapped)
assert result["title"] == "Green Goddess Bowls"
def test_not_a_recipe_error(self):
with pytest.raises(ValueError, match="not_a_recipe"):
_parse_scanner_json(json.dumps({"error": "not_a_recipe"}))
def test_missing_title_returns_none_title(self):
data = dict(GOOD_JSON)
data.pop("title")
result = _parse_scanner_json(json.dumps(data))
assert result.get("title") is None
def test_malformed_json_raises(self):
with pytest.raises(ValueError, match="parse"):
_parse_scanner_json("this is not json at all")
def test_json_inside_prose(self):
"""Model sometimes adds leading text before the JSON object."""
text = f"Here is the extracted recipe:\n{json.dumps(GOOD_JSON)}"
result = _parse_scanner_json(text)
assert result["title"] == "Green Goddess Bowls"
# ── _cross_reference_pantry ───────────────────────────────────────────────────
class TestCrossReferencePantry:
PANTRY = ["brown rice", "pumpkin seeds", "olive oil", "broccoli"]
def test_marks_exact_match(self):
ingr = [
ScannedIngredient(name="brown rice", qty="1/2", unit="cup"),
ScannedIngredient(name="avocado", qty="1", unit=None),
]
result, pct = _cross_reference_pantry(ingr, self.PANTRY)
assert result[0].in_pantry is True
assert result[1].in_pantry is False
assert pct == 50
def test_partial_word_match(self):
"""'broccoli florets' should match pantry item 'broccoli'."""
ingr = [ScannedIngredient(name="broccoli florets", qty="8", unit="oz")]
result, pct = _cross_reference_pantry(ingr, self.PANTRY)
assert result[0].in_pantry is True
assert pct == 100
def test_empty_pantry_all_false(self):
ingr = [ScannedIngredient(name="broccoli", qty="1", unit=None)]
result, pct = _cross_reference_pantry(ingr, [])
assert result[0].in_pantry is False
assert pct == 0
def test_empty_ingredients_zero_pct(self):
_, pct = _cross_reference_pantry([], self.PANTRY)
assert pct == 0
def test_case_insensitive_match(self):
ingr = [ScannedIngredient(name="Brown Rice", qty="1", unit="cup")]
result, pct = _cross_reference_pantry(ingr, self.PANTRY)
assert result[0].in_pantry is True
# ── RecipeScanner ─────────────────────────────────────────────────────────────
class TestRecipeScanner:
def _make_scanner(self) -> RecipeScanner:
return RecipeScanner()
@patch("app.services.recipe.recipe_scanner._call_vision_backend")
def test_scan_single_image_success(self, mock_call, tmp_path):
mock_call.return_value = json.dumps(GOOD_JSON)
img = _fake_image_path(tmp_path)
scanner = self._make_scanner()
result = scanner.scan([img], pantry_names=["brown rice", "avocado"])
assert isinstance(result, ScannedRecipeResult)
assert result.title == "Green Goddess Bowls"
assert result.servings == "2"
assert result.cook_time == "15 min"
assert len(result.ingredients) == 5
assert result.confidence == "high"
assert result.pantry_match_pct == 40 # 2 of 5 in pantry
@patch("app.services.recipe.recipe_scanner._call_vision_backend")
def test_scan_multi_image(self, mock_call, tmp_path):
"""Two photos treated as one recipe — both passed to VLM."""
mock_call.return_value = json.dumps(GOOD_JSON)
img1 = _fake_image_path(tmp_path, "p1.jpg")
img2 = _fake_image_path(tmp_path, "p2.jpg")
scanner = self._make_scanner()
result = scanner.scan([img1, img2])
# Both images passed through
call_args = mock_call.call_args
assert len(call_args[0][0]) == 2 # image_paths list has 2 items
assert result.title == "Green Goddess Bowls"
@patch("app.services.recipe.recipe_scanner._call_vision_backend")
def test_scan_not_a_recipe_raises(self, mock_call, tmp_path):
mock_call.return_value = json.dumps({"error": "not_a_recipe"})
img = _fake_image_path(tmp_path)
scanner = self._make_scanner()
with pytest.raises(ValueError, match="not_a_recipe"):
scanner.scan([img])
@patch("app.services.recipe.recipe_scanner._call_vision_backend")
def test_warnings_propagated(self, mock_call, tmp_path):
data = dict(GOOD_JSON)
data["warnings"] = ["Directions appear to continue on another page not shown"]
mock_call.return_value = json.dumps(data)
img = _fake_image_path(tmp_path)
scanner = self._make_scanner()
result = scanner.scan([img])
assert len(result.warnings) == 1
assert "another page" in result.warnings[0]
@patch("app.services.recipe.recipe_scanner._call_vision_backend")
def test_scan_no_pantry_names(self, mock_call, tmp_path):
mock_call.return_value = json.dumps(GOOD_JSON)
img = _fake_image_path(tmp_path)
scanner = self._make_scanner()
result = scanner.scan([img])
# No pantry passed — all in_pantry=False, pct=0
assert result.pantry_match_pct == 0
assert all(not i.in_pantry for i in result.ingredients)
def test_scan_too_many_images_raises(self, tmp_path):
imgs = [_fake_image_path(tmp_path, f"p{i}.jpg") for i in range(5)]
scanner = self._make_scanner()
with pytest.raises(ValueError, match="4 images"):
scanner.scan(imgs)
def test_scan_no_images_raises(self):
scanner = self._make_scanner()
with pytest.raises(ValueError, match="least one"):
scanner.scan([])
@patch("app.services.recipe.recipe_scanner._call_vision_backend")
def test_backend_unavailable_raises(self, mock_call, tmp_path):
mock_call.side_effect = RuntimeError("No vision backend configured")
img = _fake_image_path(tmp_path)
scanner = self._make_scanner()
with pytest.raises(RuntimeError, match="No vision backend"):
scanner.scan([img])

View file

@ -0,0 +1,127 @@
"""Tests for task-based routing added to get_meal_plan_router()."""
from __future__ import annotations
from unittest.mock import MagicMock
import pytest
def _make_task_ctx(url: str = "http://node:8080") -> MagicMock:
"""Mock context manager returned by task_allocate()."""
alloc = MagicMock()
alloc.url = url
alloc.allocation_id = "alloc-task-1"
alloc.service = "cf-text"
ctx = MagicMock()
ctx.__enter__ = MagicMock(return_value=alloc)
ctx.__exit__ = MagicMock(return_value=False)
return ctx
def _make_task_ctx_not_registered() -> MagicMock:
"""Mock context manager that raises TaskNotRegistered on enter."""
from app.services.task_inference import TaskNotRegistered
ctx = MagicMock()
ctx.__enter__ = MagicMock(side_effect=TaskNotRegistered("not registered"))
ctx.__exit__ = MagicMock(return_value=False)
return ctx
def _make_direct_alloc_ctx(url: str = "http://node:8080") -> MagicMock:
"""Mock context manager returned by CFOrchClient.allocate()."""
alloc = MagicMock()
alloc.url = url
ctx = MagicMock()
ctx.__enter__ = MagicMock(return_value=alloc)
ctx.__exit__ = MagicMock(return_value=False)
return ctx
def test_task_path_returns_orch_router_on_success(monkeypatch):
"""get_meal_plan_router() returns _OrchTextRouter when task allocation succeeds."""
monkeypatch.setenv("CF_ORCH_URL", "http://coord:7700")
import unittest.mock as um
# Patch the name as it exists in llm_router's own namespace (module-level import).
with um.patch("app.services.meal_plan.llm_router.task_allocate",
return_value=_make_task_ctx(url="http://node:9001")):
from app.services.meal_plan.llm_router import get_meal_plan_router, _OrchTextRouter
router, ctx = get_meal_plan_router()
assert isinstance(router, _OrchTextRouter)
assert router._base_url == "http://node:9001"
def test_task_not_registered_falls_back_to_direct_allocate(monkeypatch):
"""get_meal_plan_router() falls back to direct cf-text allocation on TaskNotRegistered."""
monkeypatch.setenv("CF_ORCH_URL", "http://coord:7700")
direct_ctx = _make_direct_alloc_ctx(url="http://node:9002")
import unittest.mock as um
# Patch task_allocate in llm_router's namespace so TaskNotRegistered is raised.
with um.patch("app.services.meal_plan.llm_router.task_allocate",
return_value=_make_task_ctx_not_registered()), \
um.patch("app.services.meal_plan.llm_router.CFOrchClient") as MockClient:
MockClient.return_value.allocate.return_value = direct_ctx
from app.services.meal_plan.llm_router import get_meal_plan_router, _OrchTextRouter
router, ctx = get_meal_plan_router()
assert isinstance(router, _OrchTextRouter)
assert router._base_url == "http://node:9002"
def test_no_cf_orch_url_returns_llm_router(monkeypatch):
"""get_meal_plan_router() returns LLMRouter when CF_ORCH_URL is not set."""
monkeypatch.delenv("CF_ORCH_URL", raising=False)
import unittest.mock as um
mock_lr = MagicMock()
with um.patch("app.services.meal_plan.llm_router.LLMRouter", return_value=mock_lr):
from app.services.meal_plan.llm_router import get_meal_plan_router
router, ctx = get_meal_plan_router()
assert router is mock_lr
def test_tier1_general_exception_falls_back_to_direct_allocate(monkeypatch):
"""get_meal_plan_router() falls back to direct allocation when task_allocate raises RuntimeError."""
monkeypatch.setenv("CF_ORCH_URL", "http://coord:7700")
direct_ctx = _make_direct_alloc_ctx(url="http://node:9003")
import unittest.mock as um
failing_ctx = MagicMock()
failing_ctx.__enter__ = MagicMock(side_effect=RuntimeError("coordinator down"))
failing_ctx.__exit__ = MagicMock(return_value=False)
with um.patch("app.services.meal_plan.llm_router.task_allocate",
return_value=failing_ctx), \
um.patch("app.services.meal_plan.llm_router.CFOrchClient") as MockClient:
MockClient.return_value.allocate.return_value = direct_ctx
from app.services.meal_plan.llm_router import get_meal_plan_router, _OrchTextRouter
router, ctx = get_meal_plan_router()
assert isinstance(router, _OrchTextRouter)
assert router._base_url == "http://node:9003"
def test_tier2_none_alloc_releases_ctx_and_falls_through(monkeypatch):
"""get_meal_plan_router() releases Tier 2 ctx and falls through when alloc is None."""
monkeypatch.setenv("CF_ORCH_URL", "http://coord:7700")
import unittest.mock as um
none_alloc_ctx = MagicMock()
none_alloc_ctx.__enter__ = MagicMock(return_value=None)
none_alloc_ctx.__exit__ = MagicMock(return_value=False)
mock_lr = MagicMock()
with um.patch("app.services.meal_plan.llm_router.task_allocate",
return_value=_make_task_ctx_not_registered()), \
um.patch("app.services.meal_plan.llm_router.CFOrchClient") as MockClient, \
um.patch("app.services.meal_plan.llm_router.LLMRouter", return_value=mock_lr):
MockClient.return_value.allocate.return_value = none_alloc_ctx
from app.services.meal_plan.llm_router import get_meal_plan_router
router, ctx = get_meal_plan_router()
assert router is mock_lr
none_alloc_ctx.__exit__.assert_called_once_with(None, None, None)

View file

@ -0,0 +1,164 @@
"""Tests for app/services/task_inference.py"""
from __future__ import annotations
from unittest.mock import MagicMock, patch
import pytest
def _ok_resp(url: str = "http://node:8080", allocation_id: str = "alloc-123") -> MagicMock:
m = MagicMock()
m.status_code = 200
m.is_success = True
m.json.return_value = {
"url": url,
"allocation_id": allocation_id,
"gpu_id": 0,
"started": True,
"warm": False,
}
return m
def _err_resp(status_code: int, text: str = "error") -> MagicMock:
m = MagicMock()
m.status_code = status_code
m.is_success = False
m.text = text
return m
def test_task_allocate_yields_allocation_on_200(monkeypatch):
"""task_allocate() yields Allocation with url, allocation_id, service on 200."""
monkeypatch.setenv("CF_ORCH_URL", "http://coord:7700")
with patch("app.services.task_inference.httpx.post", return_value=_ok_resp()) as mock_post, \
patch("app.services.task_inference.httpx.delete") as mock_del:
from app.services.task_inference import task_allocate
with task_allocate("kiwi", "meal_plan", service_hint="cf-text") as alloc:
assert alloc.url == "http://node:8080"
assert alloc.allocation_id == "alloc-123"
assert alloc.service == "cf-text"
called_url = mock_post.call_args[0][0]
assert called_url == "http://coord:7700/api/inference/task"
mock_del.assert_called_once()
def test_task_allocate_uses_service_from_response_when_present(monkeypatch):
"""task_allocate() uses service from response dict over service_hint when available."""
monkeypatch.setenv("CF_ORCH_URL", "http://coord:7700")
resp = _ok_resp()
resp.json.return_value["service"] = "cf-vision"
with patch("app.services.task_inference.httpx.post", return_value=resp), \
patch("app.services.task_inference.httpx.delete"):
from app.services.task_inference import task_allocate
with task_allocate("kiwi", "ocr", service_hint="cf-docuvision") as alloc:
assert alloc.service == "cf-vision"
def test_task_allocate_404_raises_task_not_registered(monkeypatch):
"""task_allocate() raises TaskNotRegistered on coordinator 404."""
monkeypatch.setenv("CF_ORCH_URL", "http://coord:7700")
with patch("app.services.task_inference.httpx.post", return_value=_err_resp(404)):
from app.services.task_inference import task_allocate, TaskNotRegistered
with pytest.raises(TaskNotRegistered):
with task_allocate("kiwi", "meal_plan", service_hint="cf-text"):
pass
def test_task_allocate_503_raises_runtime_error(monkeypatch):
"""task_allocate() raises RuntimeError on non-404 coordinator errors."""
monkeypatch.setenv("CF_ORCH_URL", "http://coord:7700")
with patch("app.services.task_inference.httpx.post", return_value=_err_resp(503, "no GPU")):
from app.services.task_inference import task_allocate
with pytest.raises(RuntimeError, match="HTTP 503"):
with task_allocate("kiwi", "meal_plan", service_hint="cf-text"):
pass
def test_task_allocate_release_called_on_clean_exit(monkeypatch):
"""task_allocate() DELETEs the allocation on clean context exit."""
monkeypatch.setenv("CF_ORCH_URL", "http://coord:7700")
with patch("app.services.task_inference.httpx.post", return_value=_ok_resp(allocation_id="xyz")), \
patch("app.services.task_inference.httpx.delete") as mock_del:
from app.services.task_inference import task_allocate
with task_allocate("kiwi", "meal_plan", service_hint="cf-text"):
pass
release_url = mock_del.call_args[0][0]
assert "cf-text" in release_url
assert "xyz" in release_url
def test_task_allocate_release_called_when_inner_block_raises(monkeypatch):
"""task_allocate() DELETEs the allocation even when the inner block raises."""
monkeypatch.setenv("CF_ORCH_URL", "http://coord:7700")
with patch("app.services.task_inference.httpx.post", return_value=_ok_resp(allocation_id="abc")), \
patch("app.services.task_inference.httpx.delete") as mock_del:
from app.services.task_inference import task_allocate
with pytest.raises(ValueError):
with task_allocate("kiwi", "meal_plan", service_hint="cf-text"):
raise ValueError("inner error")
mock_del.assert_called_once()
def test_task_allocate_release_failure_is_swallowed(monkeypatch):
"""task_allocate() does not propagate DELETE failures."""
import httpx as _httpx
monkeypatch.setenv("CF_ORCH_URL", "http://coord:7700")
with patch("app.services.task_inference.httpx.post", return_value=_ok_resp()), \
patch("app.services.task_inference.httpx.delete",
side_effect=_httpx.RequestError("gone", request=MagicMock())):
from app.services.task_inference import task_allocate
with task_allocate("kiwi", "meal_plan", service_hint="cf-text") as alloc:
assert alloc.url == "http://node:8080"
# no exception raised
def test_task_allocate_no_orch_url_raises_runtime_error(monkeypatch):
"""task_allocate() raises RuntimeError when CF_ORCH_URL is not set."""
monkeypatch.delenv("CF_ORCH_URL", raising=False)
from app.services.task_inference import task_allocate
with pytest.raises(RuntimeError, match="CF_ORCH_URL"):
with task_allocate("kiwi", "meal_plan", service_hint="cf-text"):
pass
def test_task_allocate_network_error_raises_runtime_error(monkeypatch):
"""task_allocate() wraps httpx.RequestError in RuntimeError."""
import httpx as _httpx
monkeypatch.setenv("CF_ORCH_URL", "http://coord:7700")
with patch("app.services.task_inference.httpx.post",
side_effect=_httpx.RequestError("timeout", request=MagicMock())):
from app.services.task_inference import task_allocate
with pytest.raises(RuntimeError, match="unreachable"):
with task_allocate("kiwi", "meal_plan", service_hint="cf-text"):
pass
def test_task_allocate_malformed_json_raises_runtime_error(monkeypatch):
"""task_allocate() raises RuntimeError when coordinator returns non-JSON on 200."""
monkeypatch.setenv("CF_ORCH_URL", "http://coord:7700")
bad_resp = MagicMock()
bad_resp.status_code = 200
bad_resp.is_success = True
bad_resp.text = "<html>proxy error</html>"
bad_resp.json.side_effect = ValueError("not json")
with patch("app.services.task_inference.httpx.post", return_value=bad_resp):
from app.services.task_inference import task_allocate
with pytest.raises(RuntimeError, match="malformed"):
with task_allocate("kiwi", "meal_plan", service_hint="cf-text"):
pass
def test_task_allocate_missing_url_field_raises_runtime_error(monkeypatch):
"""task_allocate() raises RuntimeError when coordinator response is missing url field."""
monkeypatch.setenv("CF_ORCH_URL", "http://coord:7700")
bad_resp = MagicMock()
bad_resp.status_code = 200
bad_resp.is_success = True
bad_resp.text = '{"allocation_id": "x"}'
bad_resp.json.return_value = {"allocation_id": "x"} # missing "url"
with patch("app.services.task_inference.httpx.post", return_value=bad_resp):
from app.services.task_inference import task_allocate
with pytest.raises(RuntimeError, match="malformed"):
with task_allocate("kiwi", "meal_plan", service_hint="cf-text"):
pass

View file

@ -0,0 +1,88 @@
"""Tests for task-based routing added to _try_docuvision()."""
from __future__ import annotations
from unittest.mock import MagicMock, patch
import pytest
def _mock_doc_result(text: str = "RECEIPT TEXT") -> MagicMock:
r = MagicMock()
r.text = text
return r
def _make_task_ctx(url: str = "http://node:9010") -> MagicMock:
alloc = MagicMock()
alloc.url = url
alloc.allocation_id = "alloc-vis-1"
alloc.service = "cf-docuvision"
ctx = MagicMock()
ctx.__enter__ = MagicMock(return_value=alloc)
ctx.__exit__ = MagicMock(return_value=False)
return ctx
def _make_task_not_registered() -> MagicMock:
from app.services.task_inference import TaskNotRegistered
ctx = MagicMock()
ctx.__enter__ = MagicMock(side_effect=TaskNotRegistered("not registered"))
ctx.__exit__ = MagicMock(return_value=False)
return ctx
def _make_direct_alloc(url: str = "http://node:9011") -> MagicMock:
alloc = MagicMock()
alloc.url = url
ctx = MagicMock()
ctx.__enter__ = MagicMock(return_value=alloc)
ctx.__exit__ = MagicMock(return_value=False)
return ctx
def test_try_docuvision_task_path_returns_text(monkeypatch, tmp_path):
"""_try_docuvision() uses task allocation and returns extracted text on success."""
monkeypatch.setenv("CF_ORCH_URL", "http://coord:7700")
fake_image = tmp_path / "receipt.jpg"
fake_image.write_bytes(b"fake")
with patch("app.services.task_inference.task_allocate",
return_value=_make_task_ctx(url="http://node:9010")), \
patch("app.services.ocr.docuvision_client.DocuvisionClient") as MockDoc:
MockDoc.return_value.extract_text.return_value = _mock_doc_result("STORE $12.34")
from app.services.ocr.vl_model import _try_docuvision
result = _try_docuvision(str(fake_image))
assert result == "STORE $12.34"
MockDoc.assert_called_once_with("http://node:9010")
def test_try_docuvision_falls_back_to_direct_on_task_not_registered(monkeypatch, tmp_path):
"""_try_docuvision() falls back to direct cf-docuvision allocation on TaskNotRegistered."""
monkeypatch.setenv("CF_ORCH_URL", "http://coord:7700")
fake_image = tmp_path / "receipt.jpg"
fake_image.write_bytes(b"fake")
with patch("app.services.task_inference.task_allocate",
return_value=_make_task_not_registered()), \
patch("circuitforge_orch.client.CFOrchClient") as MockClient, \
patch("app.services.ocr.docuvision_client.DocuvisionClient") as MockDoc:
MockClient.return_value.allocate.return_value = _make_direct_alloc("http://node:9011")
MockDoc.return_value.extract_text.return_value = _mock_doc_result("FALLBACK TEXT")
from app.services.ocr.vl_model import _try_docuvision
result = _try_docuvision(str(fake_image))
assert result == "FALLBACK TEXT"
MockDoc.assert_called_once_with("http://node:9011")
def test_try_docuvision_returns_none_without_cf_orch_url(monkeypatch, tmp_path):
"""_try_docuvision() returns None immediately when CF_ORCH_URL is not set."""
monkeypatch.delenv("CF_ORCH_URL", raising=False)
fake_image = tmp_path / "receipt.jpg"
fake_image.write_bytes(b"fake")
from app.services.ocr.vl_model import _try_docuvision
result = _try_docuvision(str(fake_image))
assert result is None

View file

@ -17,12 +17,17 @@ from app.services.ocr.docuvision_client import DocuvisionClient, DocuvisionResul
def test_extract_text_sends_base64_image(tmp_path: Path) -> None: def test_extract_text_sends_base64_image(tmp_path: Path) -> None:
"""extract_text() POSTs a base64-encoded image and returns parsed text.""" """extract_text() POSTs image_b64 and returns parsed raw_text."""
image_file = tmp_path / "test.jpg" image_file = tmp_path / "test.jpg"
image_file.write_bytes(b"fake-image-bytes") image_file.write_bytes(b"fake-image-bytes")
mock_response = MagicMock() mock_response = MagicMock()
mock_response.json.return_value = {"text": "Cheerios", "confidence": 0.95} mock_response.json.return_value = {
"raw_text": "Cheerios",
"elements": [],
"tables": [],
"metadata": {"hint": "text", "confidence": 0.95},
}
mock_response.raise_for_status.return_value = None mock_response.raise_for_status.return_value = None
with patch("httpx.Client") as mock_client_cls: with patch("httpx.Client") as mock_client_cls:
@ -41,7 +46,8 @@ def test_extract_text_sends_base64_image(tmp_path: Path) -> None:
assert call_kwargs[0][0] == "http://docuvision:8080/extract" assert call_kwargs[0][0] == "http://docuvision:8080/extract"
posted_json = call_kwargs[1]["json"] posted_json = call_kwargs[1]["json"]
expected_b64 = base64.b64encode(b"fake-image-bytes").decode() expected_b64 = base64.b64encode(b"fake-image-bytes").decode()
assert posted_json["image"] == expected_b64 assert posted_json["image_b64"] == expected_b64
assert posted_json["hint"] == "text"
def test_extract_text_raises_on_http_error(tmp_path: Path) -> None: def test_extract_text_raises_on_http_error(tmp_path: Path) -> None:

View file

@ -95,14 +95,15 @@ class TestTimeExtraction:
class TestTimeTotals: class TestTimeTotals:
def test_active_passive_split(self): def test_active_passive_split(self):
steps = [ steps = [
"Chop onions finely.", # active, no time "Chop onions finely.", # active; chop action → 2 min prep
"Sear chicken for 5 minutes per side.", # active, 5 min "Sear chicken for 5 minutes per side.", # active, 5 min explicit
"Simmer for 20 minutes.", # passive, 20 min "Simmer for 20 minutes.", # passive, 20 min explicit
] ]
result = parse_time_effort(steps) result = parse_time_effort(steps)
assert result.active_min == 5 # "Chop onions" now contributes prep_min (chop base=2.0) + 5 explicit = 7 active
assert result.active_min == 7
assert result.passive_min == 20 assert result.passive_min == 20
assert result.total_min == 25 assert result.total_min == 27
def test_all_active_passive_zero(self): def test_all_active_passive_zero(self):
steps = ["Dice vegetables.", "Season with salt.", "Plate and serve."] steps = ["Dice vegetables.", "Season with salt.", "Plate and serve."]
@ -130,16 +131,28 @@ class TestEffortLabel:
result = parse_time_effort(["a", "b", "c"]) result = parse_time_effort(["a", "b", "c"])
assert result.effort_label == "quick" assert result.effort_label == "quick"
def test_four_steps_is_moderate(self): def test_bake_recipe_is_moderate(self):
result = parse_time_effort(["a", "b", "c", "d"]) # Passive default for "bake" = 30 min → moderate (21-45 min range)
result = parse_time_effort([
"Mix dry ingredients.",
"Combine wet ingredients.",
"Fold together until just combined.",
"Bake until a toothpick comes out clean.",
])
assert result.effort_label == "moderate" assert result.effort_label == "moderate"
def test_seven_steps_is_moderate(self): def test_slow_cook_recipe_is_involved(self):
result = parse_time_effort(["a"] * 7) # Passive default for "slow cook" = 300 min → involved (>45 min)
assert result.effort_label == "moderate" result = parse_time_effort([
"Brown the meat in batches.",
"Add vegetables and broth.",
"Slow cook until tender.",
])
assert result.effort_label == "involved"
def test_eight_steps_is_involved(self): def test_explicit_time_drives_effort_label(self):
result = parse_time_effort(["a"] * 8) # Explicit passive time of 90 min → involved
result = parse_time_effort(["Braise for 90 minutes."])
assert result.effort_label == "involved" assert result.effort_label == "involved"