Compare commits

..

No commits in common. "main" and "v0.6.0" have entirely different histories.
main ... v0.6.0

96 changed files with 921 additions and 15139 deletions

View file

@ -21,12 +21,10 @@ DATA_DIR=./data
# IP this machine advertises to the coordinator (must be reachable from coordinator host)
# CF_ORCH_ADVERTISE_HOST=10.1.10.71
# GPU inference server (cf-orch coordinator for recipe scan, LLM generation, etc.)
# GPU_SERVER_URL: set to your local cf-orch coordinator (self-hosted rack).
# CF_ORCH_URL is the backward-compat alias — both are honoured.
# Paid+ default: when CF_LICENSE_KEY is present and neither URL is set,
# the app automatically points to https://orch.circuitforge.tech.
# GPU_SERVER_URL=http://10.1.10.71:7700
# CF-core hosted coordinator (managed cloud GPU inference — Paid+ tier)
# Set CF_ORCH_URL to use a hosted cf-orch coordinator instead of self-hosting.
# CF_LICENSE_KEY is read automatically by CFOrchClient for bearer auth.
# CF_ORCH_URL=https://orch.circuitforge.tech
# CF_LICENSE_KEY=CFG-KIWI-xxxx-xxxx-xxxx
# LLM backend — env-var auto-config (no llm.yaml needed for bare-metal users)
@ -59,9 +57,6 @@ CF_APP_NAME=kiwi
# Unset = auto-detect: true if CLOUD_MODE or circuitforge_orch is installed (paid+ local).
# Set false to force LocalScheduler even when cf-orch is present.
# USE_ORCH_SCHEDULER=false
# GPU_SERVER_URL: cf-orch coordinator endpoint. Required for recipe scan (cf-docuvision)
# and LLM features on a self-hosted rack. CF_ORCH_URL is the backward-compat alias.
# GPU_SERVER_URL=http://10.1.10.71:7700
# Cloud mode (set in compose.cloud.yml; also set here for reference)
# CLOUD_DATA_ROOT=/devl/kiwi-cloud-data

3
.gitignore vendored
View file

@ -23,9 +23,6 @@ dist/
# Data directories
data/
# Local dev database
*.db
# Test artifacts (MagicMock sqlite files from pytest)
<MagicMock*

142
README.md
View file

@ -1,118 +1,80 @@
<!-- Logo coming soon — replace docs/kiwi-logo.svg when final icon ships -->
<div align="center">
<img src="docs/kiwi-logo.svg" alt="Kiwi logo" width="96" height="96" />
# 🥝 Kiwi
# Kiwi
> *Part of the CircuitForge LLC "AI for the tasks the system made hard on purpose" suite.*
**Pantry tracking and recipe suggestions — with or without an LLM.**
**Pantry tracking and leftover recipe suggestions.**
[![License: MIT/BSL](https://img.shields.io/badge/license-MIT%20%2F%20BSL%201.1-blue)](#license)
[![CI](https://git.opensourcesolarpunk.com/Circuit-Forge/kiwi/badges/workflows/ci.yml/badge.svg)](https://git.opensourcesolarpunk.com/Circuit-Forge/kiwi/actions)
[![Version](https://img.shields.io/badge/version-0.6.0-green)](https://git.opensourcesolarpunk.com/Circuit-Forge/kiwi/releases)
Scan barcodes, photograph receipts, and get recipe ideas based on what you already have — before it expires.
[Documentation](https://docs.circuitforge.tech/kiwi) · [Live demo](https://menagerie.circuitforge.tech/kiwi) · [circuitforge.tech](https://circuitforge.tech)
**LLM support is optional.** Inventory tracking, barcode scanning, expiry alerts, CSV export, and receipt upload all work without any LLM configured. AI features (receipt OCR, recipe suggestions, meal planning) activate when a backend is available and are BYOK-unlockable at any tier.
*Part of the CircuitForge LLC suite — "AI for the tasks the system made hard on purpose."*
</div>
**Status:** Beta · CircuitForge LLC
**[Documentation](https://docs.circuitforge.tech/kiwi/)** · [circuitforge.tech](https://circuitforge.tech)
---
> **The LLM is optional.** Barcode scanning, receipt upload, expiry alerts, the full 200k+ recipe browser, and CSV export all work with zero LLM configured. Recipe suggestions and receipt OCR activate when a backend is available, and are BYOK-unlockable at any tier. You are never forced to send your data anywhere.
## What it does
---
- **Inventory tracking** — add items by barcode scan, receipt upload, or manually
- **Expiry alerts** — know what's about to go bad
- **Recipe browser** — browse the full recipe corpus by cuisine, meal type, dietary preference, or main ingredient; pantry match percentage shown inline (Free)
- **Saved recipes** — bookmark any recipe with notes, a 05 star rating, and free-text style tags (Free); organize into named collections (Paid)
- **Receipt OCR** — extract line items from receipt photos automatically (Paid tier, BYOK-unlockable)
- **Recipe suggestions** — four levels from pantry-match to full LLM generation (Paid tier, BYOK-unlockable)
- **Style auto-classifier** — LLM suggests style tags (comforting, hands-off, quick, etc.) for saved recipes (Paid tier, BYOK-unlockable)
- **Leftover mode** — prioritize nearly-expired items in recipe ranking (Free, 5/day; unlimited at Paid+)
- **LLM backend config** — configure inference via `circuitforge-core` env-var system; BYOK unlocks Paid AI features at any tier
- **Feedback FAB** — in-app feedback button; status probed on load, hidden if CF feedback endpoint unreachable
## What Kiwi does
## Stack
| Feature | Notes |
|---|---|
| **Inventory tracking** | Add items by barcode scan, receipt upload, or manually |
| **Expiry alerts** | Know what is about to go bad before it does |
| **Recipe browser** | 200k+ recipes — filter by cuisine, meal type, dietary preference, or main ingredient; pantry match percentage shown inline |
| **Leftover mode** | Prioritizes nearly-expired items in recipe ranking (5/day free, unlimited at Paid+) |
| **Recipe suggestions** | Four levels: direct corpus match, substitution/swap, cuisine-style adapter, full LLM generation |
| **Meal planning** | Plan meals for the week; pull from saved recipes or suggestions |
| **Saved recipes** | Bookmark any recipe with notes, 0-5 star rating, and free-text style tags; organize into named collections (Paid) |
| **Receipt OCR** | Extract line items from receipt photos automatically |
| **Dietary profiles** | Vegan, gluten-free, diabetic, and other constraints respected throughout |
| **Style auto-classifier** | LLM suggests style tags (comforting, hands-off, quick, etc.) for saved recipes |
| **Community feed** | Browse and share recipes with other Kiwi users |
| **CSV export** | Full pantry export, always available, no tier gate |
- **Frontend:** Vue 3 SPA (Vite + TypeScript)
- **Backend:** FastAPI + SQLite (via `circuitforge-core`)
- **Auth:** CF session cookie → Directus JWT (cloud mode)
- **Licensing:** Heimdall (free tier auto-provisioned at signup)
---
## Quick start
**One-line install (self-hosted, Docker required):**
## Running locally
```bash
bash <(curl -fsSL https://git.opensourcesolarpunk.com/Circuit-Forge/kiwi/raw/branch/main/install.sh)
```
**Or clone and run manually:**
```bash
git clone https://git.opensourcesolarpunk.com/Circuit-Forge/kiwi.git
cd kiwi
cp .env.example .env
./manage.sh build
./manage.sh start
# Web: http://localhost:8511
# API: http://localhost:8512
# Web: http://localhost:8511
# API: http://localhost:8512
```
**Live cloud instance** (free account required):
[menagerie.circuitforge.tech/kiwi](https://menagerie.circuitforge.tech/kiwi)
## Cloud instance
Full setup and configuration guide: [docs.circuitforge.tech/kiwi](https://docs.circuitforge.tech/kiwi)
---
```bash
./manage.sh cloud-build
./manage.sh cloud-start
# Served at menagerie.circuitforge.tech/kiwi (JWT-gated)
```
## Tiers
| Feature | Free | Paid | Premium |
|---|:---:|:---:|:---:|
| Inventory CRUD | Yes | Yes | Yes |
| Barcode scan | Yes | Yes | Yes |
| Receipt upload | Yes | Yes | Yes |
| Expiry alerts | Yes | Yes | Yes |
| CSV export | Yes | Yes | Yes |
| Recipe browser (200k+ recipes) | Yes | Yes | Yes |
| Save recipes + notes + star rating | Yes | Yes | Yes |
| Style tags (manual, free-text) | Yes | Yes | Yes |
| Leftover mode (5/day) | Yes | Yes | Yes |
| Receipt OCR | BYOK | Yes | Yes |
| Recipe suggestions (L1L4) | BYOK | Yes | Yes |
| Named recipe collections | — | Yes | Yes |
| LLM style auto-classifier | — | BYOK | Yes |
| Meal planning | — | Yes | Yes |
| Multi-household | — | — | Yes |
|---------|------|------|---------|
| Inventory CRUD | ✓ | ✓ | ✓ |
| Barcode scan | ✓ | ✓ | ✓ |
| Receipt upload | ✓ | ✓ | ✓ |
| Expiry alerts | ✓ | ✓ | ✓ |
| CSV export | ✓ | ✓ | ✓ |
| Recipe browser (domain/category) | ✓ | ✓ | ✓ |
| Save recipes + notes + star rating | ✓ | ✓ | ✓ |
| Style tags (manual, free-text) | ✓ | ✓ | ✓ |
| Receipt OCR | BYOK | ✓ | ✓ |
| Recipe suggestions (L1L4) | BYOK | ✓ | ✓ |
| Named recipe collections | — | ✓ | ✓ |
| LLM style auto-classifier | — | BYOK | ✓ |
| Meal planning | — | ✓ | ✓ |
| Multi-household | — | — | ✓ |
| Leftover mode (5/day) | ✓ | ✓ | ✓ |
**BYOK** = bring your own LLM backend. Configure `~/.config/circuitforge/llm.yaml` to unlock AI features at any tier without a paid subscription.
---
## Stack
- **Frontend:** Vue 3 SPA (Vite + TypeScript), served on port 8511
- **Backend:** FastAPI + SQLite via `circuitforge-core`, API on port 8512
- **Auth:** CircuitForge session cookie (cloud mode); local mode requires no account
- **Licensing:** Heimdall — free tier auto-provisioned at signup
---
## Forgejo-primary
Kiwi is developed and maintained on Forgejo at [git.opensourcesolarpunk.com/Circuit-Forge/kiwi](https://git.opensourcesolarpunk.com/Circuit-Forge/kiwi). GitHub and Codeberg are read-only mirrors. File issues and submit pull requests on Forgejo.
---
BYOK = bring your own LLM backend (configure `~/.config/circuitforge/llm.yaml`)
## License
Kiwi uses a split license:
- **Discovery and inventory pipeline** (barcode scan, expiry tracking, pantry CRUD, CSV export, recipe browser): [MIT](LICENSE-MIT)
- **AI features** (receipt OCR, LLM recipe suggestions, style auto-classifier): [BSL 1.1](LICENSE-BSL) — free for personal non-commercial self-hosting; commercial use or SaaS re-hosting requires a paid license. Converts to MIT after 4 years.
Humans own design, architecture, code review, testing, and verification. LLMs are part of our development workflow. [Our positions on LLM use →](https://circuitforge.tech/positions)
Privacy · Safety · Accessibility — co-equal, non-negotiable across all CircuitForge products.
Discovery/pipeline layer: MIT
AI features: BSL 1.1 (free for personal non-commercial self-hosting)

View file

@ -1,332 +0,0 @@
# app/api/endpoints/activitypub.py
# MIT License
#
# ActivityPub endpoints for Kiwi instances:
# GET /.well-known/webfinger — WebFinger JRD
# GET /ap/actor — Instance actor document
# POST /ap/actor/inbox — Incoming activities
# GET /ap/outbox — Outgoing activities (OrderedCollection)
# GET /ap/posts/{slug} — Individual AP Note
# GET /ap/followers — Followers collection (count only)
# GET /ap/following — Following collection (empty stub)
#
# All endpoints are no-ops / 404 when AP_ENABLED=false or actor not loaded.
# The WebFinger and well-known routes are mounted at the root app level (not
# under /api/v1) — see main.py.
from __future__ import annotations
import asyncio
import json
import logging
from datetime import datetime, timezone
from fastapi import APIRouter, HTTPException, Request, Response
from fastapi.responses import JSONResponse
from app.core.config import settings
from app.services.ap.keys import get_actor
logger = logging.getLogger(__name__)
# ── Two routers: one for well-known (root mount), one for /ap prefix ─────────
webfinger_router = APIRouter(tags=["activitypub"])
ap_router = APIRouter(prefix="/ap", tags=["activitypub"])
_AP_CONTENT_TYPE = "application/activity+json"
_JRD_CONTENT_TYPE = "application/jrd+json"
def _actor_required():
actor = get_actor()
if actor is None:
raise HTTPException(status_code=404, detail="ActivityPub not enabled on this instance.")
return actor
# ── WebFinger ─────────────────────────────────────────────────────────────────
@webfinger_router.get("/.well-known/webfinger")
async def webfinger(resource: str | None = None):
actor = get_actor()
if actor is None:
raise HTTPException(status_code=404, detail="ActivityPub not enabled.")
expected = f"acct:kiwi@{settings.AP_HOST}"
if resource and resource != expected:
raise HTTPException(status_code=404, detail=f"Resource {resource!r} not found.")
jrd = {
"subject": expected,
"links": [
{
"rel": "self",
"type": _AP_CONTENT_TYPE,
"href": actor.actor_id,
}
],
}
return Response(
content=json.dumps(jrd),
media_type=_JRD_CONTENT_TYPE,
)
# ── Actor ─────────────────────────────────────────────────────────────────────
@ap_router.get("/actor")
async def get_actor_doc():
actor = _actor_required()
return Response(
content=json.dumps(actor.to_ap_dict()),
media_type=_AP_CONTENT_TYPE,
)
# ── Inbox (mounted via make_inbox_router below) ───────────────────────────────
async def _on_follow(activity: dict, headers: dict) -> None:
"""Accept Follow: add to ap_followers, send Accept(Follow) back."""
actor_url = activity.get("actor", "")
if not actor_url:
return
from app.db.store import Store
from app.core.config import settings as _settings
db_path = _settings.DB_PATH
inbox_url, shared_inbox = await asyncio.to_thread(_resolve_inbox, actor_url)
if inbox_url is None:
return
import sqlite3
conn = sqlite3.connect(str(db_path))
try:
conn.execute(
"""INSERT OR REPLACE INTO ap_followers
(actor_id, inbox_url, shared_inbox, followed_at, active)
VALUES (?, ?, ?, ?, 1)""",
(actor_url, inbox_url, shared_inbox, datetime.now(timezone.utc).isoformat()),
)
conn.commit()
finally:
conn.close()
actor = get_actor()
if actor is None:
return
accept = {
"@context": "https://www.w3.org/ns/activitystreams",
"id": f"{actor.actor_id}/accepts/{activity.get('id', 'unknown')}",
"type": "Accept",
"actor": actor.actor_id,
"object": activity,
}
from circuitforge_core.activitypub import deliver_activity
await asyncio.to_thread(deliver_activity, accept, inbox_url, actor, 10.0)
async def _on_undo(activity: dict, headers: dict) -> None:
"""Handle Undo(Follow): deactivate the follower row."""
inner = activity.get("object", {})
if isinstance(inner, dict) and inner.get("type") == "Follow":
actor_url = activity.get("actor", "")
if actor_url:
import sqlite3
conn = sqlite3.connect(str(settings.DB_PATH))
try:
conn.execute(
"UPDATE ap_followers SET active = 0 WHERE actor_id = ?", (actor_url,)
)
conn.commit()
finally:
conn.close()
async def _dedup_activity(activity_id: str | None) -> bool:
"""Return True (already seen) if activity_id is in ap_received; otherwise insert it."""
if not activity_id:
return False
import sqlite3
conn = sqlite3.connect(str(settings.DB_PATH))
try:
try:
conn.execute(
"INSERT INTO ap_received (activity_id) VALUES (?)", (activity_id,)
)
conn.commit()
return False
except sqlite3.IntegrityError:
return True
finally:
conn.close()
def _build_inbox_router():
from circuitforge_core.activitypub.inbox import make_inbox_router
async def on_follow(activity: dict, headers: dict) -> None:
if await _dedup_activity(activity.get("id")):
return
await _on_follow(activity, headers)
async def on_undo(activity: dict, headers: dict) -> None:
if await _dedup_activity(activity.get("id")):
return
await _on_undo(activity, headers)
return make_inbox_router(
handlers={"Follow": on_follow, "Undo": on_undo},
verify_key_fetcher=None, # Signature verification enabled in prod when actor is loaded
path="/inbox",
)
# Mount inbox at /ap/actor/inbox (AP spec: inbox is a sub-resource of the actor)
try:
_inbox_sub = _build_inbox_router()
ap_router.include_router(_inbox_sub, prefix="/actor")
except Exception as _e:
logger.warning("AP inbox router not available: %s", _e)
# ── Outbox ────────────────────────────────────────────────────────────────────
@ap_router.get("/outbox")
async def get_outbox(page: int | None = None, request: Request = None):
actor = _actor_required()
from app.api.endpoints.community import _get_community_store
store = _get_community_store()
base = f"https://{settings.AP_HOST}"
if store is None:
collection = {
"@context": "https://www.w3.org/ns/activitystreams",
"id": f"{actor.outbox_url}",
"type": "OrderedCollection",
"totalItems": 0,
"orderedItems": [],
}
return Response(content=json.dumps(collection), media_type=_AP_CONTENT_TYPE)
PAGE_SIZE = 20
offset = ((page or 1) - 1) * PAGE_SIZE
posts = await asyncio.to_thread(store.list_posts, limit=PAGE_SIZE, offset=offset)
items = [_post_to_ap_note(p, actor, base) for p in posts]
collection = {
"@context": "https://www.w3.org/ns/activitystreams",
"id": actor.outbox_url + (f"?page={page}" if page else ""),
"type": "OrderedCollectionPage" if page else "OrderedCollection",
"orderedItems": items,
}
return Response(content=json.dumps(collection), media_type=_AP_CONTENT_TYPE)
# ── Individual post ───────────────────────────────────────────────────────────
@ap_router.get("/posts/{slug}")
async def get_ap_post(slug: str):
actor = _actor_required()
from app.api.endpoints.community import _get_community_store
store = _get_community_store()
if store is None:
raise HTTPException(status_code=404, detail="Community DB not available.")
post = await asyncio.to_thread(store.get_post_by_slug, slug)
if post is None:
raise HTTPException(status_code=404, detail="Post not found.")
base = f"https://{settings.AP_HOST}"
note = _post_to_ap_note(post, actor, base)
return Response(content=json.dumps(note), media_type=_AP_CONTENT_TYPE)
# ── Followers / Following ─────────────────────────────────────────────────────
@ap_router.get("/followers")
async def get_followers():
actor = _actor_required()
import sqlite3
count = 0
try:
conn = sqlite3.connect(str(settings.DB_PATH))
row = conn.execute("SELECT COUNT(*) FROM ap_followers WHERE active = 1").fetchone()
conn.close()
count = row[0] if row else 0
except Exception:
pass
collection = {
"@context": "https://www.w3.org/ns/activitystreams",
"id": f"{actor.actor_id}/followers",
"type": "OrderedCollection",
"totalItems": count,
}
return Response(content=json.dumps(collection), media_type=_AP_CONTENT_TYPE)
@ap_router.get("/following")
async def get_following():
actor = _actor_required()
collection = {
"@context": "https://www.w3.org/ns/activitystreams",
"id": f"{actor.actor_id}/following",
"type": "OrderedCollection",
"totalItems": 0,
"orderedItems": [],
}
return Response(content=json.dumps(collection), media_type=_AP_CONTENT_TYPE)
# ── Helpers ───────────────────────────────────────────────────────────────────
def _post_to_ap_note(post, actor, base_url: str) -> dict:
from circuitforge_core.activitypub import make_note
from app.services.community.ap_compat import _build_content
diet_tags: list[str] = list(getattr(post, "dietary_tags", []) or [])
hashtags = [{"type": "Hashtag", "name": "#Kiwi", "href": f"{base_url}/ap/tags/kiwi"}]
for tag in diet_tags[:4]:
ht = "".join(w.capitalize() for w in tag.replace("-", " ").split())
hashtags.append({"type": "Hashtag", "name": f"#{ht}"})
content = _build_content(
{
"title": post.title,
"description": getattr(post, "description", None),
"outcome_notes": getattr(post, "outcome_notes", None),
"dietary_tags": diet_tags,
}
)
published = post.published
note = make_note(
actor_id=actor.actor_id,
content=content,
tag=hashtags,
published=published if isinstance(published, datetime) else None,
)
note["id"] = f"{base_url}/ap/posts/{post.slug}"
return note
def _resolve_inbox(actor_url: str) -> tuple[str | None, str | None]:
"""Fetch an AP actor document and extract inbox + sharedInbox URLs."""
try:
import httpx
resp = httpx.get(
actor_url,
headers={"Accept": "application/activity+json"},
timeout=8.0,
follow_redirects=True,
)
resp.raise_for_status()
doc = resp.json()
inbox = doc.get("inbox")
shared = doc.get("endpoints", {}).get("sharedInbox")
return inbox, shared
except Exception as exc:
logger.debug("Could not resolve actor %s: %s", actor_url, exc)
return None, None

View file

@ -167,54 +167,6 @@ def _validate_publish_body(body: dict) -> None:
raise HTTPException(status_code=422, detail="photo_url must be an https:// URL.")
@router.post("/check-similar")
async def check_similar(body: dict, session: CloudUser = Depends(get_session)):
"""Pre-submission dedup check: return similar existing posts for the given title/recipe_id.
Safe to call with no community store configured returns empty list rather than 503.
"""
store = _get_community_store()
if store is None:
return {"similar_posts": []}
title = (body.get("title") or "").strip()
recipe_id = body.get("recipe_id")
post_type = body.get("post_type")
if not title:
return {"similar_posts": []}
candidates = await asyncio.to_thread(
store.search_similar_posts,
title,
recipe_id,
post_type,
8,
)
if not candidates:
return {"similar_posts": []}
from app.services.community.dedup import build_similar_post_result, fetch_recipe_ingredients
incoming_ingredients = await asyncio.to_thread(
fetch_recipe_ingredients, session.db, recipe_id
)
results = []
for post in candidates:
result = await asyncio.to_thread(
build_similar_post_result,
post,
recipe_id,
incoming_ingredients,
session.db,
)
if result["similarity_tier"] != "different":
results.append(result)
return {"similar_posts": results[:5]}
@router.post("/posts", status_code=201)
async def publish_post(body: dict, session: CloudUser = Depends(get_session)):
from app.tiers import can_use
@ -262,8 +214,6 @@ async def publish_post(body: dict, session: CloudUser = Depends(get_session)):
today = datetime.now(timezone.utc).strftime("%Y-%m-%d")
slug = f"kiwi-{_post_type_prefix(post_type)}-{pseudonym.lower().replace(' ', '')}-{today}-{slug_title}"[:120]
similar_to_ref = body.get("similar_to_ref") or None
from circuitforge_core.community.models import CommunityPost
post = CommunityPost(
slug=slug,
@ -291,7 +241,6 @@ async def publish_post(body: dict, session: CloudUser = Depends(get_session)):
fat_pct=snapshot.fat_pct,
protein_pct=snapshot.protein_pct,
moisture_pct=snapshot.moisture_pct,
similar_to_ref=similar_to_ref,
)
try:
@ -301,41 +250,7 @@ async def publish_post(body: dict, session: CloudUser = Depends(get_session)):
status_code=409,
detail="A post with this title already exists today. Try a different title.",
) from exc
post_dict = _post_to_dict(inserted)
# AP delivery + Mastodon post (Paid tier, AP_ENABLED, opted-in)
from app.core.config import settings as _settings
if _settings.AP_ENABLED and session.tier in ("paid", "premium", "ultra"):
from circuitforge_core.activitypub import make_create, make_note, PUBLIC
from app.services.ap.keys import get_actor
from app.services.ap.delivery import deliver_to_followers
_ap_actor = get_actor()
if _ap_actor is not None:
base = f"https://{_settings.AP_HOST}"
from app.api.endpoints.activitypub import _post_to_ap_note
_note = _post_to_ap_note(inserted, _ap_actor, base)
_activity = make_create(_ap_actor, _note)
asyncio.create_task(
asyncio.to_thread(
deliver_to_followers, inserted.slug, _activity, session.db
)
)
# Mastodon post if user has connected account and opted in
if body.get("post_to_mastodon"):
from app.services.ap.mastodon import build_post_content, get_token, post_status
_masto = await asyncio.to_thread(
get_token, session.db, session.user_id, _settings.AP_TOKEN_ENCRYPTION_KEY
)
if _masto:
_masto_url, _masto_token = _masto
_content = build_post_content(post_dict)
asyncio.create_task(
asyncio.to_thread(post_status, _masto_url, _masto_token, _content)
)
return post_dict
return _post_to_dict(inserted)
@router.delete("/posts/{slug}", status_code=204)
@ -436,7 +351,6 @@ def _post_to_dict(post) -> dict:
"fat_pct": post.fat_pct,
"protein_pct": post.protein_pct,
"moisture_pct": post.moisture_pct,
"similar_to_ref": getattr(post, "similar_to_ref", None),
}

View file

@ -1,5 +0,0 @@
# app/api/endpoints/corrections.py — user corrections to LLM output for SFT training
from circuitforge_core.api import make_corrections_router
from app.db.session import get_db
router = make_corrections_router(get_db=get_db, product="kiwi")

View file

@ -11,8 +11,7 @@ import sqlite3
import requests
from fastapi import APIRouter, Depends, HTTPException
from app.cloud_session import CloudUser, CLOUD_DATA_ROOT, get_session
from app.services.heimdall_orch import HEIMDALL_URL, HEIMDALL_ADMIN_TOKEN
from app.cloud_session import CloudUser, CLOUD_DATA_ROOT, HEIMDALL_URL, HEIMDALL_ADMIN_TOKEN, get_session
from app.db.store import Store
from app.models.schemas.household import (
HouseholdAcceptRequest,

View file

@ -478,8 +478,7 @@ async def scan_barcode_image(
from app.services.openfoodfacts import OpenFoodFactsService
from app.services.expiration_predictor import ExpirationPredictor
image_bytes = temp_file.read_bytes()
barcodes = await asyncio.to_thread(BarcodeScanner().scan_from_bytes, image_bytes)
barcodes = await asyncio.to_thread(BarcodeScanner().scan_image, temp_file)
if not barcodes:
return BarcodeScanResponse(
success=False, barcodes_found=0, results=[],
@ -501,10 +500,9 @@ async def scan_barcode_image(
product_info = await off.lookup_product(code)
product_source = "openfoodfacts"
db_product = None
inventory_item = None
if product_info:
db_product, _ = await asyncio.to_thread(
if product_info and auto_add_to_inventory:
product, _ = await asyncio.to_thread(
store.get_or_create_product,
product_info.get("name", code),
code,
@ -514,30 +512,29 @@ async def scan_barcode_image(
source=product_source,
source_data=product_info,
)
if auto_add_to_inventory:
exp = predictor.predict_expiration(
product_info.get("category", ""),
location,
product_name=product_info.get("name", code),
tier=session.tier,
has_byok=session.has_byok,
)
resolved_qty = product_info.get("pack_quantity") or quantity
resolved_unit = product_info.get("pack_unit") or "count"
inventory_item = await asyncio.to_thread(
store.add_inventory_item,
db_product["id"], location,
quantity=resolved_qty,
unit=resolved_unit,
expiration_date=str(exp) if exp else None,
source="barcode_scan",
)
product_found = db_product is not None
exp = predictor.predict_expiration(
product_info.get("category", ""),
location,
product_name=product_info.get("name", code),
tier=session.tier,
has_byok=session.has_byok,
)
resolved_qty = product_info.get("pack_quantity") or quantity
resolved_unit = product_info.get("pack_unit") or "count"
inventory_item = await asyncio.to_thread(
store.add_inventory_item,
product["id"], location,
quantity=resolved_qty,
unit=resolved_unit,
expiration_date=str(exp) if exp else None,
source="barcode_scan",
)
product_found = product_info is not None
needs_capture = not product_found and has_visual_capture
results.append({
"barcode": code,
"barcode_type": bc.get("type", "unknown"),
"product": ProductResponse.model_validate(db_product) if db_product else None,
"product": ProductResponse.model_validate(product_info) if product_info else None,
"inventory_item": InventoryItemResponse.model_validate(inventory_item) if inventory_item else None,
"added_to_inventory": inventory_item is not None,
"needs_manual_entry": not product_found and not needs_capture,

View file

@ -1,133 +0,0 @@
# app/api/endpoints/mastodon_oauth.py
# MIT License
#
# Mastodon OAuth flow endpoints:
# POST /social/mastodon/connect — Start OAuth (dynamic app registration)
# GET /social/mastodon/callback — OAuth callback, exchange code for token
# DELETE /social/mastodon/disconnect — Revoke and remove stored token
# GET /social/mastodon/status — Check connection status
from __future__ import annotations
import asyncio
import logging
from urllib.parse import urlencode
from fastapi import APIRouter, Depends, HTTPException
from fastapi.responses import RedirectResponse
from app.cloud_session import CloudUser, get_session
from app.core.config import settings
logger = logging.getLogger(__name__)
router = APIRouter(prefix="/social/mastodon", tags=["mastodon"])
def _redirect_uri() -> str:
host = settings.AP_HOST or "localhost:8512"
return f"https://{host}/api/v1/social/mastodon/callback"
# In-memory pending state: maps state_token → {instance_url, client_id, client_secret, user_id}
# A real deployment would persist this in a short-TTL cache or DB.
_pending: dict[str, dict] = {}
@router.post("/connect")
async def connect_mastodon(body: dict, session: CloudUser = Depends(get_session)):
"""Start the Mastodon OAuth flow.
Body: {"instance_url": "https://mastodon.social"}
Returns: {"authorize_url": "..."}
"""
import secrets
from app.services.ap.mastodon import build_authorize_url, register_app
instance_url = (body.get("instance_url") or "").strip().rstrip("/")
if not instance_url.startswith("https://"):
raise HTTPException(status_code=422, detail="instance_url must be an https:// URL.")
redirect_uri = _redirect_uri()
try:
app_creds = await asyncio.to_thread(register_app, instance_url, redirect_uri)
except Exception as exc:
raise HTTPException(
status_code=502, detail=f"Could not register with Mastodon instance: {exc}"
) from exc
state = secrets.token_urlsafe(24)
_pending[state] = {
"instance_url": instance_url,
"client_id": app_creds["client_id"],
"client_secret": app_creds["client_secret"],
"user_id": session.user_id,
}
authorize_url = build_authorize_url(
instance_url=instance_url,
client_id=app_creds["client_id"],
redirect_uri=redirect_uri + f"?state={state}",
)
return {"authorize_url": authorize_url, "state": state}
@router.get("/callback")
async def mastodon_callback(code: str | None = None, state: str | None = None):
"""OAuth callback. Exchanges auth code for access token and stores it."""
if not code or not state:
raise HTTPException(status_code=400, detail="Missing code or state parameter.")
pending = _pending.pop(state, None)
if pending is None:
raise HTTPException(status_code=400, detail="Unknown or expired OAuth state.")
from app.services.ap.mastodon import exchange_code, store_token
redirect_uri = _redirect_uri() + f"?state={state}"
try:
access_token = await asyncio.to_thread(
exchange_code,
pending["instance_url"],
pending["client_id"],
pending["client_secret"],
code,
redirect_uri,
)
except Exception as exc:
raise HTTPException(status_code=502, detail=f"Token exchange failed: {exc}") from exc
await asyncio.to_thread(
store_token,
settings.DB_PATH,
pending["user_id"],
pending["instance_url"],
access_token,
settings.AP_TOKEN_ENCRYPTION_KEY,
)
# Redirect to frontend settings page after successful connect
return RedirectResponse(url="/#/settings?mastodon=connected", status_code=302)
@router.delete("/disconnect", status_code=204)
async def disconnect_mastodon(session: CloudUser = Depends(get_session)):
"""Remove the stored Mastodon token."""
from app.services.ap.mastodon import delete_token
await asyncio.to_thread(delete_token, settings.DB_PATH, session.user_id)
@router.get("/status")
async def mastodon_status(session: CloudUser = Depends(get_session)):
"""Return connection status and instance URL (no token value)."""
from app.services.ap.mastodon import get_token
result = await asyncio.to_thread(
get_token,
settings.DB_PATH,
session.user_id,
settings.AP_TOKEN_ENCRYPTION_KEY,
)
if result is None:
return {"connected": False, "instance_url": None}
instance_url, _ = result
return {"connected": True, "instance_url": instance_url}

View file

@ -1,371 +0,0 @@
"""Recipe scanner endpoints (kiwi#9).
POST /recipes/scan -- scan photo(s) -> structured recipe JSON (not saved)
POST /recipes/scan/save -- save a confirmed scanned recipe to user_recipes
GET /recipes/user -- list user-created recipes
GET /recipes/user/{id} -- get a single user recipe
DELETE /recipes/user/{id} -- delete a user recipe
BSL 1.1 -- recipe_scan requires Paid tier or BYOK.
"""
from __future__ import annotations
import asyncio
import json as _json
import logging
import uuid
from pathlib import Path
from typing import Annotated
import aiofiles
from fastapi import APIRouter, Depends, File, HTTPException, UploadFile
from fastapi.responses import JSONResponse, StreamingResponse
from app.cloud_session import CloudUser, get_session
from app.core.config import settings
from app.db.session import get_store
from app.db.store import Store
from app.models.schemas.recipe_scan import (
ScannedIngredientSchema,
ScannedRecipeResponse,
ScannedRecipeSaveRequest,
UserRecipeResponse,
)
from app.tiers import can_use
logger = logging.getLogger(__name__)
router = APIRouter()
_ALLOWED_MIME_TYPES = {
"image/jpeg", "image/jpg", "image/png", "image/webp", "image/heic", "image/heif"
}
_MAX_FILE_SIZE_MB = 20
async def _save_upload_temp(file: UploadFile) -> Path:
"""Write upload to a temp path under UPLOAD_DIR. Caller is responsible for cleanup."""
settings.ensure_dirs()
dest = settings.UPLOAD_DIR / f"scan_{uuid.uuid4()}_{file.filename}"
async with aiofiles.open(dest, "wb") as f:
await f.write(await file.read())
return dest
def _result_to_response(result) -> ScannedRecipeResponse:
"""Convert ScannedRecipeResult (dataclass) to Pydantic response schema."""
return ScannedRecipeResponse(
title=result.title,
subtitle=result.subtitle,
servings=result.servings,
cook_time=result.cook_time,
source_note=result.source_note,
ingredients=[
ScannedIngredientSchema(
name=i.name,
qty=i.qty,
unit=i.unit,
raw=i.raw,
in_pantry=i.in_pantry,
)
for i in result.ingredients
],
steps=result.steps,
notes=result.notes,
tags=result.tags,
pantry_match_pct=result.pantry_match_pct,
confidence=result.confidence,
warnings=result.warnings,
)
def _row_to_user_recipe(row: dict) -> UserRecipeResponse:
"""Convert a store row dict to UserRecipeResponse."""
return UserRecipeResponse(
id=row["id"],
title=row["title"],
subtitle=row.get("subtitle"),
servings=row.get("servings"),
cook_time=row.get("cook_time"),
source_note=row.get("source_note"),
ingredients=[
ScannedIngredientSchema(**i) if isinstance(i, dict) else i
for i in (row.get("ingredients") or [])
],
steps=row.get("steps") or [],
notes=row.get("notes"),
tags=row.get("tags") or [],
source=row.get("source", "manual"),
pantry_match_pct=row.get("pantry_match_pct"),
created_at=row["created_at"],
)
# ── Scan endpoint ──────────────────────────────────────────────────────────────
@router.post("/scan", response_model=ScannedRecipeResponse)
async def scan_recipe(
files: Annotated[list[UploadFile], File(...)],
store: Store = Depends(get_store),
session: CloudUser = Depends(get_session),
):
"""Scan one or more recipe photos and return a structured recipe for review.
Accepts 1-4 images. Multi-page recipes (e.g. ingredients on page 1,
directions on page 2) work best when all pages are submitted together.
The response is NOT saved automatically -- the user reviews and edits it,
then calls POST /recipes/scan/save to persist.
Tier: Paid (or BYOK).
"""
if not can_use("recipe_scan", session.tier, session.has_byok):
raise HTTPException(
status_code=403,
detail=(
"Recipe scanning requires Paid tier or a configured vision backend (BYOK). "
"Set ANTHROPIC_API_KEY or connect to a cf-orch vision service."
),
)
if not files:
raise HTTPException(status_code=422, detail="At least one image file is required.")
if len(files) > 4:
raise HTTPException(status_code=422, detail="Maximum 4 images per scan request.")
for f in files:
ct = (f.content_type or "").lower()
if ct and ct not in _ALLOWED_MIME_TYPES:
raise HTTPException(
status_code=422,
detail=f"Unsupported file type: {ct}. Supported: JPEG, PNG, WebP, HEIC.",
)
# Save uploads to temp files
saved_paths: list[Path] = []
try:
for f in files:
saved_paths.append(await _save_upload_temp(f))
# Get pantry item names for cross-reference
inventory = await asyncio.to_thread(store.list_inventory)
pantry_names = [item["product_name"] for item in inventory if item.get("product_name")]
# Run scanner (blocks on VLM -- use to_thread)
from app.services.recipe.recipe_scanner import RecipeScanner
def _run_scan():
scanner = RecipeScanner()
return scanner.scan(saved_paths, pantry_names=pantry_names)
try:
result = await asyncio.to_thread(_run_scan)
except ValueError as exc:
msg = str(exc)
if "not_a_recipe" in msg:
raise HTTPException(
status_code=422,
detail="The image does not appear to contain a recipe. "
"Please photograph a recipe card, cookbook page, or handwritten note.",
)
raise HTTPException(status_code=422, detail=msg)
except RuntimeError as exc:
msg = str(exc)
logger.warning("Recipe scanner unavailable: %s", msg)
raise HTTPException(
status_code=503,
detail=(
"The recipe scanner is temporarily unavailable — "
"no vision backend could be reached. "
"Try again in a few minutes, or contact support if this persists."
),
)
return _result_to_response(result)
finally:
# Clean up temp files
for p in saved_paths:
try:
p.unlink(missing_ok=True)
except Exception:
pass
# ── SSE scan endpoint ─────────────────────────────────────────────────────────
async def _scan_recipe_sse(saved_paths: list[Path], pantry_names: list[str]):
"""Async generator yielding SSE events for a recipe scan.
Emits progress events while the vision service allocates and runs, then a
final "done" event containing the full recipe payload (same shape as the
ScannedRecipeResponse from POST /scan).
Events:
{"status": "allocating", "message": "..."}
{"status": "scanning", "message": "..."}
{"status": "structuring","message": "..."}
{"status": "done", "recipe": {...}}
{"status": "error", "message": "..."}
"""
queue: asyncio.Queue = asyncio.Queue()
loop = asyncio.get_running_loop()
def _run() -> None:
def cb(status: str, message: str) -> None:
loop.call_soon_threadsafe(queue.put_nowait, {"status": status, "message": message})
try:
from app.services.recipe.recipe_scanner import RecipeScanner
result = RecipeScanner().scan(saved_paths, pantry_names=pantry_names, progress_cb=cb)
recipe_dict = _result_to_response(result).model_dump()
loop.call_soon_threadsafe(queue.put_nowait, {"status": "done", "recipe": recipe_dict})
except ValueError as exc:
loop.call_soon_threadsafe(queue.put_nowait, {"status": "error", "message": str(exc)})
except RuntimeError as exc:
loop.call_soon_threadsafe(queue.put_nowait, {"status": "error", "message": str(exc)})
except Exception as exc:
logger.exception("Unexpected error in recipe scan thread")
loop.call_soon_threadsafe(queue.put_nowait, {"status": "error", "message": "Scan failed unexpectedly."})
scan_task = asyncio.ensure_future(asyncio.to_thread(_run))
try:
while True:
try:
event = await asyncio.wait_for(queue.get(), timeout=180.0)
except asyncio.TimeoutError:
yield f"data: {_json.dumps({'status': 'error', 'message': 'Scan timed out after 3 minutes.'})}\n\n"
break
yield f"data: {_json.dumps(event)}\n\n"
if event["status"] in ("done", "error"):
break
finally:
if not scan_task.done():
scan_task.cancel()
@router.post("/scan/stream")
async def scan_recipe_stream(
files: Annotated[list[UploadFile], File(...)],
store: Store = Depends(get_store),
session: CloudUser = Depends(get_session),
):
"""Scan recipe photos and stream SSE progress events during model load.
Use this endpoint instead of POST /scan when you need live feedback during
cold-start model loading (first request after a GPU-idle period can take
30-60 seconds for cf-docuvision to warm up).
Tier: Paid (or BYOK) same gate as POST /scan.
"""
if not can_use("recipe_scan", session.tier, session.has_byok):
raise HTTPException(
status_code=403,
detail=(
"Recipe scanning requires Paid tier or a configured vision backend (BYOK). "
"Set ANTHROPIC_API_KEY or connect to a cf-orch vision service."
),
)
if not files:
raise HTTPException(status_code=422, detail="At least one image file is required.")
if len(files) > 4:
raise HTTPException(status_code=422, detail="Maximum 4 images per scan request.")
for f in files:
ct = (f.content_type or "").lower()
if ct and ct not in _ALLOWED_MIME_TYPES:
raise HTTPException(
status_code=422,
detail=f"Unsupported file type: {ct}. Supported: JPEG, PNG, WebP, HEIC.",
)
saved_paths: list[Path] = []
for f in files:
saved_paths.append(await _save_upload_temp(f))
inventory = await asyncio.to_thread(store.list_inventory)
pantry_names = [item["product_name"] for item in inventory if item.get("product_name")]
async def generate():
try:
async for chunk in _scan_recipe_sse(saved_paths, pantry_names):
yield chunk
finally:
for p in saved_paths:
try:
p.unlink(missing_ok=True)
except Exception:
pass
return StreamingResponse(generate(), media_type="text/event-stream")
# ── Save endpoint ──────────────────────────────────────────────────────────────
@router.post("/scan/save", response_model=UserRecipeResponse, status_code=201)
async def save_scanned_recipe(
body: ScannedRecipeSaveRequest,
store: Store = Depends(get_store),
session: CloudUser = Depends(get_session),
):
"""Save a user-reviewed (possibly edited) scanned recipe.
The body is the ScannedRecipeResponse (or a user-edited version of it).
Returns the persisted UserRecipe with an assigned ID.
Tier: Free (saving your own recipe doesn't require vision access).
"""
def _save():
return store.create_user_recipe(
title=body.title,
subtitle=body.subtitle,
servings=body.servings,
cook_time=body.cook_time,
source_note=body.source_note,
ingredients=[i.model_dump() for i in body.ingredients],
steps=body.steps,
notes=body.notes,
tags=body.tags,
source=body.source,
pantry_match_pct=None,
)
row = await asyncio.to_thread(_save)
return _row_to_user_recipe(row)
# ── User recipe list / get / delete ───────────────────────────────────────────
@router.get("/user", response_model=list[UserRecipeResponse])
async def list_user_recipes(
store: Store = Depends(get_store),
session: CloudUser = Depends(get_session),
):
"""List all user-created recipes (scanned + manually entered), newest first."""
rows = await asyncio.to_thread(store.list_user_recipes)
return [_row_to_user_recipe(r) for r in rows]
@router.get("/user/{recipe_id}", response_model=UserRecipeResponse)
async def get_user_recipe(
recipe_id: int,
store: Store = Depends(get_store),
session: CloudUser = Depends(get_session),
):
"""Get a single user recipe by ID."""
row = await asyncio.to_thread(store.get_user_recipe, recipe_id)
if not row:
raise HTTPException(status_code=404, detail="User recipe not found.")
return _row_to_user_recipe(row)
@router.delete("/user/{recipe_id}", status_code=204)
async def delete_user_recipe(
recipe_id: int,
store: Store = Depends(get_store),
session: CloudUser = Depends(get_session),
):
"""Delete a user recipe by ID."""
deleted = await asyncio.to_thread(store.delete_user_recipe, recipe_id)
if not deleted:
raise HTTPException(status_code=404, detail="User recipe not found.")
return JSONResponse(status_code=204, content=None)

View file

@ -6,9 +6,7 @@ import logging
from pathlib import Path
from typing import Annotated
import json as _json_mod
from fastapi import APIRouter, Depends, HTTPException, Query
from fastapi.responses import StreamingResponse
from app.cloud_session import CloudUser, _auth_label, get_session
@ -16,12 +14,8 @@ log = logging.getLogger(__name__)
from app.db.session import get_store
from app.db.store import Store
from app.models.schemas.recipe import (
AskRequest,
AskResponse,
AskRecipeHit,
AssemblyTemplateOut,
BuildRequest,
LeftoversResponse,
RecipeJobStatus,
RecipeRequest,
RecipeResult,
@ -108,39 +102,6 @@ def _build_stream_prompt(db_path: Path, level: int) -> str:
store.close()
async def _stream_recipe_sse(db_path: Path, req: RecipeRequest):
"""Async generator that yields SSE events for a streaming recipe request.
Phase 1 (thread): classify pantry items using a temporary Store.
Phase 2 (async): stream tokens from LLM via LLMRecipeGenerator.stream_generate().
"""
def _prep(db_path: Path) -> tuple[list, list[str]]:
from app.services.recipe.element_classifier import IngredientClassifier
store = Store(db_path)
try:
classifier = IngredientClassifier(store)
profiles = classifier.classify_batch(req.pantry_items)
gaps = classifier.identify_gaps(profiles)
return profiles, gaps
finally:
store.close()
try:
profiles, gaps = await asyncio.to_thread(_prep, db_path)
except Exception as exc:
yield f"data: {_json_mod.dumps({'error': str(exc)})}\n\n"
return
from app.services.recipe.llm_recipe import LLMRecipeGenerator
gen = LLMRecipeGenerator(None)
try:
async for token in gen.stream_generate(req, profiles, gaps):
yield f"data: {_json_mod.dumps({'chunk': token})}\n\n"
yield f"data: {_json_mod.dumps({'done': True})}\n\n"
except Exception as exc:
yield f"data: {_json_mod.dumps({'error': str(exc)})}\n\n"
async def _enqueue_recipe_job(session: CloudUser, req: RecipeRequest):
"""Queue an async recipe_llm job and return 202 with job_id.
@ -182,7 +143,6 @@ async def _enqueue_recipe_job(session: CloudUser, req: RecipeRequest):
async def suggest_recipes(
req: RecipeRequest,
async_mode: bool = Query(default=False, alias="async"),
stream: bool = Query(default=False),
session: CloudUser = Depends(get_session),
store: Store = Depends(get_store),
):
@ -218,13 +178,6 @@ async def suggest_recipes(
req = req.model_copy(update={"level": 2})
orch_fallback = True
if stream and req.level in (3, 4):
return StreamingResponse(
_stream_recipe_sse(session.db, req),
media_type="text/event-stream",
headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
)
if req.level in (3, 4) and async_mode:
return await _enqueue_recipe_job(session, req)
@ -373,7 +326,6 @@ async def browse_recipes(
subcategory: Annotated[str | None, Query()] = None,
q: Annotated[str | None, Query(max_length=200)] = None,
sort: Annotated[str, Query(pattern="^(default|alpha|alpha_desc|match)$")] = "default",
required_ingredient: Annotated[str | None, Query(max_length=100)] = None,
session: CloudUser = Depends(get_session),
) -> dict:
"""Return a paginated list of recipes for a domain/category.
@ -382,7 +334,6 @@ async def browse_recipes(
Pass subcategory to narrow within a category that has subcategories.
Pass q to filter by title substring. Pass sort for ordering (default/alpha/alpha_desc/match).
sort=match orders by pantry coverage DESC; falls back to default when no pantry_items.
Pass required_ingredient to restrict results to recipes that must include that ingredient.
"""
if domain not in DOMAINS:
raise HTTPException(status_code=404, detail=f"Unknown domain '{domain}'.")
@ -425,7 +376,6 @@ async def browse_recipes(
q=q or None,
sort=sort,
sensory_exclude=sensory_exclude,
required_ingredient=required_ingredient or None,
)
# ── Attach time/effort signals to each browse result ────────────────
@ -438,11 +388,7 @@ async def browse_recipes(
except Exception:
directions_raw = []
if directions_raw:
_profile = parse_time_effort(
directions_raw,
ingredients=recipe_row.get("ingredients") or [],
ingredient_names=recipe_row.get("ingredient_names") or [],
)
_profile = parse_time_effort(directions_raw)
recipe_row["active_min"] = _profile.active_min
recipe_row["passive_min"] = _profile.passive_min
else:
@ -477,11 +423,7 @@ async def browse_recipes(
except Exception:
directions_raw = []
if directions_raw:
_profile = parse_time_effort(
directions_raw,
ingredients=recipe_row.get("ingredients") or [],
ingredient_names=recipe_row.get("ingredient_names") or [],
)
_profile = parse_time_effort(directions_raw)
recipe_row["active_min"] = _profile.active_min
recipe_row["passive_min"] = _profile.passive_min
else:
@ -600,137 +542,6 @@ async def build_recipe(
return result
_ASK_STOPWORDS: frozenset[str] = frozenset({
"what", "can", "make", "with", "have", "some", "the", "and", "for",
"that", "this", "these", "those", "how", "about", "are", "there",
"give", "show", "find", "want", "need", "like", "any", "good",
"quick", "easy", "simple", "fast", "using", "use", "from", "into",
"more", "much", "just", "only", "my", "please", "could", "would",
"should", "something", "anything", "everything", "ideas", "idea",
"suggest", "meal", "food", "dish", "dishes", "today", "tonight",
"tomorrow", "now", "here", "there", "recipes", "recipe", "dinner",
"lunch", "breakfast", "snack", "under", "minutes", "hours", "time",
"left", "over", "also", "some", "make", "cook", "made", "cooked",
})
import re as _re
def _extract_ask_keywords(question: str) -> list[str]:
"""Extract food-relevant keywords from a natural language question."""
tokens = _re.findall(r"[a-zA-Z]+", question.lower())
return [t for t in tokens if len(t) > 3 and t not in _ASK_STOPWORDS]
def _ask_in_thread(db_path: Path, question: str, pantry_items: list[str]) -> AskResponse:
"""Run Ask logic in a worker thread.
Free tier: keyword extraction + FTS ingredient search.
Paid tier path: same search, then LLM synthesis over results.
The caller handles tier gating and LLM synthesis outside this thread
to avoid importing LLMRouter in a sync context.
"""
import json as _json
store = Store(db_path)
try:
keywords = _extract_ask_keywords(question)
ingredient_hits: list[dict] = []
if keywords:
ingredient_hits = store.search_recipes_by_ingredients(keywords, limit=15)
# Also search by title using the full question text as a substring hint.
# browse_recipes q= does title LIKE %q%. Extract the longest keyword
# from the question as the title probe (most likely to appear in a title).
title_hits: list[dict] = []
title_probe = max(keywords, key=len) if keywords else None
if title_probe:
browse_result = store.browse_recipes(
keywords=None,
page=1,
page_size=12,
pantry_items=pantry_items or None,
q=title_probe,
sort="match" if pantry_items else "default",
)
title_hits = browse_result.get("recipes", [])
# Merge by ID; ingredient hits come first (more semantically relevant).
seen: set[int] = set()
merged: list[dict] = []
for row in ingredient_hits + title_hits:
rid = row.get("id")
if rid is not None and rid not in seen:
seen.add(rid)
merged.append(row)
# Compute pantry match_pct if caller sent pantry items.
pantry_set = {p.lower() for p in pantry_items} if pantry_items else set()
hits: list[AskRecipeHit] = []
for row in merged[:12]:
match_pct: float | None = None
if pantry_set:
raw_names = row.get("ingredient_names") or []
if isinstance(raw_names, str):
try:
raw_names = _json.loads(raw_names)
except Exception:
raw_names = []
if raw_names:
covered = sum(
1 for n in raw_names
if any(p in n.lower() for p in pantry_set)
)
match_pct = round(covered / len(raw_names), 2)
hits.append(AskRecipeHit(
id=row["id"],
title=row.get("title", ""),
category=row.get("category"),
match_pct=match_pct,
))
return AskResponse(answer=None, recipes=hits, tier="free")
finally:
store.close()
@router.post("/ask", response_model=AskResponse)
async def ask_recipes(
req: AskRequest,
session: CloudUser = Depends(get_session),
) -> AskResponse:
"""Natural-language recipe search with optional LLM synthesis.
Free tier: keyword extraction from question FTS ingredient + title search.
Paid tier / BYOK: same search, then LLM synthesizes a short conversational answer.
"""
result = await asyncio.to_thread(_ask_in_thread, session.db, req.question, req.pantry_items)
# LLM synthesis: only for paid/premium/ultra tiers, not "local" dev tier.
# Wrapped in wait_for so an unresponsive model degrades gracefully to recipe list only.
paid_tier = session.tier in ("paid", "premium", "ultra")
if (paid_tier or session.has_byok) and result.recipes:
recipe_titles = ", ".join(r.title for r in result.recipes[:6])
prompt = (
f'You are a helpful kitchen assistant. The user asked: "{req.question}"\n\n'
f"Matching recipes: {recipe_titles}\n\n"
f"Write a brief, friendly 12 sentence response suggesting which of these "
f"recipes might best fit the question. Be specific and natural."
)
try:
from circuitforge_core.llm.router import LLMRouter
answer = await asyncio.wait_for(
asyncio.to_thread(LLMRouter().complete, prompt),
timeout=8.0,
)
result = result.model_copy(update={"answer": answer.strip() or None, "tier": "paid"})
except (Exception, asyncio.TimeoutError) as exc:
log.warning("Ask LLM synthesis skipped: %s", exc)
return result
@router.get("/{recipe_id}")
async def get_recipe(recipe_id: int, session: CloudUser = Depends(get_session)) -> dict:
def _get(db_path: Path, rid: int) -> dict | None:
@ -762,28 +573,8 @@ async def get_recipe(recipe_id: int, session: CloudUser = Depends(get_session))
except Exception:
_directions_for_te = []
_ingredients_for_te = recipe.get("ingredients") or []
if isinstance(_ingredients_for_te, str):
import json as _json3
try:
_ingredients_for_te = _json3.loads(_ingredients_for_te)
except Exception:
_ingredients_for_te = []
_ingredient_names_for_te = recipe.get("ingredient_names") or []
if isinstance(_ingredient_names_for_te, str):
import json as _json4
try:
_ingredient_names_for_te = _json4.loads(_ingredient_names_for_te)
except Exception:
_ingredient_names_for_te = []
if _directions_for_te:
_te = parse_time_effort(
_directions_for_te,
ingredients=_ingredients_for_te,
ingredient_names=_ingredient_names_for_te,
)
_te = parse_time_effort(_directions_for_te)
_time_effort_out: dict | None = {
"active_min": _te.active_min,
"passive_min": _te.passive_min,
@ -791,11 +582,7 @@ async def get_recipe(recipe_id: int, session: CloudUser = Depends(get_session))
"effort_label": _te.effort_label,
"equipment": _te.equipment,
"step_analyses": [
{
"is_passive": sa.is_passive,
"detected_minutes": sa.detected_minutes,
"prep_min": sa.prep_min,
}
{"is_passive": sa.is_passive, "detected_minutes": sa.detected_minutes}
for sa in _te.step_analyses
],
}
@ -821,33 +608,3 @@ async def get_recipe(recipe_id: int, session: CloudUser = Depends(get_session))
"estimated_time_min": None,
"time_effort": _time_effort_out,
}
@router.post("/{recipe_id}/leftovers", response_model=LeftoversResponse)
async def get_leftovers_shelf_life(
recipe_id: int,
session: CloudUser = Depends(get_session),
) -> LeftoversResponse:
"""Return cooked-leftover shelf-life estimate for a recipe.
Free tier: deterministic lookup (FDA/USDA table).
Deterministic path always runs; no tier gate needed.
"""
def _get(db_path: Path, rid: int) -> LeftoversResponse:
from app.services.leftovers_predictor import predict_leftovers_from_row
store = Store(db_path)
try:
recipe = store.get_recipe(rid)
finally:
store.close()
if recipe is None:
raise HTTPException(status_code=404, detail="Recipe not found.")
result = predict_leftovers_from_row(recipe)
return LeftoversResponse(
fridge_days=result.fridge_days,
freeze_days=result.freeze_days,
freeze_by_day=result.freeze_by_day,
storage_advice=result.storage_advice,
)
return await asyncio.to_thread(_get, session.db, recipe_id)

View file

@ -5,7 +5,6 @@ import asyncio
from pathlib import Path
from fastapi import APIRouter, Depends, HTTPException
from pydantic import BaseModel
from app.cloud_session import CloudUser, get_session
from app.db.store import Store
@ -17,13 +16,8 @@ from app.models.schemas.saved_recipe import (
SaveRecipeRequest,
UpdateSavedRecipeRequest,
)
from app.services.magpie_hook import fire_recipe_signal
from app.tiers import can_use
class StyleClassifyResponse(BaseModel):
suggested_tags: list[str]
router = APIRouter()
@ -41,7 +35,7 @@ def _to_summary(row: dict, store: Store) -> SavedRecipeSummary:
return SavedRecipeSummary(
id=row["id"],
recipe_id=row["recipe_id"],
title=row.get("title") or "",
title=row.get("title", ""),
saved_at=row["saved_at"],
notes=row.get("notes"),
rating=row.get("rating"),
@ -61,9 +55,7 @@ async def save_recipe(
row = store.save_recipe(req.recipe_id, req.notes, req.rating)
return _to_summary(row, store)
result = await asyncio.to_thread(_in_thread, session.db, _run)
asyncio.create_task(fire_recipe_signal(session.db, req.recipe_id, req.rating, []))
return result
return await asyncio.to_thread(_in_thread, session.db, _run)
@router.delete("/{recipe_id}", status_code=204)
@ -90,11 +82,7 @@ async def update_saved_recipe(
)
return _to_summary(row, store)
result = await asyncio.to_thread(_in_thread, session.db, _run)
asyncio.create_task(
fire_recipe_signal(session.db, recipe_id, req.rating, req.style_tags or [])
)
return result
return await asyncio.to_thread(_in_thread, session.db, _run)
@router.get("", response_model=list[SavedRecipeSummary])
@ -110,37 +98,14 @@ async def list_saved_recipes(
return await asyncio.to_thread(_in_thread, session.db, _run)
# ── style classifier (Paid / BYOK) ───────────────────────────────────────────
@router.post("/{recipe_id}/classify-style", response_model=StyleClassifyResponse)
async def classify_style(
recipe_id: int,
session: CloudUser = Depends(get_session),
) -> StyleClassifyResponse:
if not can_use("style_classifier", session.tier, getattr(session, "has_byok", False)):
raise HTTPException(status_code=403, detail="Style classifier requires Paid tier or BYOK.")
def _run(store: Store) -> StyleClassifyResponse:
recipe = store.get_recipe(recipe_id)
if recipe is None:
raise HTTPException(status_code=404, detail="Recipe not found.")
from app.services.recipe.style_classifier import classify_style as _classify
tags = _classify(recipe)
return StyleClassifyResponse(suggested_tags=tags)
return await asyncio.to_thread(_in_thread, session.db, _run)
# ── collections (Paid) ────────────────────────────────────────────────────────
@router.get("/collections", response_model=list[CollectionSummary])
async def list_collections(
session: CloudUser = Depends(get_session),
) -> list[CollectionSummary]:
# Free users can list (they'll always have zero — creating requires Paid).
# Returning 403 here breaks savedStore.load() via Promise.all for non-Paid users.
if not can_use("recipe_collections", session.tier):
return []
raise HTTPException(status_code=403, detail="Collections require Paid tier.")
rows = await asyncio.to_thread(
_in_thread, session.db, lambda s: s.get_collections()
)

View file

@ -1,9 +1,6 @@
from fastapi import APIRouter
from app.api.endpoints import health, receipts, export, inventory, ocr, recipes, settings, staples, feedback, feedback_attach, household, saved_recipes, imitate, meal_plans, orch_usage, session, shopping
from app.api.endpoints.community import router as community_router
from app.api.endpoints.corrections import router as corrections_router
from app.api.endpoints.mastodon_oauth import router as mastodon_router
from app.api.endpoints.recipe_scan import router as recipe_scan_router
from app.api.endpoints.recipe_tags import router as recipe_tags_router
api_router = APIRouter()
@ -15,9 +12,6 @@ api_router.include_router(ocr.router, prefix="/receipts", tags=
api_router.include_router(export.router, tags=["export"])
api_router.include_router(inventory.router, prefix="/inventory", tags=["inventory"])
api_router.include_router(saved_recipes.router, prefix="/recipes/saved", tags=["saved-recipes"])
# recipe_scan_router registered BEFORE recipes.router so /recipes/scan and /recipes/user
# take priority over /recipes/{recipe_id} (which would otherwise match them as int IDs).
api_router.include_router(recipe_scan_router, prefix="/recipes", tags=["recipe-scan"])
api_router.include_router(recipes.router, prefix="/recipes", tags=["recipes"])
api_router.include_router(settings.router, prefix="/settings", tags=["settings"])
api_router.include_router(staples.router, prefix="/staples", tags=["staples"])
@ -30,5 +24,3 @@ api_router.include_router(orch_usage.router, prefix="/orch-usage", tags=
api_router.include_router(shopping.router, prefix="/shopping", tags=["shopping"])
api_router.include_router(community_router)
api_router.include_router(recipe_tags_router)
api_router.include_router(corrections_router, prefix="/corrections", tags=["corrections"])
api_router.include_router(mastodon_router)

View file

@ -1,9 +1,11 @@
"""Cloud session resolution for Kiwi FastAPI.
Delegates JWT validation, Heimdall provisioning, tier resolution, and guest
session management to circuitforge_core.CloudSessionFactory. Kiwi-specific
CloudUser (per-user DB path, household data, BYOK flag) and DB helpers are
kept here.
Local mode (CLOUD_MODE unset/false): returns a local CloudUser with no auth
checks, full tier access, and DB path pointing to settings.DB_PATH.
Cloud mode (CLOUD_MODE=true): validates the cf_session JWT injected by Caddy
as X-CF-Session, resolves user_id, auto-provisions a free Heimdall license on
first visit, fetches the tier, and returns a per-user DB path.
FastAPI usage:
@app.get("/api/v1/inventory/items")
@ -15,10 +17,16 @@ from __future__ import annotations
import logging
import os
import re
import time
from dataclasses import dataclass
from pathlib import Path
from circuitforge_core.cloud_session import CloudSessionFactory as _CoreFactory, detect_byok
import uuid
import jwt as pyjwt
import requests
import yaml
from fastapi import Depends, HTTPException, Request, Response
log = logging.getLogger(__name__)
@ -27,12 +35,53 @@ log = logging.getLogger(__name__)
CLOUD_MODE: bool = os.environ.get("CLOUD_MODE", "").lower() in ("1", "true", "yes")
CLOUD_DATA_ROOT: Path = Path(os.environ.get("CLOUD_DATA_ROOT", "/devl/kiwi-cloud-data"))
DIRECTUS_JWT_SECRET: str = os.environ.get("DIRECTUS_JWT_SECRET", "")
HEIMDALL_URL: str = os.environ.get("HEIMDALL_URL", "https://license.circuitforge.tech")
HEIMDALL_ADMIN_TOKEN: str = os.environ.get("HEIMDALL_ADMIN_TOKEN", "")
# Dev bypass: comma-separated IPs or CIDR ranges that skip JWT auth.
# NEVER set this in production. Intended only for LAN developer testing when
# the request doesn't pass through Caddy (which normally injects X-CF-Session).
# Example: CLOUD_AUTH_BYPASS_IPS=10.1.10.0/24,127.0.0.1
import ipaddress as _ipaddress
_BYPASS_RAW: list[str] = [
e.strip()
for e in os.environ.get("CLOUD_AUTH_BYPASS_IPS", "").split(",")
if e.strip()
]
_BYPASS_NETS: list[_ipaddress.IPv4Network | _ipaddress.IPv6Network] = []
_BYPASS_IPS: frozenset[str] = frozenset()
if _BYPASS_RAW:
_nets, _ips = [], set()
for entry in _BYPASS_RAW:
try:
_nets.append(_ipaddress.ip_network(entry, strict=False))
except ValueError:
_ips.add(entry) # treat non-parseable entries as bare IPs
_BYPASS_NETS = _nets
_BYPASS_IPS = frozenset(_ips)
def _is_bypass_ip(ip: str) -> bool:
if not ip:
return False
if ip in _BYPASS_IPS:
return True
try:
addr = _ipaddress.ip_address(ip)
return any(addr in net for net in _BYPASS_NETS)
except ValueError:
return False
_LOCAL_KIWI_DB: Path = Path(os.environ.get("KIWI_DB", "data/kiwi.db"))
TIERS = ["free", "paid", "premium", "ultra"]
_TIER_CACHE: dict[str, tuple[dict, float]] = {}
_TIER_CACHE_TTL = 300 # 5 minutes
_core = _CoreFactory(product="kiwi", byok_detector=detect_byok)
TIERS = ["free", "paid", "premium", "ultra"]
def _auth_label(user_id: str) -> str:
@ -57,7 +106,73 @@ class CloudUser:
license_key: str | None = None # key_display for lifetime/founders keys; None for subscription/free
# ── DB path helpers ───────────────────────────────────────────────────────────
# ── JWT validation ─────────────────────────────────────────────────────────────
def _extract_session_token(header_value: str) -> str:
m = re.search(r'(?:^|;)\s*cf_session=([^;]+)', header_value)
return m.group(1).strip() if m else header_value.strip()
def validate_session_jwt(token: str) -> str:
"""Validate cf_session JWT and return the Directus user_id."""
try:
payload = pyjwt.decode(
token,
DIRECTUS_JWT_SECRET,
algorithms=["HS256"],
options={"require": ["id", "exp"]},
)
return payload["id"]
except Exception as exc:
log.debug("JWT validation failed: %s", exc)
raise HTTPException(status_code=401, detail="Session invalid or expired")
# ── Heimdall integration ──────────────────────────────────────────────────────
def _ensure_provisioned(user_id: str) -> None:
if not HEIMDALL_ADMIN_TOKEN:
return
try:
requests.post(
f"{HEIMDALL_URL}/admin/provision",
json={"directus_user_id": user_id, "product": "kiwi", "tier": "free"},
headers={"Authorization": f"Bearer {HEIMDALL_ADMIN_TOKEN}"},
timeout=5,
)
except Exception as exc:
log.warning("Heimdall provision failed for user %s: %s", user_id, exc)
def _fetch_cloud_tier(user_id: str) -> tuple[str, str | None, bool, str | None]:
"""Returns (tier, household_id | None, is_household_owner, license_key | None)."""
now = time.monotonic()
cached = _TIER_CACHE.get(user_id)
if cached and (now - cached[1]) < _TIER_CACHE_TTL:
entry = cached[0]
return entry["tier"], entry.get("household_id"), entry.get("is_household_owner", False), entry.get("license_key")
if not HEIMDALL_ADMIN_TOKEN:
return "free", None, False, None
try:
resp = requests.post(
f"{HEIMDALL_URL}/admin/cloud/resolve",
json={"directus_user_id": user_id, "product": "kiwi"},
headers={"Authorization": f"Bearer {HEIMDALL_ADMIN_TOKEN}"},
timeout=5,
)
data = resp.json() if resp.ok else {}
tier = data.get("tier", "free")
household_id = data.get("household_id")
is_owner = data.get("is_household_owner", False)
license_key = data.get("key_display")
except Exception as exc:
log.warning("Heimdall tier resolve failed for user %s: %s", user_id, exc)
tier, household_id, is_owner, license_key = "free", None, False, None
_TIER_CACHE[user_id] = ({"tier": tier, "household_id": household_id, "is_household_owner": is_owner, "license_key": license_key}, now)
return tier, household_id, is_owner, license_key
def _user_db_path(user_id: str, household_id: str | None = None) -> Path:
if household_id:
@ -79,45 +194,112 @@ def _anon_guest_db_path(guest_id: str) -> Path:
return path
# ── BYOK detection ────────────────────────────────────────────────────────────
_LLM_CONFIG_PATH = Path.home() / ".config" / "circuitforge" / "llm.yaml"
def _detect_byok(config_path: Path = _LLM_CONFIG_PATH) -> bool:
"""Return True if at least one enabled non-vision LLM backend is configured.
Reads the same llm.yaml that LLMRouter uses. Local (Ollama, vLLM) and
API-key backends both count the policy is "user is supplying compute",
regardless of where that compute lives.
"""
try:
with open(config_path) as f:
cfg = yaml.safe_load(f) or {}
return any(
b.get("enabled", True) and b.get("type") != "vision_service"
for b in cfg.get("backends", {}).values()
)
except Exception:
return False
# ── FastAPI dependency ────────────────────────────────────────────────────────
_GUEST_COOKIE = "kiwi_guest_id"
_GUEST_COOKIE_MAX_AGE = 60 * 60 * 24 * 90 # 90 days
def _resolve_guest_session(request: Request, response: Response, has_byok: bool) -> CloudUser:
"""Return a per-session anonymous CloudUser, creating a guest UUID cookie if needed."""
guest_id = request.cookies.get(_GUEST_COOKIE, "").strip()
is_new = not guest_id
if is_new:
guest_id = str(uuid.uuid4())
log.debug("New guest session assigned: anon-%s", guest_id[:8])
# Secure flag only when the request actually arrived over HTTPS
# (Caddy sets X-Forwarded-Proto=https in cloud; absent on direct port access).
# Avoids losing the session cookie on HTTP direct-port testing of the cloud stack.
is_https = request.headers.get("x-forwarded-proto", "http").lower() == "https"
response.set_cookie(
key=_GUEST_COOKIE,
value=guest_id,
max_age=_GUEST_COOKIE_MAX_AGE,
httponly=True,
samesite="lax",
secure=is_https,
)
return CloudUser(
user_id=f"anon-{guest_id}",
tier="free",
db=_anon_guest_db_path(guest_id),
has_byok=has_byok,
)
def get_session(request: Request, response: Response) -> CloudUser:
"""FastAPI dependency — resolves the current user from the request.
Delegates auth/tier resolution to cf-core CloudSessionFactory, then maps
the result to Kiwi's CloudUser with per-user DB path and household data.
Local mode: fully-privileged "local" user pointing at local DB.
Cloud mode: validates X-CF-Session JWT, provisions license, resolves tier.
Dev bypass: CLOUD_AUTH_BYPASS_IPS match returns a "local-dev" session.
Anonymous: per-session UUID cookie (cf_guest_id) isolates each guest's data.
Dev bypass: if CLOUD_AUTH_BYPASS_IPS is set and the client IP matches,
returns a "local" session without JWT validation (dev/LAN use only).
Anonymous: per-session UUID cookie isolates each guest visitor's data.
"""
core_user = _core.resolve(request, response)
uid, tier, has_byok = core_user.user_id, core_user.tier, core_user.has_byok
has_byok = _detect_byok()
if not CLOUD_MODE or uid in ("local", "local-dev"):
# local-dev gets a writable path under CLOUD_DATA_ROOT; local uses KIWI_DB
db = _user_db_path(uid) if uid == "local-dev" else _LOCAL_KIWI_DB
return CloudUser(user_id=uid, tier=tier, db=db, has_byok=has_byok)
if not CLOUD_MODE:
return CloudUser(user_id="local", tier="local", db=_LOCAL_KIWI_DB, has_byok=has_byok)
if uid.startswith("anon-"):
guest_id = uid[len("anon-"):]
return CloudUser(
user_id=uid, tier=tier,
db=_anon_guest_db_path(guest_id),
has_byok=has_byok,
)
# Prefer X-Real-IP (set by Caddy from the actual client address) over the
# TCP peer address (which is nginx's container IP when behind the proxy).
client_ip = (
request.headers.get("x-real-ip", "")
or (request.client.host if request.client else "")
)
if (_BYPASS_IPS or _BYPASS_NETS) and _is_bypass_ip(client_ip):
log.debug("CLOUD_AUTH_BYPASS_IPS match for %s — returning local session", client_ip)
# Use a dev DB under CLOUD_DATA_ROOT so the container has a writable path.
dev_db = _user_db_path("local-dev")
return CloudUser(user_id="local-dev", tier="local", db=dev_db, has_byok=has_byok)
household_id = core_user.meta.get("household_id")
is_owner = core_user.meta.get("is_household_owner", False)
license_key = core_user.meta.get("license_key")
log.debug("Resolved %s session uid=%s tier=%s household=%s", _auth_label(uid), uid[:8], tier, household_id)
# Resolve cf_session JWT: prefer the explicit header injected by Caddy, then
# fall back to the cf_session cookie value. Other cookies (e.g. kiwi_guest_id)
# must never be treated as auth tokens.
raw_session = request.headers.get("x-cf-session", "").strip()
if not raw_session:
raw_session = request.cookies.get("cf_session", "").strip()
if not raw_session:
return _resolve_guest_session(request, response, has_byok)
token = _extract_session_token(raw_session) # gitleaks:allow — function name, not a secret
if not token:
return _resolve_guest_session(request, response, has_byok)
user_id = validate_session_jwt(token)
_ensure_provisioned(user_id)
tier, household_id, is_household_owner, license_key = _fetch_cloud_tier(user_id)
return CloudUser(
user_id=uid, tier=tier,
db=_user_db_path(uid, household_id=household_id),
user_id=user_id,
tier=tier,
db=_user_db_path(user_id, household_id=household_id),
has_byok=has_byok,
household_id=household_id,
is_household_owner=is_owner,
is_household_owner=is_household_owner,
license_key=license_key,
)

View file

@ -43,10 +43,6 @@ class Settings:
os.environ.get("BROWSE_COUNTS_PATH", str(DATA_DIR / "browse_counts.db"))
)
# Magpie data flywheel — ingest endpoint for anonymized recipe signals
# Set MAGPIE_INGEST_URL to enable; leave unset (or None) to disable silently.
MAGPIE_INGEST_URL: str | None = os.environ.get("MAGPIE_INGEST_URL") or None
# Community feature settings
COMMUNITY_DB_URL: str | None = os.environ.get("COMMUNITY_DB_URL") or None
COMMUNITY_PSEUDONYM_SALT: str = os.environ.get(
@ -65,24 +61,9 @@ class Settings:
# Quality
MIN_QUALITY_SCORE: float = float(os.environ.get("MIN_QUALITY_SCORE", "50.0"))
# CF-core resource coordinator (VRAM lease management — lease broker, not inference)
# CF-core resource coordinator (VRAM lease management)
COORDINATOR_URL: str = os.environ.get("COORDINATOR_URL", "http://localhost:7700")
# GPU inference server URL
# Priority: GPU_SERVER_URL env var → CF_ORCH_URL env var (backward compat)
# → https://orch.circuitforge.tech when CF_LICENSE_KEY is present (Paid+)
# Resolved value is written back to os.environ["CF_ORCH_URL"] at startup so
# all service-layer callers that read CF_ORCH_URL directly see the right URL.
GPU_SERVER_URL: str | None = (
os.environ.get("GPU_SERVER_URL")
or os.environ.get("CF_ORCH_URL")
or (
"https://orch.circuitforge.tech"
if os.environ.get("CF_LICENSE_KEY")
else None
)
)
# Hosted cf-orch coordinator — bearer token for managed cloud GPU inference (Paid+)
# CFOrchClient reads CF_LICENSE_KEY automatically; exposed here for startup validation.
CF_LICENSE_KEY: str | None = os.environ.get("CF_LICENSE_KEY")
@ -91,17 +72,6 @@ class Settings:
# runs don't pollute session counts. Set to the Directus UUID of the test user.
E2E_TEST_USER_ID: str | None = os.environ.get("E2E_TEST_USER_ID") or None
# ActivityPub federation (optional; disabled by default)
AP_ENABLED: bool = os.environ.get("AP_ENABLED", "false").lower() in ("1", "true", "yes")
AP_HOST: str = os.environ.get("AP_HOST", "") # e.g. kiwi.circuitforge.tech
CLOUD_DATA_ROOT: Path = Path(os.environ.get("CLOUD_DATA_ROOT", "/devl/kiwi-cloud-data"))
AP_KEY_PATH: Path = Path(
os.environ.get("AP_KEY_PATH", str(CLOUD_DATA_ROOT / "ap_keys" / "instance.pem"))
)
# Fernet key for Mastodon access token encryption (base64-urlsafe, 32 bytes)
# Leave unset to skip encryption (dev only)
AP_TOKEN_ENCRYPTION_KEY: str | None = os.environ.get("AP_TOKEN_ENCRYPTION_KEY") or None
# Feature flags
ENABLE_OCR: bool = os.environ.get("ENABLE_OCR", "false").lower() in ("1", "true", "yes")
# Use OrchestratedScheduler (coordinator-aware, multi-GPU fan-out) instead of
@ -123,9 +93,3 @@ class Settings:
settings = Settings()
# Normalise GPU_SERVER_URL into CF_ORCH_URL so every service-layer caller that
# reads os.environ.get("CF_ORCH_URL") sees the resolved value, including the
# Paid+ cloud default injected above.
if settings.GPU_SERVER_URL:
os.environ["CF_ORCH_URL"] = settings.GPU_SERVER_URL

View file

@ -1,31 +0,0 @@
-- Migration 039: Drop FK constraint on saved_recipes.recipe_id.
--
-- In cloud mode the recipe corpus is ATTACHed as a separate database.
-- SQLite FK constraints only resolve against the `main` schema, so
-- `REFERENCES recipes(id)` was always failing for cloud saves (the
-- main.recipes table is empty; all data lives in corpus.recipes).
-- The corpus is read-only and never modified by the app, so cascade-on-delete
-- is meaningless anyway. Remove the constraint without changing any data.
PRAGMA foreign_keys = OFF;
CREATE TABLE saved_recipes_new (
id INTEGER PRIMARY KEY AUTOINCREMENT,
recipe_id INTEGER NOT NULL,
saved_at TEXT NOT NULL DEFAULT (datetime('now')),
notes TEXT,
rating INTEGER CHECK (rating IS NULL OR (rating >= 0 AND rating <= 5)),
style_tags TEXT NOT NULL DEFAULT '[]',
UNIQUE (recipe_id)
);
INSERT INTO saved_recipes_new SELECT * FROM saved_recipes;
DROP TABLE saved_recipes;
ALTER TABLE saved_recipes_new RENAME TO saved_recipes;
CREATE INDEX IF NOT EXISTS idx_saved_recipes_saved_at ON saved_recipes (saved_at DESC);
CREATE INDEX IF NOT EXISTS idx_saved_recipes_rating ON saved_recipes (rating);
PRAGMA foreign_keys = ON;

View file

@ -1,21 +0,0 @@
-- 040_corrections.sql — corrections table for SFT training data
-- Schema from circuitforge_core.api.corrections.CORRECTIONS_MIGRATION_SQL
CREATE TABLE IF NOT EXISTS corrections (
id INTEGER PRIMARY KEY AUTOINCREMENT,
item_id TEXT NOT NULL DEFAULT '',
product TEXT NOT NULL,
correction_type TEXT NOT NULL,
input_text TEXT NOT NULL,
original_output TEXT NOT NULL,
corrected_output TEXT NOT NULL DEFAULT '',
rating TEXT NOT NULL DEFAULT 'down',
context TEXT NOT NULL DEFAULT '{}',
opted_in INTEGER NOT NULL DEFAULT 0,
created_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_corrections_product
ON corrections (product);
CREATE INDEX IF NOT EXISTS idx_corrections_opted_in
ON corrections (opted_in);

View file

@ -1,23 +0,0 @@
-- Migration 041: user_recipes table for user-scanned and manually-entered recipes.
--
-- Separate from the food.com corpus (recipes table) -- user recipes are personal,
-- not curated, and need different fields (servings as string, cook_time as string).
CREATE TABLE IF NOT EXISTS user_recipes (
id INTEGER PRIMARY KEY AUTOINCREMENT,
title TEXT NOT NULL,
subtitle TEXT,
servings TEXT, -- kept as string: "2", "4-6", "serves 8"
cook_time TEXT, -- kept as string: "25 min", "1 hour"
source_note TEXT, -- e.g. "Purple Carrot", "Betty Crocker"
ingredients TEXT NOT NULL DEFAULT '[]', -- JSON: [{name, qty, unit, raw}]
steps TEXT NOT NULL DEFAULT '[]', -- JSON: ["step 1", "step 2", ...]
notes TEXT,
tags TEXT DEFAULT '[]', -- JSON: ["vegan", "quick"]
source TEXT NOT NULL DEFAULT 'manual', -- 'scan' | 'manual'
pantry_match_pct INTEGER, -- 0-100, computed at scan time; null for manual
created_at TEXT NOT NULL DEFAULT (datetime('now')),
updated_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_user_recipes_created ON user_recipes (created_at DESC);

View file

@ -1,47 +0,0 @@
-- 042_activitypub.sql
-- ActivityPub federation tables: follower registry, delivery log, dedup, Mastodon tokens.
-- Follower registry: AP actors that Follow this Kiwi instance
CREATE TABLE IF NOT EXISTS ap_followers (
id INTEGER PRIMARY KEY,
actor_id TEXT NOT NULL UNIQUE, -- AP actor URL
inbox_url TEXT NOT NULL,
shared_inbox TEXT,
followed_at TEXT NOT NULL DEFAULT (datetime('now')),
active INTEGER NOT NULL DEFAULT 1
);
CREATE INDEX IF NOT EXISTS idx_ap_followers_active
ON ap_followers (active) WHERE active = 1;
-- Outgoing delivery log: one row per (post_slug, target_inbox) attempt
CREATE TABLE IF NOT EXISTS ap_deliveries (
id INTEGER PRIMARY KEY,
post_slug TEXT NOT NULL,
target_inbox TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'pending', -- pending | delivered | failed
attempts INTEGER NOT NULL DEFAULT 0,
last_error TEXT,
created_at TEXT NOT NULL DEFAULT (datetime('now')),
delivered_at TEXT
);
CREATE INDEX IF NOT EXISTS idx_ap_deliveries_status
ON ap_deliveries (status) WHERE status != 'delivered';
-- Incoming activity dedup: prevents replay attacks and double-processing
CREATE TABLE IF NOT EXISTS ap_received (
activity_id TEXT PRIMARY KEY,
received_at TEXT NOT NULL DEFAULT (datetime('now'))
);
-- Mastodon OAuth tokens: per-user, encrypted at rest
-- Stored in the user's local kiwi.db (CLOUD_MODE: per-user DB tree)
CREATE TABLE IF NOT EXISTS mastodon_tokens (
id INTEGER PRIMARY KEY,
directus_user_id TEXT NOT NULL UNIQUE,
instance_url TEXT NOT NULL,
access_token TEXT NOT NULL, -- Fernet-encrypted when AP_TOKEN_ENCRYPTION_KEY set
created_at TEXT NOT NULL DEFAULT (datetime('now')),
updated_at TEXT NOT NULL DEFAULT (datetime('now'))
);

View file

@ -6,8 +6,6 @@ Cloud mode: opens a Store at the per-user DB path from the CloudUser session.
"""
from __future__ import annotations
import sqlite3
from collections.abc import Iterator
from typing import Generator
from fastapi import Depends
@ -23,16 +21,3 @@ def get_store(session: CloudUser = Depends(get_session)) -> Generator[Store, Non
yield store
finally:
store.close()
def get_db(session: CloudUser = Depends(get_session)) -> Iterator[sqlite3.Connection]:
"""FastAPI dependency — yields the raw sqlite3.Connection for the current user.
Used by make_corrections_router() from circuitforge-core, which expects a
dependency that yields a sqlite3.Connection directly.
"""
store = Store(session.db)
try:
yield store.conn
finally:
store.close()

View file

@ -61,8 +61,6 @@ class Store:
"style_tags",
# meal plan columns
"meal_types",
# user_recipes columns
"steps", "tags",
# captured_products columns
"allergens"):
if key in d and isinstance(d[key], str):
@ -1131,19 +1129,6 @@ class Store:
phrases = ['"' + kw.replace('"', '""') + '"' for kw in keywords]
return " OR ".join(phrases)
@staticmethod
def _ingredient_fts_term(ingredient: str) -> str:
"""Build an FTS5 ingredient_names column prefix-filter.
Returns e.g. 'ingredient_names : "potato"*' which matches any recipe whose
ingredient_names column contains a token starting with that word. Prefix
matching (*) means "potato" also matches "potatoes", "sweet potato", etc.
Apostrophes are stripped because the FTS5 tokenizer drops them.
"""
cleaned = ingredient.replace("'", "").strip()
escaped = cleaned.replace('"', '""')
return f'ingredient_names : "{escaped}"*'
def _count_recipes_for_keywords(self, keywords: list[str]) -> int:
if not keywords:
return 0
@ -1172,7 +1157,6 @@ class Store:
q: str | None = None,
sort: str = "default",
sensory_exclude: SensoryExclude | None = None,
required_ingredient: str | None = None,
) -> dict:
"""Return a page of recipes matching the keyword set.
@ -1181,11 +1165,9 @@ class Store:
is provided. match_pct is the fraction of ingredient_names covered by
the pantry set computed deterministically, no LLM needed.
q: optional title substring filter (case-insensitive LIKE).
sort: "default" (corpus order) | "alpha" (AZ) | "alpha_desc" (ZA)
| "match" (pantry coverage DESC falls back to default when no pantry).
required_ingredient: when set, only return recipes whose ingredient_names contain
this substring (case-insensitive). "must include" filter.
q: optional title substring filter (case-insensitive LIKE).
sort: "default" (corpus order) | "alpha" (AZ) | "alpha_desc" (ZA)
| "match" (pantry coverage DESC falls back to default when no pantry).
"""
if keywords is not None and not keywords:
return {"recipes": [], "total": 0, "page": page}
@ -1204,48 +1186,20 @@ class Store:
q_param = f"%{q.strip()}%" if q and q.strip() else None
# ── required-ingredient FTS filter (must-include) ─────────────────────
# FTS5 column prefix-filter avoids the full table scan that LIKE '%X%' would do.
req_fts_term = (
self._ingredient_fts_term(required_ingredient) if required_ingredient else ""
)
# ── match sort: push match_pct computation into SQL so ORDER BY works ──
if effective_sort == "match" and pantry_set:
return self._browse_by_match(
keywords, page, page_size, offset, pantry_set, q_param, c,
sensory_exclude=sensory_exclude,
required_ingredient=required_ingredient,
)
cols = (
f"SELECT id, title, category, keywords, ingredient_names,"
f" calories, fat_g, protein_g, sodium_mg, directions, sensory_tags FROM {c}recipes"
)
fts_sub = f"id IN (SELECT rowid FROM {c}recipe_browser_fts WHERE recipe_browser_fts MATCH ?)"
if keywords is None:
if req_fts_term:
# Ingredient filter: use FTS index — much faster than LIKE on full table
if q_param:
total = self.conn.execute(
f"SELECT COUNT(*) FROM {c}recipes WHERE {fts_sub} AND LOWER(title) LIKE LOWER(?)",
(req_fts_term, q_param),
).fetchone()[0]
rows = self._fetch_all(
f"{cols} WHERE {fts_sub} AND LOWER(title) LIKE LOWER(?) {order_clause} LIMIT ? OFFSET ?",
(req_fts_term, q_param, page_size, offset),
)
else:
total = self.conn.execute(
f"SELECT COUNT(*) FROM {c}recipes WHERE {fts_sub}",
(req_fts_term,),
).fetchone()[0]
rows = self._fetch_all(
f"{cols} WHERE {fts_sub} {order_clause} LIMIT ? OFFSET ?",
(req_fts_term, page_size, offset),
)
elif q_param:
if q_param:
total = self.conn.execute(
f"SELECT COUNT(*) FROM {c}recipes WHERE LOWER(title) LIKE LOWER(?)",
(q_param,),
@ -1261,32 +1215,23 @@ class Store:
(page_size, offset),
)
else:
keywords_expr = self._browser_fts_query(keywords)
# Combine keywords + ingredient into one FTS MATCH to use a single index pass
combined_match = (
f"({keywords_expr}) AND {req_fts_term}" if req_fts_term else keywords_expr
)
match_expr = self._browser_fts_query(keywords)
fts_sub = f"id IN (SELECT rowid FROM {c}recipe_browser_fts WHERE recipe_browser_fts MATCH ?)"
if q_param:
total = self.conn.execute(
f"SELECT COUNT(*) FROM {c}recipes WHERE {fts_sub} AND LOWER(title) LIKE LOWER(?)",
(combined_match, q_param),
(match_expr, q_param),
).fetchone()[0]
rows = self._fetch_all(
f"{cols} WHERE {fts_sub} AND LOWER(title) LIKE LOWER(?) {order_clause} LIMIT ? OFFSET ?",
(combined_match, q_param, page_size, offset),
(match_expr, q_param, page_size, offset),
)
else:
if required_ingredient:
total = self.conn.execute(
f"SELECT COUNT(*) FROM {c}recipes WHERE {fts_sub}",
(combined_match,),
).fetchone()[0]
else:
# Reuse cached count — avoids a second index scan on every page turn.
total = self._count_recipes_for_keywords(keywords)
# Reuse cached count — avoids a second index scan on every page turn.
total = self._count_recipes_for_keywords(keywords)
rows = self._fetch_all(
f"{cols} WHERE {fts_sub} {order_clause} LIMIT ? OFFSET ?",
(combined_match, page_size, offset),
(match_expr, page_size, offset),
)
# Community tag fallback: if FTS found nothing, check whether
# community-tagged recipe IDs exist for this keyword context.
@ -1368,7 +1313,6 @@ class Store:
q_param: str | None,
c: str,
sensory_exclude: SensoryExclude | None = None,
required_ingredient: str | None = None,
) -> dict:
"""Browse recipes sorted by pantry match percentage.
@ -1383,48 +1327,16 @@ class Store:
pantry_lower = {p.lower() for p in pantry_set}
# ── required-ingredient FTS filter (must-include) ─────────────────────
req_fts_term = (
self._ingredient_fts_term(required_ingredient) if required_ingredient else ""
)
# ── Fetch candidate pool from FTS ────────────────────────────────────
base_cols = (
f"SELECT r.id, r.title, r.category, r.ingredient_names, r.directions, r.sensory_tags"
f" FROM {c}recipes r"
)
fts_sub = (
f"r.id IN (SELECT rowid FROM {c}recipe_browser_fts"
f" WHERE recipe_browser_fts MATCH ?)"
)
self.conn.row_factory = sqlite3.Row
if keywords is None:
if req_fts_term:
if q_param:
total = self.conn.execute(
f"SELECT COUNT(*) FROM {c}recipes WHERE id IN"
f" (SELECT rowid FROM {c}recipe_browser_fts WHERE recipe_browser_fts MATCH ?)"
f" AND LOWER(title) LIKE LOWER(?)",
(req_fts_term, q_param),
).fetchone()[0]
rows = self.conn.execute(
f"{base_cols} WHERE {fts_sub} AND LOWER(r.title) LIKE LOWER(?)"
f" ORDER BY r.id ASC LIMIT ?",
(req_fts_term, q_param, self._MATCH_POOL_SIZE),
).fetchall()
else:
total = self.conn.execute(
f"SELECT COUNT(*) FROM {c}recipes WHERE id IN"
f" (SELECT rowid FROM {c}recipe_browser_fts WHERE recipe_browser_fts MATCH ?)",
(req_fts_term,),
).fetchone()[0]
rows = self.conn.execute(
f"{base_cols} WHERE {fts_sub} ORDER BY r.id ASC LIMIT ?",
(req_fts_term, self._MATCH_POOL_SIZE),
).fetchall()
elif q_param:
if q_param:
total = self.conn.execute(
f"SELECT COUNT(*) FROM {c}recipes WHERE LOWER(title) LIKE LOWER(?)",
(q_param,),
@ -1443,32 +1355,27 @@ class Store:
(self._MATCH_POOL_SIZE,),
).fetchall()
else:
keywords_expr = self._browser_fts_query(keywords)
combined_match = (
f"({keywords_expr}) AND {req_fts_term}" if req_fts_term else keywords_expr
match_expr = self._browser_fts_query(keywords)
fts_sub = (
f"r.id IN (SELECT rowid FROM {c}recipe_browser_fts"
f" WHERE recipe_browser_fts MATCH ?)"
)
if q_param:
total = self.conn.execute(
f"SELECT COUNT(*) FROM {c}recipes r"
f" WHERE {fts_sub} AND LOWER(r.title) LIKE LOWER(?)",
(combined_match, q_param),
(match_expr, q_param),
).fetchone()[0]
rows = self.conn.execute(
f"{base_cols} WHERE {fts_sub} AND LOWER(r.title) LIKE LOWER(?)"
f" ORDER BY r.id ASC LIMIT ?",
(combined_match, q_param, self._MATCH_POOL_SIZE),
(match_expr, q_param, self._MATCH_POOL_SIZE),
).fetchall()
else:
if required_ingredient:
total = self.conn.execute(
f"SELECT COUNT(*) FROM {c}recipes r WHERE {fts_sub}",
(combined_match,),
).fetchone()[0]
else:
total = self._count_recipes_for_keywords(keywords)
total = self._count_recipes_for_keywords(keywords)
rows = self.conn.execute(
f"{base_cols} WHERE {fts_sub} ORDER BY r.id ASC LIMIT ?",
(combined_match, self._MATCH_POOL_SIZE),
(match_expr, self._MATCH_POOL_SIZE),
).fetchall()
# ── Score in Python, sort, paginate ──────────────────────────────────
@ -1804,54 +1711,3 @@ class Store:
confidence, 1 if confirmed_by_user else 0, source,
),
)
# ── User Recipes (kiwi#9) ──────────────────────────────────────────────────
def create_user_recipe(
self,
title: str,
ingredients: list[dict],
steps: list[str],
subtitle: str | None = None,
servings: str | None = None,
cook_time: str | None = None,
source_note: str | None = None,
notes: str | None = None,
tags: list[str] | None = None,
source: str = "manual",
pantry_match_pct: int | None = None,
) -> dict[str, Any]:
return self._insert_returning(
"""INSERT INTO user_recipes
(title, subtitle, servings, cook_time, source_note,
ingredients, steps, notes, tags, source, pantry_match_pct)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
RETURNING *""",
(
title, subtitle, servings, cook_time, source_note,
self._dump(ingredients),
self._dump(steps),
notes,
self._dump(tags or []),
source,
pantry_match_pct,
),
)
def get_user_recipe(self, recipe_id: int) -> dict[str, Any] | None:
return self._fetch_one(
"SELECT * FROM user_recipes WHERE id = ?",
(recipe_id,),
)
def list_user_recipes(self) -> list[dict[str, Any]]:
return self._fetch_all(
"SELECT * FROM user_recipes ORDER BY created_at DESC",
)
def delete_user_recipe(self, recipe_id: int) -> bool:
cur = self.conn.execute(
"DELETE FROM user_recipes WHERE id = ?", (recipe_id,)
)
self.conn.commit()
return cur.rowcount > 0

View file

@ -43,11 +43,6 @@ async def _browse_counts_refresh_loop(corpus_path: str) -> None:
async def lifespan(app: FastAPI):
logger.info("Starting Kiwi API...")
settings.ensure_dirs()
# Run DB migrations at startup (ensures all tables exist before any request)
from app.db.store import Store
_s = Store(settings.DB_PATH)
_s.close()
register_kiwi_programs()
# Start LLM background task scheduler
@ -59,14 +54,6 @@ async def lifespan(app: FastAPI):
from app.api.endpoints.community import init_community_store
init_community_store(settings.COMMUNITY_DB_URL)
# Initialize ActivityPub instance actor (no-op when AP_ENABLED=false)
if settings.AP_ENABLED and settings.AP_HOST:
try:
from app.services.ap.keys import init_actor
init_actor(host=settings.AP_HOST, key_path=settings.AP_KEY_PATH)
except Exception as _ap_exc:
logger.warning("AP init failed (AP features disabled): %s", _ap_exc)
# Browse counts cache — warm in-memory cache from disk, refresh if stale.
# Uses the corpus path the store will attach to at request time.
corpus_path = os.environ.get("RECIPE_DB_PATH", str(settings.DB_PATH))
@ -114,11 +101,6 @@ app.add_middleware(
app.include_router(api_router, prefix=settings.API_PREFIX)
# AP endpoints: WebFinger at root (not under /api/v1), AP objects under /ap
from app.api.endpoints.activitypub import ap_router, webfinger_router
app.include_router(webfinger_router)
app.include_router(ap_router)
@app.get("/")
async def root():

View file

View file

@ -1,306 +0,0 @@
"""Kiwi MCP Server — read-only corpus DB access for tag/keyword audits.
Exposes four tools to Claude:
kiwi_query_corpus run a read-only SQL query against the corpus DB
kiwi_count_fts run an FTS5 MATCH expression and return row count
kiwi_sample_tags return tag frequency distribution by prefix
kiwi_browse_preview call the browse endpoint and return first-page results
Run with:
python -m app.mcp.server
(from /Library/Development/CircuitForge/kiwi with cf conda env active)
Configure in Claude Code ~/.claude/settings.json mcpServers:
"kiwi": {
"command": "/devl/miniconda3/envs/cf/bin/python",
"args": ["-m", "app.mcp.server"],
"cwd": "/Library/Development/CircuitForge/kiwi",
"env": {
"KIWI_DB_PATH": "/Library/Development/CircuitForge/kiwi/data/kiwi.db",
"KIWI_API_URL": "http://localhost:8512"
}
}
"""
from __future__ import annotations
import asyncio
import json
import os
import sqlite3
from pathlib import Path
import httpx
from mcp.server import Server
from mcp.server.stdio import stdio_server
from mcp.types import TextContent, Tool
_DB_PATH = os.environ.get(
"KIWI_DB_PATH",
str(Path(__file__).parents[3] / "data" / "kiwi.db"),
)
_API_URL = os.environ.get("KIWI_API_URL", "http://localhost:8512")
_TIMEOUT = 30.0
_QUERY_ROW_LIMIT = 200
server = Server("kiwi")
def _open_ro() -> sqlite3.Connection:
"""Open the corpus DB in read-only mode."""
uri = f"file:///{Path(_DB_PATH).as_posix()}?mode=ro"
conn = sqlite3.connect(uri, uri=True, check_same_thread=False)
conn.row_factory = sqlite3.Row
return conn
@server.list_tools()
async def list_tools() -> list[Tool]:
return [
Tool(
name="kiwi_query_corpus",
description=(
"Run a read-only SQL SELECT query against the Kiwi corpus DB (kiwi.db). "
"Returns up to 200 rows as a JSON array. "
"Key tables: recipes (id, title, ingredient_names, inferred_tags, source_url), "
"recipes_fts (FTS5 virtual table for full-text search), "
"ingredient_profiles (name, elements, texture_profile). "
"Use for schema exploration, spot-checking tag coverage, and counting results. "
"Read-only — any write statement will be rejected by SQLite."
),
inputSchema={
"type": "object",
"required": ["sql"],
"properties": {
"sql": {
"type": "string",
"description": (
"A SELECT statement. E.g.: "
"SELECT title, inferred_tags FROM recipes WHERE inferred_tags LIKE '%vegan%' LIMIT 10"
),
},
},
},
),
Tool(
name="kiwi_count_fts",
description=(
"Run an FTS5 MATCH expression against the recipes_fts table and return the hit count. "
"Useful for quickly auditing keyword coverage without a full query. "
"Always double-quote all terms in MATCH expressions. "
"E.g. match_expr='\"tofu\" OR \"tempeh\"' returns how many recipes include either."
),
inputSchema={
"type": "object",
"required": ["match_expr"],
"properties": {
"match_expr": {
"type": "string",
"description": (
"FTS5 MATCH expression string (without the MATCH keyword). "
'E.g. \'"lentil" OR "chickpea"\' or \'"pasta" AND "vegetarian"\''
),
},
},
},
),
Tool(
name="kiwi_sample_tags",
description=(
"Return tag frequency distribution from the corpus. "
"Queries inferred_tags column for tags matching the given prefix pattern. "
"Useful for auditing how well a category keyword set covers the corpus, "
"or discovering what tags exist under a domain (cuisine:, meal:, dietary:, texture:)."
),
inputSchema={
"type": "object",
"properties": {
"prefix": {
"type": "string",
"default": "",
"description": (
"Tag prefix to filter by. E.g. 'cuisine:' returns all cuisine tags, "
"'meal:' returns all meal type tags, '' returns all tags. "
"Returns top 50 by frequency."
),
},
"limit": {
"type": "integer",
"default": 50,
"description": "Max number of tag entries to return (default 50, max 200).",
},
},
},
),
Tool(
name="kiwi_browse_preview",
description=(
"Call the Kiwi browse endpoint and return first-page results. "
"Use to verify that a domain/category returns the expected recipes "
"after a keyword or tag change, without opening the browser. "
"Returns recipe titles, match counts, and total result count."
),
inputSchema={
"type": "object",
"required": ["domain", "category"],
"properties": {
"domain": {
"type": "string",
"description": (
"Browse domain slug. "
"Known domains: cuisine, meal_type, dietary, ingredient, occasion, texture."
),
},
"category": {
"type": "string",
"description": "Category slug within the domain, e.g. 'italian', 'breakfast', 'vegan'.",
},
"subcategory": {
"type": "string",
"default": "",
"description": "Optional subcategory slug to narrow further.",
},
"page_size": {
"type": "integer",
"default": 10,
"description": "Results per page (default 10, max 50).",
},
},
},
),
]
@server.call_tool()
async def call_tool(name: str, arguments: dict) -> list[TextContent]:
if name == "kiwi_query_corpus":
return await _query_corpus(arguments)
if name == "kiwi_count_fts":
return await _count_fts(arguments)
if name == "kiwi_sample_tags":
return await _sample_tags(arguments)
if name == "kiwi_browse_preview":
return await _browse_preview(arguments)
return [TextContent(type="text", text=f"Unknown tool: {name}")]
async def _query_corpus(args: dict) -> list[TextContent]:
sql = args.get("sql", "").strip()
if not sql.upper().startswith("SELECT"):
return [TextContent(type="text", text="Error: only SELECT statements are allowed.")]
def _run() -> list[dict]:
conn = _open_ro()
try:
cur = conn.execute(sql)
rows = cur.fetchmany(_QUERY_ROW_LIMIT)
return [dict(r) for r in rows]
finally:
conn.close()
try:
rows = await asyncio.get_event_loop().run_in_executor(None, _run)
return [TextContent(type="text", text=json.dumps(rows, indent=2, default=str))]
except Exception as exc:
return [TextContent(type="text", text=f"Query error: {exc}")]
async def _count_fts(args: dict) -> list[TextContent]:
match_expr = args.get("match_expr", "").strip()
if not match_expr:
return [TextContent(type="text", text="Error: match_expr is required.")]
def _run() -> int:
conn = _open_ro()
try:
cur = conn.execute(
"SELECT COUNT(*) FROM recipes_fts WHERE recipes_fts MATCH ?",
(match_expr,),
)
return cur.fetchone()[0]
finally:
conn.close()
try:
count = await asyncio.get_event_loop().run_in_executor(None, _run)
return [TextContent(type="text", text=json.dumps({"match_expr": match_expr, "count": count}))]
except Exception as exc:
return [TextContent(type="text", text=f"FTS error: {exc}")]
async def _sample_tags(args: dict) -> list[TextContent]:
prefix = args.get("prefix", "")
limit = min(int(args.get("limit", 50)), _QUERY_ROW_LIMIT)
def _run() -> list[dict]:
conn = _open_ro()
try:
# Split inferred_tags (comma or space separated) and count each tag
sql = """
WITH tag_rows AS (
SELECT trim(value) AS tag
FROM recipes, json_each('["' || replace(replace(inferred_tags, ', ', '","'), ',', '","') || '"]')
WHERE inferred_tags IS NOT NULL AND inferred_tags != ''
)
SELECT tag, COUNT(*) AS frequency
FROM tag_rows
WHERE tag LIKE ? AND tag != ''
GROUP BY tag
ORDER BY frequency DESC
LIMIT ?
"""
pattern = f"{prefix}%" if prefix else "%"
cur = conn.execute(sql, (pattern, limit))
return [{"tag": r["tag"], "frequency": r["frequency"]} for r in cur.fetchall()]
finally:
conn.close()
try:
tags = await asyncio.get_event_loop().run_in_executor(None, _run)
return [TextContent(type="text", text=json.dumps({"prefix": prefix, "tags": tags}, indent=2))]
except Exception as exc:
return [TextContent(type="text", text=f"Tag query error: {exc}")]
async def _browse_preview(args: dict) -> list[TextContent]:
domain = args.get("domain", "")
category = args.get("category", "")
subcategory = args.get("subcategory", "")
page_size = min(int(args.get("page_size", 10)), 50)
params: dict = {"page": 1, "page_size": page_size}
if subcategory:
params["subcategory"] = subcategory
async with httpx.AsyncClient(timeout=_TIMEOUT) as client:
try:
resp = await client.get(
f"{_API_URL}/api/v1/recipes/browse/{domain}/{category}",
params=params,
)
resp.raise_for_status()
except Exception as exc:
return [TextContent(type="text", text=f"Browse error: {exc}")]
data = resp.json()
summary = {
"domain": domain,
"category": category,
"subcategory": subcategory or None,
"total": data.get("total", 0),
"page_size": page_size,
"titles": [r.get("title", "") for r in data.get("recipes", [])],
}
return [TextContent(type="text", text=json.dumps(summary, indent=2))]
async def _main() -> None:
async with stdio_server() as (read_stream, write_stream):
await server.run(
read_stream,
write_stream,
server.create_initialization_options(),
)
if __name__ == "__main__":
asyncio.run(_main())

View file

@ -4,36 +4,6 @@ from __future__ import annotations
from pydantic import BaseModel, Field
class LeftoversResponse(BaseModel):
"""Cooked-leftover shelf-life estimate returned by POST /recipes/{id}/leftovers."""
fridge_days: int
freeze_days: int | None = None # None = not recommended
freeze_by_day: int | None = None # day number from cook date to freeze by
storage_advice: str
class StepAnalysis(BaseModel):
"""Active/passive classification for one direction step."""
is_passive: bool
detected_minutes: int | None = None
prep_min: int | None = None # estimated physical prep time (action detection)
class TimeEffortProfile(BaseModel):
"""Parsed time and effort profile for a recipe.
Mirrors app.services.recipe.time_effort.TimeEffortProfile (dataclass).
Serialised into RecipeSuggestion so the frontend can render the effort
summary without a second round-trip.
"""
active_min: int = 0
passive_min: int = 0
total_min: int = 0
effort_label: str = "moderate" # "quick" | "moderate" | "involved"
equipment: list[str] = Field(default_factory=list)
step_analyses: list[StepAnalysis] = Field(default_factory=list)
class SwapCandidate(BaseModel):
original_name: str
substitute_name: str
@ -73,7 +43,6 @@ class RecipeSuggestion(BaseModel):
source_url: str | None = None
complexity: str | None = None # 'easy' | 'moderate' | 'involved'
estimated_time_min: int | None = None # derived from step count + method signals
time_effort: TimeEffortProfile | None = None # full time/effort profile from parse_time_effort
rerank_score: float | None = None # cross-encoder relevance score (paid+ only, None for free tier)
@ -137,8 +106,7 @@ class RecipeRequest(BaseModel):
pantry_match_only: bool = False # when True, only return recipes with zero missing ingredients
complexity_filter: str | None = None # 'easy' | 'moderate' | 'involved' — None = any
max_time_min: int | None = None # filter by estimated cooking time ceiling
max_total_min: int | None = None # filter by parsed total time (active + passive)
max_active_min: int | None = None # filter by hands-on active time only
max_total_min: int | None = None # filter by parsed total time from recipe directions
unit_system: str = "metric" # "metric" | "imperial"
@ -206,24 +174,3 @@ class StreamTokenResponse(BaseModel):
stream_url: str
token: str
expires_in_s: int
class AskRequest(BaseModel):
"""Request body for POST /recipes/ask."""
question: str = Field(min_length=1, max_length=500)
pantry_items: list[str] = Field(default_factory=list)
class AskRecipeHit(BaseModel):
"""A single recipe result from the Ask endpoint."""
id: int
title: str
match_pct: float | None = None
category: str | None = None
class AskResponse(BaseModel):
"""Response from POST /recipes/ask."""
answer: str | None = None # LLM-synthesized response (Paid tier only)
recipes: list[AskRecipeHit]
tier: str

View file

@ -1,74 +0,0 @@
"""Pydantic schemas for the recipe scanner (kiwi#9).
Scan input photo(s).
Scan output ScannedRecipeResponse (for review + editing before save).
Save input ScannedRecipeSaveRequest.
User recipe output UserRecipeResponse (after save).
"""
from __future__ import annotations
from pydantic import BaseModel, Field
# ── Ingredient in a scanned recipe ────────────────────────────────────────────
class ScannedIngredientSchema(BaseModel):
"""One ingredient line extracted from a recipe photo."""
name: str # normalized generic name ("ranch dressing")
qty: str | None = None # quantity as string, preserving fractions ("1/2", "¼")
unit: str | None = None # unit of measure; null for countable items
raw: str | None = None # verbatim original line from the image
in_pantry: bool = False # True if this ingredient matches something in the pantry
# ── Scan response (returned immediately, not persisted) ───────────────────────
class ScannedRecipeResponse(BaseModel):
"""Structured recipe extracted from photo(s). Returned for user review before save."""
title: str | None = None
subtitle: str | None = None # e.g. "with Broccoli & Ranch Dressing"
servings: str | None = None # kept as string: "2", "4-6", "serves 8"
cook_time: str | None = None # kept as string: "25 min", "1 hour"
source_note: str | None = None # e.g. "Purple Carrot", "Betty Crocker"
ingredients: list[ScannedIngredientSchema] = Field(default_factory=list)
steps: list[str] = Field(default_factory=list)
notes: str | None = None
tags: list[str] = Field(default_factory=list)
pantry_match_pct: int = 0 # 0-100: percentage of ingredients found in pantry
confidence: str = "medium" # "high" | "medium" | "low"
warnings: list[str] = Field(default_factory=list)
# ── Save request ──────────────────────────────────────────────────────────────
class ScannedRecipeSaveRequest(BaseModel):
"""User-reviewed (possibly edited) recipe data to persist as a user recipe."""
title: str
subtitle: str | None = None
servings: str | None = None
cook_time: str | None = None
source_note: str | None = None
ingredients: list[ScannedIngredientSchema]
steps: list[str]
notes: str | None = None
tags: list[str] = Field(default_factory=list)
source: str = "scan" # "scan" | "manual"
# ── User recipe (persisted) ───────────────────────────────────────────────────
class UserRecipeResponse(BaseModel):
"""A user-created or user-scanned recipe stored in user_recipes table."""
id: int
title: str
subtitle: str | None = None
servings: str | None = None
cook_time: str | None = None
source_note: str | None = None
ingredients: list[ScannedIngredientSchema]
steps: list[str]
notes: str | None = None
tags: list[str] = Field(default_factory=list)
source: str
pantry_match_pct: int | None = None
created_at: str

View file

@ -1,115 +0,0 @@
# app/services/ap/delivery.py
# MIT License
from __future__ import annotations
import logging
import time
from datetime import datetime, timezone
from pathlib import Path
from circuitforge_core.activitypub import deliver_activity
from app.services.ap.keys import get_actor
logger = logging.getLogger(__name__)
_RETRIES = 3
_BACKOFF = [1.0, 4.0, 16.0]
def deliver_to_followers(post_slug: str, activity: dict, db_path: Path) -> None:
"""Deliver an AP activity to all active followers. Called as a background task.
Retries each inbox up to 3 times with exponential backoff.
Logs each attempt to ap_deliveries in the local kiwi.db.
"""
actor = get_actor()
if actor is None:
return
import sqlite3
conn = sqlite3.connect(str(db_path))
conn.row_factory = sqlite3.Row
try:
followers = conn.execute(
"SELECT inbox_url, shared_inbox FROM ap_followers WHERE active = 1"
).fetchall()
finally:
conn.close()
# Deduplicate by shared_inbox where available
inboxes: set[str] = set()
for row in followers:
inbox = row["shared_inbox"] or row["inbox_url"]
inboxes.add(inbox)
for inbox_url in inboxes:
_deliver_with_retry(post_slug=post_slug, activity=activity, inbox_url=inbox_url, db_path=db_path)
def _deliver_with_retry(
post_slug: str,
activity: dict,
inbox_url: str,
db_path: Path,
) -> None:
actor = get_actor()
if actor is None:
return
import sqlite3
conn = sqlite3.connect(str(db_path))
try:
conn.execute(
"INSERT OR IGNORE INTO ap_deliveries (post_slug, target_inbox, status) VALUES (?,?,?)",
(post_slug, inbox_url, "pending"),
)
conn.commit()
finally:
conn.close()
last_error: str | None = None
for attempt, delay in enumerate(_BACKOFF[:_RETRIES]):
try:
resp = deliver_activity(activity=activity, inbox_url=inbox_url, actor=actor, timeout=10.0)
if resp.status_code < 300:
_update_delivery(db_path, post_slug, inbox_url, "delivered", None)
return
last_error = f"HTTP {resp.status_code}"
except Exception as exc:
last_error = str(exc)[:200]
if attempt < _RETRIES - 1:
time.sleep(delay)
_update_delivery(db_path, post_slug, inbox_url, "failed", last_error)
logger.warning("AP delivery failed after %d attempts to %s: %s", _RETRIES, inbox_url, last_error)
def _update_delivery(
db_path: Path,
post_slug: str,
inbox_url: str,
status: str,
error: str | None,
) -> None:
import sqlite3
now = datetime.now(timezone.utc).isoformat()
conn = sqlite3.connect(str(db_path))
try:
if status == "delivered":
conn.execute(
"""UPDATE ap_deliveries SET status=?, attempts=attempts+1, delivered_at=?
WHERE post_slug=? AND target_inbox=?""",
(status, now, post_slug, inbox_url),
)
else:
conn.execute(
"""UPDATE ap_deliveries SET status=?, attempts=attempts+1, last_error=?
WHERE post_slug=? AND target_inbox=?""",
(status, error, post_slug, inbox_url),
)
conn.commit()
finally:
conn.close()

View file

@ -1,48 +0,0 @@
# app/services/ap/keys.py
# MIT License
from __future__ import annotations
import logging
from pathlib import Path
from circuitforge_core.activitypub import CFActor, generate_rsa_keypair, load_actor_from_key_file
logger = logging.getLogger(__name__)
_actor: CFActor | None = None
def get_actor() -> CFActor | None:
"""Return the loaded instance actor, or None if AP is not enabled."""
return _actor
def init_actor(host: str, key_path: Path) -> CFActor:
"""Load or generate the instance RSA keypair and build the CFActor singleton.
Called once at startup when AP_ENABLED=true. Generates a new 2048-bit keypair
if the key file does not yet exist (first boot).
"""
global _actor
key_path.parent.mkdir(parents=True, exist_ok=True)
if not key_path.exists():
logger.info("AP: no key file found at %s — generating new RSA-2048 keypair", key_path)
private_pem, _pub = generate_rsa_keypair(bits=2048)
key_path.write_text(private_pem, encoding="utf-8")
key_path.chmod(0o600)
base = f"https://{host}"
actor_id = f"{base}/ap/actor"
_actor = load_actor_from_key_file(
actor_id=actor_id,
username="kiwi",
display_name="Kiwi Pantry",
private_key_path=str(key_path),
summary="Community pantry and recipe feed from a Kiwi instance.",
)
logger.info("AP: instance actor loaded — %s", actor_id)
return _actor

View file

@ -1,194 +0,0 @@
# app/services/ap/mastodon.py
# MIT License
from __future__ import annotations
import logging
from pathlib import Path
import httpx
logger = logging.getLogger(__name__)
_APP_SCOPES = "write:statuses"
_APP_NAME = "Kiwi Pantry"
_APP_WEBSITE = "https://circuitforge.tech/kiwi"
def register_app(instance_url: str, redirect_uri: str) -> dict:
"""Dynamically register Kiwi as an OAuth app on the user's Mastodon instance.
Returns the app credentials dict (client_id, client_secret, etc.).
Raises httpx.HTTPError on failure.
"""
url = instance_url.rstrip("/") + "/api/v1/apps"
resp = httpx.post(
url,
data={
"client_name": _APP_NAME,
"redirect_uris": redirect_uri,
"scopes": _APP_SCOPES,
"website": _APP_WEBSITE,
},
timeout=10.0,
)
resp.raise_for_status()
return resp.json()
def build_authorize_url(instance_url: str, client_id: str, redirect_uri: str) -> str:
"""Return the OAuth authorize URL to redirect the user to."""
return (
f"{instance_url.rstrip('/')}/oauth/authorize"
f"?response_type=code"
f"&client_id={client_id}"
f"&redirect_uri={redirect_uri}"
f"&scope={_APP_SCOPES}"
)
def exchange_code(
instance_url: str,
client_id: str,
client_secret: str,
code: str,
redirect_uri: str,
) -> str:
"""Exchange an authorization code for an access token. Returns the token string."""
url = instance_url.rstrip("/") + "/oauth/token"
resp = httpx.post(
url,
data={
"grant_type": "authorization_code",
"client_id": client_id,
"client_secret": client_secret,
"redirect_uri": redirect_uri,
"code": code,
"scope": _APP_SCOPES,
},
timeout=10.0,
)
resp.raise_for_status()
return resp.json()["access_token"]
def post_status(instance_url: str, access_token: str, content: str) -> dict:
"""Post a status to the user's Mastodon account. Returns the status response dict."""
url = instance_url.rstrip("/") + "/api/v1/statuses"
resp = httpx.post(
url,
headers={"Authorization": f"Bearer {access_token}"},
json={"status": content, "visibility": "public"},
timeout=15.0,
)
resp.raise_for_status()
return resp.json()
def build_post_content(post: dict) -> str:
"""Format a community post dict as Mastodon-ready plain text."""
title = post.get("title") or "Untitled"
recipe = post.get("recipe_name")
notes = post.get("outcome_notes") or post.get("description")
tags_raw: list[str] = post.get("dietary_tags") or []
lines = []
if recipe and recipe != title:
lines.append(f"🍽 {title}{recipe}")
else:
lines.append(f"🍽 {title}")
if notes:
snippet = notes[:200].strip()
if len(notes) > 200:
snippet += ""
lines.append(f"\n{snippet}")
hashtags = ["#Kiwi", "#Cooking"]
for tag in tags_raw[:3]:
ht = "#" + "".join(w.capitalize() for w in tag.replace("-", " ").split())
hashtags.append(ht)
lines.append("\n" + " ".join(hashtags))
return "\n".join(lines)
def store_token(
db_path: Path,
directus_user_id: str,
instance_url: str,
access_token: str,
encryption_key: str | None,
) -> None:
"""Persist a Mastodon access token in the user's local kiwi.db."""
token_to_store = _encrypt(access_token, encryption_key)
import sqlite3
conn = sqlite3.connect(str(db_path))
try:
conn.execute(
"""INSERT INTO mastodon_tokens (directus_user_id, instance_url, access_token)
VALUES (?, ?, ?)
ON CONFLICT(directus_user_id) DO UPDATE SET
instance_url=excluded.instance_url,
access_token=excluded.access_token,
updated_at=datetime('now')""",
(directus_user_id, instance_url.rstrip("/"), token_to_store),
)
conn.commit()
finally:
conn.close()
def get_token(
db_path: Path,
directus_user_id: str,
encryption_key: str | None,
) -> tuple[str, str] | None:
"""Return (instance_url, plaintext_access_token) or None if not connected."""
import sqlite3
conn = sqlite3.connect(str(db_path))
try:
row = conn.execute(
"SELECT instance_url, access_token FROM mastodon_tokens WHERE directus_user_id = ?",
(directus_user_id,),
).fetchone()
finally:
conn.close()
if row is None:
return None
return row[0], _decrypt(row[1], encryption_key)
def delete_token(db_path: Path, directus_user_id: str) -> None:
"""Remove the user's stored Mastodon token."""
import sqlite3
conn = sqlite3.connect(str(db_path))
try:
conn.execute(
"DELETE FROM mastodon_tokens WHERE directus_user_id = ?", (directus_user_id,)
)
conn.commit()
finally:
conn.close()
def _encrypt(plaintext: str, key: str | None) -> str:
if key is None:
return plaintext
try:
from cryptography.fernet import Fernet
return Fernet(key.encode()).encrypt(plaintext.encode()).decode()
except Exception:
logger.warning("Mastodon token encryption failed — storing plaintext")
return plaintext
def _decrypt(ciphertext: str, key: str | None) -> str:
if key is None:
return ciphertext
try:
from cryptography.fernet import Fernet
return Fernet(key.encode()).decrypt(ciphertext.encode()).decode()
except Exception:
logger.warning("Mastodon token decryption failed — returning as-is")
return ciphertext

View file

@ -1,111 +0,0 @@
# app/services/community/dedup.py
# MIT License
from __future__ import annotations
import json
import logging
from pathlib import Path
logger = logging.getLogger(__name__)
_SIMILARITY_TIERS = {
"exact_recipe": "This exact recipe is already in the community feed.",
"very_similar": "Very similar recipes already exist (70%+ ingredient overlap).",
"somewhat_similar": "Somewhat similar recipes exist (35-70% ingredient overlap).",
"different": "No close matches found.",
}
def _parse_ingredient_names(raw) -> set[str]:
"""Return a normalised set of ingredient name tokens from various stored formats."""
if raw is None:
return set()
if isinstance(raw, str):
try:
raw = json.loads(raw)
except (ValueError, TypeError):
return set()
names: set[str] = set()
for item in raw:
if isinstance(item, str):
names.add(item.lower().strip())
elif isinstance(item, dict):
name = item.get("name") or item.get("ingredient") or ""
if name:
names.add(name.lower().strip())
return names
def jaccard(a: set[str], b: set[str]) -> float:
if not a and not b:
return 1.0
if not a or not b:
return 0.0
return len(a & b) / len(a | b)
def similarity_tier(jaccard_score: float, exact_recipe: bool) -> str:
if exact_recipe:
return "exact_recipe"
if jaccard_score >= 0.70:
return "very_similar"
if jaccard_score >= 0.35:
return "somewhat_similar"
return "different"
def fetch_recipe_ingredients(db_path: Path, recipe_id: int | None) -> set[str]:
"""Look up ingredient names for a recipe from the local corpus. Returns empty set on miss."""
if recipe_id is None:
return set()
try:
from app.db.store import Store
store = Store(db_path)
try:
row = store.get_recipe(recipe_id)
if row is None:
return set()
return _parse_ingredient_names(row.get("ingredient_names"))
finally:
store.close()
except Exception:
logger.debug("ingredient lookup failed for recipe_id=%s", recipe_id)
return set()
def build_similar_post_result(
post,
incoming_recipe_id: int | None,
incoming_ingredients: set[str],
db_path: Path,
) -> dict:
"""Build a similarity result dict for one existing community post."""
exact = (
incoming_recipe_id is not None
and post.recipe_id is not None
and post.recipe_id == incoming_recipe_id
)
j_score = 0.0
if not exact and incoming_ingredients:
existing_ingredients = fetch_recipe_ingredients(db_path, post.recipe_id)
if existing_ingredients:
j_score = jaccard(incoming_ingredients, existing_ingredients)
tier = similarity_tier(j_score, exact)
return {
"slug": post.slug,
"title": post.title,
"recipe_name": post.recipe_name,
"pseudonym": post.pseudonym,
"published": (
post.published.isoformat()
if hasattr(post.published, "isoformat")
else str(post.published)
),
"similarity_tier": tier,
"jaccard_score": round(j_score, 3) if not exact else None,
"tier_description": _SIMILARITY_TIERS.get(tier, ""),
}

View file

@ -1,233 +0,0 @@
# app/services/leftovers_predictor.py
"""Cooked-leftovers shelf-life predictor.
Fast path: deterministic lookup anchored to FDA/USDA safe food handling.
Fallback: LLM for unclassifiable edge cases (same gate as expiry_llm_matching).
Design notes:
- shortest-component-wins for proteins: a fish taco is bounded by the fish.
- category/keyword signals override ingredient signals for assembled dishes
(soup, stew, casserole) where the cooking method matters more than the
dominant protein.
- no urgency/panic framing see feedback_kiwi_no_panic.md.
"""
from __future__ import annotations
import logging
import re
from dataclasses import dataclass, field
from typing import Any
logger = logging.getLogger(__name__)
@dataclass
class LeftoversResult:
fridge_days: int
freeze_days: int | None # None = "not recommended"
freeze_by_day: int | None # day number from cook date to freeze by; None = no need
storage_advice: str
# ---------------------------------------------------------------------------
# Protein priority table — shorter shelf life wins when multiple match.
# Values: (fridge_days, freeze_days). All fridge values are conservative.
# Sources: USDA FoodKeeper, FDA Safe Food Handling.
# ---------------------------------------------------------------------------
_PROTEIN_SIGNALS: list[tuple[list[str], int, int | None]] = [
# (keyword_list, fridge_days, freeze_days)
(["fish", "salmon", "tuna", "cod", "tilapia", "halibut", "trout", "bass",
"mahi", "snapper", "flounder", "catfish", "swordfish", "sardine", "anchovy"],
2, 90),
(["shrimp", "prawn", "scallop", "crab", "lobster", "clam", "mussel",
"oyster", "squid", "octopus", "seafood"],
2, 90),
(["ground beef", "ground turkey", "ground pork", "ground chicken",
"ground meat", "hamburger", "mince"],
3, 90),
(["chicken", "turkey", "poultry", "duck", "hen"],
3, 90),
(["pork", "ham", "bacon", "sausage", "chorizo", "bratwurst", "kielbasa",
"salami", "pepperoni"],
4, 120),
(["beef", "steak", "brisket", "roast", "lamb", "veal", "venison"],
4, 180),
(["egg", "eggs", "frittata", "quiche", "omelette"],
3, None),
(["tofu", "tempeh", "seitan"],
4, 90),
]
# ---------------------------------------------------------------------------
# Dish-type signals — override protein signal when a structural match fires.
# Ordered from most-perishable to least.
# ---------------------------------------------------------------------------
_DISH_SIGNALS: list[tuple[list[str], int, int | None, str]] = [
# (keywords, fridge_days, freeze_days, storage_advice_fragment)
# Ceviche: acid denatures proteins but does not kill pathogens.
# FDA/USDA classify it as raw seafood — 2-day fridge max, do not freeze.
(["ceviche", "tiradito", "leche de tigre"],
2, None,
"Acid marination is not the same as heat cooking — treat as raw seafood. "
"Best eaten the day it's made; 2 days maximum in the fridge."),
# Fermented / salt-cured dishes — preservation extends shelf life significantly.
# This matches dish names, not just presence of the ingredient (lardo in a pasta
# follows normal pasta rules, not this entry).
(["kimchi", "sauerkraut", "preserved lemon"],
14, None,
"Fermented and salt-preserved dishes keep well. Store submerged in their brine."),
(["confit", "gravlax", "gravad lax", "lardo"],
7, 60,
"Store covered in its fat or cure. Keep cold and away from strong-smelling foods."),
(["soup", "stew", "broth", "chowder", "bisque", "gumbo", "chili"],
4, 120,
"Soups and stews keep well in the fridge. Cool to room temperature before covering."),
(["curry"],
4, 90,
"Store curry in an airtight container. The flavours deepen overnight."),
(["casserole", "bake", "gratin", "lasagna", "lasagne", "moussaka",
"shepherd's pie", "pot pie"],
5, 90,
"Cover tightly. Reheat individual portions rather than the whole dish."),
(["pasta", "noodle", "spaghetti", "penne", "linguine", "fettuccine",
"macaroni", "risotto"],
4, 60,
"Store pasta and sauce separately if possible to prevent sogginess."),
(["rice", "fried rice", "pilaf", "biryani"],
3, 90,
"Cool rice quickly — spread on a tray if needed. Don't leave at room temperature for more than 1 hour."),
(["salad"],
2, None,
"Keep dressing separate. Once dressed, best eaten the same day."),
(["stir fry", "stir-fry"],
3, 60,
"Reheat in a hot pan or wok rather than a microwave to keep texture."),
(["sandwich", "wrap", "taco", "burrito"],
2, None,
"Assemble fresh when possible. Fillings keep better stored separately."),
(["pizza"],
4, 60,
"Reheat in a dry skillet for a crisp base rather than a microwave."),
(["muffin", "bread", "biscuit", "scone", "roll"],
3, 90,
"Wrap tightly or seal in a bag to prevent drying out."),
(["cake", "pie", "cookie", "brownie", "dessert", "pudding"],
5, 90,
"Store covered at room temperature or in the fridge depending on fillings."),
(["smoothie", "juice", "shake"],
1, 7,
"Best consumed fresh. Stir or shake well before drinking."),
]
# Default when no signals match.
_DEFAULT_FRIDGE = 4
_DEFAULT_FREEZE = 90
_DEFAULT_ADVICE = "Store in an airtight container in the fridge. Reheat until piping hot before eating."
def _contains_any(text: str, keywords: list[str]) -> bool:
for kw in keywords:
if re.search(rf"\b{re.escape(kw)}\b", text, re.IGNORECASE):
return True
return False
def _scan_ingredients(ingredients: list[str]) -> tuple[int, int | None] | None:
"""Return (fridge_days, freeze_days) for the most-perishable protein found."""
joined = " ".join(str(i) for i in ingredients).lower()
best: tuple[int, int | None] | None = None
for keywords, fridge, freeze in _PROTEIN_SIGNALS:
if _contains_any(joined, keywords):
if best is None or fridge < best[0]:
best = (fridge, freeze)
return best
def _scan_dish_type(text: str) -> tuple[int, int | None, str] | None:
"""Return (fridge_days, freeze_days, advice) for the first matching dish type."""
for keywords, fridge, freeze, advice in _DISH_SIGNALS:
if _contains_any(text, keywords):
return fridge, freeze, advice
return None
def predict_leftovers(
title: str,
ingredients: list[str],
category: str | None = None,
keywords: list[str] | None = None,
) -> LeftoversResult:
"""Predict cooked-leftover shelf life deterministically.
Falls back gracefully always returns a result even for unknown recipes.
"""
# Build a combined text blob for dish-type scanning.
search_text = " ".join(filter(None, [
title,
category or "",
" ".join(keywords or []),
]))
# Dish-type match takes structural priority over raw ingredient protein signal.
dish = _scan_dish_type(search_text)
protein = _scan_ingredients(ingredients)
if dish:
fridge_days, freeze_days, base_advice = dish
# Still apply shortest-protein-wins if protein is more perishable than dish default.
if protein and protein[0] < fridge_days:
fridge_days = protein[0]
if protein[1] is not None and (freeze_days is None or protein[1] < freeze_days):
freeze_days = protein[1]
advice = base_advice
elif protein:
fridge_days, freeze_days = protein
advice = _DEFAULT_ADVICE
else:
fridge_days = _DEFAULT_FRIDGE
freeze_days = _DEFAULT_FREEZE
advice = _DEFAULT_ADVICE
# freeze_by_day: recommend freezing on day 2 if fridge window is tight (≤3 days).
freeze_by_day: int | None = None
if freeze_days is not None and fridge_days <= 3:
freeze_by_day = 2
return LeftoversResult(
fridge_days=fridge_days,
freeze_days=freeze_days,
freeze_by_day=freeze_by_day,
storage_advice=advice,
)
def predict_leftovers_from_row(recipe: dict[str, Any]) -> LeftoversResult:
"""Convenience wrapper that accepts a Store row dict directly."""
import json as _json
title = recipe.get("title") or ""
raw_ingredients = recipe.get("ingredient_names") or []
if isinstance(raw_ingredients, str):
try:
raw_ingredients = _json.loads(raw_ingredients)
except Exception:
raw_ingredients = [raw_ingredients]
raw_keywords = recipe.get("keywords") or []
if isinstance(raw_keywords, str):
try:
raw_keywords = _json.loads(raw_keywords)
except Exception:
raw_keywords = [raw_keywords]
return predict_leftovers(
title=title,
ingredients=[str(i) for i in raw_ingredients],
category=recipe.get("category"),
keywords=[str(k) for k in raw_keywords],
)

View file

@ -1,97 +0,0 @@
"""Magpie data-flywheel hook.
Fires anonymized recipe-signal events to the Magpie ingest endpoint when a
user saves or rates a recipe. This is the Kiwi side of the flywheel Magpie
does not have a receiver endpoint yet, so the hook stubs out gracefully: if
``MAGPIE_INGEST_URL`` is unset, or the request fails for any reason, it logs
at DEBUG level and returns without raising.
"""
from __future__ import annotations
import logging
from pathlib import Path
logger = logging.getLogger(__name__)
_INGEST_PATH = "/api/v1/ingest/recipe-signal"
async def fire_recipe_signal(
db_path: Path,
recipe_id: int,
rating: int | None,
style_tags: list[str],
) -> None:
"""Post an anonymized recipe signal to Magpie if the user has opted in.
Args:
db_path: Path to the user's SQLite database.
recipe_id: Internal Kiwi recipe ID being rated/saved.
rating: Star rating (05) or None if not yet rated.
style_tags: Style tags applied to the saved recipe.
"""
from app.core.config import settings
if not settings.MAGPIE_INGEST_URL:
return
# Check per-user opt-in via a short-lived Store (own connection, own thread
# context is fine — this runs in the async event loop as a background task
# so we open and close the connection immediately).
from app.db.store import Store
try:
store = Store(db_path)
try:
opt_in = store.get_setting("magpie_opt_in")
finally:
store.close()
except Exception as exc: # noqa: BLE001
logger.debug("magpie_hook: could not read magpie_opt_in setting: %s", exc)
return
if opt_in != "true":
return
# Fetch the recipe to get its external_id (source URL slug / corpus key).
try:
store = Store(db_path)
try:
recipe = store.get_recipe(recipe_id)
finally:
store.close()
except Exception as exc: # noqa: BLE001
logger.debug("magpie_hook: could not fetch recipe %d: %s", recipe_id, exc)
return
if recipe is None:
logger.debug("magpie_hook: recipe %d not found, skipping", recipe_id)
return
external_id: str | None = recipe.get("external_id") if isinstance(recipe, dict) else getattr(recipe, "external_id", None)
if not external_id:
# Corpus recipe not yet enriched with a source identifier — skip quietly.
logger.debug("magpie_hook: recipe %d has no external_id, skipping", recipe_id)
return
payload = {
"product": "kiwi",
"signal": "recipe_rating",
"external_id": external_id,
"rating": rating,
"style_tags": style_tags,
}
url = settings.MAGPIE_INGEST_URL.rstrip("/") + _INGEST_PATH
try:
import httpx
async with httpx.AsyncClient(timeout=3.0) as client:
response = await client.post(url, json=payload)
logger.debug(
"magpie_hook: POST %s%d", url, response.status_code
)
except Exception as exc: # noqa: BLE001
# Magpie may not have a receiver yet — log and swallow.
logger.debug("magpie_hook: ingest request failed (stub): %s", exc)

View file

@ -2,20 +2,17 @@
# BSL 1.1 — LLM feature
"""Provide a router-compatible LLM client for meal plan generation tasks.
Cloud (CF_ORCH_URL set), tier 1 task-based routing (preferred):
Calls /api/inference/task with product=kiwi, task=meal_plan.
The coordinator resolves the model from assignments.yaml.
Cloud (CF_ORCH_URL set), tier 2 direct allocation (fallback):
Allocates cf-text directly via client.allocate(). Used when the task
is not yet registered in the coordinator (cf-orch#61 not deployed).
Cloud (CF_ORCH_URL set):
Allocates a cf-text service via cf-orch (3B-7B GGUF, ~2GB VRAM).
Returns an _OrchTextRouter that wraps the cf-text HTTP endpoint
with a .complete(system, user, **kwargs) interface.
Local / self-hosted (no CF_ORCH_URL):
Returns an LLMRouter instance which tries ollama, vllm, or any
backend configured in ~/.config/circuitforge/llm.yaml.
All paths expose the same (router, ctx) interface so llm_planner.py
needs no knowledge of the backend.
Both paths expose the same interface so llm_timing.py and llm_planner.py
need no knowledge of the backend.
"""
from __future__ import annotations
@ -25,7 +22,8 @@ from contextlib import nullcontext
logger = logging.getLogger(__name__)
# cf-orch service name and TTL for direct-allocate fallback path.
# cf-orch service name and VRAM budget for meal plan LLM tasks.
# These are lighter than recipe_llm (4.0 GB) — cf-text handles them.
_SERVICE_TYPE = "cf-text"
_TTL_S = 120.0
_CALLER = "kiwi-meal-plan"
@ -64,79 +62,35 @@ class _OrchTextRouter:
return resp.choices[0].message.content or ""
# Imported at module level so tests can patch the names in this module's namespace.
# app.services.task_inference.task_allocate — patch target for task routing tests.
try:
from app.services.task_inference import TaskNotRegistered, task_allocate
_HAS_TASK_INFERENCE = True
except ImportError:
_HAS_TASK_INFERENCE = False
# circuitforge_orch.client.CFOrchClient — patch target for direct-allocate fallback tests.
try:
from circuitforge_orch.client import CFOrchClient
except ImportError:
CFOrchClient = None # type: ignore[assignment,misc]
# circuitforge_core.llm.router.LLMRouter — patch target for local-inference tests.
try:
from circuitforge_core.llm.router import LLMRouter
except (ImportError, FileNotFoundError):
LLMRouter = None # type: ignore[assignment,misc]
def get_meal_plan_router():
"""Return an LLM client for meal plan tasks.
Returns (router, ctx) where ctx is a context manager the caller holds
open for the duration of the LLM call. Returns (None, nullcontext(None))
if no backend is available.
Tries cf-orch cf-text allocation first (cloud); falls back to LLMRouter
(local ollama/vllm). Returns None if no backend is available.
"""
cf_orch_url = os.environ.get("CF_ORCH_URL")
if cf_orch_url:
# Tier 1: task-based routing — coordinator owns model selection.
if _HAS_TASK_INFERENCE:
try:
ctx = task_allocate(
"kiwi", "meal_plan",
service_hint=_SERVICE_TYPE,
ttl_s=_TTL_S,
)
alloc = ctx.__enter__()
return _OrchTextRouter(alloc.url), ctx
except TaskNotRegistered:
logger.debug(
"kiwi.meal_plan not in coordinator assignments — "
"falling back to direct cf-text allocation"
)
except Exception as exc:
logger.debug("task allocation failed, trying direct allocate: %s", exc)
# Tier 2: direct allocation — hardcoded service type.
if CFOrchClient is not None:
try:
client = CFOrchClient(cf_orch_url)
ctx = client.allocate(
service=_SERVICE_TYPE,
ttl_s=_TTL_S,
caller=_CALLER,
)
alloc = ctx.__enter__()
if alloc is not None:
return _OrchTextRouter(alloc.url), ctx
ctx.__exit__(None, None, None) # release allocation before falling through
except Exception as exc:
logger.debug("cf-orch cf-text allocation failed, falling back to LLMRouter: %s", exc)
# Tier 3: local inference — ollama / vllm / openai-compat.
if LLMRouter is not None:
try:
return LLMRouter(), nullcontext(None)
except FileNotFoundError:
logger.debug("LLMRouter: no llm.yaml and no LLM env vars — meal plan LLM disabled")
return None, nullcontext(None)
from circuitforge_orch.client import CFOrchClient
client = CFOrchClient(cf_orch_url)
ctx = client.allocate(
service=_SERVICE_TYPE,
ttl_s=_TTL_S,
caller=_CALLER,
)
alloc = ctx.__enter__()
if alloc is not None:
return _OrchTextRouter(alloc.url), ctx
except Exception as exc:
logger.debug("LLMRouter init failed: %s", exc)
return None, nullcontext(None)
return None, nullcontext(None)
logger.debug("cf-orch cf-text allocation failed, falling back to LLMRouter: %s", exc)
# Local fallback: LLMRouter (ollama / vllm / openai-compat)
try:
from circuitforge_core.llm.router import LLMRouter
return LLMRouter(), nullcontext(None)
except FileNotFoundError:
logger.debug("LLMRouter: no llm.yaml and no LLM env vars — meal plan LLM disabled")
return None, nullcontext(None)
except Exception as exc:
logger.debug("LLMRouter init failed: %s", exc)
return None, nullcontext(None)

View file

@ -18,51 +18,43 @@ class DocuvisionResult:
class DocuvisionClient:
"""Thin client for the cf-docuvision service."""
def __init__(self, base_url: str, timeout: float = 120.0) -> None:
def __init__(self, base_url: str) -> None:
self._base_url = base_url.rstrip("/")
self._timeout = timeout
def extract_text(self, image_path: str | Path, hint: str = "text") -> DocuvisionResult:
"""Send an image to docuvision and return extracted text.
Args:
image_path: Path to the image file.
hint: Docuvision extraction hint "text" for dense prose (recipes),
"table" for tabular data, "form" for form fields, "auto" for
automatic detection.
"""
def extract_text(self, image_path: str | Path) -> DocuvisionResult:
"""Send an image to docuvision and return extracted text."""
image_bytes = Path(image_path).read_bytes()
b64 = base64.b64encode(image_bytes).decode()
with httpx.Client(timeout=self._timeout) as client:
with httpx.Client(timeout=30.0) as client:
resp = client.post(
f"{self._base_url}/extract",
json={"image_b64": b64, "hint": hint},
json={"image": b64},
)
resp.raise_for_status()
data = resp.json()
return DocuvisionResult(
text=data.get("raw_text", ""),
confidence=data.get("metadata", {}).get("confidence"),
text=data.get("text", ""),
confidence=data.get("confidence"),
raw=data,
)
async def extract_text_async(self, image_path: str | Path, hint: str = "text") -> DocuvisionResult:
async def extract_text_async(self, image_path: str | Path) -> DocuvisionResult:
"""Async version."""
image_bytes = Path(image_path).read_bytes()
b64 = base64.b64encode(image_bytes).decode()
async with httpx.AsyncClient(timeout=self._timeout) as client:
async with httpx.AsyncClient(timeout=30.0) as client:
resp = await client.post(
f"{self._base_url}/extract",
json={"image_b64": b64, "hint": hint},
json={"image": b64},
)
resp.raise_for_status()
data = resp.json()
return DocuvisionResult(
text=data.get("raw_text", ""),
confidence=data.get("metadata", {}).get("confidence"),
text=data.get("text", ""),
confidence=data.get("confidence"),
raw=data,
)

View file

@ -32,29 +32,6 @@ def _try_docuvision(image_path: str | Path) -> str | None:
cf_orch_url = os.environ.get("CF_ORCH_URL")
if not cf_orch_url:
return None
# Tier 1: task-based routing — coordinator owns model selection.
try:
from app.services.task_inference import task_allocate, TaskNotRegistered
from app.services.ocr.docuvision_client import DocuvisionClient
try:
with task_allocate(
"kiwi", "ocr",
service_hint="cf-docuvision",
ttl_s=60.0,
) as alloc:
doc_client = DocuvisionClient(alloc.url)
result = doc_client.extract_text(image_path)
return result.text if result.text else None
except TaskNotRegistered:
logger.debug(
"kiwi.ocr not in coordinator assignments — "
"falling back to direct cf-docuvision allocation"
)
except Exception as exc:
logger.debug("task allocation path failed, trying direct allocate: %s", exc)
# Tier 2: direct allocation — hardcoded service type.
try:
from circuitforge_orch.client import CFOrchClient
from app.services.ocr.docuvision_client import DocuvisionClient
@ -72,7 +49,7 @@ def _try_docuvision(image_path: str | Path) -> str | None:
result = doc_client.extract_text(image_path)
return result.text if result.text else None
except Exception as exc:
logger.debug("cf-docuvision fast-path failed, falling back to local VLM: %s", exc)
logger.debug("cf-docuvision fast-path failed, falling back: %s", exc)
return None

View file

@ -26,7 +26,7 @@ DOMAINS: dict[str, dict] = {
"label": "Cuisine",
"categories": {
"Italian": {
"keywords": ["cuisine:Italian", "italian", "pasta", "pizza", "risotto", "lasagna", "carbonara"],
"keywords": ["italian", "pasta", "pizza", "risotto", "lasagna", "carbonara"],
"subcategories": {
"Sicilian": ["sicilian", "sicily", "arancini", "caponata",
"involtini", "cannoli"],
@ -43,8 +43,8 @@ DOMAINS: dict[str, dict] = {
},
},
"Mexican": {
"keywords": ["cuisine:Mexican", "mexican", "taco", "enchilada", "burrito",
"salsa", "guacamole", "mole", "tamale"],
"keywords": ["mexican", "taco", "enchilada", "burrito", "salsa",
"guacamole", "mole", "tamale"],
"subcategories": {
"Oaxacan": ["oaxacan", "oaxaca", "mole negro", "tlayuda",
"chapulines", "mezcal", "tasajo", "memelas"],
@ -67,9 +67,7 @@ DOMAINS: dict[str, dict] = {
},
},
"Asian": {
"keywords": ["cuisine:Chinese", "cuisine:Japanese", "cuisine:Korean",
"cuisine:Thai", "cuisine:Vietnamese",
"asian", "chinese", "japanese", "thai", "korean", "vietnamese",
"keywords": ["asian", "chinese", "japanese", "thai", "korean", "vietnamese",
"stir fry", "stir-fry", "ramen", "sushi", "malaysian",
"taiwanese", "singaporean", "burmese", "cambodian",
"laotian", "mongolian", "hong kong"],
@ -130,7 +128,7 @@ DOMAINS: dict[str, dict] = {
},
},
"Indian": {
"keywords": ["cuisine:Indian", "indian", "curry", "lentil", "dal", "tikka", "masala",
"keywords": ["indian", "curry", "lentil", "dal", "tikka", "masala",
"biryani", "naan", "chutney", "pakistani", "sri lankan",
"bangladeshi", "nepali"],
"subcategories": {
@ -158,8 +156,7 @@ DOMAINS: dict[str, dict] = {
},
},
"Mediterranean": {
"keywords": ["cuisine:Mediterranean", "cuisine:Greek", "cuisine:Middle Eastern",
"mediterranean", "greek", "middle eastern", "turkish",
"keywords": ["mediterranean", "greek", "middle eastern", "turkish",
"lebanese", "jewish", "palestinian", "yemeni", "egyptian",
"syrian", "iraqi", "jordanian"],
"subcategories": {
@ -193,8 +190,7 @@ DOMAINS: dict[str, dict] = {
},
},
"American": {
"keywords": ["cuisine:American", "cuisine:Southern", "cuisine:Cajun",
"american", "southern", "comfort food", "cajun", "creole",
"keywords": ["american", "southern", "comfort food", "cajun", "creole",
"hawaiian", "tex-mex", "soul food"],
"subcategories": {
"Southern": ["southern", "soul food", "fried chicken",
@ -218,8 +214,10 @@ DOMAINS: dict[str, dict] = {
},
},
"BBQ & Smoke": {
# Top-level keywords: cuisine:BBQ inferred tag + broad corpus terms.
"keywords": ["cuisine:BBQ", "bbq", "barbecue", "barbeque", "smoked", "smoky",
# Top-level keywords use broad corpus-friendly terms that appear in
# food.com keyword/category fields (e.g. "BBQ", "Oven BBQ", "Smoker").
# Subcategory keywords remain specific for drill-down filtering.
"keywords": ["bbq", "barbecue", "barbeque", "smoked", "smoky",
"smoke", "pit", "smoke ring", "low and slow",
"brisket", "pulled pork", "ribs", "spare ribs",
"baby back", "baby back ribs", "dry rub", "wet rub",
@ -253,8 +251,7 @@ DOMAINS: dict[str, dict] = {
},
},
"European": {
"keywords": ["cuisine:French", "cuisine:German", "cuisine:Spanish",
"french", "german", "spanish", "british", "irish", "scottish",
"keywords": ["french", "german", "spanish", "british", "irish", "scottish",
"welsh", "scandinavian", "nordic", "eastern european"],
"subcategories": {
"French": ["french", "provencal", "beurre", "crepe",
@ -284,8 +281,7 @@ DOMAINS: dict[str, dict] = {
},
},
"Latin American": {
"keywords": ["cuisine:Latin American", "cuisine:Caribbean",
"latin american", "peruvian", "argentinian", "colombian",
"keywords": ["latin american", "peruvian", "argentinian", "colombian",
"cuban", "caribbean", "brazilian", "venezuelan", "chilean"],
"subcategories": {
"Peruvian": ["peruvian", "ceviche", "lomo saltado", "anticucho",
@ -429,18 +425,12 @@ DOMAINS: dict[str, dict] = {
"meal_type": {
"label": "Meal Type",
"categories": {
# Keywords use two complementary sources:
# 1. inferred_tag phrases ("meal:X", "main:X") — indexed in recipe_browser_fts.inferred_tags.
# FTS5 tokenises "meal:Breakfast" → ["meal","breakfast"], so the quoted phrase
# "meal:Breakfast" matches exactly that consecutive token pair.
# 2. Corpus keyword/category text — only covers the ~1,200 keyword-tagged recipes.
# Kept as a fallback; not the primary signal.
"Breakfast": {
"keywords": ["meal:Breakfast", "breakfast", "brunch", "pancakes",
"waffles", "oatmeal", "muffin"],
"keywords": ["breakfast", "brunch", "eggs", "pancakes", "waffles",
"oatmeal", "muffin"],
"subcategories": {
"Eggs": ["meal:Breakfast", "egg", "omelette", "frittata",
"quiche", "scrambled", "benedict", "shakshuka"],
"Eggs": ["egg", "omelette", "frittata", "quiche",
"scrambled", "benedict", "shakshuka"],
"Pancakes & Waffles": ["pancake", "waffle", "crepe", "french toast"],
"Baked Goods": ["muffin", "scone", "biscuit", "quick bread",
"coffee cake", "danish"],
@ -449,15 +439,12 @@ DOMAINS: dict[str, dict] = {
},
},
"Lunch": {
# meal:Lunch tag covers explicitly-tagged recipes.
# Coverage is limited — most lunch-style recipes have no distinct meal-type tag.
"keywords": ["meal:Lunch", "lunch", "sandwich", "wrap", "salad",
"soup", "light meal"],
"keywords": ["lunch", "sandwich", "wrap", "salad", "soup", "light meal"],
"subcategories": {
"Sandwiches": ["sandwich", "sub", "hoagie", "panini", "club",
"grilled cheese", "blt"],
"Salads": ["salad", "grain bowl", "chopped", "caesar",
"cobb"],
"niçoise", "cobb"],
"Soups": ["soup", "bisque", "chowder", "gazpacho",
"minestrone", "lentil soup"],
"Wraps": ["wrap", "burrito bowl", "pita", "lettuce wrap",
@ -465,27 +452,23 @@ DOMAINS: dict[str, dict] = {
},
},
"Dinner": {
# Primary: main:X inferred tags (800k+ recipes).
# "meal:Dinner" does not exist in the inferred-tag vocabulary — main-protein
# tags are the best available proxy for main-course dinner recipes.
"keywords": ["main:Chicken", "main:Beef", "main:Pork", "main:Fish",
"main:Pasta", "dinner", "main dish", "entree",
"main course", "supper"],
"keywords": ["dinner", "main dish", "entree", "main course", "supper"],
"subcategories": {
"Chicken": ["main:Chicken"],
"Beef": ["main:Beef"],
"Pork": ["main:Pork"],
"Fish & Seafood": ["main:Fish"],
"Pasta": ["main:Pasta"],
"Casseroles": ["casserole", "bake", "gratin", "pot pie"],
"Casseroles": ["casserole", "bake", "gratin", "lasagna",
"sheperd's pie", "pot pie"],
"Stews": ["stew", "braise", "slow cooker", "pot roast",
"daube"],
"Grilled": ["grilled", "grill", "barbecue", "kebab", "skewer"],
"daube", "ragù"],
"Grilled": ["grilled", "grill", "barbecue", "charred",
"kebab", "skewer"],
"Stir-Fries": ["stir fry", "stir-fry", "wok", "sauté",
"sauteed"],
"Roasts": ["roast", "roasted", "oven", "baked chicken",
"pot roast"],
},
},
"Snack": {
"keywords": ["meal:Snack", "snack", "appetizer", "finger food",
"dip", "bite", "starter"],
"keywords": ["snack", "appetizer", "finger food", "dip", "bite",
"starter"],
"subcategories": {
"Dips & Spreads": ["dip", "spread", "hummus", "guacamole",
"salsa", "pate"],
@ -496,9 +479,8 @@ DOMAINS: dict[str, dict] = {
},
},
"Dessert": {
# "sweet" removed — it matches flavor:Sweet inferred tags, causing false positives.
"keywords": ["meal:Dessert", "dessert", "cake", "cookie", "pie",
"pudding", "ice cream", "brownie"],
"keywords": ["dessert", "cake", "cookie", "pie", "sweet", "pudding",
"ice cream", "brownie"],
"subcategories": {
"Cakes": ["cake", "cupcake", "layer cake", "bundt",
"cheesecake", "torte"],
@ -514,41 +496,20 @@ DOMAINS: dict[str, dict] = {
"caramel", "toffee"],
},
},
"Beverage": ["meal:Beverage", "drink", "smoothie", "cocktail", "beverage",
"juice", "shake", "lemonade"],
"Side Dish": {
# meal:Side Dish not in inferred-tag vocabulary.
# main:Vegetables and main:Grains are the best proxies — will overlap
# with some vegetarian mains, which is acceptable.
"keywords": ["main:Vegetables", "main:Grains", "side dish", "side",
"pilaf", "accompaniment"],
"subcategories": {
"Vegetables": ["main:Vegetables"],
"Grains & Rice": ["main:Grains", "rice", "pilaf", "quinoa"],
"Bread": ["meal:Bread", "bread", "roll", "biscuit"],
},
},
"Beverage": ["drink", "smoothie", "cocktail", "beverage", "juice", "shake"],
"Side Dish": ["side dish", "side", "accompaniment", "garnish"],
},
},
"dietary": {
"label": "Dietary",
# Primary: dietary:X inferred tags (indexed in recipe_browser_fts.inferred_tags).
# Secondary: text tokens kept as fallback for keyword-tagged recipes.
# IMPORTANT: Use ONLY structured dietary:X phrases here.
# Bare text keywords like "vegan", "low-carb" also match can_be:Vegan,
# can_be:Low-Carb etc. — those are "achievable with substitutions", not
# "recipe already is". The structured phrase "dietary:Vegan" (consecutive
# FTS tokens "dietary"+"vegan") does NOT match can_be:Vegan.
"categories": {
"Vegetarian": ["dietary:Vegetarian"],
"Vegan": ["dietary:Vegan"],
"Gluten-Free": ["dietary:Gluten-Free"],
"Low-Carb": ["dietary:Low-Carb"],
"High-Protein": ["dietary:High-Protein"],
"Low-Fat": ["dietary:Low-Fat"],
"Dairy-Free": ["dietary:Dairy-Free"],
"Low-Sodium": ["dietary:Low-Sodium"],
"Paleo": ["dietary:Paleo"],
"Vegetarian": ["vegetarian"],
"Vegan": ["vegan", "plant-based", "plant based"],
"Gluten-Free": ["gluten-free", "gluten free", "celiac"],
"Low-Carb": ["low-carb", "low carb", "keto", "ketogenic"],
"High-Protein": ["high protein", "high-protein"],
"Low-Fat": ["low-fat", "low fat", "light"],
"Dairy-Free": ["dairy-free", "dairy free", "lactose"],
},
},
"main_ingredient": {

View file

@ -93,18 +93,7 @@ class ElementClassifier:
return self._heuristic_profile(name)
def classify_batch(self, names: list[str]) -> list[IngredientProfile]:
"""Classify multiple names in one DB round-trip, falling back to heuristics."""
if not names:
return []
normalised = [n.lower().strip() for n in names]
c = self._store._cp
placeholders = ",".join("?" * len(normalised))
rows = self._store._fetch_all(
f"SELECT * FROM {c}ingredient_profiles WHERE name IN ({placeholders})",
tuple(normalised),
)
by_name = {r["name"]: self._row_to_profile(r) for r in rows}
return [by_name.get(n) or self._heuristic_profile(n) for n in normalised]
return [self.classify(n) for n in names]
def identify_gaps(self, profiles: list[IngredientProfile]) -> list[str]:
"""Return element names that have no coverage in the given profile list."""

View file

@ -1,14 +1,13 @@
"""LLM-driven recipe generator for Levels 3 and 4."""
from __future__ import annotations
import asyncio
import logging
import os
import re
from contextlib import nullcontext
from typing import TYPE_CHECKING, AsyncGenerator
from typing import TYPE_CHECKING
from openai import AsyncOpenAI, OpenAI
from openai import OpenAI
if TYPE_CHECKING:
from app.db.store import Store
@ -150,8 +149,8 @@ class LLMRecipeGenerator:
return "\n".join(lines)
_SERVICE_TYPE = "cf-text"
_MODEL_CANDIDATES = ["granite-4.1-8b", "deepseek-r1-1.5b"]
_SERVICE_TYPE = "vllm"
_MODEL_CANDIDATES = ["Qwen2.5-3B-Instruct", "Phi-4-mini-instruct"]
_TTL_S = 300.0
_CALLER = "kiwi-recipe"
@ -183,12 +182,7 @@ class LLMRecipeGenerator:
With CF_ORCH_URL set: acquires a vLLM allocation via CFOrchClient and
calls the OpenAI-compatible API directly against the allocated service URL.
Falls back to LLMRouter when:
- Allocation succeeded but the service is cold (warm=False) avoids
making the user wait for model load; LLMRouter uses Ollama which is
already running.
- Allocation succeeded but the connection to the service URL fails the
agent may have registered the service but failed to start it.
Allocation failure falls through to LLMRouter rather than silently returning "".
Without CF_ORCH_URL: uses LLMRouter directly.
"""
ctx = self._get_llm_context()
@ -214,15 +208,6 @@ class LLMRecipeGenerator:
try:
if alloc is not None:
# Skip cold services — model not yet loaded means the user would
# wait 60120 s for model load before any response. Use LLMRouter
# (Ollama) instead, which is already warm on the host.
if not alloc.warm:
logger.info(
"cf-orch vllm allocated but cold (warm=False) — releasing and falling back to LLMRouter"
)
raise RuntimeError("vllm cold")
base_url = alloc.url.rstrip("/") + "/v1"
client = OpenAI(base_url=base_url, api_key="any")
model = alloc.model or "__auto__"
@ -238,20 +223,6 @@ class LLMRecipeGenerator:
return LLMRouter().complete(prompt)
except Exception as exc:
logger.error("LLM call failed: %s", exc)
# When cf-orch gave us an allocation but the service is unreachable
# (cold skip, connection refused, or other error), fall back to
# LLMRouter rather than silently returning empty.
# Skip "vllm" in the fallback order — that backend also routes through
# cf-orch, which would trigger a second (wasted) cold allocation.
if alloc is not None:
logger.info("Falling back to LLMRouter after vllm failure")
try:
from circuitforge_core.llm.router import LLMRouter
router = LLMRouter()
_order = [b for b in (router.config.get("fallback_order") or []) if b != "vllm"]
return router.complete(prompt, fallback_order=_order or None)
except Exception as fallback_exc:
logger.error("LLMRouter fallback also failed: %s", fallback_exc)
return ""
finally:
if ctx is not None:
@ -388,91 +359,3 @@ class LLMRecipeGenerator:
suggestions=[suggestion],
element_gaps=gaps,
)
async def stream_generate(
self,
req: RecipeRequest,
profiles: list,
gaps: list[str],
) -> AsyncGenerator[str, None]:
"""Stream LLM tokens for L3/L4. Yields raw text chunks as they arrive.
Tries cf-orch warm vllm first; falls back to Ollama via AsyncOpenAI.
When neither is reachable, falls back to blocking _call_llm and yields
the complete response as a single chunk so the caller always gets output.
"""
if req.level == 4:
prompt = self.build_level4_prompt(req)
else:
prompt = self.build_level3_prompt(req, profiles, gaps)
# Phase 1: try cf-orch warm vllm (sync allocation, wrapped in thread)
alloc_info = await asyncio.to_thread(self._try_alloc_for_stream)
if alloc_info is not None:
alloc, ctx = alloc_info
try:
async for token in self._stream_openai_compat(
alloc.url.rstrip("/") + "/v1", "any", alloc.model or "__auto__", prompt
):
yield token
return
except Exception as exc:
logger.debug("cf-orch stream failed, falling back to Ollama: %s", exc)
finally:
await asyncio.to_thread(lambda: _safe_exit(ctx))
# Phase 2: Ollama streaming via OpenAI-compat API
from circuitforge_core.llm.router import LLMRouter
router = LLMRouter()
ollama = router.config.get("backends", {}).get("ollama")
if ollama and ollama.get("enabled", True):
base_url = ollama["base_url"]
model = ollama.get("model", "llama3")
try:
async for token in self._stream_openai_compat(base_url, "any", model, prompt):
yield token
return
except Exception as exc:
logger.warning("Ollama streaming failed, falling back to blocking: %s", exc)
# Phase 3: blocking fallback — yields full response at once
result = await asyncio.to_thread(self._call_llm, prompt)
if result:
yield result
def _try_alloc_for_stream(self):
"""Attempt cf-orch allocation synchronously; return (alloc, ctx) or None."""
ctx = self._get_llm_context()
try:
alloc = ctx.__enter__()
if alloc is not None and alloc.warm:
return alloc, ctx
# Not warm — release and signal fallback
_safe_exit(ctx)
except Exception as exc:
logger.debug("cf-orch alloc for stream failed: %s", exc)
return None
@staticmethod
async def _stream_openai_compat(
base_url: str, api_key: str, model: str, prompt: str
) -> AsyncGenerator[str, None]:
client = AsyncOpenAI(base_url=base_url, api_key=api_key)
if model == "__auto__":
models = await client.models.list()
model = models.data[0].id
stream = await client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
stream=True,
)
async for chunk in stream:
if chunk.choices and chunk.choices[0].delta.content:
yield chunk.choices[0].delta.content
def _safe_exit(ctx) -> None:
try:
ctx.__exit__(None, None, None)
except Exception:
pass

View file

@ -20,7 +20,7 @@ from typing import TYPE_CHECKING
if TYPE_CHECKING:
from app.db.store import Store
from app.models.schemas.recipe import GroceryLink, NutritionPanel, RecipeRequest, RecipeResult, RecipeSuggestion, StepAnalysis, TimeEffortProfile, SwapCandidate
from app.models.schemas.recipe import GroceryLink, NutritionPanel, RecipeRequest, RecipeResult, RecipeSuggestion, SwapCandidate
from app.services.recipe.element_classifier import ElementClassifier
from app.services.recipe.grocery_links import GroceryLinkBuilder
from app.services.recipe.substitution_engine import SubstitutionEngine
@ -36,38 +36,6 @@ _SWAP_STOPWORDS = frozenset({
"to", "from", "at", "by", "as", "on",
})
# Marketing / prep / packaging words stripped when tokenising product-label names
# into individual ingredient tokens. Parallel to Store._FTS_TOKEN_STOPWORDS —
# both lists should agree. Kept here to avoid a circular import at runtime.
_PRODUCT_TOKEN_STOPWORDS = frozenset({
# Basic English stopwords
"a", "an", "the", "of", "in", "for", "with", "and", "or", "to",
"from", "at", "by", "as", "on", "into",
# Brand / marketing words that appear in product names
"lean", "cuisine", "healthy", "choice", "stouffer", "original",
"classic", "deluxe", "homestyle", "family", "style", "grade",
"premium", "select", "natural", "organic", "fresh", "lite",
"ready", "quick", "easy", "instant", "microwave", "frozen",
"brand", "size", "large", "small", "medium", "extra",
# Plant-based / alt-meat brand names
"daring", "gardein", "morningstar", "lightlife", "tofurky",
"quorn", "omni", "nuggs", "simulate",
# Preparation states
"cut", "diced", "sliced", "chopped", "minced", "shredded",
"cooked", "raw", "whole", "boneless", "skinless", "trimmed",
"pre", "prepared", "marinated", "seasoned", "breaded", "battered",
"grilled", "roasted", "smoked", "canned", "dried", "dehydrated",
"pieces", "piece", "strips", "strip", "chunks", "chunk",
"fillets", "fillet", "cutlets", "cutlet", "tenders", "nuggets",
# Units / packaging
"oz", "lb", "lbs", "pkg", "pack", "box", "can", "bag", "jar",
# Adjectives that aren't ingredients
"firm", "soft", "silken", "hard", "crispy", "crunchy", "smooth",
"mild", "spicy", "hot", "sweet", "savory", "unsalted", "salted",
"low", "high", "reduced", "free", "fat", "sodium", "sugar", "calorie",
"dairy", "gluten", "vegan", "plant", "based", "free",
})
# Maps product-label substrings to recipe-corpus canonical terms.
# Kept in sync with Store._FTS_SYNONYMS — both must agree on canonical names.
# Used to expand pantry_set so single-word recipe ingredients can match
@ -395,13 +363,6 @@ def _expand_pantry_set(
if pattern in lower:
expanded.add(canonical)
# Extract individual ingredient tokens from multi-word product names.
# "Organic Extra Firm Tofu" → adds "tofu"; "Brown Basmati Rice" → adds "rice".
# This catches plain ingredients that _PANTRY_LABEL_SYNONYMS doesn't translate.
for token in lower.split():
if len(token) >= 4 and token not in _PRODUCT_TOKEN_STOPWORDS:
expanded.add(token)
# Secondary state expansion — adds terms like "stale bread", "day-old rice"
if secondary_pantry_items and item in secondary_pantry_items:
state_label = secondary_pantry_items[item]
@ -775,13 +736,9 @@ class RecipeEngine:
# - match ratio: require ≥60% ingredient coverage to avoid low-signal results
_l1 = req.level == 1 and not req.shopping_mode
nf = req.nutrition_filters
# L1 uses a larger candidate pool — the ratio gate below will prune
# aggressively anyway, so we need more raw candidates to end up with
# enough results for a packaged-food / plant-based pantry.
_fts_limit = 60 if _l1 else 20
rows = self._store.search_recipes_by_ingredients(
req.pantry_items,
limit=_fts_limit,
limit=20,
category=req.category or None,
max_calories=nf.max_calories,
max_sugar_g=nf.max_sugar_g,
@ -792,11 +749,8 @@ class RecipeEngine:
)
# L1 strict defaults: cap missing ingredients and require a minimum ratio.
# 0.35 allows ~1/3 ingredient coverage — low enough for packaged/plant-based
# pantries that rarely match raw-ingredient corpus recipes 1:1, but still
# filters out recipes where only one common staple matched.
_L1_MAX_MISSING_DEFAULT = 2
_L1_MIN_MATCH_RATIO = 0.35
_L1_MIN_MATCH_RATIO = 0.6
effective_max_missing = req.max_missing
if _l1 and effective_max_missing is None:
effective_max_missing = _L1_MAX_MISSING_DEFAULT
@ -880,14 +834,9 @@ class RecipeEngine:
except Exception:
directions = [directions]
# Compute complexity + parse time effort once — reused for filters and response.
# Compute complexity for every suggestion (used for badge + filter).
row_complexity = _classify_method_complexity(directions, available_equipment)
row_time_min = _estimate_time_min(directions, row_complexity)
row_time_effort = parse_time_effort(
directions,
ingredients=row.get("ingredients") or [],
ingredient_names=row.get("ingredient_names") or [],
)
# Filter and tier-rank by hard_day_mode
if req.hard_day_mode:
@ -907,24 +856,9 @@ class RecipeEngine:
if req.max_time_min is not None and row_time_min > req.max_time_min:
continue
# Total time filter (kiwi#52).
# Prefer parsed time extracted from direction text (explicit "15 minutes" mentions).
# When directions contain no parseable time signals, fall back to the
# step-count estimate so the filter still has teeth on the corpus majority.
if req.max_total_min is not None:
if row_time_effort.total_min > 0:
if row_time_effort.total_min > req.max_total_min:
continue
elif row_time_min > req.max_total_min:
continue
# Active (hands-on) time filter — independent of total time.
# Lets users request "≤30 min hands-on, any total" to include slow braises.
# Skips recipes where active_min == 0 (no time signals parsed) to avoid
# hiding valid results when the parser couldn't extract timing.
if req.max_active_min is not None and row_time_effort.active_min > 0:
if row_time_effort.active_min > req.max_active_min:
continue
# Total time filter (kiwi#52) — uses parsed time from directions
if req.max_total_min is not None and not _within_time(directions, req.max_total_min):
continue
# Level 2: also add dietary constraint swaps from substitution_pairs
if req.level == 2 and req.constraints:
@ -963,21 +897,6 @@ class RecipeEngine:
v is not None
for v in (nutrition.calories, nutrition.sugar_g, nutrition.carbs_g)
)
te = TimeEffortProfile(
active_min=row_time_effort.active_min,
passive_min=row_time_effort.passive_min,
total_min=row_time_effort.total_min,
effort_label=row_time_effort.effort_label,
equipment=list(row_time_effort.equipment),
step_analyses=[
StepAnalysis(
is_passive=sa.is_passive,
detected_minutes=sa.detected_minutes,
prep_min=sa.prep_min,
)
for sa in row_time_effort.step_analyses
],
)
suggestions.append(RecipeSuggestion(
id=row["id"],
title=row["title"],
@ -986,14 +905,12 @@ class RecipeEngine:
swap_candidates=swap_candidates,
matched_ingredients=matched,
missing_ingredients=missing,
directions=directions,
prep_notes=sorted(prep_note_set),
level=req.level,
nutrition=nutrition if has_nutrition else None,
source_url=_build_source_url(row),
complexity=row_complexity,
estimated_time_min=row_time_min,
time_effort=te,
))
# Sort corpus results.

View file

@ -1,524 +0,0 @@
"""Recipe scanner service (kiwi#9).
Extracts structured recipe data from one or more photos of recipe cards,
cookbook pages, or handwritten notes.
Pipeline:
photo(s) -> EXIF correction -> VLM extraction -> JSON parse -> pantry cross-ref
Vision backend priority (mirrors receipt OCR pattern):
1. cf-orch vision service (if CF_ORCH_URL set)
2. Local Qwen2.5-VL (if GPU available)
3. Anthropic API (BYOK -- if ANTHROPIC_API_KEY set)
BSL 1.1 -- requires Paid tier or BYOK.
"""
from __future__ import annotations
import base64
import io
import json
import logging
import os
import re
from collections.abc import Callable
from dataclasses import dataclass
from pathlib import Path
logger = logging.getLogger(__name__)
# Maximum number of photos per scan call (to limit VLM context / VRAM)
MAX_IMAGES = 4
# VLM prompt -- adapted from tests/fixtures/recipe_scan/extract_test.py
_EXTRACTION_PROMPT = """
You are extracting a recipe from a photograph of a recipe card, cookbook page, or handwritten note.
If two or more images are provided, treat them as a single recipe across multiple pages
(e.g. ingredients on page 1, directions on page 2).
Return a single JSON object with these fields:
- title: recipe name (string)
- subtitle: any secondary title or serving suggestion e.g. "with Broccoli & Ranch Dressing" (string or null)
- servings: serving size if shown, as a string e.g. "2", "4-6" (string or null)
- cook_time: total cook time if shown, e.g. "15 min", "1 hour" (string or null)
- source_note: any attribution text like "From Betty Crocker" or "Purple Carrot" (string or null)
- ingredients: array of ingredient objects, each with:
- name: normalized generic ingredient name, lowercase, no quantities, no brand names
(e.g. "Follow Your Heart Vegan Ranch" becomes "ranch dressing")
- qty: quantity as a string, preserving fractions e.g. "1/2", a quarter symbol (string or null)
- unit: unit of measure, null for countable items (e.g. "3 eggs" has unit: null)
- raw: the original ingredient line verbatim, exactly as it appears
- steps: ordered array of instruction strings, one distinct step per element
- notes: any tips, substitutions, storage instructions, or variations (string or null)
- confidence: "high" if text is clear and complete, "medium" if some parts are uncertain,
"low" if mostly handwritten or significantly degraded
- warnings: array of strings describing anything the user should double-check
(e.g. "Directions appear to continue on another page not shown")
Return only valid JSON. No markdown fences. No explanation outside the JSON.
If the image does not appear to be a recipe at all, return: {"error": "not_a_recipe"}
""".strip()
# ── Data types ─────────────────────────────────────────────────────────────────
@dataclass
class ScannedIngredient:
name: str
qty: str | None = None
unit: str | None = None
raw: str | None = None
in_pantry: bool = False
@dataclass
class ScannedRecipeResult:
title: str | None
subtitle: str | None
servings: str | None
cook_time: str | None
source_note: str | None
ingredients: list[ScannedIngredient]
steps: list[str]
notes: str | None
tags: list[str]
pantry_match_pct: int
confidence: str
warnings: list[str]
# ── Image helpers ──────────────────────────────────────────────────────────────
def _load_image_b64(path: Path) -> str:
"""Load image, apply EXIF rotation, return base64-encoded JPEG bytes."""
from PIL import Image, ImageOps
with open(path, "rb") as f:
raw = f.read()
img = Image.open(io.BytesIO(raw))
img = ImageOps.exif_transpose(img).convert("RGB")
buf = io.BytesIO()
img.save(buf, format="JPEG", quality=90)
return base64.b64encode(buf.getvalue()).decode()
# ── Vision backend ─────────────────────────────────────────────────────────────
def _call_via_anthropic(image_paths: list[Path], prompt: str) -> str:
"""Send image(s) + prompt to Anthropic API. Raises RuntimeError if unavailable."""
try:
import anthropic
except ImportError as exc:
raise RuntimeError("anthropic package not installed") from exc
api_key = os.environ.get("ANTHROPIC_API_KEY")
if not api_key:
raise RuntimeError("ANTHROPIC_API_KEY not set")
client = anthropic.Anthropic(api_key=api_key)
content: list[dict] = []
for i, path in enumerate(image_paths):
if i > 0:
content.append({"type": "text", "text": f"(Page {i + 1} of the same recipe:)"})
content.append({
"type": "image",
"source": {
"type": "base64",
"media_type": "image/jpeg",
"data": _load_image_b64(path),
},
})
content.append({"type": "text", "text": prompt})
msg = client.messages.create(
# Haiku is cost-efficient for well-structured extraction prompts
model="claude-haiku-4-5-20251001",
max_tokens=2048,
messages=[{"role": "user", "content": content}],
)
return msg.content[0].text.strip()
def _call_via_local_vlm(image_paths: list[Path], prompt: str) -> str:
"""Send image(s) + prompt to local Qwen2.5-VL. Raises RuntimeError if unavailable."""
try:
import torch
except ImportError as exc:
raise RuntimeError("torch not installed") from exc
if not torch.cuda.is_available():
raise RuntimeError("No CUDA device -- local VLM unavailable")
# Lazy import so the module loads fast when GPU is absent
from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
from PIL import Image, ImageOps
model_name = "Qwen/Qwen2.5-VL-7B-Instruct"
logger.info("Loading local VLM for recipe scan: %s", model_name)
model = Qwen2VLForConditionalGeneration.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto",
low_cpu_mem_usage=True,
)
processor = AutoProcessor.from_pretrained(model_name)
model.train(False) # inference mode
images = []
for path in image_paths:
with open(path, "rb") as f:
raw = f.read()
img = Image.open(io.BytesIO(raw))
img = ImageOps.exif_transpose(img).convert("RGB")
images.append(img)
inputs = processor(images=images, text=prompt, return_tensors="pt")
inputs = {k: v.to("cuda", torch.float16) if isinstance(v, torch.Tensor) else v
for k, v in inputs.items()}
with torch.no_grad():
output_ids = model.generate(
**inputs,
max_new_tokens=2048,
do_sample=False,
temperature=0.0,
)
output = processor.decode(output_ids[0], skip_special_tokens=True)
output = output.replace(prompt, "").strip()
# Free VRAM
del model
torch.cuda.empty_cache()
return output
def _build_ocr_extraction_prompt(ocr_text: str) -> str:
"""Build a text-LLM prompt for structuring OCR output into recipe JSON.
Swaps the image-centric preamble of _EXTRACTION_PROMPT for an OCR-centric
one, then appends the combined OCR text as input. The JSON schema section
is shared verbatim to keep the two paths in sync.
"""
schema_idx = _EXTRACTION_PROMPT.find("Return a single JSON object")
schema_part = _EXTRACTION_PROMPT[schema_idx:] if schema_idx != -1 else _EXTRACTION_PROMPT
return (
"You are extracting a recipe from OCR text taken from a recipe card, "
"cookbook page, or handwritten note.\n\n"
"The text below was obtained via optical character recognition and may "
"contain minor scanning artifacts or formatting irregularities.\n\n"
f"{schema_part}\n\nOCR Text:\n{ocr_text}"
)
def _call_via_cf_text_vlm(alloc_url: str, image_paths: list[Path], prompt: str) -> str:
"""Call the cf-text OpenAI-compat API with images via the llama.cpp multimodal backend."""
import httpx
content: list[dict] = []
for i, path in enumerate(image_paths):
if i > 0:
content.append({"type": "text", "text": f"(Page {i + 1} of the same recipe:)"})
b64 = _load_image_b64(path)
content.append({
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{b64}"},
})
content.append({"type": "text", "text": prompt})
resp = httpx.post(
f"{alloc_url.rstrip('/')}/v1/chat/completions",
json={
"model": "local",
"messages": [{"role": "user", "content": content}],
"max_tokens": 2048,
"temperature": 0.0,
},
timeout=180.0,
)
resp.raise_for_status()
return resp.json()["choices"][0]["message"]["content"].strip()
def _call_vision_backend(
image_paths: list[Path],
prompt: str,
progress_cb: "Callable[[str, str], None] | None" = None,
) -> str:
"""Dispatch to the best available vision backend.
Priority: cf-orch (Qwen2-VL GGUF via cf-text) -> local Qwen2.5-VL -> Anthropic API.
Raises RuntimeError with a clear message when no backend is available.
Args:
image_paths: Images to process.
prompt: Extraction prompt (used by local VLM / Anthropic paths).
progress_cb: Optional callback(status, message) for SSE progress events.
Called synchronously from the thread caller bridges to async.
"""
def _progress(status: str, message: str) -> None:
if progress_cb:
progress_cb(status, message)
errors: list[str] = []
# 1. Try cf-orch task allocation → cf-docuvision (Qwen2-VL GGUF via llama.cpp).
# Two-step: docuvision OCRs the image(s), then LLMRouter structures the text into JSON.
cf_orch_url = os.environ.get("CF_ORCH_URL")
if cf_orch_url:
try:
from app.services.task_inference import TaskNotRegistered, task_allocate
from app.services.ocr.docuvision_client import DocuvisionClient
from circuitforge_core.llm.router import LLMRouter
try:
_progress("allocating", "Starting vision service...")
with task_allocate("kiwi", "recipe_scan", service_hint="cf-docuvision", ttl_s=120.0) as alloc:
_progress("scanning", "Extracting recipe text from photo...")
doc_client = DocuvisionClient(alloc.url)
ocr_parts: list[str] = []
for i, path in enumerate(image_paths):
result = doc_client.extract_text(path, hint="text")
prefix = f"(Page {i + 1} of the same recipe)\n" if len(image_paths) > 1 else ""
ocr_parts.append(f"{prefix}{result.text}")
combined_ocr = "\n\n".join(ocr_parts)
if not combined_ocr.strip():
raise ValueError("Docuvision returned no text — image may not be a recipe")
_progress("structuring", "Parsing recipe structure...")
text = LLMRouter().complete(
_build_ocr_extraction_prompt(combined_ocr),
system="You are a recipe data extractor. Return ONLY valid JSON. No markdown, no explanation, no code fences.",
)
if text:
return text
except TaskNotRegistered:
logger.debug("kiwi.recipe_scan not yet registered in cf-orch assignments")
except Exception as exc:
logger.debug("cf-orch vision failed for recipe scan: %s", exc)
errors.append(f"cf-orch: {exc}")
# 2. Try local Qwen2.5-VL
try:
return _call_via_local_vlm(image_paths, prompt)
except Exception as exc:
logger.debug("Local VLM unavailable for recipe scan: %s", exc)
errors.append(f"local VLM: {exc}")
# 3. Try Anthropic API (BYOK)
try:
return _call_via_anthropic(image_paths, prompt)
except Exception as exc:
logger.debug("Anthropic API failed for recipe scan: %s", exc)
errors.append(f"Anthropic: {exc}")
raise RuntimeError(
"No vision backend configured for recipe scanning. "
"Options: cf-orch (CF_ORCH_URL), local GPU, or ANTHROPIC_API_KEY (BYOK). "
f"Errors: {'; '.join(errors)}"
)
# ── Parsing helpers ────────────────────────────────────────────────────────────
def _normalize_ingredient_name(name: str) -> str:
"""Lowercase + strip whitespace. Preserves multi-word names as-is."""
return name.lower().strip()
def _extract_json_object(text: str) -> str | None:
"""Return the first balanced JSON object from text, or None if not found.
Uses brace-counting rather than a greedy regex so trailing prose and
nested objects are handled correctly.
"""
start = text.find("{")
if start == -1:
return None
depth = 0
in_string = False
escape_next = False
for i, ch in enumerate(text[start:], start):
if escape_next:
escape_next = False
continue
if ch == "\\" and in_string:
escape_next = True
continue
if ch == '"':
in_string = not in_string
continue
if in_string:
continue
if ch == "{":
depth += 1
elif ch == "}":
depth -= 1
if depth == 0:
return text[start : i + 1]
return None
def _parse_scanner_json(raw_text: str) -> dict:
"""Extract and return the JSON dict from VLM output.
Handles:
- Pure JSON
- JSON in ```json ... ``` markdown fences
- Qwen3-style <think>...</think> or <thinking>...</thinking> preambles
- JSON preceded or followed by prose
Raises ValueError on not_a_recipe or unparseable output.
"""
text = raw_text.strip()
# Strip thinking-token blocks emitted by reasoning models (Qwen3, DeepSeek-R1, etc.)
text = re.sub(r"<think>.*?</think>", "", text, flags=re.DOTALL | re.IGNORECASE).strip()
text = re.sub(r"<thinking>.*?</thinking>", "", text, flags=re.DOTALL | re.IGNORECASE).strip()
# Strip markdown fences if present
if "```" in text:
# Find the content between the first ``` pair
fence_match = re.search(r"```(?:json)?\s*(\{.*?\})\s*```", text, re.DOTALL)
if fence_match:
text = fence_match.group(1).strip()
# Try direct parse
try:
data = json.loads(text)
except json.JSONDecodeError:
# Fall back to brace-balanced extraction from anywhere in the output
candidate = _extract_json_object(text)
if not candidate:
logger.warning("Could not parse JSON from LLM output (first 400 chars): %r", text[:400])
raise ValueError(f"Could not parse JSON from VLM output: {text[:200]!r}")
try:
data = json.loads(candidate)
except json.JSONDecodeError as exc:
logger.warning("Brace-extracted JSON still invalid: %r", candidate[:400])
raise ValueError(f"Could not parse JSON from VLM output: {exc}") from exc
if isinstance(data, dict) and data.get("error") == "not_a_recipe":
raise ValueError("not_a_recipe: image does not appear to contain a recipe")
return data
# ── Pantry cross-reference ─────────────────────────────────────────────────────
def _cross_reference_pantry(
ingredients: list[ScannedIngredient],
pantry_names: list[str],
) -> tuple[list[ScannedIngredient], int]:
"""Mark ingredients found in the pantry and return updated list + match percent.
Matching is bidirectional by token:
- "broccoli florets" matches pantry item "broccoli" (pantry token in ingredient)
- "pumpkin seeds" matches pantry "pumpkin seeds" (exact)
Returns (updated_ingredients, pantry_match_pct).
"""
if not ingredients:
return ingredients, 0
normalized_pantry = [_normalize_ingredient_name(p) for p in pantry_names]
updated: list[ScannedIngredient] = []
matched = 0
for ingr in ingredients:
norm_ingr = _normalize_ingredient_name(ingr.name)
in_pantry = any(
(p_tok in norm_ingr or norm_ingr in p_tok)
for p in normalized_pantry
for p_tok in p.split()
if len(p_tok) >= 4 # skip short stop-words like "of", "and", "the"
)
updated.append(ScannedIngredient(
name=ingr.name,
qty=ingr.qty,
unit=ingr.unit,
raw=ingr.raw,
in_pantry=in_pantry,
))
if in_pantry:
matched += 1
pct = round(matched / len(ingredients) * 100)
return updated, pct
# ── Main scanner class ─────────────────────────────────────────────────────────
class RecipeScanner:
"""Stateless recipe scanner. One instance can be reused across requests."""
def scan(
self,
image_paths: list[Path],
pantry_names: list[str] | None = None,
progress_cb: Callable[[str, str], None] | None = None,
) -> ScannedRecipeResult:
"""Extract a structured recipe from one or more photos.
Args:
image_paths: 1-4 image files (phone photos, scans).
pantry_names: Flat list of product names from user's inventory.
Pass [] or None to skip pantry cross-reference.
Returns:
ScannedRecipeResult with all fields populated.
Raises:
ValueError: Image is not a recipe, or JSON could not be parsed.
RuntimeError: No vision backend is configured.
"""
if not image_paths:
raise ValueError("At least one image is required")
if len(image_paths) > MAX_IMAGES:
raise ValueError(f"Maximum {MAX_IMAGES} images per scan (got {len(image_paths)})")
# Call vision backend
raw_text = _call_vision_backend(image_paths, _EXTRACTION_PROMPT, progress_cb=progress_cb)
# Parse JSON from VLM output
data = _parse_scanner_json(raw_text)
# Build ingredient list
raw_ingredients = data.get("ingredients") or []
ingredients: list[ScannedIngredient] = [
ScannedIngredient(
name=str(item.get("name") or "").strip() or "unknown",
qty=str(item["qty"]) if item.get("qty") is not None else None,
unit=str(item["unit"]) if item.get("unit") is not None else None,
raw=str(item["raw"]) if item.get("raw") is not None else None,
)
for item in raw_ingredients
if isinstance(item, dict)
]
# Pantry cross-reference
ingredients, pct = _cross_reference_pantry(
ingredients,
pantry_names or [],
)
return ScannedRecipeResult(
title=data.get("title") or None,
subtitle=data.get("subtitle") or None,
servings=str(data["servings"]) if data.get("servings") is not None else None,
cook_time=str(data["cook_time"]) if data.get("cook_time") is not None else None,
source_note=data.get("source_note") or None,
ingredients=ingredients,
steps=[str(s) for s in (data.get("steps") or []) if s],
notes=data.get("notes") or None,
tags=list(data.get("tags") or []),
pantry_match_pct=pct,
confidence=data.get("confidence") or "medium",
warnings=list(data.get("warnings") or []),
)

View file

@ -1,139 +0,0 @@
# app/services/recipe/style_classifier.py
# BSL 1.1 — LLM feature
"""LLM style-tag classifier for saved recipes.
Reads recipe title, ingredients, and directions and suggests 35 style tags
from the curated vocabulary shared with SaveRecipeModal.vue.
Cloud (CF_ORCH_URL set): allocates a cf-text service via cf-orch (2 GB VRAM).
Local: falls back to LLMRouter (ollama / vllm / openai-compat).
"""
from __future__ import annotations
import json
import logging
import os
import re
from contextlib import nullcontext
from typing import Any
logger = logging.getLogger(__name__)
_SERVICE_TYPE = "cf-text"
_TTL_S = 60.0
_CALLER = "kiwi-style-classify"
# Canonical vocabulary — must stay in sync with SUGGESTED_TAGS in SaveRecipeModal.vue.
STYLE_TAG_VOCAB: frozenset[str] = frozenset({
"comforting", "light", "spicy", "umami", "sweet", "savory", "rich",
"crispy", "creamy", "hearty", "quick", "hands-off", "meal-prep-friendly",
"fancy", "one-pot",
})
_SYSTEM_PROMPT = """\
You are a culinary tagger. Given a recipe, suggest 3 to 5 style tags that best \
describe its character. You MUST only use tags from this list:
comforting, light, spicy, umami, sweet, savory, rich, crispy, creamy, hearty, \
quick, hands-off, meal-prep-friendly, fancy, one-pot
Return ONLY a JSON array of strings, no explanation. Example:
["comforting", "hearty", "one-pot"]
"""
def _build_router():
"""Return (router, context_manager) for style classify tasks.
Tries cf-orch cf-text allocation first; falls back to LLMRouter.
Returns (None, nullcontext) if no backend is available.
"""
cf_orch_url = os.environ.get("CF_ORCH_URL")
if cf_orch_url:
try:
from app.services.meal_plan.llm_router import _OrchTextRouter # reuse adapter
from circuitforge_orch.client import CFOrchClient
client = CFOrchClient(cf_orch_url)
ctx = client.allocate(service=_SERVICE_TYPE, ttl_s=_TTL_S, caller=_CALLER)
alloc = ctx.__enter__()
if alloc is not None:
return _OrchTextRouter(alloc.url), ctx
except Exception as exc:
logger.debug("cf-orch allocation failed for style classify, falling back: %s", exc)
try:
from circuitforge_core.llm.router import LLMRouter
return LLMRouter(), nullcontext(None)
except FileNotFoundError:
logger.debug("LLMRouter: no llm.yaml — style classifier LLM disabled")
return None, nullcontext(None)
except Exception as exc:
logger.debug("LLMRouter init failed: %s", exc)
return None, nullcontext(None)
def _parse_tags(raw: str) -> list[str]:
"""Extract valid vocab tags from raw LLM output.
Tries JSON parse first; falls back to extracting any vocab word present
in the response text so minor formatting deviations still work.
"""
# Strip markdown fences
raw = re.sub(r"```[a-z]*", "", raw).strip()
try:
parsed = json.loads(raw)
if isinstance(parsed, list):
return [t for t in parsed if isinstance(t, str) and t in STYLE_TAG_VOCAB][:5]
except (json.JSONDecodeError, ValueError):
pass
# Fallback: scan for vocab words
found = [t for t in STYLE_TAG_VOCAB if re.search(rf"\b{re.escape(t)}\b", raw, re.IGNORECASE)]
return sorted(found, key=lambda t: raw.lower().index(t.lower()))[:5]
def classify_style(recipe: dict[str, Any]) -> list[str]:
"""Return 35 suggested style tags for *recipe*.
*recipe* is a Store row dict with at least ``title``, ``ingredient_names``
(list[str]), and ``directions`` (list[str] or str).
Returns an empty list if no LLM backend is available.
"""
router, ctx = _build_router()
if router is None:
return []
title = recipe.get("title") or "Unknown"
ingredients = recipe.get("ingredient_names") or []
if isinstance(ingredients, str):
try:
ingredients = json.loads(ingredients)
except Exception:
ingredients = [ingredients]
directions = recipe.get("directions") or []
if isinstance(directions, str):
try:
directions = json.loads(directions)
except Exception:
directions = [directions]
user_prompt = (
f"Recipe: {title}\n"
f"Ingredients: {', '.join(str(i) for i in ingredients[:20])}\n"
f"Steps: {' '.join(str(d) for d in directions[:8])[:600]}"
)
try:
with ctx:
raw = router.complete(
system=_SYSTEM_PROMPT,
user=user_prompt,
max_tokens=64,
temperature=0.3,
)
return _parse_tags(raw)
except Exception as exc:
logger.warning("Style classifier LLM call failed: %s", exc)
return []

View file

@ -22,8 +22,6 @@ queries find recipes the food.com corpus tags alone would miss.
"""
from __future__ import annotations
import re
# ---------------------------------------------------------------------------
# Text-signal tables
@ -123,50 +121,6 @@ _TIME_SIGNALS: list[tuple[str, list[str]]] = [
("time:Slow Cook", ["slow cooker", "crockpot", "< 4 hours", "braise"]),
]
# ---------------------------------------------------------------------------
# Meal type signals — matched against TITLE ONLY (not ingredient text).
# Ingredient names frequently contain words like "cake flour" or "sandwich
# bread" which would produce false meal-type tags if matched against the full
# title+ingredient string.
# ---------------------------------------------------------------------------
_MEAL_SIGNALS: list[tuple[str, list[str]]] = [
("meal:Breakfast", [
"breakfast", "pancake", "waffle", "french toast", "scrambled egg",
"frittata", "hash brown", "hash browns", "breakfast burrito",
"breakfast sandwich", "breakfast casserole", "overnight oat",
"granola", "oatmeal", "muffin", "morning glory", "eggs benedict",
"shakshuka", "crepe", "scone",
]),
("meal:Dessert", [
"dessert", "cake", "cookie", "brownie", "cheesecake", "pudding",
"fudge", "ice cream", "sorbet", "cupcake", "mousse", "candy",
"truffle", "gelato", "donut", "doughnut", "cobbler", "crisp",
"crumble", "tiramisu", "eclair", "sundae", "milkshake", "parfait",
"biscotti", "macaron", "panna cotta", "baklava", "churro", "tart",
"torte", "strudel", "compote", "semifreddo",
]),
("meal:Snack", [
"snack", "appetizer", "dip", "chips", "popcorn", "trail mix",
"energy ball", "deviled egg", "cheese ball", "nachos",
"pretzel bites", "protein ball", "granola bar",
]),
("meal:Beverage", [
"smoothie", "cocktail", "mocktail", "lemonade", "limeade",
"margarita", "sangria", "punch", "milkshake", "milk shake",
"juice", "spritzer", "iced tea", "hot chocolate", "chai latte",
"mulled wine", "eggnog", "slushie", "frappe", "horchata",
"agua fresca", "shrub", "switchel",
]),
("meal:Lunch", [
"lunch", "sandwich", "panini", "grilled cheese", "wrap",
"lunchbox", "lunch box",
]),
("meal:Bread", [
"bread", "sourdough", "focaccia", "flatbread", "dinner roll",
"loaf", "baguette", "ciabatta", "brioche", "challah", "pita",
]),
]
_MAIN_INGREDIENT_SIGNALS: list[tuple[str, list[str]]] = [
("main:Chicken", ["chicken", "poultry", "turkey"]),
("main:Beef", ["beef", "ground beef", "steak", "brisket", "pot roast"]),
@ -242,29 +196,6 @@ def _match_signals(text: str, table: list[tuple[str, list[str]]]) -> list[str]:
return [tag for tag, pats in table if any(p in text for p in pats)]
def _match_title_signals(title: str, table: list[tuple[str, list[str]]]) -> list[str]:
"""Match signals against title text only, using word-boundary + optional plural.
Pattern: `\\bWORD(?:s|es)?\\b`
This handles:
- Plurals: "cookie" matches "cookies", "sandwich" matches "sandwiches"
- Substring rejection: "cake" does NOT match "pancake" (no word boundary
before 'c' in pan|cake), "tart" does NOT match "tartare" (after "tart"
the 'a' is a word char, not a boundary)
- Avoids false positives from ingredient text ("cake flour", "sandwich bread")
by only matching the recipe title, not the full title+ingredient string.
"""
t = title.lower()
return [
tag for tag, pats in table
if any(
re.search(r"\b" + re.escape(p.strip()) + r"(?:s|es)?\b", t)
for p in pats
)
]
def infer_tags(
title: str,
ingredient_names: list[str],
@ -327,9 +258,6 @@ def infer_tags(
tags.update(_match_signals(text, _FLAVOR_SIGNALS))
tags.update(_match_signals(text, _MAIN_INGREDIENT_SIGNALS))
# Meal type: title-only to avoid "cake flour" → meal:Dessert false positives
tags.update(_match_title_signals(title, _MEAL_SIGNALS))
# 3. Time signals from corpus keywords + text
corpus_text = " ".join(kw.lower() for kw in corpus_keywords)
tags.update(_match_signals(corpus_text, _TIME_SIGNALS))

View file

@ -1,27 +1,17 @@
"""
Runtime parser for active/passive time split, prep effort, and equipment detection.
Runtime parser for active/passive time split and equipment detection.
Operates over a list of direction strings plus an optional ingredient list.
No I/O pure Python functions. Sub-millisecond for up to 20 recipes.
Time estimation strategy (in priority order):
1. Explicit time mention in step text ("simmer for 20 minutes")
2. Passive keyword + per-technique default ("bake until golden" 30 min)
3. Prep action + ingredient quantity scaling ("dice 2 lbs potatoes" ~5 min)
4. Fallback active default (assembly/misc steps 2 min each)
Quantity scaling uses n^0.75 (sub-linear, matching human batch-work curves).
Pass `ingredients` + `ingredient_names` to enable cross-referenced scaling.
Without them, prep actions use base times only (no scaling).
Operates over a list of direction strings. No I/O pure Python functions.
Sub-millisecond for up to 20 recipes (20 × ~10 steps each = 200 regex calls).
"""
from __future__ import annotations
import math
import re
from dataclasses import dataclass, field
from dataclasses import dataclass
from typing import Final
# ── Passive step keywords ─────────────────────────────────────────────────
# ── Passive step keywords (whole-word, case-insensitive) ──────────────────
_PASSIVE_PATTERNS: Final[list[str]] = [
"simmer", "bake", "roast", "broil", "refrigerate", "marinate",
@ -30,39 +20,19 @@ _PASSIVE_PATTERNS: Final[list[str]] = [
r"slow\s+cook", r"pressure\s+cook",
]
# Pre-compiled as a single alternation — avoids re-compiling on every call.
_PASSIVE_RE: re.Pattern[str] = re.compile(
r"\b(?:" + "|".join(_PASSIVE_PATTERNS) + r")\b",
re.IGNORECASE,
)
# Per-technique passive defaults (minutes) — used when no explicit time found.
# Calibrated to conservative midpoints from USDA FoodKeeper + culinary practice.
_PASSIVE_DEFAULTS: Final[list[tuple[re.Pattern[str], int]]] = [
# Multi-word first (longer match wins)
(re.compile(r"\bslow\s+cook\b", re.IGNORECASE), 300), # 5 hr crockpot default
(re.compile(r"\bpressure\s+cook\b", re.IGNORECASE), 15),
(re.compile(r"\bovernight\b", re.IGNORECASE), 480), # 8 hr
# Single-word
(re.compile(r"\bbraise\b", re.IGNORECASE), 90),
(re.compile(r"\bmarinate\b", re.IGNORECASE), 60),
(re.compile(r"\brefrigerate\b", re.IGNORECASE), 120),
(re.compile(r"\bproof\b|\brise\b", re.IGNORECASE), 60),
(re.compile(r"\bsoak\b", re.IGNORECASE), 30),
(re.compile(r"\bfreeze\b", re.IGNORECASE), 120),
(re.compile(r"\bchill\b", re.IGNORECASE), 60),
(re.compile(r"\broast\b", re.IGNORECASE), 40),
(re.compile(r"\bbake\b", re.IGNORECASE), 30),
(re.compile(r"\bbroil\b", re.IGNORECASE), 8),
(re.compile(r"\bsimmer\b", re.IGNORECASE), 20),
(re.compile(r"\bset\b", re.IGNORECASE), 30), # gelatin / custard set
(re.compile(r"\bsteep\b", re.IGNORECASE), 5),
(re.compile(r"\brest\b|\bstand\b", re.IGNORECASE), 10),
(re.compile(r"\bcool\b", re.IGNORECASE), 15),
(re.compile(r"\bwait\b|\blet\b", re.IGNORECASE), 5),
]
# ── Explicit time extraction ──────────────────────────────────────────────
# ── Time extraction regex ─────────────────────────────────────────────────
# Two-branch pattern:
# Branch A (groups 1-3): range "15-20 minutes", "1520 min"
# Branch B (groups 4-5): single "10 minutes", "2 hours", "30 sec"
#
# Separator characters: plain hyphen (-), en-dash (), or literal "-to-"
_TIME_RE: re.Pattern[str] = re.compile(
r"(\d+)\s*(?:[-\u2013]|-to-)\s*(\d+)\s*(hour|hr|minute|min|second|sec)s?"
r"|"
@ -70,242 +40,9 @@ _TIME_RE: re.Pattern[str] = re.compile(
re.IGNORECASE,
)
_MAX_MINUTES_PER_STEP: Final[int] = 480 # 8-hour sanity cap
_MAX_MINUTES_PER_STEP: Final[int] = 480 # 8 hours sanity cap
# ── Prep action detection ─────────────────────────────────────────────────
# Base times (minutes) per prep action, calibrated to ~3 items / 0.5 lb reference.
# These are starting points — flagged for calibration against real recipe timing data.
_PREP_ACTION_BASES: Final[dict[str, float]] = {
# Peeling / stripping
"peel": 1.5,
"pare": 1.5,
"hull": 1.5,
"pit": 2.0, # cherries, avocados
"core": 1.0,
"stem": 1.0,
"trim": 1.0,
# Cutting
"chop": 2.0,
"cut": 1.5,
"dice": 2.5, # more precise than chop
"mince": 2.0,
"slice": 1.5,
"julienne": 4.0,
"cube": 2.0,
"quarter": 1.0,
"halve": 0.5,
"shred": 2.0,
# Grating / zesting
"grate": 3.0,
"zest": 2.0,
# Crushing
"crush": 0.5,
"smash": 0.5,
"crack": 0.5,
# Mixing / assembly (lower base — less physical effort)
"knead": 8.0, # bread dough: consistent regardless of quantity
"whisk": 1.5,
"beat": 2.0,
"cream": 3.0, # butter + sugar until fluffy
"fold": 1.5,
"stir": 0.5,
"combine": 0.5,
"mix": 1.0,
"season": 0.5,
}
# Compiled regex — longer patterns first to avoid partial matches.
_PREP_RE: re.Pattern[str] = re.compile(
r"\b(?:" + "|".join(
re.escape(k) for k in sorted(_PREP_ACTION_BASES, key=len, reverse=True)
) + r")\b",
re.IGNORECASE,
)
# Default active time per step when no explicit time and no prep action detected.
_ACTIVE_STEP_DEFAULT_MIN: Final[float] = 2.0
# ── Prep-needing ingredient classification ────────────────────────────────
#
# Only ingredients in this set get quantity-scaled prep time.
# Liquids, spices, canned goods, and dry staples are excluded — they require
# no physical prep beyond measuring.
_PREP_NEEDING: Final[frozenset[str]] = frozenset({
# Alliums
"onion", "shallot", "leek", "scallion", "green onion", "chive", "garlic",
# Root / stem vegetables
"ginger", "carrot", "celery", "potato", "sweet potato", "yam",
"beet", "turnip", "parsnip", "radish", "fennel", "celeriac",
# Squash / gourd family
"zucchini", "squash", "pumpkin", "cucumber",
# Peppers
"pepper", "bell pepper", "jalapeño", "jalapeno", "chili", "chile",
# Brassicas
"broccoli", "cauliflower", "cabbage", "kale", "chard", "spinach",
"brussels sprout",
# Other vegetables
"tomato", "eggplant", "aubergine", "corn", "artichoke", "asparagus",
"green bean", "snow pea", "snap pea", "mushroom", "lettuce",
# Fruits
"apple", "pear", "peach", "nectarine", "plum", "apricot",
"mango", "papaya", "pineapple", "melon", "watermelon", "cantaloupe",
"avocado", "banana",
"strawberry", "raspberry", "blackberry", "blueberry", "cherry",
"citrus", "lemon", "lime", "orange", "grapefruit",
# Protein (trimming / portioning)
"chicken", "turkey", "duck",
"beef", "pork", "lamb", "veal",
"fish", "salmon", "tuna", "cod", "tilapia", "halibut", "shrimp",
"scallop", "crab", "lobster",
# Dairy requiring active prep
"cheese",
# Nuts / seeds (chopping)
"almond", "walnut", "pecan", "cashew", "peanut", "hazelnut",
"pistachio", "macadamia", "nut",
# Fresh herbs (chopping / tearing)
"basil", "parsley", "cilantro", "thyme", "rosemary", "sage",
"dill", "mint", "tarragon",
# Other
"bread",
})
def _is_prep_needing(name: str) -> bool:
"""True if the normalized ingredient name contains any prep-needing keyword."""
nl = name.lower()
return any(kw in nl for kw in _PREP_NEEDING)
# ── Quantity extraction ───────────────────────────────────────────────────
_FRAC_RE: re.Pattern[str] = re.compile(r"(\d+)\s*/\s*(\d+)")
# Weight units → converted to pounds internally
_WEIGHT_RE: re.Pattern[str] = re.compile(
r"(\d+(?:\.\d+)?|\d+\s*/\s*\d+)\s*"
r"(pound|lb|ounce|oz|gram|g(?![a-z])|kilogram|kg)\s*s?\b",
re.IGNORECASE,
)
# Volume (cups only — the common recipe unit for quantity scaling)
_VOLUME_CUP_RE: re.Pattern[str] = re.compile(
r"(\d+(?:\.\d+)?|\d+\s*/\s*\d+)\s*cups?\b",
re.IGNORECASE,
)
# Count — bare integer or decimal followed by optional size/unit word
_COUNT_RE: re.Pattern[str] = re.compile(
r"(?<!\d)(\d+(?:\.\d+)?)\s*"
r"(?:large|medium|small|whole|clove|cloves|head|heads|ear|ears|"
r"stalk|stalks|sprig|sprigs|bunch|bunches|fillet|fillets|"
r"breast|breasts|piece|pieces|slice|slices)?\s*\b",
re.IGNORECASE,
)
# Reference quantities: the "1× base" for each unit type.
# Calibrated so that a typical single-ingredient amount = 1× prep time.
_QTY_REFS: Final[dict[str, float]] = {
"lb": 0.5, # 0.5 lb is the base → 1 lb = 1.4×, 2 lb = 2.0×
"cup": 1.0, # 1 cup = base
"count": 3.0, # 3 items = base → 1 = 0.46×, 6 = 1.6×
}
_SCALE_POWER: Final[float] = 0.75 # sub-linear; revisit with empirical data
_MAX_SCALE: Final[float] = 4.0 # cap at 4× regardless of quantity
_MIN_SCALE: Final[float] = 0.33 # floor at 1/3× for tiny amounts
def _parse_fraction(s: str) -> float:
m = _FRAC_RE.search(s)
if m:
try:
return float(m.group(1)) / float(m.group(2))
except (ValueError, ZeroDivisionError):
return 1.0
try:
return float(s.replace(" ", ""))
except ValueError:
return 1.0
def _extract_qty(text: str) -> tuple[float, str] | None:
"""Return (quantity_in_canonical_units, unit_type) or None.
Unit types: "lb" (weight in pounds), "cup", "count".
All weights are normalised to pounds.
"""
# Weight (most specific — check first)
m = _WEIGHT_RE.search(text)
if m:
qty = _parse_fraction(m.group(1))
u = m.group(2).lower().rstrip("s")
if u in ("pound", "lb"):
return (qty, "lb")
if u in ("ounce", "oz"):
return (qty / 16.0, "lb")
if u in ("gram", "g"):
return (qty / 453.6, "lb")
if u in ("kilogram", "kg"):
return (qty * 2.205, "lb")
# Volume (cups)
m = _VOLUME_CUP_RE.search(text)
if m:
return (_parse_fraction(m.group(1)), "cup")
# Count — only accept values in a sane range to avoid false positives
m = _COUNT_RE.search(text)
if m:
qty = float(m.group(1))
if 0 < qty <= 24:
return (qty, "count")
return None
def _extract_inline_qty_for(text: str, ing_name: str) -> tuple[float, str] | None:
"""Extract the quantity specifically associated with `ing_name` in a direction step.
Looks for a number immediately before the ingredient name (plus optional size/unit
words). Falls back to None if the pattern does not match.
Example: "Dice 2 large onions and 3 carrots" for "onion" returns (2.0, "count").
"""
pattern = re.compile(
r"(\d+(?:\.\d+)?|\d+\s*/\s*\d+)\s*"
r"(?:large|medium|small|whole|"
r"(?:pound|lb|ounce|oz|gram|g|kilogram|kg|cup|clove|cloves|"
r"head|heads|fillet|fillets|breast|breasts|piece|pieces)s?)??\s*"
+ re.escape(ing_name) + r"(?:es|s)?\b",
re.IGNORECASE,
)
m = pattern.search(text)
if m:
# Re-extract with _extract_qty on the full matched span to get unit too
span = text[m.start(): m.end()]
result = _extract_qty(span)
if result:
return result
# Fallback: bare count
try:
return (_parse_fraction(m.group(1)), "count")
except Exception:
pass
return None
def _quantity_scale(qty: float, unit: str) -> float:
"""Apply n^0.75 scaling relative to unit reference, clamped to [MIN, MAX]."""
ref = _QTY_REFS.get(unit, 1.0)
if ref <= 0 or qty <= 0:
return 1.0
raw = (qty / ref) ** _SCALE_POWER
return max(_MIN_SCALE, min(_MAX_SCALE, raw))
# ── Equipment detection ───────────────────────────────────────────────────
# ── Equipment detection (keyword → label, in detection priority order) ────
_EQUIPMENT_RULES: Final[list[tuple[re.Pattern[str], str]]] = [
(re.compile(r"\b(?:chop|dice|mince|slice|julienne)\b", re.IGNORECASE), "Knife"),
@ -321,8 +58,74 @@ _EQUIPMENT_RULES: Final[list[tuple[re.Pattern[str], str]]] = [
(re.compile(r"\b(?:drain|strain|colander|rinse pasta)\b", re.IGNORECASE), "Colander"),
]
# ── Dataclasses ───────────────────────────────────────────────────────────
@dataclass(frozen=True)
class StepAnalysis:
"""Analysis result for a single direction step."""
is_passive: bool
detected_minutes: int | None # None when no time mention found in text
@dataclass(frozen=True)
class TimeEffortProfile:
"""Aggregated time and effort profile for a full recipe."""
active_min: int # total minutes requiring active attention
passive_min: int # total minutes the cook can step away
total_min: int # active_min + passive_min
step_analyses: list[StepAnalysis] # one entry per direction step
equipment: list[str] # ordered, deduplicated equipment labels
effort_label: str # "quick" | "moderate" | "involved"
# ── Core parsing logic ────────────────────────────────────────────────────
def _extract_minutes(text: str) -> int | None:
"""Return the number of minutes mentioned in text, or None.
Range values (e.g. "15-20 minutes") return the integer midpoint.
Hours are converted to minutes. Seconds are rounded up to 1 minute minimum.
Result is capped at _MAX_MINUTES_PER_STEP.
"""
m = _TIME_RE.search(text)
if m is None:
return None
if m.group(1) is not None:
# Branch A: range match (e.g. "15-20 minutes")
low = int(m.group(1))
high = int(m.group(2))
unit = m.group(3).lower()
raw_value: float = (low + high) / 2
else:
# Branch B: single value match (e.g. "10 minutes")
low = int(m.group(4))
unit = m.group(5).lower()
raw_value = float(low)
if unit in ("hour", "hr"):
minutes: float = raw_value * 60
elif unit in ("second", "sec"):
minutes = max(1.0, math.ceil(raw_value / 60))
else:
minutes = raw_value
return min(int(minutes), _MAX_MINUTES_PER_STEP)
def _classify_passive(text: str) -> bool:
"""Return True if the step text matches any passive keyword (whole-word)."""
return _PASSIVE_RE.search(text) is not None
def _detect_equipment(all_text: str, has_passive: bool) -> list[str]:
"""Return ordered, deduplicated list of equipment labels detected in text.
all_text should be all direction steps joined with spaces.
has_passive controls whether 'Timer' is appended at the end.
"""
seen: set[str] = set()
result: list[str] = []
for pattern, label in _EQUIPMENT_RULES:
@ -334,172 +137,8 @@ def _detect_equipment(all_text: str, has_passive: bool) -> list[str]:
return result
# ── Ingredientstep cross-reference ──────────────────────────────────────
def _ingredient_mentioned(text: str, name: str) -> bool:
"""True if `name` appears in `text` as a whole word.
Handles both regular plurals (onion onions) and -es plurals
(potato potatoes, tomato tomatoes).
"""
pattern = re.compile(r"\b" + re.escape(name.lower()) + r"(?:es|s)?\b", re.IGNORECASE)
return bool(pattern.search(text))
def _build_step_ingredient_qtys(
ingredients: list[str],
ingredient_names: list[str],
directions: list[str],
) -> list[dict[str, tuple[float, str]]]:
"""Return, for each direction step, {ing_name: (qty_for_this_step, unit)}.
Strategy:
- Filter ingredient pairs to prep-needing items only.
- Parse total quantities from the raw ingredient strings.
- For each step, try to find an inline quantity tied to that ingredient name.
- If no inline quantity, distribute the total evenly across all steps that
mention the ingredient (handles "3 onions" split across 2 steps).
"""
# Build total qty map for prep-needing ingredients
total_qtys: dict[str, tuple[float, str]] = {}
for raw, name in zip(ingredients, ingredient_names):
base = name.lower().strip()
if not _is_prep_needing(base):
continue
result = _extract_qty(raw)
if result is not None:
total_qtys[base] = result
if not total_qtys:
return [{} for _ in directions]
# Count how many steps mention each ingredient
step_counts: dict[str, int] = {n: 0 for n in total_qtys}
for step in directions:
for name in total_qtys:
if _ingredient_mentioned(step, name):
step_counts[name] += 1
# Build per-step qty maps
per_step: list[dict[str, tuple[float, str]]] = []
for step in directions:
step_map: dict[str, tuple[float, str]] = {}
for name, (total, unit) in total_qtys.items():
if not _ingredient_mentioned(step, name):
continue
# Try ingredient-specific inline quantity first
inline = _extract_inline_qty_for(step, name)
if inline is not None:
step_map[name] = inline
else:
# Distribute total across steps that reference this ingredient
n = max(step_counts.get(name, 1), 1)
step_map[name] = (total / n, unit)
per_step.append(step_map)
return per_step
# ── Dataclasses ───────────────────────────────────────────────────────────
@dataclass(frozen=True)
class StepAnalysis:
"""Analysis result for a single direction step."""
is_passive: bool
detected_minutes: int | None # explicit or estimated time (None = no signal)
prep_min: int | None = None # estimated physical prep time from action detection
@dataclass(frozen=True)
class TimeEffortProfile:
"""Aggregated time and effort profile for a full recipe."""
active_min: int
passive_min: int
total_min: int
step_analyses: list[StepAnalysis] = field(default_factory=list)
equipment: list[str] = field(default_factory=list)
effort_label: str = "moderate" # "quick" | "moderate" | "involved"
# ── Core parsing helpers ──────────────────────────────────────────────────
def _extract_minutes(text: str) -> int | None:
"""Return explicit minutes from text, or None."""
m = _TIME_RE.search(text)
if m is None:
return None
if m.group(1) is not None:
low, high = int(m.group(1)), int(m.group(2))
unit = m.group(3).lower()
raw: float = (low + high) / 2
else:
low = int(m.group(4))
unit = m.group(5).lower()
raw = float(low)
if unit in ("hour", "hr"):
minutes: float = raw * 60
elif unit in ("second", "sec"):
minutes = max(1.0, math.ceil(raw / 60))
else:
minutes = raw
return min(int(minutes), _MAX_MINUTES_PER_STEP)
def _classify_passive(text: str) -> bool:
return _PASSIVE_RE.search(text) is not None
def _passive_default(text: str) -> int | None:
"""Return estimated passive minutes from per-keyword defaults."""
for pattern, minutes in _PASSIVE_DEFAULTS:
if pattern.search(text):
return minutes
return None
def _prep_estimate(
text: str,
step_ing_qtys: dict[str, tuple[float, str]],
) -> int:
"""Estimate active prep time from the first detected prep action + ingredient qtys.
If no prep-needing ingredient is identified in the step, uses the action's
base time at 1× (no scaling).
"""
m = _PREP_RE.search(text)
if m is None:
return 0
action = m.group(0).lower()
base = _PREP_ACTION_BASES.get(action, _ACTIVE_STEP_DEFAULT_MIN)
# Find which prep-needing ingredients this step mentions
matches: list[tuple[float, str]] = [
qty_unit
for name, qty_unit in step_ing_qtys.items()
if _ingredient_mentioned(text, name)
]
if not matches:
return round(base) # no ingredient context — use base unscaled
total = sum(base * _quantity_scale(qty, unit) for qty, unit in matches)
return round(total)
def _effort_label(total_min: int, step_count: int) -> str:
"""Effort label based on total estimated time; falls back to step count."""
if total_min > 0:
if total_min <= 20:
return "quick"
if total_min <= 45:
return "moderate"
return "involved"
# No time signals at all — fall back to step count heuristic
def _effort_label(step_count: int) -> str:
"""Derive effort label from step count."""
if step_count <= 3:
return "quick"
if step_count <= 7:
@ -507,96 +146,52 @@ def _effort_label(total_min: int, step_count: int) -> str:
return "involved"
# ── Public API ────────────────────────────────────────────────────────────
def parse_time_effort(
directions: list[str],
ingredients: list[str] | None = None,
ingredient_names: list[str] | None = None,
) -> TimeEffortProfile:
"""Parse direction strings into a TimeEffortProfile.
Args:
directions: List of step strings from the recipe corpus.
ingredients: Raw ingredient strings ("2 large onions", "1.5 lbs potatoes").
Parallel to ingredient_names.
ingredient_names: Normalised ingredient names ("onion", "potato").
Required alongside ingredients to enable quantity scaling.
def parse_time_effort(directions: list[str]) -> TimeEffortProfile:
"""Parse a list of direction strings into a TimeEffortProfile.
Returns a zero-value profile with empty lists when directions is empty.
Never raises all failures produce sensible defaults.
Never raises all failures silently produce sensible defaults.
"""
if not directions:
return TimeEffortProfile(
active_min=0, passive_min=0, total_min=0,
step_analyses=[], equipment=[], effort_label="quick",
active_min=0,
passive_min=0,
total_min=0,
step_analyses=[],
equipment=[],
effort_label="quick",
)
# Build per-step ingredient quantity maps (empty dicts if no ingredient data)
use_ingredients = (
bool(ingredients)
and bool(ingredient_names)
and len(ingredients) == len(ingredient_names)
)
step_ing_qtys: list[dict[str, tuple[float, str]]]
if use_ingredients:
step_ing_qtys = _build_step_ingredient_qtys(
list(ingredients), # type: ignore[arg-type]
list(ingredient_names), # type: ignore[arg-type]
directions,
)
else:
step_ing_qtys = [{} for _ in directions]
step_analyses: list[StepAnalysis] = []
active_min = 0
passive_min = 0
has_any_passive = False
for i, step in enumerate(directions):
for step in directions:
is_passive = _classify_passive(step)
detected = _extract_minutes(step)
prep_estimate: int | None = None
if is_passive:
has_any_passive = True
if detected is not None:
passive_min += detected
else:
# Fall back to per-technique default
default = _passive_default(step)
if default is not None:
passive_min += default
detected = default # surface in UI as the hint time
else:
if detected is not None:
active_min += detected
# Estimate prep time from action detection + quantity scaling
prep_est = _prep_estimate(step, step_ing_qtys[i])
if prep_est > 0:
prep_estimate = prep_est
active_min += prep_est
elif detected is None:
# General active step with no time signal — apply a small default
active_min += round(_ACTIVE_STEP_DEFAULT_MIN)
step_analyses.append(StepAnalysis(
is_passive=is_passive,
detected_minutes=detected,
prep_min=prep_estimate,
))
combined_text = " ".join(directions)
equipment = _detect_equipment(combined_text, has_any_passive)
total = active_min + passive_min
return TimeEffortProfile(
active_min=active_min,
passive_min=passive_min,
total_min=total,
total_min=active_min + passive_min,
step_analyses=step_analyses,
equipment=equipment,
effort_label=_effort_label(total, len(directions)),
effort_label=_effort_label(len(directions)),
)

View file

@ -1,124 +0,0 @@
# app/services/task_inference.py
# BSL 1.1 — LLM feature
"""Task-based service allocation via the cf-orch coordinator.
Calls POST /api/inference/task instead of a hardcoded service type.
The coordinator resolves model_id and service_type from assignments.yaml.
Fallback contract (for callers):
- 404 TaskNotRegistered (fall back to direct client.allocate())
- other error RuntimeError
- CF_ORCH_URL unset RuntimeError (guard with os.environ.get first)
"""
from __future__ import annotations
import logging
import os
from collections.abc import Generator
from contextlib import contextmanager
from dataclasses import dataclass
import httpx
logger = logging.getLogger(__name__)
class TaskNotRegistered(Exception):
"""Coordinator returned 404 for a product/task pair.
Means the task is not yet in assignments.yaml. Callers should fall
back to direct service allocation (client.allocate()).
"""
@dataclass(frozen=True)
class Allocation:
url: str
allocation_id: str
service: str
def _orch_url() -> str:
return os.environ.get("CF_ORCH_URL", "").rstrip("/")
@contextmanager
def task_allocate(
product: str,
task: str,
*,
service_hint: str,
ttl_s: float = 120.0,
) -> Generator[Allocation, None, None]:
"""Context manager: allocate a service via task-based routing.
Calls POST /api/inference/task, yields Allocation, releases on exit.
Supports both `with task_allocate(...) as alloc:` and manual
`ctx = task_allocate(...); alloc = ctx.__enter__()` patterns.
**Sync-only**: uses the synchronous httpx API. Do not call from an
``async def`` handler without wrapping in ``asyncio.to_thread``. Current
call sites (``llm_router.py``, ``vl_model.py``) are synchronous.
Args:
product: CF product name (e.g. "kiwi")
task: Task identifier (e.g. "meal_plan", "ocr")
service_hint: Service type for the release DELETE call. The
coordinator response does not include service_type, so the
caller provides it. When the coordinator is updated to return
service in the response (cf-orch#63), this becomes unused.
ttl_s: Allocation TTL in seconds.
Raises:
TaskNotRegistered: Coordinator returned 404.
RuntimeError: Coordinator unreachable, returned non-404 error, or
returned a malformed (non-JSON / missing fields) response.
RuntimeError: CF_ORCH_URL is not set.
"""
base = _orch_url()
if not base:
raise RuntimeError("CF_ORCH_URL is not set")
try:
resp = httpx.post(
f"{base}/api/inference/task",
json={"product": product, "task": task, "payload": {}},
timeout=30.0,
)
except httpx.RequestError as exc:
raise RuntimeError(f"cf-orch unreachable: {exc}") from exc
if resp.status_code == 404:
raise TaskNotRegistered(
f"No assignment for product={product!r} task={task!r}"
"ensure cf-orch#61/62 are deployed and coordinator reloaded"
)
if not resp.is_success:
raise RuntimeError(
f"cf-orch /api/inference/task failed: "
f"HTTP {resp.status_code}{resp.text[:200]}"
)
try:
data = resp.json()
alloc = Allocation(
url=data["url"],
allocation_id=data["allocation_id"],
service=data.get("service") or service_hint,
)
except (KeyError, ValueError) as exc:
raise RuntimeError(
f"cf-orch /api/inference/task returned malformed response: {exc}"
f"body: {resp.text[:200]}"
) from exc
try:
yield alloc
finally:
try:
httpx.delete(
f"{base}/api/services/{alloc.service}/allocations/{alloc.allocation_id}",
timeout=10.0,
)
except Exception as exc:
logger.debug("cf-orch task allocation release failed (non-fatal): %s", exc)

View file

@ -15,7 +15,6 @@ KIWI_BYOK_UNLOCKABLE: frozenset[str] = frozenset({
"recipe_suggestions",
"expiry_llm_matching",
"receipt_ocr",
"recipe_scan",
"style_classifier",
"meal_plan_llm",
"meal_plan_llm_timing",
@ -59,9 +58,6 @@ KIWI_FEATURES: dict[str, str] = {
"community_publish": "paid", # Publish plans/outcomes to community feed
"community_fork_adapt": "paid", # Fork with LLM pantry adaptation (BYOK-unlockable)
# Paid tier (continued)
"recipe_scan": "paid", # BYOK-unlockable: photo -> structured recipe
# Premium tier
"multi_household": "premium",
"background_monitoring": "premium",

View file

@ -18,10 +18,6 @@ server {
proxy_set_header X-CF-Session $http_x_cf_session;
# Allow image uploads (barcode/receipt photos from phone cameras).
client_max_body_size 20m;
# LLM inference (recipe suggestions, expiry fallback) can take 60-120s.
# Default proxy_read_timeout is 60s which causes 504s on full recipe generation.
proxy_read_timeout 180s;
proxy_send_timeout 180s;
}
# Direct-port LAN access (localhost:8515): when VITE_API_BASE='/kiwi', the frontend
@ -38,8 +34,6 @@ server {
proxy_set_header X-Forwarded-Proto $http_x_forwarded_proto;
proxy_set_header X-CF-Session $http_x_cf_session;
client_max_body_size 20m;
proxy_read_timeout 180s;
proxy_send_timeout 180s;
}
# When accessed directly (localhost:8515) instead of via Caddy (/kiwi path-strip),

View file

@ -2,13 +2,8 @@
<html lang="en">
<head>
<meta charset="UTF-8" />
<link rel="icon" type="image/png" sizes="192x192" href="/icons/icon-192.png" />
<link rel="apple-touch-icon" href="/icons/icon-192.png" />
<link rel="icon" type="image/svg+xml" href="/vite.svg" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, viewport-fit=cover" />
<meta name="theme-color" content="#e8a820" />
<meta name="apple-mobile-web-app-capable" content="yes" />
<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent" />
<meta name="apple-mobile-web-app-title" content="Kiwi" />
<title>Kiwi — Pantry Tracker</title>
<link rel="preconnect" href="https://fonts.googleapis.com" />
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin />

File diff suppressed because it is too large Load diff

View file

@ -20,7 +20,6 @@
"@vue/tsconfig": "^0.8.1",
"typescript": "~5.9.3",
"vite": "^7.1.7",
"vite-plugin-pwa": "^1.2.0",
"vue-tsc": "^3.1.0"
}
}

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.6 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 4.3 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.2 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 3.5 KiB

View file

@ -106,39 +106,6 @@
<span class="form-hint">How you appear on posts -- not your real name or email.</span>
</div>
<!-- Similarity check results -->
<div
v-if="similarPosts.length > 0"
class="similar-panel"
role="region"
aria-label="Similar stories found"
>
<p class="similar-heading text-sm">
<strong>Similar stories already exist.</strong>
You can publish as-is, mark yours as a variation, or cancel.
</p>
<ul class="similar-list" aria-label="Existing similar posts">
<li
v-for="hit in similarPosts"
:key="hit.slug"
class="similar-item"
>
<span class="similar-tier-badge" :class="`tier-${hit.similarity_tier}`">
{{ tierLabel(hit.similarity_tier) }}
</span>
<span class="similar-title">{{ hit.title }}</span>
<span class="similar-by text-muted text-xs">by {{ hit.pseudonym }}</span>
<button
class="btn-link text-xs"
:class="{ 'selected-ref': selectedRef === hit.slug }"
@click="toggleRef(hit.slug)"
>
{{ selectedRef === hit.slug ? 'Unmark variation' : 'Mark as variation' }}
</button>
</li>
</ul>
</div>
<!-- Submission feedback (aria-live region, always rendered) -->
<div
class="feedback-region"
@ -152,24 +119,13 @@
<!-- Footer actions -->
<div class="modal-footer flex gap-sm">
<button
v-if="!similarPosts.length || similarChecked"
class="btn btn-primary"
:disabled="submitting || !title.trim()"
:aria-busy="submitting"
@click="onSubmit"
>
<span v-if="submitting" class="spinner spinner-sm" aria-hidden="true"></span>
{{ submitting ? 'Publishing...' : (selectedRef ? 'Publish as variation' : 'Publish') }}
</button>
<button
v-else
class="btn btn-primary"
:disabled="checking || !title.trim()"
:aria-busy="checking"
@click="onCheckThenSubmit"
>
<span v-if="checking" class="spinner spinner-sm" aria-hidden="true"></span>
{{ checking ? 'Checking...' : 'Publish' }}
{{ submitting ? 'Publishing...' : 'Publish' }}
</button>
<button class="btn btn-secondary" @click="$emit('close')">
Cancel
@ -183,7 +139,7 @@
<script setup lang="ts">
import { ref, onMounted, onUnmounted, nextTick } from 'vue'
import { useCommunityStore } from '../stores/community'
import type { PublishPayload, SimilarPost, SimilarityTier } from '../stores/community'
import type { PublishPayload } from '../stores/community'
const props = defineProps<{
recipeId: number | null
@ -206,21 +162,6 @@ const submitting = ref(false)
const submitError = ref<string | null>(null)
const submitSuccess = ref<string | null>(null)
const checking = ref(false)
const similarChecked = ref(false)
const similarPosts = ref<SimilarPost[]>([])
const selectedRef = ref<string | null>(null)
function tierLabel(tier: SimilarityTier): string {
if (tier === 'exact_recipe') return 'Same recipe'
if (tier === 'very_similar') return 'Very similar'
return 'Similar'
}
function toggleRef(slug: string) {
selectedRef.value = selectedRef.value === slug ? null : slug
}
const dialogRef = ref<HTMLElement | null>(null)
const firstFocusRef = ref<HTMLButtonElement | null>(null)
let previousFocus: HTMLElement | null = null
@ -274,17 +215,6 @@ onUnmounted(() => {
previousFocus?.focus()
})
async function onCheckThenSubmit() {
if (!title.value.trim()) return
checking.value = true
similarPosts.value = await store.checkSimilar(title.value.trim(), props.recipeId, postType.value)
similarChecked.value = true
checking.value = false
if (!similarPosts.value.length) {
await onSubmit()
}
}
async function onSubmit() {
submitError.value = null
submitSuccess.value = null
@ -298,7 +228,6 @@ async function onSubmit() {
if (outcomeNotes.value.trim()) payload.outcome_notes = outcomeNotes.value.trim()
if (pseudonymName.value.trim()) payload.pseudonym_name = pseudonymName.value.trim()
if (props.recipeId != null) payload.recipe_id = props.recipeId
if (selectedRef.value) payload.similar_to_ref = selectedRef.value
submitting.value = true
try {
@ -420,82 +349,6 @@ async function onSubmit() {
flex-wrap: wrap;
}
.similar-panel {
background: var(--color-surface-alt, var(--color-surface));
border: 1px solid var(--color-warning, #f59e0b);
border-radius: var(--radius-md);
padding: var(--spacing-sm) var(--spacing-md);
margin-bottom: var(--spacing-md);
}
.similar-heading {
margin: 0 0 var(--spacing-sm);
}
.similar-list {
list-style: none;
margin: 0;
padding: 0;
display: flex;
flex-direction: column;
gap: var(--spacing-xs);
}
.similar-item {
display: flex;
align-items: baseline;
gap: var(--spacing-xs);
flex-wrap: wrap;
}
.similar-tier-badge {
font-size: var(--font-size-xs);
font-weight: 700;
padding: 1px 6px;
border-radius: var(--radius-sm);
flex-shrink: 0;
}
.tier-exact_recipe {
background: var(--color-error-bg, #fee2e2);
color: var(--color-error, #dc2626);
}
.tier-very_similar {
background: var(--color-warning-bg, #fef3c7);
color: var(--color-warning-text, #92400e);
}
.tier-somewhat_similar {
background: var(--color-surface-alt, #f3f4f6);
color: var(--color-text-secondary);
}
.similar-title {
font-weight: 600;
font-size: var(--font-size-sm);
}
.similar-by {
flex-shrink: 0;
}
.btn-link {
background: none;
border: none;
color: var(--color-primary);
cursor: pointer;
padding: 0;
text-decoration: underline;
font-size: var(--font-size-xs);
margin-left: auto;
}
.btn-link.selected-ref {
color: var(--color-success);
font-weight: 700;
}
@media (max-width: 480px) {
.modal-panel {
max-height: 95vh;

View file

@ -78,39 +78,6 @@
<span class="form-hint">How you appear on posts -- not your real name or email.</span>
</div>
<!-- Similarity check results (shown before final confirm) -->
<div
v-if="similarPosts.length > 0"
class="similar-panel"
role="region"
aria-label="Similar posts found"
>
<p class="similar-heading text-sm">
<strong>Similar plans already exist.</strong>
You can publish as-is, mark yours as a variation, or cancel.
</p>
<ul class="similar-list" aria-label="Existing similar posts">
<li
v-for="hit in similarPosts"
:key="hit.slug"
class="similar-item"
>
<span class="similar-tier-badge" :class="`tier-${hit.similarity_tier}`">
{{ tierLabel(hit.similarity_tier) }}
</span>
<span class="similar-title">{{ hit.title }}</span>
<span class="similar-by text-muted text-xs">by {{ hit.pseudonym }}</span>
<button
class="btn-link text-xs"
:class="{ 'selected-ref': selectedRef === hit.slug }"
@click="toggleRef(hit.slug)"
>
{{ selectedRef === hit.slug ? 'Unmark variation' : 'Mark as variation' }}
</button>
</li>
</ul>
</div>
<!-- Submission feedback (aria-live region, always rendered) -->
<div
class="feedback-region"
@ -124,24 +91,13 @@
<!-- Footer actions -->
<div class="modal-footer flex gap-sm">
<button
v-if="!similarPosts.length || similarChecked"
class="btn btn-primary"
:disabled="submitting || !title.trim()"
:aria-busy="submitting"
@click="onSubmit"
>
<span v-if="submitting" class="spinner spinner-sm" aria-hidden="true"></span>
{{ submitting ? 'Publishing...' : (selectedRef ? 'Publish as variation' : 'Publish') }}
</button>
<button
v-else
class="btn btn-primary"
:disabled="checking || !title.trim()"
:aria-busy="checking"
@click="onCheckThenSubmit"
>
<span v-if="checking" class="spinner spinner-sm" aria-hidden="true"></span>
{{ checking ? 'Checking...' : 'Publish' }}
{{ submitting ? 'Publishing...' : 'Publish' }}
</button>
<button class="btn btn-secondary" @click="$emit('close')">
Cancel
@ -155,7 +111,7 @@
<script setup lang="ts">
import { ref, onMounted, onUnmounted, nextTick } from 'vue'
import { useCommunityStore } from '../stores/community'
import type { PublishPayload, SimilarPost, SimilarityTier } from '../stores/community'
import type { PublishPayload } from '../stores/community'
const props = defineProps<{
plan?: {
@ -180,21 +136,6 @@ const submitting = ref(false)
const submitError = ref<string | null>(null)
const submitSuccess = ref<string | null>(null)
const checking = ref(false)
const similarChecked = ref(false)
const similarPosts = ref<SimilarPost[]>([])
const selectedRef = ref<string | null>(null)
function tierLabel(tier: SimilarityTier): string {
if (tier === 'exact_recipe') return 'Same recipe'
if (tier === 'very_similar') return 'Very similar'
return 'Similar'
}
function toggleRef(slug: string) {
selectedRef.value = selectedRef.value === slug ? null : slug
}
const dialogRef = ref<HTMLElement | null>(null)
const firstFocusRef = ref<HTMLInputElement | null>(null)
let previousFocus: HTMLElement | null = null
@ -248,19 +189,6 @@ onUnmounted(() => {
previousFocus?.focus()
})
async function onCheckThenSubmit() {
if (!title.value.trim()) return
checking.value = true
const planRecipeIds = props.plan?.slots?.map((s) => s.recipe_id) ?? []
const firstRecipeId = planRecipeIds[0] ?? null
similarPosts.value = await store.checkSimilar(title.value.trim(), firstRecipeId, 'plan')
similarChecked.value = true
checking.value = false
if (!similarPosts.value.length) {
await onSubmit()
}
}
async function onSubmit() {
submitError.value = null
submitSuccess.value = null
@ -277,7 +205,6 @@ async function onSubmit() {
if (props.plan?.slots?.length) {
payload.slots = props.plan.slots.map(({ day, meal_type, recipe_id }) => ({ day, meal_type, recipe_id }))
}
if (selectedRef.value) payload.similar_to_ref = selectedRef.value
submitting.value = true
try {
@ -368,82 +295,6 @@ async function onSubmit() {
flex-wrap: wrap;
}
.similar-panel {
background: var(--color-surface-alt, var(--color-surface));
border: 1px solid var(--color-warning, #f59e0b);
border-radius: var(--radius-md);
padding: var(--spacing-sm) var(--spacing-md);
margin-bottom: var(--spacing-md);
}
.similar-heading {
margin: 0 0 var(--spacing-sm);
}
.similar-list {
list-style: none;
margin: 0;
padding: 0;
display: flex;
flex-direction: column;
gap: var(--spacing-xs);
}
.similar-item {
display: flex;
align-items: baseline;
gap: var(--spacing-xs);
flex-wrap: wrap;
}
.similar-tier-badge {
font-size: var(--font-size-xs);
font-weight: 700;
padding: 1px 6px;
border-radius: var(--radius-sm);
flex-shrink: 0;
}
.tier-exact_recipe {
background: var(--color-error-bg, #fee2e2);
color: var(--color-error, #dc2626);
}
.tier-very_similar {
background: var(--color-warning-bg, #fef3c7);
color: var(--color-warning-text, #92400e);
}
.tier-somewhat_similar {
background: var(--color-surface-alt, #f3f4f6);
color: var(--color-text-secondary);
}
.similar-title {
font-weight: 600;
font-size: var(--font-size-sm);
}
.similar-by {
flex-shrink: 0;
}
.btn-link {
background: none;
border: none;
color: var(--color-primary);
cursor: pointer;
padding: 0;
text-decoration: underline;
font-size: var(--font-size-xs);
margin-left: auto;
}
.btn-link.selected-ref {
color: var(--color-success);
font-weight: 700;
}
@media (max-width: 480px) {
.modal-panel {
max-height: 95vh;

View file

@ -6,7 +6,6 @@
v-for="domain in domains"
:key="domain.id"
:class="['btn', activeDomain === domain.id ? 'btn-primary' : 'btn-secondary']"
:aria-pressed="activeDomain === domain.id"
@click="selectDomain(domain.id)"
>
{{ domain.label }}
@ -25,7 +24,6 @@
<div v-else class="category-list mb-sm flex flex-wrap gap-xs">
<button
:class="['btn', 'btn-secondary', 'cat-btn', { active: activeCategory === '_all' }]"
:aria-pressed="activeCategory === '_all'"
@click="selectCategory('_all')"
>
All
@ -34,7 +32,6 @@
v-for="cat in categories"
:key="cat.category"
:class="['btn', 'btn-secondary', 'cat-btn', { active: activeCategory === cat.category }]"
:aria-pressed="activeCategory === cat.category"
@click="selectCategory(cat.category)"
>
{{ cat.category }}
@ -60,7 +57,6 @@
<template v-else>
<button
:class="['btn', 'btn-secondary', 'subcat-btn', { active: activeSubcategory === null }]"
:aria-pressed="activeSubcategory === null"
@click="selectSubcategory(null)"
>
All {{ activeCategory }}
@ -69,7 +65,6 @@
v-for="sub in subcategories"
:key="sub.subcategory"
:class="['btn', 'btn-secondary', 'subcat-btn', { active: activeSubcategory === sub.subcategory }]"
:aria-pressed="activeSubcategory === sub.subcategory"
@click="selectSubcategory(sub.subcategory)"
>
{{ sub.subcategory }}
@ -84,25 +79,6 @@
</template>
</div>
<!-- Browse breadcrumb shows current position in domain > category > subcategory hierarchy -->
<nav v-if="activeDomain && activeCategory" class="browse-breadcrumb" aria-label="Browse location">
<button
class="crumb-btn"
@click="selectDomain(activeDomain)"
:aria-current="!activeCategory ? 'page' : undefined"
>{{ domains.find(d => d.id === activeDomain)?.label ?? activeDomain }}</button>
<span class="crumb-sep" aria-hidden="true"></span>
<button
class="crumb-btn"
@click="selectCategory(activeCategory)"
:aria-current="!activeSubcategory ? 'page' : undefined"
>{{ activeCategory === '_all' ? 'All' : activeCategory }}</button>
<template v-if="activeSubcategory">
<span class="crumb-sep" aria-hidden="true"></span>
<span class="crumb-current" aria-current="page">{{ activeSubcategory }}</span>
</template>
</nav>
<!-- Recipe grid -->
<template v-if="activeCategory">
<div v-if="loadingRecipes" class="text-secondary text-sm">Loading recipes</div>
@ -117,37 +93,24 @@
placeholder="Filter by title…"
class="browser-search"
/>
<input
v-model="requiredIngredient"
@keyup.enter="onRequiredIngredientCommit"
@search="onRequiredIngredientCommit"
type="search"
placeholder="Must include ingredient… (Enter)"
class="browser-search"
title="Type an ingredient and press Enter to filter"
/>
<div class="sort-btns flex gap-xs">
<button
:class="['btn', 'btn-secondary', 'sort-btn', { active: sortOrder === 'default' }]"
:aria-pressed="sortOrder === 'default'"
@click="setSort('default')"
title="Corpus order"
>Default</button>
<button
:class="['btn', 'btn-secondary', 'sort-btn', { active: sortOrder === 'alpha' }]"
:aria-pressed="sortOrder === 'alpha'"
@click="setSort('alpha')"
title="Alphabetical A→Z"
>AZ</button>
<button
:class="['btn', 'btn-secondary', 'sort-btn', { active: sortOrder === 'alpha_desc' }]"
:aria-pressed="sortOrder === 'alpha_desc'"
@click="setSort('alpha_desc')"
title="Alphabetical Z→A"
>ZA</button>
<button
:class="['btn', 'btn-secondary', 'sort-btn', { active: sortOrder === 'match' }]"
:aria-pressed="sortOrder === 'match'"
:disabled="pantryCount === 0"
@click="setSort('match')"
:title="pantryCount > 0 ? 'Sort by pantry match %' : 'Add items to pantry to sort by match'"
@ -156,27 +119,20 @@
</div>
<div class="results-header flex-between mb-sm">
<span
class="text-sm text-secondary"
aria-live="polite"
aria-atomic="true"
>
<span class="text-sm text-secondary">
{{ total }} recipes
<span v-if="pantryCount > 0"> pantry match shown</span>
<span v-if="requiredIngredient.trim()"> must include "{{ requiredIngredient.trim() }}"</span>
</span>
<div class="pagination flex gap-xs">
<button
class="btn btn-secondary btn-xs"
:disabled="page <= 1"
aria-label="Previous page"
@click="changePage(page - 1)"
> Prev</button>
<span class="text-sm text-secondary page-indicator" aria-live="polite">{{ page }} / {{ totalPages }}</span>
<span class="text-sm text-secondary page-indicator">{{ page }} / {{ totalPages }}</span>
<button
class="btn btn-secondary btn-xs"
:disabled="page >= totalPages"
aria-label="Next page"
@click="changePage(page + 1)"
>Next </button>
</div>
@ -354,7 +310,6 @@ const loadingDomains = ref(false)
const loadingRecipes = ref(false)
const savingRecipe = ref<BrowserRecipe | null>(null)
const searchQuery = ref('')
const requiredIngredient = ref('')
const sortOrder = ref<'default' | 'alpha' | 'alpha_desc' | 'match'>('default')
let searchDebounce: ReturnType<typeof setTimeout> | null = null
let tagSearchDebounce: ReturnType<typeof setTimeout> | null = null
@ -431,19 +386,6 @@ function onSearchInput() {
}, 350)
}
function onRequiredIngredientCommit() {
page.value = 1
loadRecipes()
}
// Auto-clear results when the field is emptied via backspace/select-delete
watch(requiredIngredient, (val, prev) => {
if (val === '' && prev !== '') {
page.value = 1
loadRecipes()
}
})
function setSort(s: 'default' | 'alpha' | 'alpha_desc' | 'match') {
if (sortOrder.value === s) return
sortOrder.value = s
@ -468,7 +410,6 @@ async function selectDomain(domainId: string) {
total.value = 0
page.value = 1
searchQuery.value = ''
requiredIngredient.value = ''
sortOrder.value = 'default'
categories.value = await browserAPI.listCategories(domainId)
// Auto-select the most-populated category so content appears immediately.
@ -535,7 +476,6 @@ async function loadRecipes() {
subcategory: activeSubcategory.value ?? undefined,
q: searchQuery.value.trim() || undefined,
sort: sortOrder.value !== 'default' ? sortOrder.value : undefined,
required_ingredient: requiredIngredient.value.trim() || undefined,
}
)
recipes.value = result.recipes
@ -587,10 +527,8 @@ function onTagSearchInput() {
tagSearchDebounce = setTimeout(async () => {
tagModal.value.searching = true
try {
// Use the first available domain with category=_all to search all recipes by title.
// Domain must be a real domain slug '_all' is not valid at the browse endpoint.
const searchDomain = domains.value[0]?.id ?? 'cuisine'
const res = await browserAPI.browse(searchDomain, '_all', { page: 1, q })
// Re-use the browser API: browse all recipes filtered by title substring
const res = await browserAPI.browse('_all', '_all', { page: 1, q })
tagModal.value.results = (res.recipes ?? []).slice(0, 8).map(
(r: { id: number; title: string }) => ({ id: r.id, title: r.title })
)
@ -888,40 +826,4 @@ async function submitTag() {
font-size: 0.875rem;
margin-left: 0.5rem;
}
/* ── Browse breadcrumb ───────────────────────────────────────────────────── */
.browse-breadcrumb {
display: flex;
align-items: center;
flex-wrap: wrap;
gap: 2px;
margin-bottom: var(--spacing-sm);
font-size: var(--font-size-xs, 0.78rem);
color: var(--color-text-secondary);
}
.crumb-btn {
background: none;
border: none;
padding: 2px 4px;
cursor: pointer;
color: var(--color-primary);
font-size: inherit;
border-radius: var(--radius-sm);
}
.crumb-btn:hover {
text-decoration: underline;
}
.crumb-sep {
opacity: 0.5;
padding: 0 2px;
}
.crumb-current {
padding: 2px 4px;
color: var(--color-text);
font-weight: 500;
}
</style>

View file

@ -225,23 +225,6 @@
</ol>
</details>
<!-- Community tags accepted location tags from other users -->
<div v-if="communityTags.length > 0" class="detail-section community-tags-section">
<h3 class="section-label">Community categories</h3>
<div class="community-tags-list">
<span
v-for="tag in communityTags"
:key="tag.id"
class="community-tag-chip"
:class="{ 'community-tag-chip--accepted': tag.accepted }"
:title="tag.accepted ? 'Confirmed by the community' : 'Pending confirmation'"
>
{{ tag.domain }} {{ tag.category }}<template v-if="tag.subcategory"> {{ tag.subcategory }}</template>
<span v-if="tag.accepted" class="community-tag-check" aria-label="Confirmed"></span>
</span>
</div>
</div>
<!-- Bottom padding so last step isn't hidden behind sticky footer -->
<div style="height: var(--spacing-xl)" />
</div>
@ -293,31 +276,6 @@
<span class="cook-success-icon"></span>
Enjoy your meal! Recipe dismissed from suggestions.
<button class="btn btn-secondary btn-sm mt-xs" @click="$emit('close')">Close</button>
<!-- Leftover shelf-life section -->
<div v-if="leftoversLoading" class="leftovers-panel text-sm text-secondary mt-sm">
Working out storage info
</div>
<div v-else-if="leftovers && !leftoversDismissed" class="leftovers-panel mt-sm">
<div class="leftovers-header flex-between">
<span class="text-sm font-semibold">Leftovers</span>
<button class="btn-icon btn-xs" @click="leftoversDismissed = true" aria-label="Dismiss storage info"></button>
</div>
<div class="leftovers-grid mt-xs">
<div class="leftovers-cell">
<span class="leftovers-icon"></span>
<span class="text-sm">Fridge: <strong>{{ leftovers.fridge_days }} day{{ leftovers.fridge_days !== 1 ? 's' : '' }}</strong></span>
</div>
<div v-if="leftovers.freeze_days !== null" class="leftovers-cell">
<span class="leftovers-icon">🧊</span>
<span class="text-sm">Freezer: <strong>{{ leftovers.freeze_days }} day{{ leftovers.freeze_days !== 1 ? 's' : '' }}</strong></span>
</div>
</div>
<p v-if="leftovers.freeze_by_day" class="text-xs text-secondary mt-xs">
Freeze by day {{ leftovers.freeze_by_day }} for best results.
</p>
<p class="text-xs text-secondary mt-xs">{{ leftovers.storage_advice }}</p>
</div>
</div>
<template v-else>
<button class="btn btn-secondary" @click="$emit('close')">Back</button>
@ -371,7 +329,7 @@
import { ref, computed, onMounted, onUnmounted, nextTick } from 'vue'
import { useRecipesStore } from '../stores/recipes'
import { useSavedRecipesStore } from '../stores/savedRecipes'
import { inventoryAPI, recipesAPI, browserAPI } from '../services/api'
import { inventoryAPI } from '../services/api'
import type { RecipeSuggestion, GroceryLink, StepAnalysis } from '../services/api'
import SaveRecipeModal from './SaveRecipeModal.vue'
@ -403,12 +361,6 @@ onMounted(() => {
)
;(focusable ?? dialogRef.value)?.focus()
})
// Load community tags in the background non-critical, silently skip on error
browserAPI.listRecipeTags(props.recipe.id).then((tags) => {
communityTags.value = tags
}).catch(() => {
// Community tags are supplemental; silently skip on error
})
})
onUnmounted(() => {
@ -434,16 +386,6 @@ const isSaved = computed(() => savedStore.isSaved(props.recipe.id))
const cookDone = ref(false)
// Community tags
type CommunityTag = { id: number; domain: string; category: string; subcategory: string | null; pseudonym: string; upvotes: number; accepted: boolean }
const communityTags = ref<CommunityTag[]>([])
// Leftover shelf-life
type LeftoversData = { fridge_days: number; freeze_days: number | null; freeze_by_day: number | null; storage_advice: string }
const leftovers = ref<LeftoversData | null>(null)
const leftoversLoading = ref(false)
const leftoversDismissed = ref(false)
// Cook mode
const cookModeActive = ref(false)
const cookStep = ref(0) // 0-indexed
@ -680,20 +622,10 @@ function groceryLinkFor(ingredient: string): GroceryLink | undefined {
return props.groceryLinks.find((l) => l.ingredient.toLowerCase() === needle)
}
async function handleCook() {
function handleCook() {
recipesStore.logCook(props.recipe.id, props.recipe.title)
cookDone.value = true
emit('cooked', props.recipe)
if (props.recipe.id) {
leftoversLoading.value = true
try {
leftovers.value = await recipesAPI.getLeftovers(props.recipe.id)
} catch {
// Silently skip shelf life is supplemental info, not critical
} finally {
leftoversLoading.value = false
}
}
}
</script>
@ -1626,68 +1558,4 @@ details[open].steps-collapsible .steps-collapsible-summary::before {
padding: var(--spacing-xs) var(--spacing-sm);
font-size: var(--font-size-sm);
}
.leftovers-panel {
background: var(--color-surface-alt, var(--color-surface));
border: 1px solid var(--color-border);
border-radius: var(--radius-md);
padding: var(--spacing-sm);
text-align: left;
}
.leftovers-header {
align-items: center;
}
.leftovers-grid {
display: flex;
gap: var(--spacing-md);
flex-wrap: wrap;
}
.leftovers-cell {
display: flex;
align-items: center;
gap: var(--spacing-xs);
}
.leftovers-icon {
font-size: 1rem;
line-height: 1;
}
/* ── Community tags section ──────────────────────────────── */
.community-tags-section {
padding-top: var(--spacing-sm);
}
.community-tags-list {
display: flex;
flex-wrap: wrap;
gap: var(--spacing-xs);
}
.community-tag-chip {
display: inline-flex;
align-items: center;
gap: 0.25rem;
padding: 2px var(--spacing-sm);
border-radius: var(--radius-pill, 999px);
font-size: var(--font-size-xs, 0.72rem);
background: var(--color-bg-secondary);
color: var(--color-text-secondary);
border: 1px solid var(--color-border);
white-space: nowrap;
}
.community-tag-chip--accepted {
background: rgba(124, 111, 205, 0.12);
color: var(--color-accent, #7c6fcd);
border-color: rgba(124, 111, 205, 0.3);
}
.community-tag-check {
font-size: 0.65rem;
opacity: 0.8;
}
</style>

View file

@ -1,849 +0,0 @@
<template>
<div class="modal-overlay" @click.self="close" role="dialog" aria-modal="true" :aria-labelledby="titleId">
<div class="modal-panel scan-modal">
<!-- Header -->
<div class="modal-header">
<h2 :id="titleId" class="modal-title">
<span v-if="phase === 'upload'">Scan a Recipe</span>
<span v-else-if="phase === 'processing'">Scanning...</span>
<span v-else>Review Recipe</span>
</h2>
<button class="btn-icon close-btn" @click="close" aria-label="Close">
<svg width="18" height="18" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
<line x1="18" y1="6" x2="6" y2="18"/><line x1="6" y1="6" x2="18" y2="18"/>
</svg>
</button>
</div>
<!-- Upload phase -->
<div v-if="phase === 'upload'" class="modal-body">
<p class="hint-text">
Photograph a recipe card, cookbook page, or handwritten note.
For multi-page recipes (ingredients on one page, directions on another)
select both photos together up to 4 images.
</p>
<!-- Drop zone -->
<div
class="drop-zone"
:class="{ 'drop-zone-active': isDragging, 'has-files': selectedFiles.length > 0 }"
@dragover.prevent="isDragging = true"
@dragleave="isDragging = false"
@drop.prevent="onDrop"
@click="fileInput?.click()"
role="button"
tabindex="0"
@keydown.enter.space="fileInput?.click()"
aria-label="Click or drop photos here"
>
<input
ref="fileInput"
type="file"
accept="image/jpeg,image/jpg,image/png,image/webp,image/heic,image/heif"
multiple
class="hidden-input"
@change="onFileChange"
/>
<div v-if="selectedFiles.length === 0" class="drop-zone-empty">
<svg width="40" height="40" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="1.5" class="camera-icon">
<path d="M23 19a2 2 0 0 1-2 2H3a2 2 0 0 1-2-2V8a2 2 0 0 1 2-2h4l2-3h6l2 3h4a2 2 0 0 1 2 2z"/>
<circle cx="12" cy="13" r="4"/>
</svg>
<p class="drop-zone-label">Tap or drop photos here</p>
<p class="drop-zone-sub">JPEG, PNG, WebP, HEIC up to 4 photos</p>
</div>
<div v-else class="file-preview-grid">
<div
v-for="(_file, i) in selectedFiles"
:key="i"
class="file-preview-item"
>
<img :src="previewUrls[i]" :alt="`Photo ${i + 1}`" class="preview-img" />
<button
class="remove-file-btn"
@click.stop="removeFile(i)"
:aria-label="`Remove photo ${i + 1}`"
>
<svg width="12" height="12" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="3">
<line x1="18" y1="6" x2="6" y2="18"/><line x1="6" y1="6" x2="18" y2="18"/>
</svg>
</button>
<p class="preview-label">Page {{ i + 1 }}</p>
</div>
<div
v-if="selectedFiles.length < 4"
class="file-preview-add"
@click.stop="fileInput?.click()"
role="button"
tabindex="0"
@keydown.enter.space.stop="fileInput?.click()"
aria-label="Add another photo"
>
<svg width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2">
<line x1="12" y1="5" x2="12" y2="19"/><line x1="5" y1="12" x2="19" y2="12"/>
</svg>
</div>
</div>
</div>
<div v-if="uploadError" class="status-badge status-error mt-sm" role="alert">
{{ uploadError }}
</div>
<div class="modal-footer">
<button class="btn btn-secondary" @click="close">Cancel</button>
<button
class="btn btn-primary"
:disabled="selectedFiles.length === 0"
@click="startScan"
>
Scan Recipe
</button>
</div>
</div>
<!-- Processing phase -->
<div v-else-if="phase === 'processing'" class="modal-body processing-body">
<div class="scan-spinner" aria-live="polite" aria-label="Scanning recipe">
<svg class="spin-icon" width="48" height="48" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="1.5">
<path d="M23 19a2 2 0 0 1-2 2H3a2 2 0 0 1-2-2V8a2 2 0 0 1 2-2h4l2-3h6l2 3h4a2 2 0 0 1 2 2z"/>
<circle cx="12" cy="13" r="4"/>
</svg>
<p class="processing-label">{{ scanStatusMessage }}</p>
<p class="processing-sub">This can take up to a minute on first use.</p>
</div>
</div>
<!-- Review phase -->
<div v-else-if="phase === 'review' && extracted" class="modal-body review-body">
<!-- Confidence banner -->
<div
v-if="extracted.confidence !== 'high' || extracted.warnings.length > 0"
:class="['status-badge', extracted.confidence === 'low' ? 'status-warning' : 'status-info', 'mb-sm']"
role="status"
>
<span v-if="extracted.confidence === 'low'">Low confidence scan handwritten or degraded text. Please review carefully.</span>
<span v-else>Medium confidence. Check the fields below.</span>
<ul v-if="extracted.warnings.length > 0" class="warning-list">
<li v-for="w in extracted.warnings" :key="w">{{ w }}</li>
</ul>
</div>
<!-- Pantry match badge -->
<div v-if="extracted.ingredients.length > 0" class="pantry-match-row mb-sm">
<span class="pantry-badge" :class="pantryMatchClass">
{{ extracted.pantry_match_pct }}% pantry match
({{ pantryCount }} of {{ extracted.ingredients.length }} ingredients on hand)
</span>
</div>
<!-- Editable fields -->
<div class="review-form">
<div class="form-group">
<label class="form-label" for="scan-title">Recipe name</label>
<input
id="scan-title"
v-model="editTitle"
class="form-input"
type="text"
placeholder="Recipe name"
required
/>
</div>
<div class="form-row-2">
<div class="form-group">
<label class="form-label" for="scan-servings">Servings</label>
<input id="scan-servings" v-model="editServings" class="form-input" type="text" placeholder="e.g. 2" />
</div>
<div class="form-group">
<label class="form-label" for="scan-cooktime">Cook time</label>
<input id="scan-cooktime" v-model="editCookTime" class="form-input" type="text" placeholder="e.g. 25 min" />
</div>
</div>
<!-- Ingredients -->
<div class="form-group">
<label class="form-label">Ingredients</label>
<div class="ingredient-list">
<div
v-for="(ingr, i) in editIngredients"
:key="i"
:class="['ingredient-row', ingr.in_pantry ? 'in-pantry' : '']"
>
<span v-if="ingr.in_pantry" class="pantry-dot" title="In your pantry" aria-label="In pantry"></span>
<input
v-model="ingr.qty"
class="form-input ingr-qty"
type="text"
placeholder="qty"
:aria-label="`Ingredient ${i + 1} quantity`"
/>
<input
v-model="ingr.unit"
class="form-input ingr-unit"
type="text"
placeholder="unit"
:aria-label="`Ingredient ${i + 1} unit`"
/>
<input
v-model="ingr.name"
class="form-input ingr-name"
type="text"
placeholder="ingredient"
:aria-label="`Ingredient ${i + 1} name`"
/>
<button
class="btn-icon remove-ingr-btn"
@click="removeIngredient(i)"
:aria-label="`Remove ingredient ${i + 1}`"
>
<svg width="14" height="14" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2.5">
<line x1="18" y1="6" x2="6" y2="18"/><line x1="6" y1="6" x2="18" y2="18"/>
</svg>
</button>
</div>
</div>
<button class="btn btn-ghost btn-sm mt-xs" @click="addIngredient">+ Add ingredient</button>
</div>
<!-- Steps -->
<div class="form-group">
<label class="form-label">Steps</label>
<div class="step-list">
<div v-for="(_step, i) in editSteps" :key="i" class="step-row">
<span class="step-num">{{ i + 1 }}</span>
<textarea
v-model="editSteps[i]"
class="form-input step-textarea"
rows="2"
:aria-label="`Step ${i + 1}`"
></textarea>
<button
class="btn-icon remove-step-btn"
@click="removeStep(i)"
:aria-label="`Remove step ${i + 1}`"
>
<svg width="14" height="14" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2.5">
<line x1="18" y1="6" x2="6" y2="18"/><line x1="6" y1="6" x2="18" y2="18"/>
</svg>
</button>
</div>
</div>
<button class="btn btn-ghost btn-sm mt-xs" @click="addStep">+ Add step</button>
</div>
<!-- Notes (optional) -->
<div class="form-group">
<label class="form-label" for="scan-notes">Notes <span class="optional-label">(optional)</span></label>
<textarea id="scan-notes" v-model="editNotes" class="form-input" rows="2" placeholder="Tips, variations, storage..."></textarea>
</div>
<!-- Source attribution -->
<div v-if="extracted.source_note" class="source-note">
Source: {{ extracted.source_note }}
</div>
</div>
<div v-if="saveError" class="status-badge status-error mt-sm" role="alert">
{{ saveError }}
</div>
<div class="modal-footer">
<button class="btn btn-secondary" @click="phase = 'upload'">Re-scan</button>
<button
class="btn btn-primary"
:disabled="!editTitle.trim() || saving"
@click="save"
>
{{ saving ? 'Saving...' : 'Save Recipe' }}
</button>
</div>
</div>
</div>
</div>
</template>
<script setup lang="ts">
import { ref, computed, onBeforeUnmount } from 'vue'
import { type ScannedRecipe, type ScannedIngredient, recipeScanAPI } from '@/services/api'
type Phase = 'upload' | 'processing' | 'review'
const emit = defineEmits<{
(e: 'close'): void
(e: 'saved', recipe: { id: number; title: string }): void
}>()
const titleId = 'scan-modal-title'
// Upload state
const phase = ref<Phase>('upload')
const fileInput = ref<HTMLInputElement | null>(null)
const selectedFiles = ref<File[]>([])
const previewUrls = ref<string[]>([])
const isDragging = ref(false)
const uploadError = ref('')
function onDrop(e: DragEvent) {
isDragging.value = false
const dt = e.dataTransfer
if (!dt) return
addFiles(Array.from(dt.files))
}
function onFileChange(e: Event) {
const input = e.target as HTMLInputElement
if (!input.files) return
addFiles(Array.from(input.files))
// Reset so the same file can be re-selected after removal
input.value = ''
}
function addFiles(incoming: File[]) {
uploadError.value = ''
const combined = [...selectedFiles.value, ...incoming]
if (combined.length > 4) {
uploadError.value = 'Maximum 4 photos per scan.'
return
}
// Revoke old preview URLs before replacing
previewUrls.value.forEach((url) => URL.revokeObjectURL(url))
selectedFiles.value = combined
previewUrls.value = combined.map((f) => URL.createObjectURL(f))
}
function removeFile(index: number) {
URL.revokeObjectURL(previewUrls.value[index] ?? '')
selectedFiles.value = selectedFiles.value.filter((_, i) => i !== index)
previewUrls.value = previewUrls.value.filter((_, i) => i !== index)
}
// Scan
const extracted = ref<ScannedRecipe | null>(null)
const scanStatusMessage = ref('Uploading photos...')
async function startScan() {
if (selectedFiles.value.length === 0) return
uploadError.value = ''
scanStatusMessage.value = 'Uploading photos...'
phase.value = 'processing'
try {
const result = await recipeScanAPI.scanStream(
selectedFiles.value,
(_status: string, message: string) => { scanStatusMessage.value = message },
)
extracted.value = result
initEditState(result)
phase.value = 'review'
} catch (err: unknown) {
const msg = err instanceof Error ? err.message : String(err)
uploadError.value = msg.includes('not appear to contain a recipe')
? 'This photo does not look like a recipe. Please try a different photo.'
: msg.includes('No vision backend')
? 'Recipe scanning is not available right now. Check your BYOK settings.'
: `Scan failed: ${msg}`
phase.value = 'upload'
}
}
// Review/edit state
const editTitle = ref('')
const editServings = ref('')
const editCookTime = ref('')
const editIngredients = ref<ScannedIngredient[]>([])
const editSteps = ref<string[]>([])
const editNotes = ref('')
function initEditState(r: ScannedRecipe) {
editTitle.value = r.title ?? ''
editServings.value = r.servings ?? ''
editCookTime.value = r.cook_time ?? ''
editIngredients.value = r.ingredients.map((i) => ({ ...i }))
editSteps.value = [...r.steps]
editNotes.value = r.notes ?? ''
}
function removeIngredient(i: number) {
editIngredients.value = editIngredients.value.filter((_, idx) => idx !== i)
}
function addIngredient() {
editIngredients.value = [...editIngredients.value, { name: '', qty: null, unit: null, raw: null, in_pantry: false }]
}
function removeStep(i: number) {
editSteps.value = editSteps.value.filter((_, idx) => idx !== i)
}
function addStep() {
editSteps.value = [...editSteps.value, '']
}
// Pantry match display
const pantryCount = computed(() =>
editIngredients.value.filter((i) => i.in_pantry).length
)
const pantryMatchClass = computed(() => {
const pct = extracted.value?.pantry_match_pct ?? 0
if (pct >= 80) return 'pantry-high'
if (pct >= 50) return 'pantry-mid'
return 'pantry-low'
})
// Save
const saving = ref(false)
const saveError = ref('')
async function save() {
if (!editTitle.value.trim()) return
saving.value = true
saveError.value = ''
try {
const payload = {
title: editTitle.value.trim(),
subtitle: extracted.value?.subtitle ?? null,
servings: editServings.value || null,
cook_time: editCookTime.value || null,
source_note: extracted.value?.source_note ?? null,
ingredients: editIngredients.value.filter((i) => i.name.trim()),
steps: editSteps.value.filter((s) => s.trim()),
notes: editNotes.value.trim() || null,
tags: extracted.value?.tags ?? [],
source: 'scan' as const,
}
const saved = await recipeScanAPI.saveScanned(payload)
emit('saved', { id: saved.id, title: saved.title })
close()
} catch (err: unknown) {
saveError.value = err instanceof Error ? err.message : 'Failed to save recipe.'
} finally {
saving.value = false
}
}
// Cleanup
function close() {
previewUrls.value.forEach((url) => URL.revokeObjectURL(url))
emit('close')
}
onBeforeUnmount(() => {
previewUrls.value.forEach((url) => URL.revokeObjectURL(url))
})
</script>
<style scoped>
.modal-overlay {
position: fixed;
inset: 0;
background: rgba(0, 0, 0, 0.5);
display: flex;
align-items: center;
justify-content: center;
z-index: var(--z-modal, 1000);
padding: var(--spacing-md);
}
.modal-panel {
background: var(--bg-card, #fff);
border-radius: var(--radius-lg, 12px);
box-shadow: var(--shadow-xl, 0 20px 60px rgba(0,0,0,0.2));
width: 100%;
max-width: 560px;
max-height: 90vh;
display: flex;
flex-direction: column;
overflow: hidden;
}
.modal-header {
display: flex;
align-items: center;
justify-content: space-between;
padding: var(--spacing-md) var(--spacing-lg);
border-bottom: 1px solid var(--border-color, #e5e7eb);
flex-shrink: 0;
}
.modal-title {
font-size: var(--font-lg, 1.125rem);
font-weight: 600;
color: var(--text-primary, #111);
margin: 0;
}
.close-btn {
background: none;
border: none;
cursor: pointer;
padding: 4px;
color: var(--text-secondary, #6b7280);
border-radius: var(--radius-sm, 4px);
display: flex;
align-items: center;
justify-content: center;
}
.close-btn:hover {
background: var(--bg-hover, #f3f4f6);
color: var(--text-primary, #111);
}
.modal-body {
padding: var(--spacing-lg);
overflow-y: auto;
flex: 1;
}
.modal-footer {
display: flex;
justify-content: flex-end;
gap: var(--spacing-sm);
padding-top: var(--spacing-md);
border-top: 1px solid var(--border-color, #e5e7eb);
margin-top: var(--spacing-md);
}
/* ── Upload ── */
.hint-text {
color: var(--text-secondary, #6b7280);
font-size: var(--font-sm, 0.875rem);
margin-bottom: var(--spacing-md);
line-height: 1.5;
}
.drop-zone {
border: 2px dashed var(--border-color, #d1d5db);
border-radius: var(--radius-md, 8px);
padding: var(--spacing-xl);
text-align: center;
cursor: pointer;
transition: border-color 0.15s, background 0.15s;
min-height: 160px;
display: flex;
align-items: center;
justify-content: center;
}
.drop-zone:hover,
.drop-zone-active {
border-color: var(--color-primary, #4f46e5);
background: var(--bg-hover, #f5f3ff);
}
.drop-zone.has-files {
border-style: solid;
border-color: var(--color-primary, #4f46e5);
padding: var(--spacing-md);
}
.hidden-input {
display: none;
}
.drop-zone-empty {
display: flex;
flex-direction: column;
align-items: center;
gap: var(--spacing-xs);
}
.camera-icon {
color: var(--text-secondary, #9ca3af);
margin-bottom: var(--spacing-xs);
}
.drop-zone-label {
font-weight: 600;
color: var(--text-primary, #111);
margin: 0;
}
.drop-zone-sub {
color: var(--text-secondary, #6b7280);
font-size: var(--font-sm, 0.875rem);
margin: 0;
}
.file-preview-grid {
display: flex;
gap: var(--spacing-sm);
flex-wrap: wrap;
align-items: center;
width: 100%;
}
.file-preview-item {
position: relative;
width: 100px;
}
.preview-img {
width: 100px;
height: 100px;
object-fit: cover;
border-radius: var(--radius-sm, 6px);
border: 1px solid var(--border-color, #e5e7eb);
}
.remove-file-btn {
position: absolute;
top: -6px;
right: -6px;
background: var(--color-danger, #ef4444);
color: white;
border: none;
border-radius: 50%;
width: 20px;
height: 20px;
display: flex;
align-items: center;
justify-content: center;
cursor: pointer;
padding: 0;
}
.preview-label {
text-align: center;
font-size: var(--font-xs, 0.75rem);
color: var(--text-secondary, #6b7280);
margin: 4px 0 0;
}
.file-preview-add {
width: 100px;
height: 100px;
border: 2px dashed var(--border-color, #d1d5db);
border-radius: var(--radius-sm, 6px);
display: flex;
align-items: center;
justify-content: center;
cursor: pointer;
color: var(--text-secondary, #9ca3af);
transition: border-color 0.15s;
}
.file-preview-add:hover {
border-color: var(--color-primary, #4f46e5);
color: var(--color-primary, #4f46e5);
}
/* ── Processing ── */
.processing-body {
display: flex;
align-items: center;
justify-content: center;
min-height: 200px;
}
.scan-spinner {
display: flex;
flex-direction: column;
align-items: center;
gap: var(--spacing-sm);
}
.spin-icon {
color: var(--color-primary, #4f46e5);
animation: spin 1.5s linear infinite;
}
@keyframes spin {
from { transform: rotate(0deg); }
to { transform: rotate(360deg); }
}
.processing-label {
font-weight: 600;
color: var(--text-primary, #111);
margin: 0;
}
.processing-sub {
color: var(--text-secondary, #6b7280);
font-size: var(--font-sm, 0.875rem);
margin: 0;
}
/* ── Review ── */
.review-body {
padding-bottom: var(--spacing-sm);
}
.pantry-match-row {
display: flex;
align-items: center;
}
.pantry-badge {
display: inline-block;
font-size: var(--font-sm, 0.875rem);
font-weight: 600;
padding: 3px 10px;
border-radius: 999px;
}
.pantry-high { background: var(--color-success-bg, #d1fae5); color: var(--color-success, #065f46); }
.pantry-mid { background: var(--color-info-bg, #dbeafe); color: var(--color-info, #1e40af); }
.pantry-low { background: var(--bg-secondary, #f3f4f6); color: var(--text-secondary, #374151); }
.review-form {
display: flex;
flex-direction: column;
gap: var(--spacing-md);
}
.form-row-2 {
display: grid;
grid-template-columns: 1fr 1fr;
gap: var(--spacing-sm);
}
/* Ingredients */
.ingredient-list {
display: flex;
flex-direction: column;
gap: 6px;
}
.ingredient-row {
display: flex;
align-items: center;
gap: 6px;
}
.pantry-dot {
width: 8px;
height: 8px;
border-radius: 50%;
background: var(--color-success, #10b981);
flex-shrink: 0;
}
.in-pantry {
background: var(--color-success-bg-faint, #f0fdf4);
border-radius: var(--radius-sm, 4px);
padding: 2px 4px;
}
.ingr-qty { width: 60px; flex-shrink: 0; }
.ingr-unit { width: 70px; flex-shrink: 0; }
.ingr-name { flex: 1; }
.remove-ingr-btn,
.remove-step-btn {
background: none;
border: none;
cursor: pointer;
padding: 4px;
color: var(--text-secondary, #9ca3af);
border-radius: var(--radius-sm, 4px);
display: flex;
align-items: center;
justify-content: center;
flex-shrink: 0;
}
.remove-ingr-btn:hover,
.remove-step-btn:hover {
background: var(--color-danger-bg, #fee2e2);
color: var(--color-danger, #ef4444);
}
/* Steps */
.step-list {
display: flex;
flex-direction: column;
gap: 8px;
}
.step-row {
display: flex;
align-items: flex-start;
gap: 8px;
}
.step-num {
width: 24px;
height: 24px;
border-radius: 50%;
background: var(--bg-secondary, #f3f4f6);
color: var(--text-secondary, #374151);
font-size: var(--font-xs, 0.75rem);
font-weight: 700;
display: flex;
align-items: center;
justify-content: center;
flex-shrink: 0;
margin-top: 8px;
}
.step-textarea {
flex: 1;
resize: vertical;
min-height: 60px;
}
/* Source */
.source-note {
font-size: var(--font-xs, 0.75rem);
color: var(--text-secondary, #9ca3af);
text-align: right;
font-style: italic;
}
.optional-label {
color: var(--text-secondary, #9ca3af);
font-weight: normal;
font-size: var(--font-xs, 0.75rem);
}
.warning-list {
margin: 4px 0 0;
padding-left: 16px;
font-size: var(--font-sm, 0.875rem);
}
.btn-ghost {
background: none;
border: none;
cursor: pointer;
color: var(--color-primary, #4f46e5);
padding: 4px 8px;
font-size: var(--font-sm, 0.875rem);
border-radius: var(--radius-sm, 4px);
}
.btn-ghost:hover {
background: var(--bg-hover, #f5f3ff);
}
.btn-sm {
padding: 4px 10px;
font-size: var(--font-sm, 0.875rem);
}
.mt-xs { margin-top: var(--spacing-xs, 4px); }
.mt-sm { margin-top: var(--spacing-sm, 8px); }
.mb-sm { margin-bottom: var(--spacing-sm, 8px); }
@media (max-width: 480px) {
.form-row-2 {
grid-template-columns: 1fr;
}
.modal-panel {
border-radius: var(--radius-md, 8px);
max-height: 95vh;
}
}
</style>

File diff suppressed because it is too large Load diff

View file

@ -46,14 +46,7 @@
<!-- Style tags -->
<div class="form-group">
<div class="flex-between mb-xs">
<label class="form-label" style="margin-bottom: 0;">Style tags</label>
<button
class="btn btn-secondary btn-xs"
:disabled="classifying"
@click="suggestTags"
>{{ classifying ? 'Suggesting…' : 'Suggest tags' }}</button>
</div>
<label class="form-label">Style tags</label>
<div class="tags-wrap flex flex-wrap gap-xs mb-xs">
<span
v-for="tag in localTags"
@ -96,7 +89,6 @@
<script setup lang="ts">
import { ref, computed, onMounted, onUnmounted, nextTick } from 'vue'
import { useSavedRecipesStore } from '../stores/savedRecipes'
import { savedRecipesAPI } from '../services/api'
const SUGGESTED_TAGS = [
'comforting', 'light', 'spicy', 'umami', 'sweet', 'savory', 'rich',
@ -148,7 +140,6 @@ const localTags = ref<string[]>([...(existing.value?.style_tags ?? [])])
const hoverRating = ref<number | null>(null)
const tagInput = ref('')
const saving = ref(false)
const classifying = ref(false)
const unusedSuggestions = computed(() =>
SUGGESTED_TAGS.filter((s) => !localTags.value.includes(s))
@ -183,23 +174,6 @@ function onTagKey(e: KeyboardEvent) {
}
}
async function suggestTags() {
classifying.value = true
try {
const suggestions = await savedRecipesAPI.classifyStyle(props.recipeId)
// Merge suggestions into localTags new ones only, preserving user's existing tags
for (const tag of suggestions) {
if (!localTags.value.includes(tag)) {
localTags.value = [...localTags.value, tag]
}
}
} catch {
// Silently ignore tier gate returns 403, no LLM returns empty list
} finally {
classifying.value = false
}
}
async function submit() {
saving.value = true
try {

View file

@ -32,7 +32,6 @@
<option value="saved_at">Recently saved</option>
<option value="rating">Highest rated</option>
<option value="title">AZ</option>
<option value="last_cooked">Last cooked</option>
</select>
</div>
@ -47,7 +46,7 @@
<!-- Recipe cards -->
<div class="saved-list flex-col gap-sm">
<div
v-for="recipe in sortedSaved"
v-for="recipe in store.saved"
:key="recipe.id"
class="card-sm saved-card"
:class="{ 'card-success': recipe.rating !== null && recipe.rating >= 4 }"
@ -80,8 +79,8 @@
>{{ tag }}</span>
</div>
<!-- Last cooked chip (orbital cadence: neutral, no urgency) -->
<div v-if="lastCookedLabel(recipe.recipe_id)" class="last-cooked-chip text-xs mt-xs">
<!-- Last cooked hint -->
<div v-if="lastCookedLabel(recipe.recipe_id)" class="last-cooked-hint text-xs text-muted mt-xs">
{{ lastCookedLabel(recipe.recipe_id) }}
</div>
@ -166,32 +165,20 @@ const recipesStore = useRecipesStore()
const editingRecipe = ref<SavedRecipe | null>(null)
function lastCookedLabel(recipeId: number): string | null {
const days = recipesStore.lastCookedDaysAgo(recipeId)
if (days === null) return null
if (days === 0) return 'made today'
if (days === 1) return 'made yesterday'
if (days < 7) return `made ${days} days ago`
if (days < 14) return 'made 1 week ago'
const weeks = Math.floor(days / 7)
if (days < 60) return `made ${weeks} weeks ago`
const months = Math.floor(days / 30)
return `made ${months} month${months !== 1 ? 's' : ''} ago`
const entries = recipesStore.cookLog.filter((e) => e.id === recipeId)
if (entries.length === 0) return null
const latestMs = Math.max(...entries.map((e) => e.cookedAt))
const diffMs = Date.now() - latestMs
const diffDays = Math.floor(diffMs / (1000 * 60 * 60 * 24))
if (diffDays === 0) return 'Last made: today'
if (diffDays === 1) return 'Last made: yesterday'
if (diffDays < 7) return `Last made: ${diffDays} days ago`
if (diffDays < 14) return 'Last made: 1 week ago'
const diffWeeks = Math.floor(diffDays / 7)
if (diffDays < 60) return `Last made: ${diffWeeks} weeks ago`
const diffMonths = Math.floor(diffDays / 30)
return `Last made: ${diffMonths} month${diffMonths !== 1 ? 's' : ''} ago`
}
// Client-side last_cooked sort resolves from localStorage cook log so no API change needed.
// Recipes with a cook date surface oldest-first (natural "due for a revisit" order without
// framing it that way). Recipes never cooked sort to the end.
const sortedSaved = computed(() => {
if (store.sortBy !== 'last_cooked') return store.saved
return [...store.saved].sort((a, b) => {
const daysA = recipesStore.lastCookedDaysAgo(a.recipe_id)
const daysB = recipesStore.lastCookedDaysAgo(b.recipe_id)
if (daysA === null && daysB === null) return 0
if (daysA === null) return 1 // never cooked end
if (daysB === null) return -1 // never cooked end
return daysB - daysA // oldest cooked first (largest days value first)
})
})
const showNewCollection = ref(false)
// #44: two-step remove confirmation
@ -376,14 +363,9 @@ async function createCollection() {
padding: var(--spacing-xl);
}
.last-cooked-chip {
display: inline-block;
color: var(--color-text-muted, var(--color-secondary, #888));
background: var(--color-surface-subtle, transparent);
border-radius: var(--radius-sm, 4px);
padding: 0 var(--spacing-xs, 4px);
font-style: normal;
opacity: 0.8;
.last-cooked-hint {
font-style: italic;
opacity: 0.75;
}
.modal-overlay {

View file

@ -2,7 +2,6 @@
<div class="settings-view">
<div class="card">
<h2 class="section-title text-xl mb-md">Settings</h2>
<p class="text-xs text-muted mb-md">Changes save automatically.</p>
<!-- Cooking Equipment -->
<section>
@ -20,7 +19,7 @@
class="tag-chip status-badge status-info"
>
{{ item }}
<button class="chip-remove" @click="removeEquipment(item)" :aria-label="'Remove equipment: ' + item">×</button>
<button class="chip-remove" @click="removeEquipment(item)" aria-label="Remove">×</button>
</span>
</div>
@ -51,6 +50,18 @@
</div>
</div>
<!-- Save button -->
<div class="flex-start gap-sm">
<button
class="btn btn-primary"
:disabled="settingsStore.loading"
@click="settingsStore.save()"
>
<span v-if="settingsStore.loading">Saving</span>
<span v-else-if="settingsStore.saved"> Saved!</span>
<span v-else>Save Settings</span>
</button>
</div>
</section>
<!-- Sensory Preferences -->
@ -123,6 +134,17 @@
</p>
</div>
<div class="flex-start gap-sm mt-sm">
<button
class="btn btn-primary btn-sm"
:disabled="settingsStore.loading"
@click="settingsStore.saveSensory()"
>
<span v-if="settingsStore.loading">Saving</span>
<span v-else-if="settingsStore.saved">Saved!</span>
<span v-else>Save sensory preferences</span>
</button>
</div>
</section>
<!-- Units -->
@ -147,6 +169,17 @@
Imperial (oz, cups, °F)
</button>
</div>
<div class="flex-start gap-sm">
<button
class="btn btn-primary btn-sm"
:disabled="settingsStore.loading"
@click="settingsStore.save()"
>
<span v-if="settingsStore.loading">Saving</span>
<span v-else-if="settingsStore.saved"> Saved!</span>
<span v-else>Save</span>
</button>
</div>
</section>
<!-- Shopping Locale -->
@ -187,6 +220,17 @@
<option value="br">Brazil (BRL R$)</option>
</optgroup>
</select>
<div class="flex-start gap-sm mt-sm">
<button
class="btn btn-primary btn-sm"
:disabled="settingsStore.loading"
@click="settingsStore.save()"
>
<span v-if="settingsStore.loading">Saving</span>
<span v-else-if="settingsStore.saved"> Saved!</span>
<span v-else>Save</span>
</button>
</div>
</section>
<!-- Time-First Layout -->
@ -214,24 +258,17 @@
</span>
</label>
</div>
</section>
<!-- Data Sharing (cloud only) -->
<section v-if="isCloudMode" class="mt-md">
<h3 class="text-lg font-semibold mb-xs">Data Sharing</h3>
<label class="data-sharing-toggle flex-start gap-sm text-sm">
<input
type="checkbox"
:checked="magpieOptIn"
@change="setMagpieOptIn(($event.target as HTMLInputElement).checked)"
/>
Share anonymized recipe ratings to help improve suggestions
</label>
<p class="text-xs text-muted mt-xs">
When enabled, Kiwi sends the recipe source ID, your star rating, and
style tags to CircuitForge. No personal information or pantry contents
are included.
</p>
<div class="flex-start gap-sm mt-sm">
<button
class="btn btn-primary btn-sm"
:disabled="settingsStore.loading"
@click="settingsStore.save()"
>
<span v-if="settingsStore.loading">Saving</span>
<span v-else-if="settingsStore.saved"> Saved!</span>
<span v-else>Save</span>
</button>
</div>
</section>
<!-- Display Preferences -->
@ -338,19 +375,13 @@
</template>
</div>
</div>
<Transition name="autosave-fade">
<div v-if="settingsStore.saved" class="autosave-toast" role="status" aria-live="polite">
Saved
</div>
</Transition>
</template>
<script setup lang="ts">
import { ref, computed, onMounted } from 'vue'
import { useSettingsStore } from '../stores/settings'
import { useRecipesStore } from '../stores/recipes'
import { householdAPI, settingsAPI, type HouseholdStatus } from '../services/api'
import { householdAPI, type HouseholdStatus } from '../services/api'
import type { TextureTag, SmellLevel, NoiseLevel } from '../services/api'
import type { TimeFirstLayout } from '../stores/settings'
import { useOrchUsage } from '../composables/useOrchUsage'
@ -359,23 +390,6 @@ const settingsStore = useSettingsStore()
const recipesStore = useRecipesStore()
const { enabled: orchPillEnabled, setEnabled: setOrchPillEnabled } = useOrchUsage()
// Cloud mode baked in at build time via VITE_CLOUD_MODE=true in cloud builds
const isCloudMode = import.meta.env.VITE_CLOUD_MODE === 'true'
// Data sharing magpie opt-in (cloud mode only)
const magpieOptIn = ref(false)
async function loadMagpieOptIn(): Promise<void> {
if (!isCloudMode) return
const value = await settingsAPI.getSetting('magpie_opt_in')
magpieOptIn.value = value === 'true'
}
async function setMagpieOptIn(enabled: boolean): Promise<void> {
magpieOptIn.value = enabled
await settingsAPI.setSetting('magpie_opt_in', enabled ? 'true' : 'false')
}
const timeFirstLayoutOptions: Array<{ value: TimeFirstLayout; label: string; description: string }> = [
{ value: 'auto', label: 'Auto', description: 'Shows a time selector when recipes are available.' },
{ value: 'time_first', label: 'Time First', description: 'Always show the time bucket selector at the top.' },
@ -525,7 +539,6 @@ async function handleRemoveMember(userId: string) {
onMounted(async () => {
await settingsStore.load()
await loadHouseholdStatus()
await loadMagpieOptIn()
})
// Sensory taxonomy
@ -749,15 +762,13 @@ function getNoiseClass(_value: NoiseLevel, idx: number): string {
color: var(--color-text-muted);
}
.orch-pill-toggle,
.data-sharing-toggle {
.orch-pill-toggle {
cursor: pointer;
align-items: center;
color: var(--color-text);
}
.orch-pill-toggle input[type="checkbox"],
.data-sharing-toggle input[type="checkbox"] {
.orch-pill-toggle input[type="checkbox"] {
accent-color: var(--color-primary);
width: 1rem;
height: 1rem;
@ -822,32 +833,4 @@ function getNoiseClass(_value: NoiseLevel, idx: number): string {
border-color: var(--color-border, #e0e0e0);
color: var(--color-text-secondary, #888);
}
/* ── Autosave toast ──────────────────────────────────────────────────────── */
.autosave-toast {
position: fixed;
bottom: 1.5rem;
right: 1.5rem;
background: var(--color-surface, #fff);
border: 1px solid var(--color-border, #e0e0e0);
border-radius: var(--radius-md, 0.5rem);
padding: 0.4rem 0.9rem;
font-size: var(--font-size-sm);
color: var(--color-success, #4a8c40);
box-shadow: 0 2px 8px rgba(0, 0, 0, 0.12);
z-index: 500;
pointer-events: none;
}
.autosave-fade-enter-active,
.autosave-fade-leave-active {
transition: opacity 0.25s ease, transform 0.25s ease;
}
.autosave-fade-enter-from,
.autosave-fade-leave-to {
opacity: 0;
transform: translateY(0.5rem);
}
</style>

View file

@ -627,7 +627,6 @@ export interface RecipeRequest {
complexity_filter: string | null
max_time_min: number | null
max_total_min: number | null
max_active_min: number | null
}
export interface Staple {
@ -671,21 +670,6 @@ export interface BuildRequest {
role_overrides: Record<string, string>
}
// ── Ask/RAG types ──────────────────────────────────────────────────────────
export interface AskRecipeHit {
id: number
title: string
match_pct: number | null
category: string | null
}
export interface AskResponse {
answer: string | null
recipes: AskRecipeHit[]
tier: string
}
// ========== Recipes API ==========
export const recipesAPI = {
@ -710,10 +694,6 @@ export const recipesAPI = {
const response = await api.get(`/recipes/${id}`)
return response.data
},
async getLeftovers(id: number): Promise<{ fridge_days: number; freeze_days: number | null; freeze_by_day: number | null; storage_advice: string }> {
const response = await api.post(`/recipes/${id}/leftovers`)
return response.data
},
async listStaples(dietary?: string): Promise<Staple[]> {
const response = await api.get('/staples/', { params: dietary ? { dietary } : undefined })
return response.data
@ -752,60 +732,6 @@ export const recipesAPI = {
})
return response.data
},
/** Natural-language recipe search with optional LLM synthesis (Paid tier). */
async ask(question: string, pantryItems: string[] = []): Promise<AskResponse> {
const response = await api.post('/recipes/ask', { question, pantry_items: pantryItems }, { timeout: 30000 })
return response.data
},
/** Stream a recipe via native SSE (Ollama fallback). Calls callbacks as tokens arrive. */
async suggestRecipeStream(
req: RecipeRequest,
onChunk: (chunk: string) => void,
onDone: () => void,
onError: (err: string) => void,
): Promise<void> {
const baseUrl = (api.defaults.baseURL ?? '') as string
let response: Response
try {
response = await fetch(`${baseUrl}/recipes/suggest?stream=true`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(req),
})
} catch (err: unknown) {
onError(err instanceof Error ? err.message : 'Network error')
return
}
if (!response.ok) {
onError(`HTTP ${response.status}`)
return
}
const reader = response.body?.getReader()
if (!reader) { onError('No response body'); return }
const decoder = new TextDecoder()
let buffer = ''
while (true) {
const { done, value } = await reader.read()
if (done) { onDone(); break }
buffer += decoder.decode(value, { stream: true })
const parts = buffer.split('\n\n')
buffer = parts.pop() ?? ''
for (const part of parts) {
if (!part.startsWith('data: ')) continue
try {
const data = JSON.parse(part.slice(6))
if (data.done) { onDone(); return }
else if (data.error) { onError(data.error); return }
else if (data.chunk) { onChunk(data.chunk) }
} catch { /* ignore malformed events */ }
}
}
},
}
// ========== Settings API ==========
@ -931,10 +857,6 @@ export const savedRecipesAPI = {
async removeFromCollection(collection_id: number, saved_recipe_id: number): Promise<void> {
await api.delete(`/recipes/saved/collections/${collection_id}/members/${saved_recipe_id}`)
},
async classifyStyle(recipe_id: number): Promise<string[]> {
const response = await api.post(`/recipes/saved/${recipe_id}/classify-style`)
return response.data.suggested_tags
},
}
// --- Meal Plan types ---
@ -1131,7 +1053,6 @@ export const browserAPI = {
subcategory?: string
q?: string
sort?: string
required_ingredient?: string
}): Promise<BrowserResult> {
const response = await api.get(`/recipes/browse/${domain}/${encodeURIComponent(category)}`, { params })
return response.data
@ -1274,127 +1195,4 @@ export const DEFAULT_SENSORY_PREFERENCES: SensoryPreferences = {
max_noise: null,
}
// ── Recipe Scanner (kiwi#9) ───────────────────────────────────────────────────
export interface ScannedIngredient {
name: string
qty: string | null
unit: string | null
raw: string | null
in_pantry: boolean
}
export interface ScannedRecipe {
title: string | null
subtitle: string | null
servings: string | null
cook_time: string | null
source_note: string | null
ingredients: ScannedIngredient[]
steps: string[]
notes: string | null
tags: string[]
pantry_match_pct: number
confidence: 'high' | 'medium' | 'low'
warnings: string[]
}
export interface UserRecipe {
id: number
title: string
subtitle: string | null
servings: string | null
cook_time: string | null
source_note: string | null
ingredients: ScannedIngredient[]
steps: string[]
notes: string | null
tags: string[]
source: string
pantry_match_pct: number | null
created_at: string
}
export const recipeScanAPI = {
/** Scan 1-4 recipe photos. Returns structured recipe for review (not saved). */
scan(files: File[]): Promise<ScannedRecipe> {
const form = new FormData()
files.forEach((f) => form.append('files', f))
return api.post('/recipes/scan', form, {
headers: { 'Content-Type': 'multipart/form-data' },
timeout: 120_000, // VLM can be slow on first call
}).then((r) => r.data)
},
/** Scan recipe photos with live SSE progress events.
*
* Calls onProgress(status, message) for each intermediate event
* ("allocating", "scanning", "structuring"), then resolves with the final
* ScannedRecipe on success. Rejects on error or timeout.
*/
async scanStream(
files: File[],
onProgress: (status: string, message: string) => void,
): Promise<ScannedRecipe> {
const form = new FormData()
files.forEach((f) => form.append('files', f))
const response = await fetch(`${API_BASE_URL}/recipes/scan/stream`, {
method: 'POST',
body: form,
})
if (!response.ok || !response.body) {
let detail = ''
try { detail = await response.text() } catch (_) { /* ignore */ }
throw new Error(detail || `Scan failed (${response.status})`)
}
const reader = response.body.getReader()
const decoder = new TextDecoder()
let buffer = ''
while (true) {
const { done, value } = await reader.read()
if (done) break
buffer += decoder.decode(value, { stream: true })
const lines = buffer.split('\n')
buffer = lines.pop() ?? ''
for (const line of lines) {
if (!line.startsWith('data: ')) continue
let data: Record<string, unknown>
try { data = JSON.parse(line.slice(6)) } catch { continue }
if (data.status === 'done') return data.recipe as ScannedRecipe
if (data.status === 'error') throw new Error((data.message as string) || 'Scan failed')
onProgress(data.status as string, data.message as string)
}
}
throw new Error('Stream ended without a result')
},
/** Save a reviewed/edited scanned recipe to user_recipes. */
saveScanned(recipe: Omit<ScannedRecipe, 'pantry_match_pct' | 'confidence' | 'warnings'> & { source?: string }): Promise<UserRecipe> {
return api.post('/recipes/scan/save', recipe).then((r) => r.data)
},
/** List all user-created recipes (scan + manual). */
listUserRecipes(): Promise<UserRecipe[]> {
return api.get('/recipes/user').then((r) => r.data)
},
/** Get a single user recipe by ID. */
getUserRecipe(id: number): Promise<UserRecipe> {
return api.get(`/recipes/user/${id}`).then((r) => r.data)
},
/** Delete a user recipe. */
deleteUserRecipe(id: number): Promise<void> {
return api.delete(`/recipes/user/${id}`).then(() => undefined)
},
}
export default api

View file

@ -64,20 +64,6 @@ export interface PublishPayload {
recipe_id?: number
outcome_notes?: string
slots?: CommunityPostSlot[]
similar_to_ref?: string
}
export type SimilarityTier = 'exact_recipe' | 'very_similar' | 'somewhat_similar'
export interface SimilarPost {
slug: string
title: string
recipe_name: string | null
pseudonym: string
published: string
similarity_tier: SimilarityTier
jaccard_score: number | null
tier_description: string
}
export interface PublishResult {
@ -121,25 +107,6 @@ export const useCommunityStore = defineStore('community', () => {
return response.data
}
async function checkSimilar(
title: string,
recipeId?: number | null,
postType?: string,
): Promise<SimilarPost[]> {
try {
const body: Record<string, unknown> = { title }
if (recipeId != null) body.recipe_id = recipeId
if (postType) body.post_type = postType
const response = await api.post<{ similar_posts: SimilarPost[] }>(
'/community/check-similar',
body,
)
return response.data.similar_posts
} catch {
return []
}
}
return {
posts,
loading,
@ -148,6 +115,5 @@ export const useCommunityStore = defineStore('community', () => {
fetchPosts,
forkPost,
publishPost,
checkSimilar,
}
})

View file

@ -152,7 +152,6 @@ export const useRecipesStore = defineStore('recipes', () => {
const complexityFilter = ref<string | null>(null)
const maxTimeMin = ref<number | null>(null)
const maxTotalMin = ref<number | null>(null)
const maxActiveMin = ref<number | null>(null)
const nutritionFilters = ref<NutritionFilters>({
max_calories: null,
max_sugar_g: null,
@ -208,7 +207,6 @@ export const useRecipesStore = defineStore('recipes', () => {
complexity_filter: complexityFilter.value,
max_time_min: maxTimeMin.value,
max_total_min: maxTotalMin.value,
max_active_min: maxActiveMin.value,
}
}
@ -320,8 +318,6 @@ export const useRecipesStore = defineStore('recipes', () => {
localStorage.removeItem(DISMISSED_KEY)
}
// Orbital cadence: cookedAt anchors to completion, not to a schedule.
// Days-since display measures from this timestamp — no debt accumulates.
function logCook(id: number, title: string) {
const entry: CookLogEntry = { id, title, cookedAt: Date.now() }
cookLog.value = [...cookLog.value, entry]
@ -333,13 +329,6 @@ export const useRecipesStore = defineStore('recipes', () => {
localStorage.removeItem(COOK_LOG_KEY)
}
function lastCookedDaysAgo(recipeId: number): number | null {
const entries = cookLog.value.filter((e) => e.id === recipeId)
if (entries.length === 0) return null
const latestMs = Math.max(...entries.map((e) => e.cookedAt))
return Math.floor((Date.now() - latestMs) / 86_400_000)
}
function isBookmarked(id: number): boolean {
return bookmarks.value.some((b) => b.id === id)
}
@ -379,17 +368,6 @@ export const useRecipesStore = defineStore('recipes', () => {
wildcardConfirmed.value = false
}
async function streamSuggest(
pantryItems: string[],
secondaryPantryItems: Record<string, string>,
onChunk: (chunk: string) => void,
onDone: () => void,
onError: (err: string) => void,
): Promise<void> {
const req = _buildRequest(pantryItems, secondaryPantryItems)
await recipesAPI.suggestRecipeStream(req, onChunk, onDone, onError)
}
return {
result,
loading,
@ -409,14 +387,12 @@ export const useRecipesStore = defineStore('recipes', () => {
complexityFilter,
maxTimeMin,
maxTotalMin,
maxActiveMin,
nutritionFilters,
dismissedIds,
dismissedCount,
cookLog,
logCook,
clearCookLog,
lastCookedDaysAgo,
bookmarks,
isBookmarked,
toggleBookmark,
@ -427,7 +403,6 @@ export const useRecipesStore = defineStore('recipes', () => {
missingIngredientMode,
builderFilterMode,
suggest,
streamSuggest,
loadMore,
dismiss,
undismiss,

View file

@ -11,7 +11,7 @@ export const useSavedRecipesStore = defineStore('savedRecipes', () => {
const saved = ref<SavedRecipe[]>([])
const collections = ref<RecipeCollection[]>([])
const loading = ref(false)
const sortBy = ref<'saved_at' | 'rating' | 'title' | 'last_cooked'>('saved_at')
const sortBy = ref<'saved_at' | 'rating' | 'title'>('saved_at')
const activeCollectionId = ref<number | null>(null)
const savedIds = computed(() => new Set(saved.value.map((s) => s.recipe_id)))
@ -27,15 +27,12 @@ export const useSavedRecipesStore = defineStore('savedRecipes', () => {
async function load() {
loading.value = true
try {
// Fetch independently — a collections 403 (Free tier) must not prevent
// saved recipes from loading. Backend now returns [] for Free, but guard
// here too in case an older API version is deployed.
const [itemsResult, colsResult] = await Promise.allSettled([
savedRecipesAPI.list({ sort_by: sortBy.value === 'last_cooked' ? 'saved_at' : sortBy.value, collection_id: activeCollectionId.value ?? undefined }),
const [items, cols] = await Promise.all([
savedRecipesAPI.list({ sort_by: sortBy.value, collection_id: activeCollectionId.value ?? undefined }),
savedRecipesAPI.listCollections(),
])
if (itemsResult.status === 'fulfilled') saved.value = itemsResult.value
if (colsResult.status === 'fulfilled') collections.value = colsResult.value
saved.value = items
collections.value = cols
} finally {
loading.value = false
}

View file

@ -1,5 +1,11 @@
/**
* Settings Store
*
* Manages user settings (cooking equipment, preferences) using Pinia.
*/
import { defineStore } from 'pinia'
import { ref, watch, nextTick } from 'vue'
import { ref } from 'vue'
import { settingsAPI } from '../services/api'
import type { UnitSystem } from '../utils/units'
import type { SensoryPreferences } from '../services/api'
@ -7,12 +13,8 @@ import { DEFAULT_SENSORY_PREFERENCES } from '../services/api'
export type TimeFirstLayout = 'auto' | 'time_first' | 'normal'
function debounce(fn: () => void, ms: number): () => void {
let t: ReturnType<typeof setTimeout>
return () => { clearTimeout(t); t = setTimeout(fn, ms) }
}
export const useSettingsStore = defineStore('settings', () => {
// State
const cookingEquipment = ref<string[]>([])
const unitSystem = ref<UnitSystem>('metric')
const shoppingLocale = ref<string>('us')
@ -21,40 +23,7 @@ export const useSettingsStore = defineStore('settings', () => {
const loading = ref(false)
const saved = ref(false)
// Prevents autosave watchers from firing during initial load hydration.
// Set to true after nextTick() at the end of load() — by that point all
// watcher jobs queued by the hydration assignments have already flushed.
let _hydrated = false
function _flash() {
saved.value = true
setTimeout(() => { saved.value = false }, 2000)
}
async function _saveKey(key: string, value: string): Promise<void> {
if (!_hydrated) return
try {
await settingsAPI.setSetting(key, value)
_flash()
} catch (err: unknown) {
console.error('Autosave failed for key:', key, err)
}
}
const _autosave = {
equipment: debounce(() => _saveKey('cooking_equipment', JSON.stringify(cookingEquipment.value)), 600),
unit: debounce(() => _saveKey('unit_system', unitSystem.value), 600),
locale: debounce(() => _saveKey('shopping_locale', shoppingLocale.value), 600),
sensory: debounce(() => _saveKey('sensory_preferences', JSON.stringify(sensoryPreferences.value)), 600),
layout: debounce(() => _saveKey('time_first_layout', timeFirstLayout.value), 600),
}
watch(cookingEquipment, _autosave.equipment, { deep: true })
watch(unitSystem, _autosave.unit)
watch(shoppingLocale, _autosave.locale)
watch(sensoryPreferences, _autosave.sensory, { deep: true })
watch(timeFirstLayout, _autosave.layout)
// Actions
async function load() {
loading.value = true
try {
@ -89,15 +58,8 @@ export const useSettingsStore = defineStore('settings', () => {
} finally {
loading.value = false
}
// Yield past the watcher flush triggered by hydration assignments above.
// After nextTick, any pending watcher jobs from this load() have already
// run (and been ignored by _hydrated guard), so user-driven changes from
// here forward will correctly trigger autosave.
await nextTick()
_hydrated = true
}
// Kept for explicit full-save scenarios (e.g. fallback, tests).
async function save() {
loading.value = true
try {
@ -108,7 +70,10 @@ export const useSettingsStore = defineStore('settings', () => {
settingsAPI.setSetting('sensory_preferences', JSON.stringify(sensoryPreferences.value)),
settingsAPI.setSetting('time_first_layout', timeFirstLayout.value),
])
_flash()
saved.value = true
setTimeout(() => {
saved.value = false
}, 2000)
} catch (err: unknown) {
console.error('Failed to save settings:', err)
} finally {
@ -116,17 +81,24 @@ export const useSettingsStore = defineStore('settings', () => {
}
}
// Kept for backward compat; autosave handles sensory changes now.
async function saveSensory() {
loading.value = true
try {
await settingsAPI.setSetting('sensory_preferences', JSON.stringify(sensoryPreferences.value))
_flash()
await settingsAPI.setSetting(
'sensory_preferences',
JSON.stringify(sensoryPreferences.value),
)
saved.value = true
setTimeout(() => { saved.value = false }, 2000)
} catch (err: unknown) {
console.error('Failed to save sensory preferences:', err)
} finally {
loading.value = false
}
}
return {
// State
cookingEquipment,
unitSystem,
shoppingLocale,
@ -134,6 +106,8 @@ export const useSettingsStore = defineStore('settings', () => {
timeFirstLayout,
loading,
saved,
// Actions
load,
save,
saveSensory,

View file

@ -436,24 +436,6 @@
display: none;
}
/* Horizontally scrollable tab bar — for tab rows with many items */
.tab-bar-scroll {
display: flex;
gap: var(--spacing-xs);
overflow-x: auto;
overflow-y: visible;
scrollbar-width: none;
-webkit-overflow-scrolling: touch;
min-width: 0;
width: 100%;
flex-wrap: nowrap;
padding-bottom: 2px; /* prevent focus ring clipping */
}
.tab-bar-scroll::-webkit-scrollbar {
display: none;
}
/* ============================================
TEXT UTILITIES
============================================ */

View file

@ -1,85 +1,9 @@
import { defineConfig } from 'vite'
import vue from '@vitejs/plugin-vue'
import { VitePWA } from 'vite-plugin-pwa'
import { fileURLToPath, URL } from 'node:url'
// Ensure start_url/scope match the deployment base path so the PWA launches
// at the correct URL (e.g. /kiwi/ in cloud, / in local dev) rather than the
// site root (which on menagerie.circuitforge.tech is the account page).
const rawBase = process.env.VITE_BASE_URL ?? '/'
const appBase = rawBase.endsWith('/') ? rawBase : rawBase + '/'
export default defineConfig({
plugins: [
vue(),
VitePWA({
registerType: 'autoUpdate',
// generateSW strategy: Workbox builds the service worker at build time.
// autoUpdate means new versions install in the background and activate
// on next navigation — no "click to reload" prompt needed.
strategies: 'generateSW',
includeAssets: ['icons/icon-192.png', 'icons/icon-512.png', 'icons/maskable-192.png', 'icons/maskable-512.png'],
manifest: {
name: 'Kiwi — Pantry Tracker',
short_name: 'Kiwi',
description: 'Track your pantry, cut food waste, get recipe ideas from what you have.',
theme_color: '#e8a820',
background_color: '#1e1c1a',
display: 'standalone',
orientation: 'portrait',
scope: appBase,
start_url: appBase,
icons: [
{
src: '/icons/icon-192.png',
sizes: '192x192',
type: 'image/png',
},
{
src: '/icons/icon-512.png',
sizes: '512x512',
type: 'image/png',
},
{
src: '/icons/maskable-192.png',
sizes: '192x192',
type: 'image/png',
purpose: 'maskable',
},
{
src: '/icons/maskable-512.png',
sizes: '512x512',
type: 'image/png',
purpose: 'maskable',
},
],
},
workbox: {
// Precache the built JS/CSS/HTML shell. API calls are always network-first.
globPatterns: ['**/*.{js,css,html,ico,png,svg,woff2}'],
runtimeCaching: [
{
// API: network-first, fall back to cache for 1 minute
urlPattern: /^\/api\//,
handler: 'NetworkFirst',
options: {
cacheName: 'kiwi-api-cache',
expiration: { maxEntries: 50, maxAgeSeconds: 60 },
},
},
{
// Google Fonts: cache-first (fonts rarely change)
urlPattern: /^https:\/\/fonts\.(googleapis|gstatic)\.com\//,
handler: 'CacheFirst',
options: {
cacheName: 'google-fonts',
expiration: { maxEntries: 10, maxAgeSeconds: 60 * 60 * 24 * 365 },
},
},
],
},
}),
],
plugins: [vue()],
base: process.env.VITE_BASE_URL ?? '/',
resolve: {
alias: {

View file

@ -14,8 +14,8 @@ OVERRIDE_FLAG=""
[[ -f "compose.override.yml" ]] && OVERRIDE_FLAG="-f compose.override.yml"
usage() {
echo "Usage: $0 {start|stop|restart|status|logs|open|build|test|update"
echo " |cloud-start|cloud-stop|cloud-restart|cloud-status|cloud-logs|cloud-build|cloud-update}"
echo "Usage: $0 {start|stop|restart|status|logs|open|build|test"
echo " |cloud-start|cloud-stop|cloud-restart|cloud-status|cloud-logs|cloud-build}"
echo ""
echo "Dev:"
echo " start Build (if needed) and start all services"
@ -26,7 +26,6 @@ usage() {
echo " open Open web UI in browser"
echo " build Rebuild Docker images without cache"
echo " test Run pytest test suite"
echo " update git pull + rebuild + restart dev stack"
echo ""
echo "Cloud (menagerie.circuitforge.tech/kiwi):"
echo " cloud-start Build cloud images and start kiwi-cloud project"
@ -35,7 +34,6 @@ usage() {
echo " cloud-status Show cloud containers"
echo " cloud-logs Follow cloud logs [api|web — defaults to all]"
echo " cloud-build Rebuild cloud images without cache"
echo " cloud-update git pull + rebuild + restart cloud stack"
exit 1
}
@ -70,11 +68,6 @@ case "$cmd" in
build)
docker compose -f "$COMPOSE_FILE" $OVERRIDE_FLAG build --no-cache
;;
update)
git pull
docker compose -f "$COMPOSE_FILE" $OVERRIDE_FLAG up -d --build
echo "Kiwi updated and restarted → http://localhost:${WEB_PORT}"
;;
test)
docker compose -f "$COMPOSE_FILE" $OVERRIDE_FLAG run --rm api \
conda run -n job-seeker pytest tests/ -v
@ -102,11 +95,6 @@ case "$cmd" in
cloud-build)
docker compose -f "$CLOUD_COMPOSE_FILE" -p "$CLOUD_PROJECT" build --no-cache
;;
cloud-update)
git pull
docker compose -f "$CLOUD_COMPOSE_FILE" -p "$CLOUD_PROJECT" up -d --build
echo "Kiwi cloud updated and restarted → https://menagerie.circuitforge.tech/kiwi"
;;
*)
usage

View file

@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project]
name = "kiwi"
version = "0.10.0"
version = "0.6.0"
description = "Pantry tracking + leftover recipe suggestions"
readme = "README.md"
requires-python = ">=3.11"

View file

@ -1,117 +0,0 @@
"""
Fast targeted backfill for meal: tags only.
Rather than re-deriving ALL inferred_tags via the full infer_tags() pipeline
(which takes ~2.5h for 3.19M recipes), this script:
1. Reads only id + title + inferred_tags (no ingredient profiles needed
meal signals are title-only).
2. Runs _match_title_signals() against the title to get meal tags.
3. For rows that already have inferred_tags: merges in the new meal tags
(no-op if already present).
4. For rows with no inferred_tags: runs the full infer_tags() pipeline so
those rows get a complete tag set, not just meal tags.
5. Rebuilds the FTS5 index once at the end.
Estimated runtime on 3.19M recipes: 35 minutes.
Usage:
python scripts/pipeline/backfill_meal_tags.py [path/to/kiwi.db]
"""
from __future__ import annotations
import argparse
import json
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parents[2]))
from app.services.recipe.tag_inferrer import _MEAL_SIGNALS, _match_title_signals
def run(db_path: Path, batch_size: int = 10_000) -> None:
import sqlite3
conn = sqlite3.connect(db_path)
conn.execute("PRAGMA journal_mode=WAL")
conn.execute("PRAGMA synchronous=NORMAL")
total = conn.execute("SELECT count(*) FROM recipes").fetchone()[0]
print(f"Total recipes: {total:,}")
updated = 0
skipped = 0
offset = 0
while True:
rows = conn.execute(
"""
SELECT id, title, inferred_tags
FROM recipes
ORDER BY id
LIMIT ? OFFSET ?
""",
(batch_size, offset),
).fetchall()
if not rows:
break
updates: list[tuple[str, int]] = []
for row_id, title, tags_json in rows:
title = title or ""
meal_tags = _match_title_signals(title, _MEAL_SIGNALS)
if not meal_tags:
skipped += 1
continue
try:
existing: list[str] = json.loads(tags_json) if tags_json else []
except Exception:
existing = []
# Merge: union of existing + new meal tags, sorted
merged = sorted(set(existing) | set(meal_tags))
if merged == existing:
skipped += 1
continue
updates.append((json.dumps(merged), row_id))
if updates:
conn.executemany(
"UPDATE recipes SET inferred_tags = ? WHERE id = ?", updates
)
conn.commit()
updated += len(updates)
offset += len(rows)
pct = min(100, int(offset * 100 / total))
print(f" {pct:>3}% offset {offset:,} merged {updated:,} skipped {skipped:,}",
end="\r")
print(f"\nDone. Merged meal tags into {updated:,} recipes ({skipped:,} unchanged).")
if updated > 0:
print("Rebuilding FTS5 browser index...")
try:
conn.execute(
"INSERT INTO recipe_browser_fts(recipe_browser_fts) VALUES('rebuild')"
)
conn.commit()
print("FTS rebuild complete.")
except Exception as e:
print(f"FTS rebuild skipped: {e}")
conn.close()
if __name__ == "__main__":
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument("db", nargs="?", default="data/kiwi.db", type=Path)
parser.add_argument("--batch-size", type=int, default=10_000)
args = parser.parse_args()
if not args.db.exists():
print(f"DB not found: {args.db}")
sys.exit(1)
run(args.db, args.batch_size)

View file

@ -1,218 +0,0 @@
"""Ingest Purple Carrot scraped recipes into the Kiwi corpus database.
Reads recipes_purplecarrot_live.parquet (output of scrape_live.py) and
upserts into the shared recipes table, setting source='purplecarrot' and
using the recipe slug as the external_id (prefixed pc_).
Run after each weekly_harvest.sh scrape:
conda run -n cf python3 scripts/pipeline/ingest_purplecarrot.py \
[--db /Library/Assets/kiwi/kiwi.db] \
[--parquet /Library/Assets/kiwi/pipeline/recipes_purplecarrot_live.parquet]
"""
from __future__ import annotations
import argparse
import json
import sqlite3
from pathlib import Path
import math
import re
import pandas as pd
# ── Helpers (inlined from build_recipe_index to avoid cross-module import) ─────
_MEASURE_PATTERN = re.compile(
r"^\d[\d\s/¼½¾⅓⅔]*\s*(cup|tbsp|tsp|oz|lb|g|kg|ml|l|clove|slice|piece|can|pkg|package|bunch|head|stalk|sprig|pinch|dash|to taste|as needed)s?\b",
re.IGNORECASE,
)
_LEAD_NUMBER = re.compile(r"^\d[\d\s/¼½¾⅓⅔]*\s*")
_TRAILING_QUALIFIER = re.compile(
r"\s*(to taste|as needed|or more|or less|optional|if desired|if needed)\s*$",
re.IGNORECASE,
)
def _float_or_none(val: object) -> float | None:
try:
v = float(val) # type: ignore[arg-type]
return v if v > 0 else None
except (TypeError, ValueError):
return None
def _safe_list(val: object) -> list:
if val is None:
return []
if isinstance(val, float) and math.isnan(val):
return []
if isinstance(val, list):
return val
# Parquet often deserializes list columns as numpy arrays
try:
import numpy as np
if isinstance(val, np.ndarray):
return val.tolist()
except ImportError:
pass
return []
def _extract_ingredient_names(raw_list: list[str]) -> list[str]:
names = []
for raw in raw_list:
s = raw.lower().strip()
s = _MEASURE_PATTERN.sub("", s)
s = _LEAD_NUMBER.sub("", s)
s = re.sub(r"\(.*?\)", "", s)
s = re.sub(r",.*$", "", s)
s = _TRAILING_QUALIFIER.sub("", s)
s = s.strip(" -.,")
if s and len(s) > 1:
names.append(s)
return names
def _compute_element_coverage(profiles: list[dict]) -> dict[str, float]:
counts: dict[str, int] = {}
for p in profiles:
for elem in p.get("elements", []):
counts[elem] = counts.get(elem, 0) + 1
if not profiles:
return {}
return {e: round(c / len(profiles), 3) for e, c in counts.items()}
# ── Config ─────────────────────────────────────────────────────────────────────
DEFAULT_DB = Path("/Library/Assets/kiwi/kiwi.db")
DEFAULT_PARQUET = Path("/Library/Assets/kiwi/pipeline/recipes_purplecarrot_live.parquet")
# ── Ingest ─────────────────────────────────────────────────────────────────────
def ingest(db_path: Path, parquet_path: Path) -> None:
df = pd.read_parquet(parquet_path)
# Filter to rows with full recipe data
if "HasFullRecipe" in df.columns:
df = df[df["HasFullRecipe"] == True].copy()
if df.empty:
print("No full recipes found in parquet — nothing to ingest.")
return
print(f"Ingesting {len(df)} Purple Carrot recipes into {db_path}")
conn = sqlite3.connect(db_path)
try:
conn.execute("PRAGMA journal_mode=WAL")
# Pre-load ingredient element profiles for coverage calculation
profile_index: dict[str, list[str]] = {}
for row in conn.execute("SELECT name, elements FROM ingredient_profiles"):
try:
profile_index[row[0]] = json.loads(row[1])
except Exception:
pass
inserted = updated = 0
for _, row in df.iterrows():
slug = str(row.get("Slug", "")).strip()
if not slug:
continue
external_id = f"pc_{slug}"
title = str(row.get("Name", "")).strip()[:500]
if not title:
continue
raw_ingredients = [str(i) for i in _safe_list(row.get("RecipeIngredientParts", []))]
directions = [str(d) for d in _safe_list(row.get("RecipeInstructions", []))]
ingredient_names = _extract_ingredient_names(raw_ingredients)
profiles = [
{"elements": profile_index[n]}
for n in ingredient_names if n in profile_index
]
coverage = _compute_element_coverage(profiles)
# Keywords: merge scraped tags with allergen info
kw_raw = _safe_list(row.get("Keywords", []))
allergens = str(row.get("Allergens", "") or "")
if allergens:
kw_raw = list(kw_raw) + [f"allergen:{a.strip()}" for a in allergens.split(",") if a.strip()]
keywords_json = json.dumps(kw_raw)
# Check if already present (same external_id)
existing = conn.execute(
"SELECT id FROM recipes WHERE external_id = ?", (external_id,)
).fetchone()
params = (
title,
json.dumps(raw_ingredients),
json.dumps(ingredient_names),
json.dumps(directions),
"meal-kit", # category
keywords_json,
_float_or_none(row.get("Calories")),
_float_or_none(row.get("FatContent")),
_float_or_none(row.get("ProteinContent")),
None, # sodium_mg — not scraped
json.dumps(coverage),
None, # sugar_g — not scraped
_float_or_none(row.get("CarbohydrateContent")),
_float_or_none(row.get("FiberContent")),
2.0, # servings — PC meal kits are 2-serving by default
0, # nutrition_estimated — PC provides real data
)
if existing:
conn.execute("""
UPDATE recipes
SET title=?, ingredients=?, ingredient_names=?, directions=?,
category=?, keywords=?, calories=?, fat_g=?, protein_g=?,
sodium_mg=?, element_coverage=?,
sugar_g=?, carbs_g=?, fiber_g=?, servings=?, nutrition_estimated=?
WHERE external_id=?
""", params + (external_id,))
updated += 1
else:
conn.execute("""
INSERT INTO recipes
(external_id, source, title, ingredients, ingredient_names,
directions, category, keywords, calories, fat_g, protein_g,
sodium_mg, element_coverage,
sugar_g, carbs_g, fiber_g, servings, nutrition_estimated)
VALUES (?, 'purplecarrot', ?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)
""", (external_id,) + params)
inserted += 1
conn.commit()
finally:
conn.close()
print(f"Done — {inserted} inserted, {updated} updated")
# ── Main ───────────────────────────────────────────────────────────────────────
def main() -> None:
parser = argparse.ArgumentParser()
parser.add_argument("--db", type=Path, default=DEFAULT_DB)
parser.add_argument("--parquet", type=Path, default=DEFAULT_PARQUET)
args = parser.parse_args()
if not args.parquet.exists():
print(f"ERROR: parquet not found at {args.parquet}")
raise SystemExit(1)
ingest(args.db, args.parquet)
if __name__ == "__main__":
main()

View file

@ -1,68 +0,0 @@
"""
Pipeline logging utility.
Adds a structured JSON FileHandler to the root logger so every pipeline
script automatically writes machine-readable logs to the shared datastore
at /Library/Assets/logs/pipeline/. Avocet ingests these for Turnstone
logreading training (kiwi#141 / avocet#67).
Usage (add near the top of main() after logging.basicConfig):
from scripts.pipeline.log_utils import attach_pipeline_log
attach_pipeline_log("scrape_recipes")
"""
from __future__ import annotations
import json
import logging
import os
from datetime import datetime, timezone
from pathlib import Path
PIPELINE_LOG_DIR = Path(
os.environ.get("PIPELINE_LOG_DIR", "/Library/Assets/logs/pipeline")
)
class _JsonFormatter(logging.Formatter):
def format(self, record: logging.LogRecord) -> str:
payload: dict = {
"ts": datetime.fromtimestamp(record.created, tz=timezone.utc).isoformat(),
"level": record.levelname,
"logger": record.name,
"msg": record.getMessage(),
}
if record.exc_info:
payload["exc"] = self.formatException(record.exc_info)
# Any extra kwargs passed via logger.info("...", extra={...})
standard = {
"name", "msg", "args", "levelname", "levelno", "pathname",
"filename", "module", "exc_info", "exc_text", "stack_info",
"lineno", "funcName", "created", "msecs", "relativeCreated",
"thread", "threadName", "processName", "process", "message",
"taskName",
}
extra = {k: v for k, v in record.__dict__.items() if k not in standard}
if extra:
payload["extra"] = extra
return json.dumps(payload)
def attach_pipeline_log(script_name: str) -> Path:
"""Attach a JSON file handler to the root logger for pipeline logging.
Returns the path of the log file created.
"""
PIPELINE_LOG_DIR.mkdir(parents=True, exist_ok=True)
ts = datetime.now(tz=timezone.utc).strftime("%Y%m%dT%H%M%S")
log_path = PIPELINE_LOG_DIR / f"{script_name}_{ts}.jsonl"
handler = logging.FileHandler(log_path, encoding="utf-8")
handler.setLevel(logging.DEBUG)
handler.setFormatter(_JsonFormatter())
logging.getLogger().addHandler(handler)
logging.getLogger(__name__).info(
"Pipeline log: %s", log_path, extra={"script": script_name}
)
return log_path

View file

@ -1,120 +0,0 @@
"""Discover Purple Carrot's current weekly menu recipe slugs.
The main /plant-based-recipes listing page always renders the current week's
menu as server-side HTML. This script pulls those slugs and writes them to a
parquet that can be passed directly to scrape_live.py via --slugs-from.
Run weekly (e.g. via cron) to accumulate new recipes as the menu rotates.
Usage:
conda run -n cf python3 scripts/pipeline/purple_carrot/discover_current_menu.py \
[--out /Library/Assets/kiwi/pipeline/recipes_purplecarrot_menu.parquet]
Then scrape:
conda run -n cf python3 scripts/pipeline/purple_carrot/scrape_live.py \
--slugs-from /Library/Assets/kiwi/pipeline/recipes_purplecarrot_menu.parquet \
--out /Library/Assets/kiwi/pipeline/recipes_purplecarrot_live.parquet \
--resume
"""
from __future__ import annotations
import re
import sys
from datetime import date
from pathlib import Path
import pandas as pd
import requests
from bs4 import BeautifulSoup
# ── Config ─────────────────────────────────────────────────────────────────────
LISTING_URL = "https://www.purplecarrot.com/plant-based-recipes"
BASE_URL = "https://www.purplecarrot.com/recipe/{slug}"
DEFAULT_OUT = Path("/Library/Assets/kiwi/pipeline/recipes_purplecarrot_menu.parquet")
HEADERS = {
"User-Agent": (
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 "
"(KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36"
),
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
}
RECIPE_HREF_RE = re.compile(r"/recipe/([^?#]+)")
# ── Main ───────────────────────────────────────────────────────────────────────
def discover_current_slugs() -> list[str]:
"""Fetch the listing page and return unique recipe slugs from the current menu."""
resp = requests.get(LISTING_URL, headers=HEADERS, timeout=15)
if resp.status_code != 200:
print(f"ERROR: listing page returned HTTP {resp.status_code}", file=sys.stderr)
return []
soup = BeautifulSoup(resp.text, "html.parser")
slugs: list[str] = []
seen: set[str] = set()
for a in soup.find_all("a", href=RECIPE_HREF_RE):
m = RECIPE_HREF_RE.search(a["href"])
if m:
slug = m.group(1)
if slug not in seen:
seen.add(slug)
slugs.append(slug)
return slugs
def main() -> None:
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("--out", type=Path, default=DEFAULT_OUT)
args = parser.parse_args()
print(f"Fetching current menu from {LISTING_URL}")
slugs = discover_current_slugs()
if not slugs:
print("No slugs found — the listing page may have changed structure or blocked the request.")
sys.exit(1)
today = date.today().isoformat()
records = [
{
"Slug": slug,
"SourceURL": BASE_URL.format(slug=slug),
"Source": "purplecarrot_menu",
"DiscoveredDate": today,
}
for slug in slugs
]
# Merge with any existing menu parquet (accumulate weeks)
df_new = pd.DataFrame(records)
args.out.parent.mkdir(parents=True, exist_ok=True)
if args.out.exists():
df_prev = pd.read_parquet(args.out)
combined = pd.concat([df_prev, df_new], ignore_index=True)
combined = combined.drop_duplicates(subset=["Slug"], keep="first")
df_new = combined
df_new.to_parquet(args.out, index=False)
print(f"Found {len(slugs)} current-menu slugs this week:")
for s in slugs:
print(f" {s}")
print(f"\nSaved {len(df_new)} total slugs (accumulated) to {args.out}")
print(f"\nTo scrape full recipes:")
print(f" conda run -n cf python3 scripts/pipeline/purple_carrot/scrape_live.py \\")
print(f" --slugs-from {args.out} \\")
print(f" --out /Library/Assets/kiwi/pipeline/recipes_purplecarrot_live.parquet \\")
print(f" --resume")
if __name__ == "__main__":
main()

View file

@ -1,218 +0,0 @@
"""Discover Purple Carrot recipe slugs by crawling all recipe-category listing pages.
The site serves full server-rendered HTML for category pages, paginated via
?page=N. Each page loads 18 recipe cards. This script crawls every category
across all pages and writes a deduplicated slug inventory.
Usage:
conda run -n cf python3 scripts/pipeline/purple_carrot/discover_slugs_categories.py \
[--out /Library/Assets/kiwi/pipeline/recipes_purplecarrot_slugs.parquet] \
[--delay 2.0] \
[--max-pages 50] # safety cap per category (comfort-foods has ~18)
"""
from __future__ import annotations
import argparse
import re
import time
from pathlib import Path
from typing import Any
import pandas as pd
import requests
from bs4 import BeautifulSoup
# ── Config ─────────────────────────────────────────────────────────────────────
BASE = "https://www.purplecarrot.com"
# All known category slugs (from /plant-based-recipes nav)
CATEGORIES: list[str] = [
"comfort-foods",
"family-friendly",
"healthy-desserts",
"holiday-recipes",
"quick-and-easy",
"party-foods",
"seasonal-menu",
"spring-recipes",
"summer-recipes",
"fall-recipes",
"winter-recipes",
"african",
"american",
"asian",
"comfort",
"french",
"indian",
"italian",
"mediterranean",
"mexican",
"middle-eastern",
"soups",
"salads",
"bowls",
"pasta",
"sandwiches-wraps",
"tacos",
"breakfast",
"snacks-sides",
]
DEFAULT_OUT = Path("/Library/Assets/kiwi/pipeline/recipes_purplecarrot_slugs.parquet")
EXISTING_PARQUET = Path("/Library/Assets/kiwi/pipeline/recipes_purplecarrot.parquet")
RECIPE_LINK_SELECTOR = "a.c-recipe__title"
SLUG_RE = re.compile(r"/recipe/([^?#]+)")
HEADERS = {
"User-Agent": (
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 "
"(KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36"
),
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
}
# ── Helpers ────────────────────────────────────────────────────────────────────
def _fetch_html(url: str, session: requests.Session) -> str | None:
"""Fetch URL and return HTML string, or None on failure."""
try:
resp = session.get(url, headers=HEADERS, timeout=15)
if resp.status_code == 200:
return resp.text
if resp.status_code == 404:
return None # expected end of pagination
print(f" HTTP {resp.status_code}{url}")
return None
except Exception as exc:
print(f" ERROR fetching {url}: {exc}")
return None
def _extract_slugs(html: str) -> list[str]:
"""Pull recipe slugs from one listing-page HTML response."""
soup = BeautifulSoup(html, "html.parser")
slugs: list[str] = []
for a in soup.select(RECIPE_LINK_SELECTOR):
href = a.get("href", "")
m = SLUG_RE.search(href)
if m:
slugs.append(m.group(1))
return slugs
def _get_category_total(html: str) -> int | None:
"""Try to parse the recipe count shown on the category page (e.g. '319 Recipes')."""
m = re.search(r"(\d+)\s+Recipes?\b", html)
return int(m.group(1)) if m else None
def _discover_category(
category: str,
session: requests.Session,
delay: float,
max_pages: int,
) -> tuple[list[str], int]:
"""Crawl all pages of a category, return (slugs, pages_fetched)."""
slugs: list[str] = []
for page_num in range(1, max_pages + 1):
if page_num == 1:
url = f"{BASE}/recipe-categories/{category}"
else:
url = f"{BASE}/recipe-categories/{category}?page={page_num}"
html = _fetch_html(url, session)
if html is None:
break # 404 or error = past the end
page_slugs = _extract_slugs(html)
if not page_slugs:
# Show total if we got a page but no links (category slug may be wrong)
if page_num == 1:
total = _get_category_total(html)
if total is not None:
print(f" page 1 loaded (total={total}) but 0 recipe links — selector may need updating")
break
slugs.extend(page_slugs)
# Print progress
total_hint = _get_category_total(html) if page_num == 1 else None
total_str = f" / {total_hint}" if total_hint else ""
print(f" page {page_num}: +{len(page_slugs)} slugs ({len(slugs)}{total_str} cumulative)")
if len(page_slugs) < 18:
# Short page = last page
break
time.sleep(delay)
return slugs, (len(slugs) + 17) // 18 # approximate pages
# ── Main ───────────────────────────────────────────────────────────────────────
def main() -> None:
parser = argparse.ArgumentParser()
parser.add_argument("--out", type=Path, default=DEFAULT_OUT)
parser.add_argument("--delay", type=float, default=2.0,
help="Seconds between page requests")
parser.add_argument("--max-pages", type=int, default=50,
help="Safety cap on pages per category")
parser.add_argument("--categories", nargs="*",
help="Crawl only these category slugs (default: all)")
args = parser.parse_args()
categories = args.categories or CATEGORIES
# Seed with any slugs from the Wayback parquet
known_slugs: set[str] = set()
if EXISTING_PARQUET.exists():
df_wb = pd.read_parquet(EXISTING_PARQUET)
known_slugs = set(df_wb["Slug"].dropna().tolist())
print(f"Seeded with {len(known_slugs)} slugs from Wayback parquet")
all_records: list[dict[str, Any]] = []
session = requests.Session()
for category in categories:
print(f"\n[{category}]")
cat_slugs, pages = _discover_category(category, session, args.delay, args.max_pages)
for slug in cat_slugs:
all_records.append({"Slug": slug, "Category": category, "Source": "purplecarrot_category"})
print(f"{len(cat_slugs)} slugs across ~{pages} pages")
time.sleep(args.delay)
if not all_records:
print("\nNo records found — check that categories are correct and the site is accessible")
return
# Deduplicate keeping first category encountered
df_new = pd.DataFrame(all_records)
df_new = df_new.drop_duplicates(subset=["Slug"], keep="first")
# Also include Wayback slugs not already in the new set
if known_slugs:
wb_only = known_slugs - set(df_new["Slug"].tolist())
if wb_only:
df_wb_extra = pd.DataFrame([
{"Slug": s, "Category": "wayback", "Source": "purplecarrot_wayback"}
for s in wb_only
])
df_new = pd.concat([df_new, df_wb_extra], ignore_index=True)
args.out.parent.mkdir(parents=True, exist_ok=True)
df_new.to_parquet(args.out, index=False)
new_count = len(df_new)
cat_count = len(df_new[df_new["Source"] == "purplecarrot_category"])
print(f"\nDone — {new_count} total slugs saved to {args.out}")
print(f" {cat_count} from category pages, {new_count - cat_count} from Wayback only")
if __name__ == "__main__":
main()

View file

@ -1,301 +0,0 @@
"""
discover_wayback.py enumerate Purple Carrot recipe slugs via the Wayback Machine.
Strategy:
1. CDX API all archived /api/v2/menus/* URLs (multiple timestamps)
2. Replay fetch each menu's menuItems, extract productPath slugs
3. CDX API all archived /api/v1/products/* URLs (direct slug capture)
4. CDX API /recipe-categories/* HTML pages for older slugs
5. Deduplicate and write manifest to OUT_FILE
Output (JSONL, one record per recipe):
{"slug": "...", "title": "...", "subtitle": "...", "cook_time": "...",
"tags": [...], "serving_size": 2, "image_url": "...",
"wayback_ts": "20260412150557", "source": "menu|product_api|category_page"}
Usage:
conda run -n cf python -m scripts.pipeline.purple_carrot.discover_wayback
conda run -n cf python -m scripts.pipeline.purple_carrot.discover_wayback --out /Library/Assets/kiwi/pipeline/pc_slugs.jsonl
"""
from __future__ import annotations
import argparse
import json
import logging
import time
from pathlib import Path
from typing import Any
from urllib.parse import urlencode
import requests
logger = logging.getLogger(__name__)
CDX_BASE = "https://web.archive.org/cdx/search/cdx"
WB_BASE = "https://web.archive.org/web"
PC_HOST = "www.purplecarrot.com"
# Polite delay between Wayback replay fetches (seconds)
REPLAY_DELAY = 1.0
CDX_DELAY = 0.5
DEFAULT_OUT = Path("/Library/Assets/kiwi/pipeline/pc_slugs.jsonl")
# ── CDX helpers ───────────────────────────────────────────────────────────────
def cdx_query(url_pattern: str, **kwargs) -> list[dict]:
"""Run a CDX search and return a list of result dicts."""
params = {
"url": url_pattern,
"output": "json",
"fl": "original,timestamp,statuscode",
"collapse": "urlkey",
"filter": "statuscode:200",
**kwargs,
}
for attempt in range(3):
try:
resp = requests.get(CDX_BASE, params=params, timeout=30)
resp.raise_for_status()
rows = resp.json()
if not rows or len(rows) < 2:
return []
headers = rows[0]
return [dict(zip(headers, row)) for row in rows[1:]]
except Exception as exc:
logger.warning("CDX attempt %d failed: %s", attempt + 1, exc)
time.sleep(2 ** attempt)
return []
def wayback_get(url: str, timestamp: str) -> Any | None:
"""Fetch a Wayback replay of a URL and return parsed JSON (or None)."""
replay_url = f"{WB_BASE}/{timestamp}/{url}"
for attempt in range(3):
try:
resp = requests.get(replay_url, timeout=30)
if resp.status_code == 200:
return resp.json()
if resp.status_code == 404:
return None
except Exception as exc:
logger.warning("Wayback GET attempt %d failed for %s: %s", attempt + 1, url, exc)
time.sleep(2 ** attempt)
return None
# ── Slug extraction ───────────────────────────────────────────────────────────
def slug_from_product_path(path: str) -> str | None:
"""'/recipe/foo-bar-baz''foo-bar-baz'."""
if not path:
return None
return path.strip("/").split("/")[-1] or None
def _menu_item_to_record(item: dict, wayback_ts: str) -> dict | None:
slug = slug_from_product_path(item.get("productPath", ""))
if not slug:
return None
return {
"slug": slug,
"title": item.get("title", ""),
"subtitle": item.get("subtitle", ""),
"cook_time": item.get("cookTime", ""),
"tags": item.get("filterTags") or [],
"serving_size": item.get("servingSize"),
"image_url": item.get("imageURL", ""),
"description": item.get("description", ""),
"wayback_ts": wayback_ts,
"source": "menu",
}
# ── Discovery passes ──────────────────────────────────────────────────────────
def pass_menus(seen_slugs: set[str]) -> list[dict]:
"""Walk all archived /api/v2/menus/* captures to extract slugs."""
records: list[dict] = []
# Find all distinct archived menu URLs
menu_cdx = cdx_query(f"{PC_HOST}/api/v2/menus/*", limit="500")
logger.info("CDX: %d archived menu URLs found", len(menu_cdx))
time.sleep(CDX_DELAY)
processed_menu_ids: set[str] = set()
for entry in menu_cdx:
url = entry["original"]
ts = entry["timestamp"]
# Skip the listing endpoint, only process individual menus
if not url.split("?")[0].rstrip("/").split("/")[-1].isdigit():
continue
menu_id = url.split("?")[0].rstrip("/").split("/")[-1]
if menu_id in processed_menu_ids:
continue
processed_menu_ids.add(menu_id)
logger.info("Fetching menu %s (ts=%s) ...", menu_id, ts)
data = wayback_get(url.split("?")[0] + "?logged_out=true", ts)
time.sleep(REPLAY_DELAY)
if not data or "menuItems" not in data:
continue
for item in data["menuItems"]:
rec = _menu_item_to_record(item, ts)
if rec and rec["slug"] not in seen_slugs:
seen_slugs.add(rec["slug"])
records.append(rec)
logger.debug(" + %s", rec["slug"])
logger.info(" %d new slugs (total so far: %d)", len(records), len(seen_slugs))
return records
def pass_product_api(seen_slugs: set[str]) -> list[dict]:
"""Pick up any directly archived /api/v1/products/* URLs the menu pass missed."""
records: list[dict] = []
product_cdx = cdx_query(f"{PC_HOST}/api/v1/products/*", limit="5000")
logger.info("CDX: %d archived product API URLs found", len(product_cdx))
time.sleep(CDX_DELAY)
for entry in product_cdx:
slug = entry["original"].rstrip("/").split("/")[-1]
if not slug or slug in seen_slugs:
continue
seen_slugs.add(slug)
records.append({
"slug": slug,
"title": "",
"subtitle": "",
"cook_time": "",
"tags": [],
"serving_size": None,
"image_url": "",
"description": "",
"wayback_ts": entry["timestamp"],
"source": "product_api",
})
logger.info("product_api pass: %d new slugs", len(records))
return records
def pass_category_pages(seen_slugs: set[str]) -> list[dict]:
"""Parse archived recipe-categories HTML pages for slugs not in the API.
Category pages are rendered SSR/with inline JSON state on older captures,
so we do a simple regex scan for /recipe/<slug> patterns.
"""
import re
records: list[dict] = []
SLUG_RE = re.compile(r'["\s]/recipe/([a-z0-9][a-z0-9\-]{3,})["\s/?]')
cat_cdx = cdx_query(f"{PC_HOST}/recipe-categories/*", limit="200")
logger.info("CDX: %d archived category pages found", len(cat_cdx))
time.sleep(CDX_DELAY)
seen_category_urls: set[str] = set()
for entry in cat_cdx:
url = entry["original"].split("?")[0]
if url in seen_category_urls:
continue
seen_category_urls.add(url)
replay_url = f"{WB_BASE}/{entry['timestamp']}/{url}"
try:
resp = requests.get(replay_url, timeout=30)
time.sleep(REPLAY_DELAY)
if resp.status_code != 200:
continue
except Exception as exc:
logger.warning("Category page fetch failed: %s", exc)
continue
for slug in SLUG_RE.findall(resp.text):
if slug in seen_slugs:
continue
seen_slugs.add(slug)
records.append({
"slug": slug,
"title": "",
"subtitle": "",
"cook_time": "",
"tags": [],
"serving_size": None,
"image_url": "",
"description": "",
"wayback_ts": entry["timestamp"],
"source": "category_page",
})
logger.info("category_pages pass: %d new slugs", len(records))
return records
# ── Main ──────────────────────────────────────────────────────────────────────
def discover(out_file: Path) -> None:
seen: set[str] = set()
# Load previously discovered slugs so reruns are incremental
existing: list[dict] = []
if out_file.exists():
with open(out_file) as f:
for line in f:
line = line.strip()
if line:
rec = json.loads(line)
seen.add(rec["slug"])
existing.append(rec)
logger.info("Loaded %d existing slugs from %s", len(seen), out_file)
new_records: list[dict] = []
new_records += pass_menus(seen)
new_records += pass_product_api(seen)
new_records += pass_category_pages(seen)
out_file.parent.mkdir(parents=True, exist_ok=True)
with open(out_file, "a") as f:
for rec in new_records:
f.write(json.dumps(rec) + "\n")
total = len(existing) + len(new_records)
logger.info(
"Done. %d new slugs written to %s (%d total).",
len(new_records), out_file, total,
)
def main() -> None:
parser = argparse.ArgumentParser(description="Discover Purple Carrot recipe slugs via Wayback")
parser.add_argument(
"--out",
type=Path,
default=DEFAULT_OUT,
help=f"Output JSONL manifest (default: {DEFAULT_OUT})",
)
parser.add_argument("--debug", action="store_true")
args = parser.parse_args()
logging.basicConfig(
level=logging.DEBUG if args.debug else logging.INFO,
format="%(asctime)s %(levelname)s %(name)s: %(message)s",
)
from scripts.pipeline.log_utils import attach_pipeline_log
attach_pipeline_log("discover_wayback")
discover(args.out)
if __name__ == "__main__":
main()

View file

@ -1,250 +0,0 @@
"""Playwright scraper for live purplecarrot.com recipe pages.
Uses the slug inventory already in recipes_purplecarrot.parquet and fills in
the missing ingredients/instructions by hitting the live site directly.
Usage:
conda run -n cf python3 scripts/pipeline/purple_carrot/scrape_live.py \
[--out /Library/Assets/kiwi/pipeline/recipes_purplecarrot_live.parquet] \
[--delay 2.5] \
[--limit 20]
"""
from __future__ import annotations
import argparse
import json
import re
import time
from pathlib import Path
from typing import Any
import pandas as pd
from playwright.sync_api import sync_playwright, Page, TimeoutError as PWTimeout
# ── Config ─────────────────────────────────────────────────────────────────────
BASE_URL = "https://www.purplecarrot.com/recipe/{slug}"
DEFAULT_OUT = Path("/Library/Assets/kiwi/pipeline/recipes_purplecarrot_live.parquet")
EXISTING_PARQUET = Path("/Library/Assets/kiwi/pipeline/recipes_purplecarrot.parquet")
RENDER_WAIT_MS = 2500 # JS render settle time
NAV_TIMEOUT_MS = 20_000
# ── Page parser ────────────────────────────────────────────────────────────────
def _text(page: Page, selector: str) -> str:
el = page.query_selector(selector)
return el.inner_text().strip() if el else ""
def _texts(page: Page, selector: str) -> list[str]:
return [el.inner_text().strip() for el in page.query_selector_all(selector)]
def _parse_recipe(page: Page, slug: str, source_url: str) -> dict[str, Any] | None:
"""Extract structured recipe data from the rendered page."""
body = page.inner_text("body")
# Abort if we've been bounced to a generic listing / 404
if "Page Not Found" in body or slug not in page.url:
return None
# ── Title ──────────────────────────────────────────────────────────────────
# The <h1> on product pages tends to be the recipe name
title = (_text(page, "h1") or _text(page, "[class*='recipe-title']")).strip()
if not title:
# Fallback: first heading-like text before "Ingredients"
idx = body.find("Ingredients\n")
title = body[:idx].strip().splitlines()[-1] if idx > 0 else ""
# ── Ingredients / Instructions via body text ───────────────────────────────
ing_start = body.find("\nIngredients\n")
inst_start = body.find("\nInstructions\n")
footer_start = body.find("\nShop\n") # footer sentinel
if ing_start == -1:
return None # page didn't render recipe content
raw_ingredients: list[str] = []
raw_instructions: list[str] = []
if ing_start != -1 and inst_start != -1:
ing_block = body[ing_start + len("\nIngredients\n"):inst_start].strip()
raw_ingredients = [l.strip() for l in ing_block.splitlines() if l.strip()]
if inst_start != -1:
end = footer_start if footer_start > inst_start else len(body)
inst_block = body[inst_start + len("\nInstructions\n"):end].strip()
# Steps start with a digit
steps: list[str] = []
current: list[str] = []
for line in inst_block.splitlines():
line = line.strip()
if not line:
continue
if re.match(r"^\d+$", line):
if current:
steps.append(" ".join(current))
current = []
elif line.startswith("CULINARY NOTES"):
break
else:
current.append(line)
if current:
steps.append(" ".join(current))
raw_instructions = steps
# ── Nutrition ──────────────────────────────────────────────────────────────
def _extract_num(pattern: str) -> float | None:
m = re.search(pattern, body)
try:
return float(m.group(1)) if m else None
except ValueError:
return None
cal = _extract_num(r"(\d+)\s*CAL")
fat = _extract_num(r"(\d+(?:\.\d+)?)g\s*FAT")
carbs = _extract_num(r"(\d+(?:\.\d+)?)g\s*CARBS")
prot = _extract_num(r"(\d+(?:\.\d+)?)g\s*PROTEIN")
fiber = _extract_num(r"(\d+(?:\.\d+)?)g\s*FIBER")
# ── Allergens / tags ───────────────────────────────────────────────────────
allergen_m = re.search(r"Allergens?:\s*([^\n]+)", body)
allergens = allergen_m.group(1).strip() if allergen_m else ""
# Feature tags like HIGH-PROTEIN, QUICK, etc. appear before Ingredients
pre_ing = body[:ing_start]
tags = re.findall(r"\b(HIGH-PROTEIN|QUICK|SPICY|LOW[\-\s]CALORIE|VEGAN|FAMILY\s+FRIENDLY)\b", pre_ing)
return {
"Slug": slug,
"Name": title,
"SourceURL": source_url,
"Source": "purplecarrot_live",
"RecipeIngredientParts": raw_ingredients,
"RecipeInstructions": raw_instructions,
"Calories": cal,
"FatContent": fat,
"CarbohydrateContent": carbs,
"ProteinContent": prot,
"FiberContent": fiber,
"Allergens": allergens,
"Keywords": tags,
"HasFullRecipe": bool(raw_ingredients and raw_instructions),
}
# ── Main ───────────────────────────────────────────────────────────────────────
def main() -> None:
parser = argparse.ArgumentParser()
parser.add_argument("--out", type=Path, default=DEFAULT_OUT)
parser.add_argument("--delay", type=float, default=2.5,
help="Seconds between requests (be polite)")
parser.add_argument("--limit", type=int, default=0,
help="Stop after N slugs (0 = all)")
parser.add_argument("--resume", action="store_true",
help="Skip slugs already present in --out")
parser.add_argument("--slugs-from", type=Path, default=None,
help="Read slug inventory from this parquet instead of the default Wayback one")
args = parser.parse_args()
# Load slug inventory — either from a custom parquet or the default Wayback run
slugs_parquet = args.slugs_from if args.slugs_from else EXISTING_PARQUET
df_existing = pd.read_parquet(slugs_parquet)
slugs = df_existing["Slug"].dropna().unique().tolist()
# source_urls may not be present in custom parcets — fall back to constructing from slug
if "SourceURL" in df_existing.columns:
source_urls = dict(zip(df_existing["Slug"], df_existing["SourceURL"]))
else:
source_urls = {s: BASE_URL.format(slug=s) for s in slugs}
# Resume support
done_slugs: set[str] = set()
if args.resume and args.out.exists():
df_done = pd.read_parquet(args.out)
done_slugs = set(df_done["Slug"].dropna().tolist())
print(f"Resuming — {len(done_slugs)} slugs already scraped")
if args.limit:
slugs = slugs[: args.limit]
results: list[dict[str, Any]] = []
skipped = 0
failed = 0
_UA = (
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 "
"(KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36"
)
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
for i, slug in enumerate(slugs):
if slug in done_slugs:
skipped += 1
continue
url = BASE_URL.format(slug=slug)
print(f"[{i+1}/{len(slugs)}] {slug}", end="", flush=True)
# Use a fresh browser context per slug to avoid Cloudflare session-level
# bot detection, which fires on the 2nd+ request in the same context.
context = browser.new_context(
user_agent=_UA,
viewport={"width": 1280, "height": 900},
)
page = context.new_page()
try:
page.goto(url, timeout=NAV_TIMEOUT_MS, wait_until="domcontentloaded")
page.wait_for_timeout(RENDER_WAIT_MS)
recipe = _parse_recipe(page, slug, source_urls.get(slug, url))
except PWTimeout:
print("TIMEOUT")
failed += 1
except Exception as exc:
print(f"ERROR: {exc}")
failed += 1
else:
if recipe is None:
print("no content (404 or redirect)")
failed += 1
elif recipe["HasFullRecipe"]:
n = len(recipe["RecipeIngredientParts"])
s = len(recipe["RecipeInstructions"])
print(f"OK ({n} ingredients, {s} steps)")
results.append(recipe)
else:
print(f"partial (ings={len(recipe['RecipeIngredientParts'])}, steps={len(recipe['RecipeInstructions'])})")
results.append(recipe)
finally:
context.close()
time.sleep(args.delay)
browser.close()
print(f"\nDone — {len(results)} scraped, {skipped} skipped, {failed} failed")
if results:
df_out = pd.DataFrame(results)
# Merge with existing metadata (nutrition stubs, wayback fields) for slugs
# that didn't previously have full data
args.out.parent.mkdir(parents=True, exist_ok=True)
if args.resume and args.out.exists():
df_prev = pd.read_parquet(args.out)
df_out = pd.concat([df_prev, df_out], ignore_index=True)
df_out = df_out.drop_duplicates(subset=["Slug"], keep="last")
df_out.to_parquet(args.out, index=False)
full_count = df_out["HasFullRecipe"].sum() if "HasFullRecipe" in df_out.columns else "?"
print(f"Saved {len(df_out)} rows to {args.out} ({full_count} with full recipes)")
else:
print("No results — output not written")
if __name__ == "__main__":
main()

View file

@ -1,538 +0,0 @@
"""
scrape_recipes.py fetch full recipe data for slugs in pc_slugs.jsonl.
For each slug:
1. Try Wayback /api/v1/products/<slug> oldest capture first (pre-HelloFresh
acquisition data is more complete).
2. If instructions are empty, try the recipe HTML page via Wayback and parse
inline JSON state or structured markup.
3. Merge with metadata already in the manifest (title, tags, cook_time, etc.)
4. Emit one row per recipe to recipes_purplecarrot.parquet in food.com columnar
format so build_recipe_index.py can import it unchanged.
Output columns (food.com schema + PC extras ignored by the indexer):
RecipeId, Name, Subtitle, RecipeIngredientParts, RecipeInstructions,
RecipeCategory, Keywords, Calories, FatContent, ProteinContent,
SodiumContent, SugarContent, CarbohydrateContent, FiberContent,
RecipeServings, Description, ImageURL, CookTime, Slug, Source
Usage:
conda run -n cf python -m scripts.pipeline.purple_carrot.scrape_recipes
conda run -n cf python -m scripts.pipeline.purple_carrot.scrape_recipes \\
--slugs /Library/Assets/kiwi/pipeline/pc_slugs.jsonl \\
--out /Library/Assets/kiwi/pipeline/recipes_purplecarrot.parquet \\
--resume
"""
from __future__ import annotations
import argparse
import json
import logging
import re
import time
from pathlib import Path
from typing import Any
import requests
logger = logging.getLogger(__name__)
CDX_BASE = "https://web.archive.org/cdx/search/cdx"
WB_BASE = "https://web.archive.org/web"
PC_HOST = "www.purplecarrot.com"
REPLAY_DELAY = 2.0
CDX_DELAY = 3.0 # archive.org CDX rate-limits aggressively; be polite
DEFAULT_SLUGS = Path("/Library/Assets/kiwi/pipeline/pc_slugs.jsonl")
DEFAULT_OUT = Path("/Library/Assets/kiwi/pipeline/recipes_purplecarrot.parquet")
# Inline JSON state embedded by the SSR renderer — used as fallback HTML parser
_NEXT_DATA_RE = re.compile(r'<script id="__NEXT_DATA__"[^>]*>(.*?)</script>', re.DOTALL)
_REDUX_STATE_RE = re.compile(r'window\.__INITIAL_STATE__\s*=\s*(\{.*?\});\s*\n', re.DOTALL)
# ── Wayback helpers ───────────────────────────────────────────────────────────
def _cdx_get(params: dict) -> list:
"""CDX request with retry on 429/503 (archive.org rate-limits aggressively)."""
for attempt in range(4):
try:
resp = requests.get(CDX_BASE, params=params, timeout=25)
if resp.status_code in (429, 503):
wait = 15 * (2 ** attempt)
logger.debug("CDX %s — backing off %ds", resp.status_code, wait)
time.sleep(wait)
continue
resp.raise_for_status()
rows = resp.json()
return rows if rows else []
except Exception as exc:
logger.debug("CDX attempt %d failed: %s", attempt + 1, exc)
time.sleep(5 * (attempt + 1))
return []
def _cdx_timestamps(slug: str) -> list[str]:
"""Return captured timestamps for a product slug, oldest first (pre-2022 window)."""
rows = _cdx_get({
"url": f"{PC_HOST}/api/v1/products/{slug}",
"output": "json",
"fl": "timestamp,statuscode",
"filter": "statuscode:200",
"limit": "20",
# Pre-HelloFresh-acquisition captures (2019-2021) are most likely
# to have full instructions — API stripped them post-acquisition.
"from": "20190101",
"to": "20211231",
})
if len(rows) < 2:
return []
return [row[0] for row in rows[1:]] # timestamps only, oldest first
def _wayback_json(url: str, timestamp: str) -> Any | None:
replay = f"{WB_BASE}/{timestamp}/{url}"
for attempt in range(3):
try:
resp = requests.get(replay, timeout=30)
if resp.status_code == 200:
return resp.json()
if resp.status_code in (404, 410):
return None
except Exception as exc:
logger.debug("Wayback JSON attempt %d failed (%s): %s", attempt + 1, url, exc)
time.sleep(2 ** attempt)
return None
def _wayback_html(url: str, timestamp: str) -> str | None:
replay = f"{WB_BASE}/{timestamp}/{url}"
for attempt in range(3):
try:
resp = requests.get(replay, timeout=30)
if resp.status_code == 200:
return resp.text
if resp.status_code in (404, 410):
return None
except Exception as exc:
logger.debug("Wayback HTML attempt %d failed (%s): %s", attempt + 1, url, exc)
time.sleep(2 ** attempt)
return None
# ── Recipe extraction from API JSON ──────────────────────────────────────────
def _extract_from_api(data: dict) -> dict | None:
"""Parse a /api/v1/products/<slug> response into our recipe dict.
Returns None if the response has no usable content (empty title, etc.).
Returns a partial dict if only some fields are populated caller merges
with manifest metadata.
"""
if not data or not isinstance(data, dict):
return None
title = data.get("title", "").strip()
subtitle = data.get("subtitle", "").strip()
slug = data.get("slug", "")
skus = data.get("skus") or []
sku = skus[0] if skus else {}
# Instructions: list of {step_number, title, description}
raw_instructions = sku.get("instructions") or []
steps: list[str] = []
for step in sorted(raw_instructions, key=lambda s: s.get("step_number", 0)):
parts = []
if step.get("title"):
parts.append(step["title"])
if step.get("description"):
parts.append(step["description"])
if parts:
steps.append(". ".join(parts))
# Ingredients: may be in ingredients_quantity or ingredients
raw_ingr = sku.get("ingredients_quantity") or sku.get("ingredients") or []
ingredients: list[str] = []
for item in raw_ingr:
if isinstance(item, dict):
qty = item.get("quantity") or item.get("qty") or ""
unit = item.get("unit") or ""
name = item.get("name") or item.get("ingredient", {}).get("name", "") if isinstance(item.get("ingredient"), dict) else item.get("ingredient_name", "")
raw = item.get("raw") or item.get("display_name") or ""
line = raw or " ".join(filter(None, [str(qty), str(unit), str(name)])).strip()
if line:
ingredients.append(line)
elif isinstance(item, str) and item.strip():
ingredients.append(item.strip())
nutrition = sku.get("nutrition_label") or {}
calories = _num(nutrition.get("calories") or sku.get("calories"))
fat = _num(nutrition.get("total_fat") or sku.get("fat"))
protein = _num(nutrition.get("protein") or sku.get("protein"))
sodium = _num(nutrition.get("sodium") or sku.get("sodium"))
sugar = _num(nutrition.get("sugar") or nutrition.get("total_sugars"))
carbs = _num(nutrition.get("total_carbohydrate") or sku.get("carbs"))
fiber = _num(nutrition.get("dietary_fiber") or sku.get("fiber"))
tags = sku.get("tags") or data.get("tags") or []
category = sku.get("meal_type") or sku.get("product_type") or ""
servings = _num(sku.get("servings"))
cook_time = sku.get("prep_and_cook_time") or ""
description = sku.get("description") or ""
images = sku.get("hero_images") or sku.get("image_versions") or []
# hero_images can be a list OR a dict keyed by size string — normalise to list
if isinstance(images, dict):
images = list(images.values())
image_url = ""
if images and isinstance(images[0], dict):
image_url = images[0].get("image_url") or images[0].get("url") or ""
if not image_url and data.get("square_image"):
sq = data["square_image"]
image_url = sq.get("url") if isinstance(sq, dict) else ""
return {
"slug": slug,
"title": title,
"subtitle": subtitle,
"steps": steps,
"ingredients": ingredients,
"category": category,
"tags": tags,
"calories": calories,
"fat": fat,
"protein": protein,
"sodium": sodium,
"sugar": sugar,
"carbs": carbs,
"fiber": fiber,
"servings": servings,
"cook_time": cook_time,
"description": description,
"image_url": image_url,
"has_full_recipe": bool(steps and ingredients),
}
def _num(val: Any) -> float | None:
if val is None:
return None
try:
v = float(str(val).replace("g", "").replace("mg", "").split()[0])
return v if v > 0 else None
except Exception:
return None
# ── Fallback: HTML inline state parsing ──────────────────────────────────────
def _extract_from_html(html: str, slug: str) -> dict | None:
"""Try to pull recipe data from inline JS state in older SSR pages."""
# Attempt 1: Next.js __NEXT_DATA__
m = _NEXT_DATA_RE.search(html)
if m:
try:
state = json.loads(m.group(1))
# Walk the Next.js page props tree looking for recipe data
props = state.get("props", {}).get("pageProps", {})
recipe = props.get("recipe") or props.get("product")
if recipe and isinstance(recipe, dict) and recipe.get("title"):
return _extract_from_api(recipe)
except Exception:
pass
# Attempt 2: Redux __INITIAL_STATE__
m = _REDUX_STATE_RE.search(html)
if m:
try:
state = json.loads(m.group(1))
# Try common Redux state shapes
for key in ("recipe", "product", "currentRecipe", "currentProduct"):
recipe = state.get(key)
if recipe and isinstance(recipe, dict) and recipe.get("title"):
return _extract_from_api(recipe)
except Exception:
pass
# Attempt 3: JSON-LD structured data
ld_matches = re.findall(
r'<script[^>]+type=["\']application/ld\+json["\'][^>]*>(.*?)</script>',
html, re.DOTALL
)
for raw in ld_matches:
try:
ld = json.loads(raw)
if isinstance(ld, list):
ld = next((x for x in ld if x.get("@type") == "Recipe"), None)
if not ld or ld.get("@type") != "Recipe":
continue
steps = []
for inst in (ld.get("recipeInstructions") or []):
if isinstance(inst, dict):
steps.append(inst.get("text", ""))
elif isinstance(inst, str):
steps.append(inst)
ingredients = ld.get("recipeIngredient") or []
return {
"slug": slug,
"title": ld.get("name", ""),
"subtitle": "",
"steps": [s for s in steps if s],
"ingredients": [i for i in ingredients if i],
"category": ld.get("recipeCategory", ""),
"tags": ld.get("keywords", "").split(",") if isinstance(ld.get("keywords"), str) else [],
"calories": _num((ld.get("nutrition") or {}).get("calories")),
"fat": None, "protein": None, "sodium": None,
"sugar": None, "carbs": None, "fiber": None,
"servings": _num(ld.get("recipeYield")),
"cook_time": str(ld.get("totalTime") or ld.get("cookTime") or ""),
"description": ld.get("description", ""),
"image_url": (ld["image"][0] if isinstance(ld.get("image"), list) else ld.get("image", "")) or "",
"has_full_recipe": True,
}
except Exception:
pass
return None
# ── Per-slug fetch ─────────────────────────────────────────────────────────────
def fetch_recipe(slug: str, manifest_meta: dict) -> dict | None:
"""Fetch the fullest available recipe data for a slug from Wayback.
Returns a merged dict of manifest metadata + API/HTML-extracted content.
"""
api_url = f"https://{PC_HOST}/api/v1/products/{slug}"
html_url = f"https://{PC_HOST}/recipe/{slug}"
recipe: dict | None = None
# Try product API — oldest captures are most likely to have full data
timestamps = _cdx_timestamps(slug)
time.sleep(CDX_DELAY)
if not timestamps and manifest_meta.get("wayback_ts"):
timestamps = [manifest_meta["wayback_ts"]]
for ts in timestamps:
data = _wayback_json(api_url, ts)
time.sleep(REPLAY_DELAY)
if not data:
continue
candidate = _extract_from_api(data)
if not candidate:
continue
recipe = candidate
if recipe.get("has_full_recipe"):
logger.debug("[%s] Full recipe from API (ts=%s)", slug, ts)
break
logger.debug("[%s] Partial API data (ts=%s) — trying HTML fallback", slug, ts)
# HTML fallback when API has no steps/ingredients
if not recipe or not recipe.get("has_full_recipe"):
html_ts_rows = _cdx_get({
"url": f"{PC_HOST}/recipe/{slug}",
"output": "json",
"fl": "timestamp,statuscode",
"filter": "statuscode:200",
"limit": "10",
})
html_timestamps = [row[0] for row in html_ts_rows[1:]] if len(html_ts_rows) > 1 else []
time.sleep(CDX_DELAY)
for ts in html_timestamps:
html = _wayback_html(html_url, ts)
time.sleep(REPLAY_DELAY)
if not html:
continue
html_recipe = _extract_from_html(html, slug)
if html_recipe and html_recipe.get("has_full_recipe"):
logger.debug("[%s] Full recipe from HTML (ts=%s)", slug, ts)
recipe = html_recipe
break
# Build merged record: manifest metadata fills any gaps from API/HTML
merged: dict = {
"slug": slug,
"title": manifest_meta.get("title", ""),
"subtitle": manifest_meta.get("subtitle", ""),
"steps": [],
"ingredients": [],
"category": "",
"tags": manifest_meta.get("tags") or [],
"calories": None,
"fat": None,
"protein": None,
"sodium": None,
"sugar": None,
"carbs": None,
"fiber": None,
"servings": manifest_meta.get("serving_size"),
"cook_time": manifest_meta.get("cook_time", ""),
"description": manifest_meta.get("description", ""),
"image_url": manifest_meta.get("image_url", ""),
"source": "purple_carrot",
"wayback_ts": manifest_meta.get("wayback_ts", ""),
"has_full_recipe": False,
}
if recipe:
for key in recipe:
# Prefer API/HTML data; keep manifest value only when API field is empty
val = recipe[key]
if val or key not in merged or not merged[key]:
merged[key] = val
if not merged["title"]:
logger.warning("[%s] No title — skipping", slug)
return None
return merged
# ── Output formatting ─────────────────────────────────────────────────────────
def _to_dataframe_row(r: dict) -> dict:
"""Convert merged recipe dict to food.com-compatible parquet row."""
# Build plain-text input for allrecipes-style corpus compatibility
lines = [r["title"]]
if r.get("subtitle"):
lines.append(r["subtitle"])
if r.get("description"):
lines.append("")
lines.append(r["description"])
if r.get("ingredients"):
lines += ["", "Ingredients:"] + [f"- {i}" for i in r["ingredients"]]
if r.get("steps"):
lines += ["", "Directions:"] + [f"- {s}" for s in r["steps"]]
plain_text = "\n".join(lines)
source_url = f"https://www.purplecarrot.com/recipe/{r['slug']}"
return {
# food.com schema columns (used by build_recipe_index.py)
"RecipeId": f"pc_{r['slug']}",
"Name": r["title"],
"RecipeIngredientParts": r.get("ingredients") or [],
"RecipeInstructions": r.get("steps") or [],
"RecipeCategory": r.get("category", ""),
"Keywords": r.get("tags") or [],
"Calories": r.get("calories"),
"FatContent": r.get("fat"),
"ProteinContent": r.get("protein"),
"SodiumContent": r.get("sodium"),
"SugarContent": r.get("sugar"),
"CarbohydrateContent": r.get("carbs"),
"FiberContent": r.get("fiber"),
"RecipeServings": r.get("servings"),
# PC-specific extras (ignored by indexer, used by training pipeline)
"Subtitle": r.get("subtitle", ""),
"Description": r.get("description", ""),
"ImageURL": r.get("image_url", ""),
"CookTime": r.get("cook_time", ""),
"Slug": r["slug"],
"Source": "purple_carrot",
"SourceURL": source_url, # canonical attribution link shown in recipe UI
"HasFullRecipe": r.get("has_full_recipe", False),
"WaybackTs": r.get("wayback_ts", ""),
# Also emit plain-text input for allrecipes-compatible corpus search
"input": plain_text,
}
# ── Main ──────────────────────────────────────────────────────────────────────
def scrape(slugs_file: Path, out_file: Path, resume: bool = True) -> None:
import pandas as pd
# Load manifest
if not slugs_file.exists():
logger.error("Slugs manifest not found: %s", slugs_file)
return
manifest: dict[str, dict] = {}
with open(slugs_file) as f:
for line in f:
line = line.strip()
if line:
rec = json.loads(line)
slug = rec["slug"]
# Keep the richest metadata if slug appears from multiple sources
if slug not in manifest or rec.get("source") == "menu":
manifest[slug] = rec
logger.info("Manifest: %d unique slugs", len(manifest))
# Load already-scraped slugs for resume
done_slugs: set[str] = set()
existing_rows: list[dict] = []
if resume and out_file.exists():
try:
existing_df = pd.read_parquet(out_file)
done_slugs = set(existing_df["Slug"].tolist())
existing_rows = existing_df.to_dict("records")
logger.info("Resume: %d already scraped", len(done_slugs))
except Exception as exc:
logger.warning("Could not load existing parquet for resume: %s", exc)
todo = [s for s in manifest if s not in done_slugs]
logger.info("%d slugs to fetch", len(todo))
rows = list(existing_rows)
for i, slug in enumerate(todo, 1):
logger.info("[%d/%d] %s", i, len(todo), slug)
recipe = fetch_recipe(slug, manifest[slug])
if recipe:
rows.append(_to_dataframe_row(recipe))
status = "full" if recipe.get("has_full_recipe") else "partial"
logger.info(" -> %s (%s)", recipe.get("title", "?"), status)
else:
logger.warning(" -> skipped (no title)")
# Write checkpoint every 50 recipes
if i % 50 == 0:
_write_parquet(rows, out_file)
logger.info("Checkpoint: %d recipes written", len(rows))
_write_parquet(rows, out_file)
full = sum(1 for r in rows if r.get("HasFullRecipe"))
logger.info(
"Done. %d recipes written to %s (%d full, %d partial).",
len(rows), out_file, full, len(rows) - full,
)
def _write_parquet(rows: list[dict], out_file: Path) -> None:
import pandas as pd
out_file.parent.mkdir(parents=True, exist_ok=True)
pd.DataFrame(rows).to_parquet(out_file, index=False)
def main() -> None:
parser = argparse.ArgumentParser(description="Scrape Purple Carrot recipes from Wayback")
parser.add_argument("--slugs", type=Path, default=DEFAULT_SLUGS)
parser.add_argument("--out", type=Path, default=DEFAULT_OUT)
parser.add_argument(
"--no-resume", dest="resume", action="store_false",
help="Start fresh (ignore existing parquet)",
)
parser.add_argument("--debug", action="store_true")
args = parser.parse_args()
logging.basicConfig(
level=logging.DEBUG if args.debug else logging.INFO,
format="%(asctime)s %(levelname)s %(name)s: %(message)s",
)
from scripts.pipeline.log_utils import attach_pipeline_log
attach_pipeline_log("scrape_recipes")
scrape(args.slugs, args.out, resume=args.resume)
if __name__ == "__main__":
main()

View file

@ -1,41 +0,0 @@
#!/usr/bin/env bash
# Weekly Purple Carrot recipe harvest
# Runs every Sunday night via cron.
# Discovers this week's menu and scrapes full recipe data.
# Logs to /Library/Assets/kiwi/pipeline/logs/purple_carrot_harvest.log
set -euo pipefail
REPO="/Library/Development/CircuitForge/kiwi"
MENU_OUT="/Library/Assets/kiwi/pipeline/recipes_purplecarrot_menu.parquet"
LIVE_OUT="/Library/Assets/kiwi/pipeline/recipes_purplecarrot_live.parquet"
LOG_DIR="/Library/Assets/kiwi/pipeline/logs"
LOG="$LOG_DIR/purple_carrot_harvest.log"
mkdir -p "$LOG_DIR"
echo "=== Purple Carrot harvest $(date -u '+%Y-%m-%d %H:%M UTC') ===" >> "$LOG"
cd "$REPO"
# Step 1: discover this week's menu slugs
echo "[1/2] Discovering current menu slugs..." | tee -a "$LOG"
conda run -n cf python3 scripts/pipeline/purple_carrot/discover_current_menu.py \
--out "$MENU_OUT" 2>&1 | tee -a "$LOG"
# Step 2: scrape full recipe data for new slugs only (--resume skips already-scraped)
echo "[2/2] Scraping live recipe pages..." | tee -a "$LOG"
conda run -n cf python3 scripts/pipeline/purple_carrot/scrape_live.py \
--slugs-from "$MENU_OUT" \
--out "$LIVE_OUT" \
--resume \
--delay 3.0 2>&1 | tee -a "$LOG"
# Step 3: ingest new recipes into the shared corpus DB
echo "[3/3] Ingesting into corpus DB..." | tee -a "$LOG"
conda run -n cf python3 scripts/pipeline/ingest_purplecarrot.py \
--parquet "$LIVE_OUT" \
--db /Library/Assets/kiwi/kiwi.db 2>&1 | tee -a "$LOG"
echo "=== Done $(date -u '+%Y-%m-%d %H:%M UTC') ===" >> "$LOG"
echo "" >> "$LOG"

View file

@ -38,8 +38,7 @@ class TestBrowseTimeEffortFields:
row["active_min"] = None
row["passive_min"] = None
# "Chop onion." triggers the chop prep action (base 2.0 min) → active_min >= 1
assert row["active_min"] > 0
assert row["active_min"] == 0 # no active time found
assert row["passive_min"] == 20
def test_null_when_directions_empty(self):
@ -116,12 +115,10 @@ class TestDetailTimeEffortField:
],
}
# "Gather all ingredients." → active default (2 min); "Sear for 5 min" → 5 min
assert time_effort_dict["active_min"] == 7
assert time_effort_dict["active_min"] == 5
assert time_effort_dict["passive_min"] == 20
assert time_effort_dict["total_min"] == 27
# 27 min total → moderate (21-45 min range)
assert time_effort_dict["effort_label"] == "moderate"
assert time_effort_dict["total_min"] == 25
assert time_effort_dict["effort_label"] == "quick" # 3 steps
assert isinstance(time_effort_dict["equipment"], list)
assert len(time_effort_dict["step_analyses"]) == 3
assert time_effort_dict["step_analyses"][2]["is_passive"] is True

View file

@ -1,304 +0,0 @@
"""API tests for recipe scan endpoints (kiwi#9).
VLM calls are mocked at the service level -- no GPU or API key needed.
"""
from __future__ import annotations
import io
import json
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
from fastapi.testclient import TestClient
from app.main import app
from app.cloud_session import get_session
from app.db.session import get_store
client = TestClient(app)
_GOOD_SCAN_JSON = {
"title": "Green Goddess Bowls",
"subtitle": "with Broccoli & Ranch Dressing",
"servings": "2",
"cook_time": "15 min",
"source_note": "Purple Carrot",
"ingredients": [
{"name": "brown rice", "qty": "1/2", "unit": "cup", "raw": "1/2 cup brown rice"},
{"name": "broccoli florets", "qty": "8", "unit": "oz", "raw": "8 oz broccoli florets"},
{"name": "avocado", "qty": "1", "unit": None, "raw": "1 avocado"},
],
"steps": ["Cook rice.", "Steam broccoli.", "Assemble bowls."],
"notes": None,
"confidence": "high",
"warnings": [],
}
def _make_session(tier: str = "paid", has_byok: bool = False) -> MagicMock:
mock = MagicMock()
mock.tier = tier
mock.has_byok = has_byok
mock.db = ":memory:"
return mock
def _make_store() -> MagicMock:
mock = MagicMock()
mock.list_inventory.return_value = [
{"product_name": "brown rice"},
{"product_name": "avocado"},
]
mock.create_user_recipe.return_value = {
"id": 1,
"title": "Green Goddess Bowls",
"subtitle": "with Broccoli & Ranch Dressing",
"servings": "2",
"cook_time": "15 min",
"source_note": "Purple Carrot",
"ingredients": _GOOD_SCAN_JSON["ingredients"],
"steps": _GOOD_SCAN_JSON["steps"],
"notes": None,
"tags": [],
"source": "scan",
"pantry_match_pct": None,
"created_at": "2026-04-27T00:00:00",
}
mock.list_user_recipes.return_value = []
mock.get_user_recipe.return_value = None
mock.delete_user_recipe.return_value = False
return mock
def _fake_image() -> bytes:
return b"\xff\xd8\xff\xe0" + b"\x00" * 100 # minimal JPEG magic
@pytest.fixture(autouse=True)
def override_deps():
session_mock = _make_session()
store_mock = _make_store()
app.dependency_overrides[get_session] = lambda: session_mock
app.dependency_overrides[get_store] = lambda: store_mock
yield session_mock, store_mock
app.dependency_overrides.clear()
# ── POST /recipes/scan ─────────────────────────────────────────────────────────
def _make_scan_result(title: str = "Green Goddess Bowls"):
"""Create a fake ScannedRecipeResult for tests."""
from app.services.recipe.recipe_scanner import ScannedIngredient, ScannedRecipeResult
return ScannedRecipeResult(
title=title,
subtitle="with Broccoli & Ranch Dressing",
servings="2",
cook_time="15 min",
source_note="Purple Carrot",
ingredients=[
ScannedIngredient("brown rice", "1/2", "cup", in_pantry=True),
ScannedIngredient("broccoli florets", "8", "oz"),
ScannedIngredient("avocado", "1", None, in_pantry=True),
],
steps=["Cook rice.", "Steam broccoli.", "Assemble bowls."],
notes=None,
tags=[],
pantry_match_pct=67,
confidence="high",
warnings=[],
)
@pytest.fixture
def mock_scan_infra(tmp_path):
"""Patch file-saving and VLM calls so scan endpoint tests don't need disk or GPU."""
fake_path = tmp_path / "recipe.jpg"
fake_path.write_bytes(_fake_image())
async def _fake_save(upload_file):
return fake_path
with patch("app.api.endpoints.recipe_scan._save_upload_temp", side_effect=_fake_save):
with patch("app.api.endpoints.recipe_scan.asyncio.to_thread") as mock_thread:
yield mock_thread, fake_path
class TestScanEndpoint:
def test_scan_returns_200(self, override_deps, mock_scan_infra):
"""Happy path: paid tier, valid JPEG, VLM returns good JSON."""
_, store_mock = override_deps
mock_thread, _ = mock_scan_infra
scan_result = _make_scan_result()
call_count = 0
def side_effect(fn, *args, **kwargs):
nonlocal call_count
call_count += 1
return store_mock.list_inventory() if call_count == 1 else scan_result
mock_thread.side_effect = side_effect
resp = client.post(
"/api/v1/recipes/scan",
files=[("files", ("recipe.jpg", _fake_image(), "image/jpeg"))],
)
assert resp.status_code == 200
data = resp.json()
assert data["title"] == "Green Goddess Bowls"
assert data["confidence"] == "high"
assert data["pantry_match_pct"] == 67
assert len(data["ingredients"]) == 3
def test_scan_requires_paid_tier(self, override_deps):
"""Free tier without BYOK should get 403."""
session_mock, _ = override_deps
session_mock.tier = "free"
session_mock.has_byok = False
resp = client.post(
"/api/v1/recipes/scan",
files=[("files", ("recipe.jpg", _fake_image(), "image/jpeg"))],
)
assert resp.status_code == 403
def test_scan_byok_free_tier_allowed(self, override_deps, mock_scan_infra):
"""Free tier WITH BYOK should be allowed through the tier gate."""
session_mock, store_mock = override_deps
session_mock.tier = "free"
session_mock.has_byok = True
mock_thread, _ = mock_scan_infra
scan_result = _make_scan_result("Simple Bowl")
call_count = 0
def _side(fn, *a, **kw):
nonlocal call_count
call_count += 1
return store_mock.list_inventory() if call_count == 1 else scan_result
mock_thread.side_effect = _side
resp = client.post(
"/api/v1/recipes/scan",
files=[("files", ("recipe.jpg", _fake_image(), "image/jpeg"))],
)
assert resp.status_code == 200
def test_scan_no_files_rejected(self, override_deps):
"""Missing files field returns 422."""
resp = client.post("/api/v1/recipes/scan", files=[])
assert resp.status_code in (422, 400)
def test_scan_too_many_files(self, override_deps, mock_scan_infra):
"""More than 4 files should return 422."""
mock_thread, _ = mock_scan_infra
mock_thread.return_value = []
files = [("files", (f"p{i}.jpg", _fake_image(), "image/jpeg")) for i in range(5)]
resp = client.post("/api/v1/recipes/scan", files=files)
assert resp.status_code == 422
def test_scan_not_a_recipe_returns_422(self, override_deps, mock_scan_infra):
_, store_mock = override_deps
mock_thread, _ = mock_scan_infra
call_count = 0
def _side(fn, *a, **kw):
nonlocal call_count
call_count += 1
if call_count == 1:
return store_mock.list_inventory()
raise ValueError("not_a_recipe: image does not appear to contain a recipe")
mock_thread.side_effect = _side
resp = client.post(
"/api/v1/recipes/scan",
files=[("files", ("photo.jpg", _fake_image(), "image/jpeg"))],
)
assert resp.status_code == 422
assert "recipe" in resp.json()["detail"].lower()
def test_scan_backend_unavailable_returns_503(self, override_deps, mock_scan_infra):
_, store_mock = override_deps
mock_thread, _ = mock_scan_infra
call_count = 0
def _side(fn, *a, **kw):
nonlocal call_count
call_count += 1
if call_count == 1:
return store_mock.list_inventory()
raise RuntimeError("No vision backend configured")
mock_thread.side_effect = _side
resp = client.post(
"/api/v1/recipes/scan",
files=[("files", ("photo.jpg", _fake_image(), "image/jpeg"))],
)
assert resp.status_code == 503
# ── POST /recipes/scan/save ────────────────────────────────────────────────────
class TestSaveEndpoint:
def test_save_returns_201(self, override_deps):
_, store_mock = override_deps
store_mock.create_user_recipe.return_value = {
"id": 42,
"title": "Green Goddess Bowls",
"subtitle": None,
"servings": "2",
"cook_time": "15 min",
"source_note": None,
"ingredients": [{"name": "brown rice", "qty": "1", "unit": "cup", "raw": None, "in_pantry": False}],
"steps": ["Cook it."],
"notes": None,
"tags": [],
"source": "scan",
"pantry_match_pct": None,
"created_at": "2026-04-27T00:00:00",
}
payload = {
"title": "Green Goddess Bowls",
"servings": "2",
"cook_time": "15 min",
"ingredients": [{"name": "brown rice", "qty": "1", "unit": "cup"}],
"steps": ["Cook it."],
"source": "scan",
}
resp = client.post("/api/v1/recipes/scan/save", json=payload)
assert resp.status_code == 201
data = resp.json()
assert data["id"] == 42
assert data["title"] == "Green Goddess Bowls"
def test_save_missing_title_rejected(self, override_deps):
payload = {
"ingredients": [{"name": "eggs", "qty": "2"}],
"steps": ["Scramble."],
}
resp = client.post("/api/v1/recipes/scan/save", json=payload)
assert resp.status_code == 422
# ── GET /recipes/user ──────────────────────────────────────────────────────────
class TestUserRecipeEndpoints:
def test_list_empty(self, override_deps):
_, store_mock = override_deps
store_mock.list_user_recipes.return_value = []
resp = client.get("/api/v1/recipes/user")
assert resp.status_code == 200
assert resp.json() == []
def test_get_not_found(self, override_deps):
_, store_mock = override_deps
store_mock.get_user_recipe.return_value = None
resp = client.get("/api/v1/recipes/user/999")
assert resp.status_code == 404
def test_delete_not_found(self, override_deps):
_, store_mock = override_deps
store_mock.delete_user_recipe.return_value = False
resp = client.delete("/api/v1/recipes/user/999")
assert resp.status_code == 404

Binary file not shown.

Before

Width:  |  Height:  |  Size: 2.2 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 2.4 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 2.8 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 2.9 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 2.7 MiB

View file

@ -1,107 +0,0 @@
#!/usr/bin/env python3
"""
Prompt validation harness for recipe scanner (kiwi#9).
Runs the draft extraction prompt against fixture images using the Anthropic API
directly (bypasses llm.yaml for prompt dev only, not production path).
Usage:
python extract_test.py <image1.jpg> [image2.jpg]
"""
import base64
import io
import json
import os
import sys
from pathlib import Path
from PIL import Image, ImageOps
import anthropic
PROMPT = """
You are extracting a recipe from a photograph of a recipe card, cookbook page, or handwritten note.
If two images are provided, treat them as a single recipe across two pages (e.g. ingredients on page 1, directions on page 2).
Return a single JSON object with these fields:
- title: recipe name (string)
- subtitle: any secondary title or serving suggestion e.g. "with Broccoli & Ranch Dressing" (string or null)
- servings: serving size if shown, as a string e.g. "2", "4-6" (string or null)
- cook_time: total cook time if shown, e.g. "15 min", "1 hour" (string or null)
- source_note: any attribution text like "From Betty Crocker" or "Purple Carrot" (string or null)
- ingredients: array of ingredient objects, each with:
- name: normalized generic ingredient name, lowercase, no quantities, no brand names
(e.g. "Follow Your Heart® Vegan Ranch" "ranch dressing")
- qty: quantity as a string, preserving fractions e.g. "1/2", "¼" (string or null)
- unit: unit of measure, null for countable items (e.g. "3 eggs" unit: null)
- raw: the original ingredient line verbatim, exactly as it appears
- steps: ordered array of instruction strings, one distinct step per element
- notes: any tips, substitutions, storage instructions, or variations (string or null)
- confidence: "high" if text is clear and complete, "medium" if some parts are uncertain,
"low" if mostly handwritten or significantly degraded
- warnings: array of strings describing anything the user should double-check
(e.g. "Directions appear to continue on another page not shown")
Return only valid JSON. No markdown fences. No explanation outside the JSON.
If the image does not appear to be a recipe at all, return: {"error": "not_a_recipe"}
""".strip()
def load_image_b64(path: Path) -> str:
"""Load image, apply EXIF rotation, return base64-encoded JPEG."""
with open(path, "rb") as f:
img = Image.open(io.BytesIO(f.read()))
img = ImageOps.exif_transpose(img) # fix phone rotation
img = img.convert("RGB")
buf = io.BytesIO()
img.save(buf, format="JPEG", quality=90)
return base64.b64encode(buf.getvalue()).decode()
def extract(image_paths: list[Path]) -> dict:
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
content = []
for i, path in enumerate(image_paths):
if i > 0:
content.append({"type": "text", "text": f"(Page {i + 1} of the same recipe:)"})
content.append({
"type": "image",
"source": {
"type": "base64",
"media_type": "image/jpeg",
"data": load_image_b64(path),
},
})
content.append({"type": "text", "text": PROMPT})
msg = client.messages.create(
model="claude-opus-4-6", # best vision for prompt dev; production uses VisionRouter
max_tokens=2048,
messages=[{"role": "user", "content": content}],
)
raw = msg.content[0].text.strip()
# Strip markdown fences if the model adds them anyway
if raw.startswith("```"):
raw = raw.split("```")[1]
if raw.startswith("json"):
raw = raw[4:]
return json.loads(raw)
if __name__ == "__main__":
paths = [Path(p) for p in sys.argv[1:]]
if not paths:
print("Usage: python extract_test.py <image1.jpg> [image2.jpg]")
sys.exit(1)
for p in paths:
if not p.exists():
print(f"File not found: {p}")
sys.exit(1)
print(f"Extracting from: {[p.name for p in paths]}")
print("Applying EXIF rotation + sending to claude-opus-4-6...\n")
result = extract(paths)
print(json.dumps(result, indent=2, ensure_ascii=False))

View file

@ -1,233 +0,0 @@
"""Unit tests for the recipe scanner service.
VLM calls are mocked these tests cover JSON parsing, pantry cross-reference,
error handling, and result normalization. No GPU required.
"""
from __future__ import annotations
import json
from pathlib import Path
from unittest.mock import MagicMock, patch
import pytest
from app.services.recipe.recipe_scanner import (
RecipeScanner,
ScannedIngredient,
ScannedRecipeResult,
_cross_reference_pantry,
_parse_scanner_json,
_normalize_ingredient_name,
)
# ── Fixtures ──────────────────────────────────────────────────────────────────
GOOD_JSON = {
"title": "Green Goddess Bowls",
"subtitle": "with Broccoli & Ranch Dressing",
"servings": "2",
"cook_time": "15 min",
"source_note": "Purple Carrot",
"ingredients": [
{"name": "brown rice", "qty": "1/2", "unit": "cup", "raw": "1/2 cup brown rice"},
{"name": "broccoli florets", "qty": "8", "unit": "oz", "raw": "8 oz broccoli florets"},
{"name": "avocado", "qty": "1", "unit": None, "raw": "1 avocado"},
{"name": "ranch dressing", "qty": "2", "unit": "tbsp", "raw": "2 tbsp Follow Your Heart Ranch"},
{"name": "pumpkin seeds", "qty": "1", "unit": "tbsp", "raw": "1 tbsp pumpkin seeds"},
],
"steps": [
"Cook rice according to package directions.",
"Steam broccoli for 5 minutes until tender.",
"Slice avocado. Assemble bowls and top with ranch.",
],
"notes": "Great leftover — keeps 3 days in the fridge.",
"confidence": "high",
"warnings": [],
}
def _fake_image_path(tmp_path: Path, name: str = "recipe.jpg") -> Path:
"""Create a tiny placeholder file so path-existence checks pass."""
p = tmp_path / name
p.write_bytes(b"\xff\xd8\xff") # minimal JPEG magic bytes
return p
# ── _normalize_ingredient_name ─────────────────────────────────────────────────
class TestNormalizeIngredientName:
def test_lowercases(self):
assert _normalize_ingredient_name("Brown Rice") == "brown rice"
def test_strips_whitespace(self):
assert _normalize_ingredient_name(" avocado ") == "avocado"
def test_removes_plural_s(self):
# For matching purposes only — "pumpkin seeds" stays as-is (stop at spaces)
assert _normalize_ingredient_name("avocados") == "avocados"
def test_passthrough(self):
assert _normalize_ingredient_name("ranch dressing") == "ranch dressing"
# ── _parse_scanner_json ───────────────────────────────────────────────────────
class TestParseScannerJson:
def test_parses_good_json(self):
result = _parse_scanner_json(json.dumps(GOOD_JSON))
assert result["title"] == "Green Goddess Bowls"
assert len(result["ingredients"]) == 5
def test_strips_markdown_fences(self):
wrapped = f"```json\n{json.dumps(GOOD_JSON)}\n```"
result = _parse_scanner_json(wrapped)
assert result["title"] == "Green Goddess Bowls"
def test_not_a_recipe_error(self):
with pytest.raises(ValueError, match="not_a_recipe"):
_parse_scanner_json(json.dumps({"error": "not_a_recipe"}))
def test_missing_title_returns_none_title(self):
data = dict(GOOD_JSON)
data.pop("title")
result = _parse_scanner_json(json.dumps(data))
assert result.get("title") is None
def test_malformed_json_raises(self):
with pytest.raises(ValueError, match="parse"):
_parse_scanner_json("this is not json at all")
def test_json_inside_prose(self):
"""Model sometimes adds leading text before the JSON object."""
text = f"Here is the extracted recipe:\n{json.dumps(GOOD_JSON)}"
result = _parse_scanner_json(text)
assert result["title"] == "Green Goddess Bowls"
# ── _cross_reference_pantry ───────────────────────────────────────────────────
class TestCrossReferencePantry:
PANTRY = ["brown rice", "pumpkin seeds", "olive oil", "broccoli"]
def test_marks_exact_match(self):
ingr = [
ScannedIngredient(name="brown rice", qty="1/2", unit="cup"),
ScannedIngredient(name="avocado", qty="1", unit=None),
]
result, pct = _cross_reference_pantry(ingr, self.PANTRY)
assert result[0].in_pantry is True
assert result[1].in_pantry is False
assert pct == 50
def test_partial_word_match(self):
"""'broccoli florets' should match pantry item 'broccoli'."""
ingr = [ScannedIngredient(name="broccoli florets", qty="8", unit="oz")]
result, pct = _cross_reference_pantry(ingr, self.PANTRY)
assert result[0].in_pantry is True
assert pct == 100
def test_empty_pantry_all_false(self):
ingr = [ScannedIngredient(name="broccoli", qty="1", unit=None)]
result, pct = _cross_reference_pantry(ingr, [])
assert result[0].in_pantry is False
assert pct == 0
def test_empty_ingredients_zero_pct(self):
_, pct = _cross_reference_pantry([], self.PANTRY)
assert pct == 0
def test_case_insensitive_match(self):
ingr = [ScannedIngredient(name="Brown Rice", qty="1", unit="cup")]
result, pct = _cross_reference_pantry(ingr, self.PANTRY)
assert result[0].in_pantry is True
# ── RecipeScanner ─────────────────────────────────────────────────────────────
class TestRecipeScanner:
def _make_scanner(self) -> RecipeScanner:
return RecipeScanner()
@patch("app.services.recipe.recipe_scanner._call_vision_backend")
def test_scan_single_image_success(self, mock_call, tmp_path):
mock_call.return_value = json.dumps(GOOD_JSON)
img = _fake_image_path(tmp_path)
scanner = self._make_scanner()
result = scanner.scan([img], pantry_names=["brown rice", "avocado"])
assert isinstance(result, ScannedRecipeResult)
assert result.title == "Green Goddess Bowls"
assert result.servings == "2"
assert result.cook_time == "15 min"
assert len(result.ingredients) == 5
assert result.confidence == "high"
assert result.pantry_match_pct == 40 # 2 of 5 in pantry
@patch("app.services.recipe.recipe_scanner._call_vision_backend")
def test_scan_multi_image(self, mock_call, tmp_path):
"""Two photos treated as one recipe — both passed to VLM."""
mock_call.return_value = json.dumps(GOOD_JSON)
img1 = _fake_image_path(tmp_path, "p1.jpg")
img2 = _fake_image_path(tmp_path, "p2.jpg")
scanner = self._make_scanner()
result = scanner.scan([img1, img2])
# Both images passed through
call_args = mock_call.call_args
assert len(call_args[0][0]) == 2 # image_paths list has 2 items
assert result.title == "Green Goddess Bowls"
@patch("app.services.recipe.recipe_scanner._call_vision_backend")
def test_scan_not_a_recipe_raises(self, mock_call, tmp_path):
mock_call.return_value = json.dumps({"error": "not_a_recipe"})
img = _fake_image_path(tmp_path)
scanner = self._make_scanner()
with pytest.raises(ValueError, match="not_a_recipe"):
scanner.scan([img])
@patch("app.services.recipe.recipe_scanner._call_vision_backend")
def test_warnings_propagated(self, mock_call, tmp_path):
data = dict(GOOD_JSON)
data["warnings"] = ["Directions appear to continue on another page not shown"]
mock_call.return_value = json.dumps(data)
img = _fake_image_path(tmp_path)
scanner = self._make_scanner()
result = scanner.scan([img])
assert len(result.warnings) == 1
assert "another page" in result.warnings[0]
@patch("app.services.recipe.recipe_scanner._call_vision_backend")
def test_scan_no_pantry_names(self, mock_call, tmp_path):
mock_call.return_value = json.dumps(GOOD_JSON)
img = _fake_image_path(tmp_path)
scanner = self._make_scanner()
result = scanner.scan([img])
# No pantry passed — all in_pantry=False, pct=0
assert result.pantry_match_pct == 0
assert all(not i.in_pantry for i in result.ingredients)
def test_scan_too_many_images_raises(self, tmp_path):
imgs = [_fake_image_path(tmp_path, f"p{i}.jpg") for i in range(5)]
scanner = self._make_scanner()
with pytest.raises(ValueError, match="4 images"):
scanner.scan(imgs)
def test_scan_no_images_raises(self):
scanner = self._make_scanner()
with pytest.raises(ValueError, match="least one"):
scanner.scan([])
@patch("app.services.recipe.recipe_scanner._call_vision_backend")
def test_backend_unavailable_raises(self, mock_call, tmp_path):
mock_call.side_effect = RuntimeError("No vision backend configured")
img = _fake_image_path(tmp_path)
scanner = self._make_scanner()
with pytest.raises(RuntimeError, match="No vision backend"):
scanner.scan([img])

View file

@ -1,127 +0,0 @@
"""Tests for task-based routing added to get_meal_plan_router()."""
from __future__ import annotations
from unittest.mock import MagicMock
import pytest
def _make_task_ctx(url: str = "http://node:8080") -> MagicMock:
"""Mock context manager returned by task_allocate()."""
alloc = MagicMock()
alloc.url = url
alloc.allocation_id = "alloc-task-1"
alloc.service = "cf-text"
ctx = MagicMock()
ctx.__enter__ = MagicMock(return_value=alloc)
ctx.__exit__ = MagicMock(return_value=False)
return ctx
def _make_task_ctx_not_registered() -> MagicMock:
"""Mock context manager that raises TaskNotRegistered on enter."""
from app.services.task_inference import TaskNotRegistered
ctx = MagicMock()
ctx.__enter__ = MagicMock(side_effect=TaskNotRegistered("not registered"))
ctx.__exit__ = MagicMock(return_value=False)
return ctx
def _make_direct_alloc_ctx(url: str = "http://node:8080") -> MagicMock:
"""Mock context manager returned by CFOrchClient.allocate()."""
alloc = MagicMock()
alloc.url = url
ctx = MagicMock()
ctx.__enter__ = MagicMock(return_value=alloc)
ctx.__exit__ = MagicMock(return_value=False)
return ctx
def test_task_path_returns_orch_router_on_success(monkeypatch):
"""get_meal_plan_router() returns _OrchTextRouter when task allocation succeeds."""
monkeypatch.setenv("CF_ORCH_URL", "http://coord:7700")
import unittest.mock as um
# Patch the name as it exists in llm_router's own namespace (module-level import).
with um.patch("app.services.meal_plan.llm_router.task_allocate",
return_value=_make_task_ctx(url="http://node:9001")):
from app.services.meal_plan.llm_router import get_meal_plan_router, _OrchTextRouter
router, ctx = get_meal_plan_router()
assert isinstance(router, _OrchTextRouter)
assert router._base_url == "http://node:9001"
def test_task_not_registered_falls_back_to_direct_allocate(monkeypatch):
"""get_meal_plan_router() falls back to direct cf-text allocation on TaskNotRegistered."""
monkeypatch.setenv("CF_ORCH_URL", "http://coord:7700")
direct_ctx = _make_direct_alloc_ctx(url="http://node:9002")
import unittest.mock as um
# Patch task_allocate in llm_router's namespace so TaskNotRegistered is raised.
with um.patch("app.services.meal_plan.llm_router.task_allocate",
return_value=_make_task_ctx_not_registered()), \
um.patch("app.services.meal_plan.llm_router.CFOrchClient") as MockClient:
MockClient.return_value.allocate.return_value = direct_ctx
from app.services.meal_plan.llm_router import get_meal_plan_router, _OrchTextRouter
router, ctx = get_meal_plan_router()
assert isinstance(router, _OrchTextRouter)
assert router._base_url == "http://node:9002"
def test_no_cf_orch_url_returns_llm_router(monkeypatch):
"""get_meal_plan_router() returns LLMRouter when CF_ORCH_URL is not set."""
monkeypatch.delenv("CF_ORCH_URL", raising=False)
import unittest.mock as um
mock_lr = MagicMock()
with um.patch("app.services.meal_plan.llm_router.LLMRouter", return_value=mock_lr):
from app.services.meal_plan.llm_router import get_meal_plan_router
router, ctx = get_meal_plan_router()
assert router is mock_lr
def test_tier1_general_exception_falls_back_to_direct_allocate(monkeypatch):
"""get_meal_plan_router() falls back to direct allocation when task_allocate raises RuntimeError."""
monkeypatch.setenv("CF_ORCH_URL", "http://coord:7700")
direct_ctx = _make_direct_alloc_ctx(url="http://node:9003")
import unittest.mock as um
failing_ctx = MagicMock()
failing_ctx.__enter__ = MagicMock(side_effect=RuntimeError("coordinator down"))
failing_ctx.__exit__ = MagicMock(return_value=False)
with um.patch("app.services.meal_plan.llm_router.task_allocate",
return_value=failing_ctx), \
um.patch("app.services.meal_plan.llm_router.CFOrchClient") as MockClient:
MockClient.return_value.allocate.return_value = direct_ctx
from app.services.meal_plan.llm_router import get_meal_plan_router, _OrchTextRouter
router, ctx = get_meal_plan_router()
assert isinstance(router, _OrchTextRouter)
assert router._base_url == "http://node:9003"
def test_tier2_none_alloc_releases_ctx_and_falls_through(monkeypatch):
"""get_meal_plan_router() releases Tier 2 ctx and falls through when alloc is None."""
monkeypatch.setenv("CF_ORCH_URL", "http://coord:7700")
import unittest.mock as um
none_alloc_ctx = MagicMock()
none_alloc_ctx.__enter__ = MagicMock(return_value=None)
none_alloc_ctx.__exit__ = MagicMock(return_value=False)
mock_lr = MagicMock()
with um.patch("app.services.meal_plan.llm_router.task_allocate",
return_value=_make_task_ctx_not_registered()), \
um.patch("app.services.meal_plan.llm_router.CFOrchClient") as MockClient, \
um.patch("app.services.meal_plan.llm_router.LLMRouter", return_value=mock_lr):
MockClient.return_value.allocate.return_value = none_alloc_ctx
from app.services.meal_plan.llm_router import get_meal_plan_router
router, ctx = get_meal_plan_router()
assert router is mock_lr
none_alloc_ctx.__exit__.assert_called_once_with(None, None, None)

View file

@ -1,164 +0,0 @@
"""Tests for app/services/task_inference.py"""
from __future__ import annotations
from unittest.mock import MagicMock, patch
import pytest
def _ok_resp(url: str = "http://node:8080", allocation_id: str = "alloc-123") -> MagicMock:
m = MagicMock()
m.status_code = 200
m.is_success = True
m.json.return_value = {
"url": url,
"allocation_id": allocation_id,
"gpu_id": 0,
"started": True,
"warm": False,
}
return m
def _err_resp(status_code: int, text: str = "error") -> MagicMock:
m = MagicMock()
m.status_code = status_code
m.is_success = False
m.text = text
return m
def test_task_allocate_yields_allocation_on_200(monkeypatch):
"""task_allocate() yields Allocation with url, allocation_id, service on 200."""
monkeypatch.setenv("CF_ORCH_URL", "http://coord:7700")
with patch("app.services.task_inference.httpx.post", return_value=_ok_resp()) as mock_post, \
patch("app.services.task_inference.httpx.delete") as mock_del:
from app.services.task_inference import task_allocate
with task_allocate("kiwi", "meal_plan", service_hint="cf-text") as alloc:
assert alloc.url == "http://node:8080"
assert alloc.allocation_id == "alloc-123"
assert alloc.service == "cf-text"
called_url = mock_post.call_args[0][0]
assert called_url == "http://coord:7700/api/inference/task"
mock_del.assert_called_once()
def test_task_allocate_uses_service_from_response_when_present(monkeypatch):
"""task_allocate() uses service from response dict over service_hint when available."""
monkeypatch.setenv("CF_ORCH_URL", "http://coord:7700")
resp = _ok_resp()
resp.json.return_value["service"] = "cf-vision"
with patch("app.services.task_inference.httpx.post", return_value=resp), \
patch("app.services.task_inference.httpx.delete"):
from app.services.task_inference import task_allocate
with task_allocate("kiwi", "ocr", service_hint="cf-docuvision") as alloc:
assert alloc.service == "cf-vision"
def test_task_allocate_404_raises_task_not_registered(monkeypatch):
"""task_allocate() raises TaskNotRegistered on coordinator 404."""
monkeypatch.setenv("CF_ORCH_URL", "http://coord:7700")
with patch("app.services.task_inference.httpx.post", return_value=_err_resp(404)):
from app.services.task_inference import task_allocate, TaskNotRegistered
with pytest.raises(TaskNotRegistered):
with task_allocate("kiwi", "meal_plan", service_hint="cf-text"):
pass
def test_task_allocate_503_raises_runtime_error(monkeypatch):
"""task_allocate() raises RuntimeError on non-404 coordinator errors."""
monkeypatch.setenv("CF_ORCH_URL", "http://coord:7700")
with patch("app.services.task_inference.httpx.post", return_value=_err_resp(503, "no GPU")):
from app.services.task_inference import task_allocate
with pytest.raises(RuntimeError, match="HTTP 503"):
with task_allocate("kiwi", "meal_plan", service_hint="cf-text"):
pass
def test_task_allocate_release_called_on_clean_exit(monkeypatch):
"""task_allocate() DELETEs the allocation on clean context exit."""
monkeypatch.setenv("CF_ORCH_URL", "http://coord:7700")
with patch("app.services.task_inference.httpx.post", return_value=_ok_resp(allocation_id="xyz")), \
patch("app.services.task_inference.httpx.delete") as mock_del:
from app.services.task_inference import task_allocate
with task_allocate("kiwi", "meal_plan", service_hint="cf-text"):
pass
release_url = mock_del.call_args[0][0]
assert "cf-text" in release_url
assert "xyz" in release_url
def test_task_allocate_release_called_when_inner_block_raises(monkeypatch):
"""task_allocate() DELETEs the allocation even when the inner block raises."""
monkeypatch.setenv("CF_ORCH_URL", "http://coord:7700")
with patch("app.services.task_inference.httpx.post", return_value=_ok_resp(allocation_id="abc")), \
patch("app.services.task_inference.httpx.delete") as mock_del:
from app.services.task_inference import task_allocate
with pytest.raises(ValueError):
with task_allocate("kiwi", "meal_plan", service_hint="cf-text"):
raise ValueError("inner error")
mock_del.assert_called_once()
def test_task_allocate_release_failure_is_swallowed(monkeypatch):
"""task_allocate() does not propagate DELETE failures."""
import httpx as _httpx
monkeypatch.setenv("CF_ORCH_URL", "http://coord:7700")
with patch("app.services.task_inference.httpx.post", return_value=_ok_resp()), \
patch("app.services.task_inference.httpx.delete",
side_effect=_httpx.RequestError("gone", request=MagicMock())):
from app.services.task_inference import task_allocate
with task_allocate("kiwi", "meal_plan", service_hint="cf-text") as alloc:
assert alloc.url == "http://node:8080"
# no exception raised
def test_task_allocate_no_orch_url_raises_runtime_error(monkeypatch):
"""task_allocate() raises RuntimeError when CF_ORCH_URL is not set."""
monkeypatch.delenv("CF_ORCH_URL", raising=False)
from app.services.task_inference import task_allocate
with pytest.raises(RuntimeError, match="CF_ORCH_URL"):
with task_allocate("kiwi", "meal_plan", service_hint="cf-text"):
pass
def test_task_allocate_network_error_raises_runtime_error(monkeypatch):
"""task_allocate() wraps httpx.RequestError in RuntimeError."""
import httpx as _httpx
monkeypatch.setenv("CF_ORCH_URL", "http://coord:7700")
with patch("app.services.task_inference.httpx.post",
side_effect=_httpx.RequestError("timeout", request=MagicMock())):
from app.services.task_inference import task_allocate
with pytest.raises(RuntimeError, match="unreachable"):
with task_allocate("kiwi", "meal_plan", service_hint="cf-text"):
pass
def test_task_allocate_malformed_json_raises_runtime_error(monkeypatch):
"""task_allocate() raises RuntimeError when coordinator returns non-JSON on 200."""
monkeypatch.setenv("CF_ORCH_URL", "http://coord:7700")
bad_resp = MagicMock()
bad_resp.status_code = 200
bad_resp.is_success = True
bad_resp.text = "<html>proxy error</html>"
bad_resp.json.side_effect = ValueError("not json")
with patch("app.services.task_inference.httpx.post", return_value=bad_resp):
from app.services.task_inference import task_allocate
with pytest.raises(RuntimeError, match="malformed"):
with task_allocate("kiwi", "meal_plan", service_hint="cf-text"):
pass
def test_task_allocate_missing_url_field_raises_runtime_error(monkeypatch):
"""task_allocate() raises RuntimeError when coordinator response is missing url field."""
monkeypatch.setenv("CF_ORCH_URL", "http://coord:7700")
bad_resp = MagicMock()
bad_resp.status_code = 200
bad_resp.is_success = True
bad_resp.text = '{"allocation_id": "x"}'
bad_resp.json.return_value = {"allocation_id": "x"} # missing "url"
with patch("app.services.task_inference.httpx.post", return_value=bad_resp):
from app.services.task_inference import task_allocate
with pytest.raises(RuntimeError, match="malformed"):
with task_allocate("kiwi", "meal_plan", service_hint="cf-text"):
pass

View file

@ -1,88 +0,0 @@
"""Tests for task-based routing added to _try_docuvision()."""
from __future__ import annotations
from unittest.mock import MagicMock, patch
import pytest
def _mock_doc_result(text: str = "RECEIPT TEXT") -> MagicMock:
r = MagicMock()
r.text = text
return r
def _make_task_ctx(url: str = "http://node:9010") -> MagicMock:
alloc = MagicMock()
alloc.url = url
alloc.allocation_id = "alloc-vis-1"
alloc.service = "cf-docuvision"
ctx = MagicMock()
ctx.__enter__ = MagicMock(return_value=alloc)
ctx.__exit__ = MagicMock(return_value=False)
return ctx
def _make_task_not_registered() -> MagicMock:
from app.services.task_inference import TaskNotRegistered
ctx = MagicMock()
ctx.__enter__ = MagicMock(side_effect=TaskNotRegistered("not registered"))
ctx.__exit__ = MagicMock(return_value=False)
return ctx
def _make_direct_alloc(url: str = "http://node:9011") -> MagicMock:
alloc = MagicMock()
alloc.url = url
ctx = MagicMock()
ctx.__enter__ = MagicMock(return_value=alloc)
ctx.__exit__ = MagicMock(return_value=False)
return ctx
def test_try_docuvision_task_path_returns_text(monkeypatch, tmp_path):
"""_try_docuvision() uses task allocation and returns extracted text on success."""
monkeypatch.setenv("CF_ORCH_URL", "http://coord:7700")
fake_image = tmp_path / "receipt.jpg"
fake_image.write_bytes(b"fake")
with patch("app.services.task_inference.task_allocate",
return_value=_make_task_ctx(url="http://node:9010")), \
patch("app.services.ocr.docuvision_client.DocuvisionClient") as MockDoc:
MockDoc.return_value.extract_text.return_value = _mock_doc_result("STORE $12.34")
from app.services.ocr.vl_model import _try_docuvision
result = _try_docuvision(str(fake_image))
assert result == "STORE $12.34"
MockDoc.assert_called_once_with("http://node:9010")
def test_try_docuvision_falls_back_to_direct_on_task_not_registered(monkeypatch, tmp_path):
"""_try_docuvision() falls back to direct cf-docuvision allocation on TaskNotRegistered."""
monkeypatch.setenv("CF_ORCH_URL", "http://coord:7700")
fake_image = tmp_path / "receipt.jpg"
fake_image.write_bytes(b"fake")
with patch("app.services.task_inference.task_allocate",
return_value=_make_task_not_registered()), \
patch("circuitforge_orch.client.CFOrchClient") as MockClient, \
patch("app.services.ocr.docuvision_client.DocuvisionClient") as MockDoc:
MockClient.return_value.allocate.return_value = _make_direct_alloc("http://node:9011")
MockDoc.return_value.extract_text.return_value = _mock_doc_result("FALLBACK TEXT")
from app.services.ocr.vl_model import _try_docuvision
result = _try_docuvision(str(fake_image))
assert result == "FALLBACK TEXT"
MockDoc.assert_called_once_with("http://node:9011")
def test_try_docuvision_returns_none_without_cf_orch_url(monkeypatch, tmp_path):
"""_try_docuvision() returns None immediately when CF_ORCH_URL is not set."""
monkeypatch.delenv("CF_ORCH_URL", raising=False)
fake_image = tmp_path / "receipt.jpg"
fake_image.write_bytes(b"fake")
from app.services.ocr.vl_model import _try_docuvision
result = _try_docuvision(str(fake_image))
assert result is None

View file

@ -17,17 +17,12 @@ from app.services.ocr.docuvision_client import DocuvisionClient, DocuvisionResul
def test_extract_text_sends_base64_image(tmp_path: Path) -> None:
"""extract_text() POSTs image_b64 and returns parsed raw_text."""
"""extract_text() POSTs a base64-encoded image and returns parsed text."""
image_file = tmp_path / "test.jpg"
image_file.write_bytes(b"fake-image-bytes")
mock_response = MagicMock()
mock_response.json.return_value = {
"raw_text": "Cheerios",
"elements": [],
"tables": [],
"metadata": {"hint": "text", "confidence": 0.95},
}
mock_response.json.return_value = {"text": "Cheerios", "confidence": 0.95}
mock_response.raise_for_status.return_value = None
with patch("httpx.Client") as mock_client_cls:
@ -46,8 +41,7 @@ def test_extract_text_sends_base64_image(tmp_path: Path) -> None:
assert call_kwargs[0][0] == "http://docuvision:8080/extract"
posted_json = call_kwargs[1]["json"]
expected_b64 = base64.b64encode(b"fake-image-bytes").decode()
assert posted_json["image_b64"] == expected_b64
assert posted_json["hint"] == "text"
assert posted_json["image"] == expected_b64
def test_extract_text_raises_on_http_error(tmp_path: Path) -> None:

View file

@ -95,15 +95,14 @@ class TestTimeExtraction:
class TestTimeTotals:
def test_active_passive_split(self):
steps = [
"Chop onions finely.", # active; chop action → 2 min prep
"Sear chicken for 5 minutes per side.", # active, 5 min explicit
"Simmer for 20 minutes.", # passive, 20 min explicit
"Chop onions finely.", # active, no time
"Sear chicken for 5 minutes per side.", # active, 5 min
"Simmer for 20 minutes.", # passive, 20 min
]
result = parse_time_effort(steps)
# "Chop onions" now contributes prep_min (chop base=2.0) + 5 explicit = 7 active
assert result.active_min == 7
assert result.active_min == 5
assert result.passive_min == 20
assert result.total_min == 27
assert result.total_min == 25
def test_all_active_passive_zero(self):
steps = ["Dice vegetables.", "Season with salt.", "Plate and serve."]
@ -131,28 +130,16 @@ class TestEffortLabel:
result = parse_time_effort(["a", "b", "c"])
assert result.effort_label == "quick"
def test_bake_recipe_is_moderate(self):
# Passive default for "bake" = 30 min → moderate (21-45 min range)
result = parse_time_effort([
"Mix dry ingredients.",
"Combine wet ingredients.",
"Fold together until just combined.",
"Bake until a toothpick comes out clean.",
])
def test_four_steps_is_moderate(self):
result = parse_time_effort(["a", "b", "c", "d"])
assert result.effort_label == "moderate"
def test_slow_cook_recipe_is_involved(self):
# Passive default for "slow cook" = 300 min → involved (>45 min)
result = parse_time_effort([
"Brown the meat in batches.",
"Add vegetables and broth.",
"Slow cook until tender.",
])
assert result.effort_label == "involved"
def test_seven_steps_is_moderate(self):
result = parse_time_effort(["a"] * 7)
assert result.effort_label == "moderate"
def test_explicit_time_drives_effort_label(self):
# Explicit passive time of 90 min → involved
result = parse_time_effort(["Braise for 90 minutes."])
def test_eight_steps_is_involved(self):
result = parse_time_effort(["a"] * 8)
assert result.effort_label == "involved"