Compare commits

...

No commits in common. "0336ef834c8608033d13c357a12483b73952dcbe" and "c418d04c305ae49c12b1ca863369f5822f69696e" have entirely different histories.

165 changed files with 23242 additions and 2 deletions

55
.env.example Normal file
View file

@ -0,0 +1,55 @@
# Kiwi — environment variables
# Copy to .env and fill in values
# API
API_PREFIX=/api/v1
CORS_ORIGINS=http://localhost:5173,http://localhost:8509
# Storage
DATA_DIR=./data
# Database (defaults to DATA_DIR/kiwi.db)
# DB_PATH=./data/kiwi.db
# Pipeline data directory for downloaded parquets (used by download_datasets.py)
# Override to store large datasets on a separate drive or NAS
# KIWI_PIPELINE_DATA_DIR=./data/pipeline
# CF-core resource coordinator (VRAM lease management)
# Set to the coordinator URL when running alongside cf-core orchestration
# COORDINATOR_URL=http://localhost:7700
# IP this machine advertises to the coordinator (must be reachable from coordinator host)
# CF_ORCH_ADVERTISE_HOST=10.1.10.71
# Processing
USE_GPU=true
GPU_MEMORY_LIMIT=6144
MAX_CONCURRENT_JOBS=4
MIN_QUALITY_SCORE=50.0
# Feature flags
ENABLE_OCR=false
# Runtime
DEBUG=false
CLOUD_MODE=false
DEMO_MODE=false
# Cloud mode (set in compose.cloud.yml; also set here for reference)
# CLOUD_DATA_ROOT=/devl/kiwi-cloud-data
# KIWI_DB=data/kiwi.db # local-mode DB path override
# DEV ONLY: bypass JWT auth for these IPs/CIDRs (LAN testing without Caddy in the path).
# NEVER set in production.
# IMPORTANT: Docker port mapping NATs source IPs to the bridge gateway. When hitting
# localhost:8515 (host → Docker → nginx → API), nginx sees 192.168.80.1, not 127.0.0.1.
# Include the Docker bridge CIDR to allow localhost and LAN access through nginx.
# Run: docker network inspect kiwi-cloud_kiwi-cloud-net | grep Subnet
# Example: CLOUD_AUTH_BYPASS_IPS=10.1.10.0/24,127.0.0.1,::1,192.168.80.0/20
# CLOUD_AUTH_BYPASS_IPS=
# Heimdall license server (required for cloud tier resolution)
# HEIMDALL_URL=https://license.circuitforge.tech
# HEIMDALL_ADMIN_TOKEN=
# Directus JWT (must match cf-directus SECRET env var)
# DIRECTUS_JWT_SECRET=

24
.gitignore vendored Normal file
View file

@ -0,0 +1,24 @@
# Superpowers brainstorming artifacts
.superpowers/
# Git worktrees
.worktrees/
# Python bytecode
__pycache__/
*.pyc
*.pyo
# Environment files (keep .env.example)
.env
# Node modules
node_modules/
dist/
# Data directories
data/
# Test artifacts (MagicMock sqlite files from pytest)
<MagicMock*

26
Dockerfile Normal file
View file

@ -0,0 +1,26 @@
FROM continuumio/miniconda3:latest
WORKDIR /app
# Install system dependencies for OpenCV + pyzbar
RUN apt-get update && apt-get install -y --no-install-recommends \
libzbar0 libgl1 libglib2.0-0 \
&& rm -rf /var/lib/apt/lists/*
# Install circuitforge-core from sibling directory (compose sets context: ..)
COPY circuitforge-core/ ./circuitforge-core/
RUN conda run -n base pip install --no-cache-dir -e ./circuitforge-core
# Create kiwi conda env and install app
COPY kiwi/environment.yml .
RUN conda env create -f environment.yml
COPY kiwi/ ./kiwi/
# Install cf-core into the kiwi env BEFORE installing kiwi (kiwi lists it as a dep)
RUN conda run -n kiwi pip install --no-cache-dir -e /app/circuitforge-core
WORKDIR /app/kiwi
RUN conda run -n kiwi pip install --no-cache-dir -e .
EXPOSE 8512
CMD ["conda", "run", "--no-capture-output", "-n", "kiwi", \
"uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8512"]

7
PRIVACY.md Normal file
View file

@ -0,0 +1,7 @@
# Privacy Policy
CircuitForge LLC's privacy policy applies to this product and is published at:
**<https://circuitforge.tech/privacy>**
Last reviewed: March 2026.

View file

@ -1,3 +1,66 @@
# kiwi
# 🥝 Kiwi
Kiwi by Circuit Forge LLC — Pantry tracking + leftover recipe suggestions; barcode/receipt OCR
> *Part of the CircuitForge LLC "AI for the tasks the system made hard on purpose" suite.*
**Pantry tracking and leftover recipe suggestions.**
Scan barcodes, photograph receipts, and get recipe ideas based on what you already have — before it expires.
**Status:** Pre-alpha · CircuitForge LLC
---
## What it does
- **Inventory tracking** — add items by barcode scan, receipt upload, or manually
- **Expiry alerts** — know what's about to go bad
- **Receipt OCR** — extract line items from receipt photos automatically (Paid tier)
- **Recipe suggestions** — LLM-powered ideas based on what's expiring (Paid tier, BYOK-unlockable)
- **Leftover mode** — prioritize nearly-expired items in recipe ranking (Premium tier)
## Stack
- **Frontend:** Vue 3 SPA (Vite + TypeScript)
- **Backend:** FastAPI + SQLite (via `circuitforge-core`)
- **Auth:** CF session cookie → Directus JWT (cloud mode)
- **Licensing:** Heimdall (free tier auto-provisioned at signup)
## Running locally
```bash
cp .env.example .env
./manage.sh build
./manage.sh start
# Web: http://localhost:8511
# API: http://localhost:8512
```
## Cloud instance
```bash
./manage.sh cloud-build
./manage.sh cloud-start
# Served at menagerie.circuitforge.tech/kiwi (JWT-gated)
```
## Tiers
| Feature | Free | Paid | Premium |
|---------|------|------|---------|
| Inventory CRUD | ✓ | ✓ | ✓ |
| Barcode scan | ✓ | ✓ | ✓ |
| Receipt upload | ✓ | ✓ | ✓ |
| Expiry alerts | ✓ | ✓ | ✓ |
| CSV export | ✓ | ✓ | ✓ |
| Receipt OCR | BYOK | ✓ | ✓ |
| Recipe suggestions | BYOK | ✓ | ✓ |
| Meal planning | — | ✓ | ✓ |
| Multi-household | — | — | ✓ |
| Leftover mode | — | — | ✓ |
BYOK = bring your own LLM backend (configure `~/.config/circuitforge/llm.yaml`)
## License
Discovery/pipeline layer: MIT
AI features: BSL 1.1 (free for personal non-commercial self-hosting)

7
app/__init__.py Normal file
View file

@ -0,0 +1,7 @@
# app/__init__.py
"""
Kiwi: Pantry tracking and leftover recipe suggestions.
"""
__version__ = "0.1.0"
__author__ = "Alan 'pyr0ball' Weinstock"

5
app/api/__init__.py Normal file
View file

@ -0,0 +1,5 @@
# app/api/__init__.py
"""
API package for Kiwi.
Contains all API routes and endpoint handlers.
"""

View file

@ -0,0 +1,4 @@
# app/api/endpoints/__init__.py
"""
API endpoint implementations for Kiwi.
"""

View file

@ -0,0 +1,47 @@
"""Export endpoints — CSV/Excel of receipt and inventory data."""
from __future__ import annotations
import asyncio
import csv
import io
from fastapi import APIRouter, Depends
from fastapi.responses import StreamingResponse
from app.db.session import get_store
from app.db.store import Store
router = APIRouter(prefix="/export", tags=["export"])
@router.get("/receipts/csv")
async def export_receipts_csv(store: Store = Depends(get_store)):
receipts = await asyncio.to_thread(store.list_receipts, 1000, 0)
output = io.StringIO()
fields = ["id", "filename", "status", "created_at", "updated_at"]
writer = csv.DictWriter(output, fieldnames=fields, extrasaction="ignore")
writer.writeheader()
writer.writerows(receipts)
output.seek(0)
return StreamingResponse(
iter([output.getvalue()]),
media_type="text/csv",
headers={"Content-Disposition": "attachment; filename=receipts.csv"},
)
@router.get("/inventory/csv")
async def export_inventory_csv(store: Store = Depends(get_store)):
items = await asyncio.to_thread(store.list_inventory)
output = io.StringIO()
fields = ["id", "product_name", "barcode", "category", "quantity", "unit",
"location", "expiration_date", "status", "created_at"]
writer = csv.DictWriter(output, fieldnames=fields, extrasaction="ignore")
writer.writeheader()
writer.writerows(items)
output.seek(0)
return StreamingResponse(
iter([output.getvalue()]),
media_type="text/csv",
headers={"Content-Disposition": "attachment; filename=inventory.csv"},
)

View file

@ -0,0 +1,14 @@
# app/api/endpoints/health.py
from fastapi import APIRouter
router = APIRouter()
@router.get("/")
async def health_check():
return {"status": "ok", "service": "kiwi-api"}
@router.get("/ping")
async def ping():
return {"ping": "pong"}

View file

@ -0,0 +1,411 @@
"""Inventory API endpoints — products, items, barcode scanning, tags, stats."""
from __future__ import annotations
import asyncio
import uuid
from pathlib import Path
from typing import Any, Dict, List, Optional
import aiofiles
from fastapi import APIRouter, Depends, File, Form, HTTPException, UploadFile, status
from pydantic import BaseModel
from app.cloud_session import CloudUser, get_session
from app.db.session import get_store
from app.db.store import Store
from app.models.schemas.inventory import (
BarcodeScanResponse,
InventoryItemCreate,
InventoryItemResponse,
InventoryItemUpdate,
InventoryStats,
ProductCreate,
ProductResponse,
ProductUpdate,
TagCreate,
TagResponse,
)
router = APIRouter()
# ── Products ──────────────────────────────────────────────────────────────────
@router.post("/products", response_model=ProductResponse, status_code=status.HTTP_201_CREATED)
async def create_product(body: ProductCreate, store: Store = Depends(get_store)):
product, _ = await asyncio.to_thread(
store.get_or_create_product,
body.name,
body.barcode,
brand=body.brand,
category=body.category,
description=body.description,
image_url=body.image_url,
nutrition_data=body.nutrition_data,
source=body.source,
source_data=body.source_data,
)
return ProductResponse.model_validate(product)
@router.get("/products", response_model=List[ProductResponse])
async def list_products(store: Store = Depends(get_store)):
products = await asyncio.to_thread(store.list_products)
return [ProductResponse.model_validate(p) for p in products]
@router.get("/products/{product_id}", response_model=ProductResponse)
async def get_product(product_id: int, store: Store = Depends(get_store)):
product = await asyncio.to_thread(store.get_product, product_id)
if not product:
raise HTTPException(status_code=404, detail="Product not found")
return ProductResponse.model_validate(product)
@router.get("/products/barcode/{barcode}", response_model=ProductResponse)
async def get_product_by_barcode(barcode: str, store: Store = Depends(get_store)):
from app.db import store as store_module # avoid circular
product = await asyncio.to_thread(
store._fetch_one, "SELECT * FROM products WHERE barcode = ?", (barcode,)
)
if not product:
raise HTTPException(status_code=404, detail="Product not found")
return ProductResponse.model_validate(product)
@router.patch("/products/{product_id}", response_model=ProductResponse)
async def update_product(
product_id: int, body: ProductUpdate, store: Store = Depends(get_store)
):
updates = body.model_dump(exclude_none=True)
if not updates:
product = await asyncio.to_thread(store.get_product, product_id)
else:
import json
sets = ", ".join(f"{k} = ?" for k in updates)
values = []
for k, v in updates.items():
values.append(json.dumps(v) if isinstance(v, dict) else v)
values.append(product_id)
await asyncio.to_thread(
store.conn.execute,
f"UPDATE products SET {sets}, updated_at = datetime('now') WHERE id = ?",
values,
)
store.conn.commit()
product = await asyncio.to_thread(store.get_product, product_id)
if not product:
raise HTTPException(status_code=404, detail="Product not found")
return ProductResponse.model_validate(product)
@router.delete("/products/{product_id}", status_code=status.HTTP_204_NO_CONTENT)
async def delete_product(product_id: int, store: Store = Depends(get_store)):
existing = await asyncio.to_thread(store.get_product, product_id)
if not existing:
raise HTTPException(status_code=404, detail="Product not found")
await asyncio.to_thread(
store.conn.execute, "DELETE FROM products WHERE id = ?", (product_id,)
)
store.conn.commit()
# ── Inventory items ───────────────────────────────────────────────────────────
@router.post("/items", response_model=InventoryItemResponse, status_code=status.HTTP_201_CREATED)
async def create_inventory_item(body: InventoryItemCreate, store: Store = Depends(get_store)):
item = await asyncio.to_thread(
store.add_inventory_item,
body.product_id,
body.location,
quantity=body.quantity,
unit=body.unit,
sublocation=body.sublocation,
purchase_date=str(body.purchase_date) if body.purchase_date else None,
expiration_date=str(body.expiration_date) if body.expiration_date else None,
notes=body.notes,
source=body.source,
)
return InventoryItemResponse.model_validate(item)
@router.get("/items", response_model=List[InventoryItemResponse])
async def list_inventory_items(
location: Optional[str] = None,
item_status: str = "available",
store: Store = Depends(get_store),
):
items = await asyncio.to_thread(store.list_inventory, location, item_status)
return [InventoryItemResponse.model_validate(i) for i in items]
@router.get("/items/expiring", response_model=List[InventoryItemResponse])
async def get_expiring_items(days: int = 7, store: Store = Depends(get_store)):
items = await asyncio.to_thread(store.expiring_soon, days)
return [InventoryItemResponse.model_validate(i) for i in items]
@router.get("/items/{item_id}", response_model=InventoryItemResponse)
async def get_inventory_item(item_id: int, store: Store = Depends(get_store)):
item = await asyncio.to_thread(store.get_inventory_item, item_id)
if not item:
raise HTTPException(status_code=404, detail="Inventory item not found")
return InventoryItemResponse.model_validate(item)
@router.patch("/items/{item_id}", response_model=InventoryItemResponse)
async def update_inventory_item(
item_id: int, body: InventoryItemUpdate, store: Store = Depends(get_store)
):
updates = body.model_dump(exclude_none=True)
if "purchase_date" in updates and updates["purchase_date"]:
updates["purchase_date"] = str(updates["purchase_date"])
if "expiration_date" in updates and updates["expiration_date"]:
updates["expiration_date"] = str(updates["expiration_date"])
item = await asyncio.to_thread(store.update_inventory_item, item_id, **updates)
if not item:
raise HTTPException(status_code=404, detail="Inventory item not found")
return InventoryItemResponse.model_validate(item)
@router.post("/items/{item_id}/consume", response_model=InventoryItemResponse)
async def consume_item(item_id: int, store: Store = Depends(get_store)):
from datetime import datetime, timezone
item = await asyncio.to_thread(
store.update_inventory_item,
item_id,
status="consumed",
consumed_at=datetime.now(timezone.utc).isoformat(),
)
if not item:
raise HTTPException(status_code=404, detail="Inventory item not found")
return InventoryItemResponse.model_validate(item)
@router.delete("/items/{item_id}", status_code=status.HTTP_204_NO_CONTENT)
async def delete_inventory_item(item_id: int, store: Store = Depends(get_store)):
existing = await asyncio.to_thread(store.get_inventory_item, item_id)
if not existing:
raise HTTPException(status_code=404, detail="Inventory item not found")
await asyncio.to_thread(
store.conn.execute, "DELETE FROM inventory_items WHERE id = ?", (item_id,)
)
store.conn.commit()
# ── Barcode scanning ──────────────────────────────────────────────────────────
class BarcodeScanTextRequest(BaseModel):
barcode: str
location: str = "pantry"
quantity: float = 1.0
auto_add_to_inventory: bool = True
@router.post("/scan/text", response_model=BarcodeScanResponse)
async def scan_barcode_text(
body: BarcodeScanTextRequest,
store: Store = Depends(get_store),
session: CloudUser = Depends(get_session),
):
"""Scan a barcode from a text string (e.g. from a hardware scanner or manual entry)."""
from app.services.openfoodfacts import OpenFoodFactsService
from app.services.expiration_predictor import ExpirationPredictor
off = OpenFoodFactsService()
predictor = ExpirationPredictor()
product_info = await off.lookup_product(body.barcode)
inventory_item = None
if product_info and body.auto_add_to_inventory:
product, _ = await asyncio.to_thread(
store.get_or_create_product,
product_info.get("name", body.barcode),
body.barcode,
brand=product_info.get("brand"),
category=product_info.get("category"),
nutrition_data=product_info.get("nutrition_data", {}),
source="openfoodfacts",
source_data=product_info,
)
exp = predictor.predict_expiration(
product_info.get("category", ""),
body.location,
product_name=product_info.get("name", body.barcode),
tier=session.tier,
has_byok=session.has_byok,
)
inventory_item = await asyncio.to_thread(
store.add_inventory_item,
product["id"], body.location,
quantity=body.quantity,
expiration_date=str(exp) if exp else None,
source="barcode_scan",
)
result_product = ProductResponse.model_validate(product)
else:
result_product = None
return BarcodeScanResponse(
success=True,
barcodes_found=1,
results=[{
"barcode": body.barcode,
"barcode_type": "text",
"product": result_product,
"inventory_item": InventoryItemResponse.model_validate(inventory_item) if inventory_item else None,
"added_to_inventory": inventory_item is not None,
"message": "Added to inventory" if inventory_item else "Product not found in database",
}],
message="Barcode processed",
)
@router.post("/scan", response_model=BarcodeScanResponse)
async def scan_barcode_image(
file: UploadFile = File(...),
auto_add_to_inventory: bool = Form(True),
location: str = Form("pantry"),
quantity: float = Form(1.0),
store: Store = Depends(get_store),
session: CloudUser = Depends(get_session),
):
"""Scan a barcode from an uploaded image. Requires Phase 2 scanner integration."""
temp_dir = Path("/tmp/kiwi_barcode_scans")
temp_dir.mkdir(parents=True, exist_ok=True)
temp_file = temp_dir / f"{uuid.uuid4()}_{file.filename}"
try:
async with aiofiles.open(temp_file, "wb") as f:
await f.write(await file.read())
from app.services.barcode_scanner import BarcodeScanner
from app.services.openfoodfacts import OpenFoodFactsService
from app.services.expiration_predictor import ExpirationPredictor
barcodes = await asyncio.to_thread(BarcodeScanner().scan_image, temp_file)
if not barcodes:
return BarcodeScanResponse(
success=False, barcodes_found=0, results=[],
message="No barcodes detected in image"
)
off = OpenFoodFactsService()
predictor = ExpirationPredictor()
results = []
for bc in barcodes:
code = bc["data"]
product_info = await off.lookup_product(code)
inventory_item = None
if product_info and auto_add_to_inventory:
product, _ = await asyncio.to_thread(
store.get_or_create_product,
product_info.get("name", code),
code,
brand=product_info.get("brand"),
category=product_info.get("category"),
nutrition_data=product_info.get("nutrition_data", {}),
source="openfoodfacts",
source_data=product_info,
)
exp = predictor.predict_expiration(
product_info.get("category", ""),
location,
product_name=product_info.get("name", code),
tier=session.tier,
has_byok=session.has_byok,
)
inventory_item = await asyncio.to_thread(
store.add_inventory_item,
product["id"], location,
quantity=quantity,
expiration_date=str(exp) if exp else None,
source="barcode_scan",
)
results.append({
"barcode": code,
"barcode_type": bc.get("type", "unknown"),
"product": ProductResponse.model_validate(product) if product_info else None,
"inventory_item": InventoryItemResponse.model_validate(inventory_item) if inventory_item else None,
"added_to_inventory": inventory_item is not None,
"message": "Added to inventory" if inventory_item else "Barcode scanned",
})
return BarcodeScanResponse(
success=True, barcodes_found=len(barcodes), results=results,
message=f"Processed {len(barcodes)} barcode(s)"
)
finally:
if temp_file.exists():
temp_file.unlink()
# ── Tags ──────────────────────────────────────────────────────────────────────
@router.post("/tags", response_model=TagResponse, status_code=status.HTTP_201_CREATED)
async def create_tag(body: TagCreate, store: Store = Depends(get_store)):
cur = await asyncio.to_thread(
store.conn.execute,
"INSERT INTO tags (name, slug, description, color, category) VALUES (?,?,?,?,?) RETURNING *",
(body.name, body.slug, body.description, body.color, body.category),
)
store.conn.commit()
import sqlite3; store.conn.row_factory = sqlite3.Row
return TagResponse.model_validate(store._row_to_dict(cur.fetchone()))
@router.get("/tags", response_model=List[TagResponse])
async def list_tags(
category: Optional[str] = None, store: Store = Depends(get_store)
):
if category:
tags = await asyncio.to_thread(
store._fetch_all, "SELECT * FROM tags WHERE category = ? ORDER BY name", (category,)
)
else:
tags = await asyncio.to_thread(
store._fetch_all, "SELECT * FROM tags ORDER BY name"
)
return [TagResponse.model_validate(t) for t in tags]
# ── Stats ─────────────────────────────────────────────────────────────────────
@router.post("/recalculate-expiry")
async def recalculate_expiry(
session: CloudUser = Depends(get_session),
store: Store = Depends(get_store),
) -> dict:
"""Re-run the expiration predictor over all available inventory items.
Uses each item's stored purchase_date and current location. Safe to call
multiple times idempotent per session.
"""
def _run(s: Store) -> tuple[int, int]:
return s.recalculate_expiry(tier=session.tier, has_byok=session.has_byok)
updated, skipped = await asyncio.to_thread(_run, store)
return {"updated": updated, "skipped": skipped}
@router.get("/stats", response_model=InventoryStats)
async def get_inventory_stats(store: Store = Depends(get_store)):
def _stats():
rows = store._fetch_all(
"""SELECT status, location, COUNT(*) as cnt
FROM inventory_items GROUP BY status, location"""
)
total = sum(r["cnt"] for r in rows)
available = sum(r["cnt"] for r in rows if r["status"] == "available")
expired = sum(r["cnt"] for r in rows if r["status"] == "expired")
expiring = len(store.expiring_soon(7))
locations = {}
for r in rows:
if r["status"] == "available":
locations[r["location"]] = locations.get(r["location"], 0) + r["cnt"]
return {
"total_items": total,
"available_items": available,
"expiring_soon": expiring,
"expired_items": expired,
"locations": locations,
}
return InventoryStats.model_validate(await asyncio.to_thread(_stats))

233
app/api/endpoints/ocr.py Normal file
View file

@ -0,0 +1,233 @@
"""OCR status, trigger, and approval endpoints."""
from __future__ import annotations
import asyncio
import json
import logging
from datetime import date
from pathlib import Path
from typing import Any
from fastapi import APIRouter, BackgroundTasks, Depends, HTTPException
from app.cloud_session import CloudUser, get_session
from app.core.config import settings
from app.db.session import get_store
from app.db.store import Store
from app.models.schemas.receipt import (
ApproveOCRRequest,
ApproveOCRResponse,
ApprovedInventoryItem,
)
from app.services.expiration_predictor import ExpirationPredictor
from app.tiers import can_use
from app.utils.units import normalize_to_metric
logger = logging.getLogger(__name__)
router = APIRouter()
# ── Status ────────────────────────────────────────────────────────────────────
@router.get("/{receipt_id}/ocr/status")
async def get_ocr_status(receipt_id: int, store: Store = Depends(get_store)):
receipt = await asyncio.to_thread(store.get_receipt, receipt_id)
if not receipt:
raise HTTPException(status_code=404, detail="Receipt not found")
rd = await asyncio.to_thread(
store._fetch_one,
"SELECT id, processing_time FROM receipt_data WHERE receipt_id = ?",
(receipt_id,),
)
return {
"receipt_id": receipt_id,
"status": receipt["status"],
"ocr_complete": rd is not None,
"ocr_enabled": settings.ENABLE_OCR,
}
# ── Trigger ───────────────────────────────────────────────────────────────────
@router.post("/{receipt_id}/ocr/trigger")
async def trigger_ocr(
receipt_id: int,
background_tasks: BackgroundTasks,
store: Store = Depends(get_store),
session: CloudUser = Depends(get_session),
):
"""Manually trigger OCR processing for an already-uploaded receipt."""
if not can_use("receipt_ocr", session.tier, session.has_byok):
raise HTTPException(
status_code=403,
detail="Receipt OCR requires Paid tier or a configured local LLM backend (BYOK).",
)
if not settings.ENABLE_OCR:
raise HTTPException(status_code=503, detail="OCR not enabled on this server.")
receipt = await asyncio.to_thread(store.get_receipt, receipt_id)
if not receipt:
raise HTTPException(status_code=404, detail="Receipt not found")
if receipt["status"] == "processing":
raise HTTPException(status_code=409, detail="OCR already in progress for this receipt.")
image_path = Path(receipt["original_path"])
if not image_path.exists():
raise HTTPException(status_code=404, detail="Image file not found on disk.")
async def _run() -> None:
try:
await asyncio.to_thread(store.update_receipt_status, receipt_id, "processing")
from app.services.receipt_service import ReceiptService
await ReceiptService(store).process(receipt_id, image_path)
except Exception as exc:
logger.exception("OCR pipeline failed for receipt %s", receipt_id)
await asyncio.to_thread(store.update_receipt_status, receipt_id, "error", str(exc))
background_tasks.add_task(_run)
return {"receipt_id": receipt_id, "status": "queued"}
# ── Data ──────────────────────────────────────────────────────────────────────
@router.get("/{receipt_id}/ocr/data")
async def get_ocr_data(receipt_id: int, store: Store = Depends(get_store)):
rd = await asyncio.to_thread(
store._fetch_one,
"SELECT * FROM receipt_data WHERE receipt_id = ?",
(receipt_id,),
)
if not rd:
raise HTTPException(status_code=404, detail="No OCR data for this receipt")
return rd
# ── Approve ───────────────────────────────────────────────────────────────────
@router.post("/{receipt_id}/ocr/approve", response_model=ApproveOCRResponse)
async def approve_ocr_items(
receipt_id: int,
body: ApproveOCRRequest,
store: Store = Depends(get_store),
session: CloudUser = Depends(get_session),
):
"""Commit reviewed OCR line items into inventory.
Reads items from receipt_data, optionally filtered by item_indices,
and creates inventory entries. Receipt status moves to 'processed'.
"""
receipt = await asyncio.to_thread(store.get_receipt, receipt_id)
if not receipt:
raise HTTPException(status_code=404, detail="Receipt not found")
if receipt["status"] not in ("staged", "processed"):
raise HTTPException(
status_code=409,
detail=f"Receipt is not staged for approval (status={receipt['status']}).",
)
rd = await asyncio.to_thread(
store._fetch_one,
"SELECT items, transaction_date FROM receipt_data WHERE receipt_id = ?",
(receipt_id,),
)
if not rd:
raise HTTPException(status_code=404, detail="No OCR data found for this receipt.")
raw_items: list[dict[str, Any]] = json.loads(rd["items"] or "[]")
if not raw_items:
raise HTTPException(status_code=422, detail="No items found in OCR data.")
# Filter to requested indices, or use all
if body.item_indices is not None:
invalid = [i for i in body.item_indices if i >= len(raw_items) or i < 0]
if invalid:
raise HTTPException(
status_code=422,
detail=f"Item indices out of range: {invalid} (receipt has {len(raw_items)} items)",
)
selected = [(i, raw_items[i]) for i in body.item_indices]
skipped = len(raw_items) - len(selected)
else:
selected = list(enumerate(raw_items))
skipped = 0
created = await asyncio.to_thread(
_commit_items, store, receipt_id, selected, body.location, rd.get("transaction_date")
)
await asyncio.to_thread(store.update_receipt_status, receipt_id, "processed")
return ApproveOCRResponse(
receipt_id=receipt_id,
approved=len(created),
skipped=skipped,
inventory_items=created,
)
def _commit_items(
store: Store,
receipt_id: int,
selected: list[tuple[int, dict[str, Any]]],
location: str,
transaction_date: str | None,
) -> list[ApprovedInventoryItem]:
"""Create product + inventory entries for approved OCR line items.
Runs synchronously inside asyncio.to_thread.
"""
predictor = ExpirationPredictor()
purchase_date: date | None = None
if transaction_date:
try:
purchase_date = date.fromisoformat(transaction_date)
except ValueError:
logger.warning("Could not parse transaction_date %r", transaction_date)
created: list[ApprovedInventoryItem] = []
for _idx, item in selected:
name = (item.get("name") or "").strip()
if not name:
logger.debug("Skipping nameless item at index %d", _idx)
continue
category = (item.get("category") or "").strip()
quantity = float(item.get("quantity") or 1.0)
raw_unit = (item.get("unit") or "each").strip()
metric_qty, base_unit = normalize_to_metric(quantity, raw_unit)
product, _ = store.get_or_create_product(
name,
category=category or None,
source="receipt_ocr",
)
exp = predictor.predict_expiration(
category, location,
purchase_date=purchase_date,
product_name=name,
)
inv = store.add_inventory_item(
product["id"],
location,
quantity=metric_qty,
unit=base_unit,
receipt_id=receipt_id,
purchase_date=str(purchase_date) if purchase_date else None,
expiration_date=str(exp) if exp else None,
source="receipt_ocr",
)
created.append(ApprovedInventoryItem(
inventory_id=inv["id"],
product_name=name,
quantity=quantity,
location=location,
expiration_date=str(exp) if exp else None,
))
return created

View file

@ -0,0 +1,110 @@
"""Receipt upload, OCR, and quality endpoints."""
from __future__ import annotations
import asyncio
import uuid
from pathlib import Path
from typing import List
import aiofiles
from fastapi import APIRouter, BackgroundTasks, Depends, File, HTTPException, UploadFile
from app.cloud_session import CloudUser, get_session
from app.core.config import settings
from app.db.session import get_store
from app.db.store import Store
from app.models.schemas.receipt import ReceiptResponse
from app.models.schemas.quality import QualityAssessment
from app.tiers import can_use
router = APIRouter()
async def _save_upload(file: UploadFile, dest_dir: Path) -> Path:
dest = dest_dir / f"{uuid.uuid4()}_{file.filename}"
async with aiofiles.open(dest, "wb") as f:
await f.write(await file.read())
return dest
@router.post("/", response_model=ReceiptResponse, status_code=201)
async def upload_receipt(
background_tasks: BackgroundTasks,
file: UploadFile = File(...),
store: Store = Depends(get_store),
session: CloudUser = Depends(get_session),
):
settings.ensure_dirs()
saved = await _save_upload(file, settings.UPLOAD_DIR)
receipt = await asyncio.to_thread(
store.create_receipt, file.filename, str(saved)
)
# Only queue OCR if the feature is enabled server-side AND the user's tier allows it.
# Check tier here, not inside the background task — once dispatched it can't be cancelled.
ocr_allowed = settings.ENABLE_OCR and can_use("receipt_ocr", session.tier, session.has_byok)
if ocr_allowed:
background_tasks.add_task(_process_receipt_ocr, receipt["id"], saved, store)
return ReceiptResponse.model_validate(receipt)
@router.post("/batch", response_model=List[ReceiptResponse], status_code=201)
async def upload_receipts_batch(
background_tasks: BackgroundTasks,
files: List[UploadFile] = File(...),
store: Store = Depends(get_store),
session: CloudUser = Depends(get_session),
):
settings.ensure_dirs()
ocr_allowed = settings.ENABLE_OCR and can_use("receipt_ocr", session.tier, session.has_byok)
results = []
for file in files:
saved = await _save_upload(file, settings.UPLOAD_DIR)
receipt = await asyncio.to_thread(
store.create_receipt, file.filename, str(saved)
)
if ocr_allowed:
background_tasks.add_task(_process_receipt_ocr, receipt["id"], saved, store)
results.append(ReceiptResponse.model_validate(receipt))
return results
@router.get("/{receipt_id}", response_model=ReceiptResponse)
async def get_receipt(receipt_id: int, store: Store = Depends(get_store)):
receipt = await asyncio.to_thread(store.get_receipt, receipt_id)
if not receipt:
raise HTTPException(status_code=404, detail="Receipt not found")
return ReceiptResponse.model_validate(receipt)
@router.get("/", response_model=List[ReceiptResponse])
async def list_receipts(
limit: int = 50, offset: int = 0, store: Store = Depends(get_store)
):
receipts = await asyncio.to_thread(store.list_receipts, limit, offset)
return [ReceiptResponse.model_validate(r) for r in receipts]
@router.get("/{receipt_id}/quality", response_model=QualityAssessment)
async def get_receipt_quality(receipt_id: int, store: Store = Depends(get_store)):
qa = await asyncio.to_thread(
store._fetch_one,
"SELECT * FROM quality_assessments WHERE receipt_id = ?",
(receipt_id,),
)
if not qa:
raise HTTPException(status_code=404, detail="Quality assessment not found")
return QualityAssessment.model_validate(qa)
async def _process_receipt_ocr(receipt_id: int, image_path: Path, store: Store) -> None:
"""Background task: run OCR pipeline on an uploaded receipt."""
try:
await asyncio.to_thread(store.update_receipt_status, receipt_id, "processing")
from app.services.receipt_service import ReceiptService
service = ReceiptService(store)
await service.process(receipt_id, image_path)
except Exception as exc:
await asyncio.to_thread(
store.update_receipt_status, receipt_id, "error", str(exc)
)

View file

@ -0,0 +1,67 @@
"""Recipe suggestion endpoints."""
from __future__ import annotations
import asyncio
from pathlib import Path
from fastapi import APIRouter, Depends, HTTPException
from app.cloud_session import CloudUser, get_session
from app.db.store import Store
from app.models.schemas.recipe import RecipeRequest, RecipeResult
from app.services.recipe.recipe_engine import RecipeEngine
from app.tiers import can_use
router = APIRouter()
def _suggest_in_thread(db_path: Path, req: RecipeRequest) -> RecipeResult:
"""Run recipe suggestion in a worker thread with its own Store connection.
SQLite connections cannot be shared across threads. This function creates
a fresh Store (and therefore a fresh sqlite3.Connection) in the same thread
where it will be used, avoiding ProgrammingError: SQLite objects created in
a thread can only be used in that same thread.
"""
store = Store(db_path)
try:
return RecipeEngine(store).suggest(req)
finally:
store.close()
@router.post("/suggest", response_model=RecipeResult)
async def suggest_recipes(
req: RecipeRequest,
session: CloudUser = Depends(get_session),
) -> RecipeResult:
# Inject session-authoritative tier/byok immediately — client-supplied values are ignored.
req = req.model_copy(update={"tier": session.tier, "has_byok": session.has_byok})
if req.level == 4 and not req.wildcard_confirmed:
raise HTTPException(
status_code=400,
detail="Level 4 (Wildcard) requires wildcard_confirmed=true.",
)
if req.level in (3, 4) and not can_use("recipe_suggestions", req.tier, req.has_byok):
raise HTTPException(
status_code=403,
detail="LLM recipe levels require Paid tier or a configured LLM backend.",
)
if req.style_id and not can_use("style_picker", req.tier):
raise HTTPException(status_code=403, detail="Style picker requires Paid tier.")
return await asyncio.to_thread(_suggest_in_thread, session.db, req)
@router.get("/{recipe_id}")
async def get_recipe(recipe_id: int, session: CloudUser = Depends(get_session)) -> dict:
def _get(db_path: Path, rid: int) -> dict | None:
store = Store(db_path)
try:
return store.get_recipe(rid)
finally:
store.close()
recipe = await asyncio.to_thread(_get, session.db, recipe_id)
if not recipe:
raise HTTPException(status_code=404, detail="Recipe not found.")
return recipe

View file

@ -0,0 +1,46 @@
"""User settings endpoints."""
from __future__ import annotations
from fastapi import APIRouter, Depends, HTTPException
from pydantic import BaseModel
from app.cloud_session import CloudUser, get_session
from app.db.session import get_store
from app.db.store import Store
router = APIRouter()
_ALLOWED_KEYS = frozenset({"cooking_equipment"})
class SettingBody(BaseModel):
value: str
@router.get("/{key}")
async def get_setting(
key: str,
session: CloudUser = Depends(get_session),
store: Store = Depends(get_store),
) -> dict:
"""Return the stored value for a settings key."""
if key not in _ALLOWED_KEYS:
raise HTTPException(status_code=422, detail=f"Unknown settings key: '{key}'.")
value = store.get_setting(key)
if value is None:
raise HTTPException(status_code=404, detail=f"Setting '{key}' not found.")
return {"key": key, "value": value}
@router.put("/{key}")
async def set_setting(
key: str,
body: SettingBody,
session: CloudUser = Depends(get_session),
store: Store = Depends(get_store),
) -> dict:
"""Upsert a settings key-value pair."""
if key not in _ALLOWED_KEYS:
raise HTTPException(status_code=422, detail=f"Unknown settings key: '{key}'.")
store.set_setting(key, body.value)
return {"key": key, "value": body.value}

View file

@ -0,0 +1,42 @@
"""Staple library endpoints."""
from __future__ import annotations
from fastapi import APIRouter, HTTPException
from app.services.recipe.staple_library import StapleLibrary
router = APIRouter()
_lib = StapleLibrary()
@router.get("/")
async def list_staples(dietary: str | None = None) -> list[dict]:
staples = _lib.filter_by_dietary(dietary) if dietary else _lib.list_all()
return [
{
"slug": s.slug,
"name": s.name,
"description": s.description,
"dietary_labels": s.dietary_labels,
"yield_formats": list(s.yield_formats.keys()),
}
for s in staples
]
@router.get("/{slug}")
async def get_staple(slug: str) -> dict:
staple = _lib.get(slug)
if not staple:
raise HTTPException(status_code=404, detail=f"Staple '{slug}' not found.")
return {
"slug": staple.slug,
"name": staple.name,
"description": staple.description,
"dietary_labels": staple.dietary_labels,
"base_ingredients": staple.base_ingredients,
"base_method": staple.base_method,
"base_time_minutes": staple.base_time_minutes,
"yield_formats": staple.yield_formats,
"compatible_styles": staple.compatible_styles,
}

13
app/api/routes.py Normal file
View file

@ -0,0 +1,13 @@
from fastapi import APIRouter
from app.api.endpoints import health, receipts, export, inventory, ocr, recipes, settings, staples
api_router = APIRouter()
api_router.include_router(health.router, prefix="/health", tags=["health"])
api_router.include_router(receipts.router, prefix="/receipts", tags=["receipts"])
api_router.include_router(ocr.router, prefix="/receipts", tags=["ocr"])
api_router.include_router(export.router, tags=["export"])
api_router.include_router(inventory.router, prefix="/inventory", tags=["inventory"])
api_router.include_router(recipes.router, prefix="/recipes", tags=["recipes"])
api_router.include_router(settings.router, prefix="/settings", tags=["settings"])
api_router.include_router(staples.router, prefix="/staples", tags=["staples"])

249
app/cloud_session.py Normal file
View file

@ -0,0 +1,249 @@
"""Cloud session resolution for Kiwi FastAPI.
Local mode (CLOUD_MODE unset/false): returns a local CloudUser with no auth
checks, full tier access, and DB path pointing to settings.DB_PATH.
Cloud mode (CLOUD_MODE=true): validates the cf_session JWT injected by Caddy
as X-CF-Session, resolves user_id, auto-provisions a free Heimdall license on
first visit, fetches the tier, and returns a per-user DB path.
FastAPI usage:
@app.get("/api/v1/inventory/items")
def list_items(session: CloudUser = Depends(get_session)):
store = Store(session.db)
...
"""
from __future__ import annotations
import logging
import os
import re
import time
from dataclasses import dataclass
from pathlib import Path
import jwt as pyjwt
import requests
import yaml
from fastapi import Depends, HTTPException, Request
log = logging.getLogger(__name__)
# ── Config ────────────────────────────────────────────────────────────────────
CLOUD_MODE: bool = os.environ.get("CLOUD_MODE", "").lower() in ("1", "true", "yes")
CLOUD_DATA_ROOT: Path = Path(os.environ.get("CLOUD_DATA_ROOT", "/devl/kiwi-cloud-data"))
DIRECTUS_JWT_SECRET: str = os.environ.get("DIRECTUS_JWT_SECRET", "")
HEIMDALL_URL: str = os.environ.get("HEIMDALL_URL", "https://license.circuitforge.tech")
HEIMDALL_ADMIN_TOKEN: str = os.environ.get("HEIMDALL_ADMIN_TOKEN", "")
# Dev bypass: comma-separated IPs or CIDR ranges that skip JWT auth.
# NEVER set this in production. Intended only for LAN developer testing when
# the request doesn't pass through Caddy (which normally injects X-CF-Session).
# Example: CLOUD_AUTH_BYPASS_IPS=10.1.10.0/24,127.0.0.1
import ipaddress as _ipaddress
_BYPASS_RAW: list[str] = [
e.strip()
for e in os.environ.get("CLOUD_AUTH_BYPASS_IPS", "").split(",")
if e.strip()
]
_BYPASS_NETS: list[_ipaddress.IPv4Network | _ipaddress.IPv6Network] = []
_BYPASS_IPS: frozenset[str] = frozenset()
if _BYPASS_RAW:
_nets, _ips = [], set()
for entry in _BYPASS_RAW:
try:
_nets.append(_ipaddress.ip_network(entry, strict=False))
except ValueError:
_ips.add(entry) # treat non-parseable entries as bare IPs
_BYPASS_NETS = _nets
_BYPASS_IPS = frozenset(_ips)
def _is_bypass_ip(ip: str) -> bool:
if not ip:
return False
if ip in _BYPASS_IPS:
return True
try:
addr = _ipaddress.ip_address(ip)
return any(addr in net for net in _BYPASS_NETS)
except ValueError:
return False
_LOCAL_KIWI_DB: Path = Path(os.environ.get("KIWI_DB", "data/kiwi.db"))
_TIER_CACHE: dict[str, tuple[str, float]] = {}
_TIER_CACHE_TTL = 300 # 5 minutes
TIERS = ["free", "paid", "premium", "ultra"]
# ── Domain ────────────────────────────────────────────────────────────────────
@dataclass(frozen=True)
class CloudUser:
user_id: str # Directus UUID, or "local"
tier: str # free | paid | premium | ultra | local
db: Path # per-user SQLite DB path
has_byok: bool # True if a configured LLM backend is present in llm.yaml
# ── JWT validation ─────────────────────────────────────────────────────────────
def _extract_session_token(header_value: str) -> str:
m = re.search(r'(?:^|;)\s*cf_session=([^;]+)', header_value)
return m.group(1).strip() if m else header_value.strip()
def validate_session_jwt(token: str) -> str:
"""Validate cf_session JWT and return the Directus user_id."""
try:
payload = pyjwt.decode(
token,
DIRECTUS_JWT_SECRET,
algorithms=["HS256"],
options={"require": ["id", "exp"]},
)
return payload["id"]
except Exception as exc:
log.debug("JWT validation failed: %s", exc)
raise HTTPException(status_code=401, detail="Session invalid or expired")
# ── Heimdall integration ──────────────────────────────────────────────────────
def _ensure_provisioned(user_id: str) -> None:
if not HEIMDALL_ADMIN_TOKEN:
return
try:
requests.post(
f"{HEIMDALL_URL}/admin/provision",
json={"directus_user_id": user_id, "product": "kiwi", "tier": "free"},
headers={"Authorization": f"Bearer {HEIMDALL_ADMIN_TOKEN}"},
timeout=5,
)
except Exception as exc:
log.warning("Heimdall provision failed for user %s: %s", user_id, exc)
def _fetch_cloud_tier(user_id: str) -> str:
now = time.monotonic()
cached = _TIER_CACHE.get(user_id)
if cached and (now - cached[1]) < _TIER_CACHE_TTL:
return cached[0]
if not HEIMDALL_ADMIN_TOKEN:
return "free"
try:
resp = requests.post(
f"{HEIMDALL_URL}/admin/cloud/resolve",
json={"directus_user_id": user_id, "product": "kiwi"},
headers={"Authorization": f"Bearer {HEIMDALL_ADMIN_TOKEN}"},
timeout=5,
)
tier = resp.json().get("tier", "free") if resp.ok else "free"
except Exception as exc:
log.warning("Heimdall tier resolve failed for user %s: %s", user_id, exc)
tier = "free"
_TIER_CACHE[user_id] = (tier, now)
return tier
def _user_db_path(user_id: str) -> Path:
path = CLOUD_DATA_ROOT / user_id / "kiwi.db"
path.parent.mkdir(parents=True, exist_ok=True)
return path
# ── BYOK detection ────────────────────────────────────────────────────────────
_LLM_CONFIG_PATH = Path.home() / ".config" / "circuitforge" / "llm.yaml"
def _detect_byok(config_path: Path = _LLM_CONFIG_PATH) -> bool:
"""Return True if at least one enabled non-vision LLM backend is configured.
Reads the same llm.yaml that LLMRouter uses. Local (Ollama, vLLM) and
API-key backends both count the policy is "user is supplying compute",
regardless of where that compute lives.
"""
try:
with open(config_path) as f:
cfg = yaml.safe_load(f) or {}
return any(
b.get("enabled", True) and b.get("type") != "vision_service"
for b in cfg.get("backends", {}).values()
)
except Exception:
return False
# ── FastAPI dependency ────────────────────────────────────────────────────────
def get_session(request: Request) -> CloudUser:
"""FastAPI dependency — resolves the current user from the request.
Local mode: fully-privileged "local" user pointing at local DB.
Cloud mode: validates X-CF-Session JWT, provisions license, resolves tier.
Dev bypass: if CLOUD_AUTH_BYPASS_IPS is set and the client IP matches,
returns a "local" session without JWT validation (dev/LAN use only).
"""
has_byok = _detect_byok()
if not CLOUD_MODE:
return CloudUser(user_id="local", tier="local", db=_LOCAL_KIWI_DB, has_byok=has_byok)
# Prefer X-Real-IP (set by nginx from the actual client address) over the
# TCP peer address (which is nginx's container IP when behind the proxy).
# Prefer X-Real-IP (set by nginx from the actual client address) over the
# TCP peer address (which is nginx's container IP when behind the proxy).
client_ip = (
request.headers.get("x-real-ip", "")
or (request.client.host if request.client else "")
)
if (_BYPASS_IPS or _BYPASS_NETS) and _is_bypass_ip(client_ip):
log.debug("CLOUD_AUTH_BYPASS_IPS match for %s — returning local session", client_ip)
# Use a dev DB under CLOUD_DATA_ROOT so the container has a writable path.
dev_db = _user_db_path("local-dev")
return CloudUser(user_id="local-dev", tier="local", db=dev_db, has_byok=has_byok)
raw_header = (
request.headers.get("x-cf-session", "")
or request.headers.get("cookie", "")
)
if not raw_header:
raise HTTPException(status_code=401, detail="Not authenticated")
token = _extract_session_token(raw_header)
if not token:
raise HTTPException(status_code=401, detail="Not authenticated")
user_id = validate_session_jwt(token)
_ensure_provisioned(user_id)
tier = _fetch_cloud_tier(user_id)
return CloudUser(user_id=user_id, tier=tier, db=_user_db_path(user_id), has_byok=has_byok)
def require_tier(min_tier: str):
"""Dependency factory — raises 403 if tier is below min_tier."""
min_idx = TIERS.index(min_tier)
def _check(session: CloudUser = Depends(get_session)) -> CloudUser:
if session.tier == "local":
return session
try:
if TIERS.index(session.tier) < min_idx:
raise HTTPException(
status_code=403,
detail=f"This feature requires {min_tier} tier or above.",
)
except ValueError:
raise HTTPException(status_code=403, detail="Unknown tier.")
return session
return _check

5
app/core/__init__.py Normal file
View file

@ -0,0 +1,5 @@
# app/core/__init__.py
"""
Core components for Kiwi.
Contains configuration, dependencies, and other core functionality.
"""

62
app/core/config.py Normal file
View file

@ -0,0 +1,62 @@
"""
Kiwi application config.
Uses circuitforge-core for env loading; no pydantic-settings dependency.
"""
from __future__ import annotations
import os
from pathlib import Path
from circuitforge_core.config.settings import load_env
# Load .env from the repo root (two levels up from app/core/)
_ROOT = Path(__file__).resolve().parents[2]
load_env(_ROOT / ".env")
class Settings:
# API
API_PREFIX: str = os.environ.get("API_PREFIX", "/api/v1")
PROJECT_NAME: str = "Kiwi — Pantry Intelligence"
# CORS
CORS_ORIGINS: list[str] = [
o.strip()
for o in os.environ.get("CORS_ORIGINS", "").split(",")
if o.strip()
]
# File storage
DATA_DIR: Path = Path(os.environ.get("DATA_DIR", str(_ROOT / "data")))
UPLOAD_DIR: Path = DATA_DIR / "uploads"
PROCESSING_DIR: Path = DATA_DIR / "processing"
ARCHIVE_DIR: Path = DATA_DIR / "archive"
# Database
DB_PATH: Path = Path(os.environ.get("DB_PATH", str(DATA_DIR / "kiwi.db")))
# Processing
MAX_CONCURRENT_JOBS: int = int(os.environ.get("MAX_CONCURRENT_JOBS", "4"))
USE_GPU: bool = os.environ.get("USE_GPU", "true").lower() in ("1", "true", "yes")
GPU_MEMORY_LIMIT: int = int(os.environ.get("GPU_MEMORY_LIMIT", "6144"))
# Quality
MIN_QUALITY_SCORE: float = float(os.environ.get("MIN_QUALITY_SCORE", "50.0"))
# CF-core resource coordinator (VRAM lease management)
COORDINATOR_URL: str = os.environ.get("COORDINATOR_URL", "http://localhost:7700")
# Feature flags
ENABLE_OCR: bool = os.environ.get("ENABLE_OCR", "false").lower() in ("1", "true", "yes")
# Runtime
DEBUG: bool = os.environ.get("DEBUG", "false").lower() in ("1", "true", "yes")
CLOUD_MODE: bool = os.environ.get("CLOUD_MODE", "false").lower() in ("1", "true", "yes")
DEMO_MODE: bool = os.environ.get("DEMO_MODE", "false").lower() in ("1", "true", "yes")
def ensure_dirs(self) -> None:
for d in (self.UPLOAD_DIR, self.PROCESSING_DIR, self.ARCHIVE_DIR):
d.mkdir(parents=True, exist_ok=True)
settings = Settings()

1
app/db/__init__.py Normal file
View file

@ -0,0 +1 @@
# DB package — use app.db.store.Store for all database access

1
app/db/base.py Normal file
View file

@ -0,0 +1 @@
# Replaced by app.db.store — SQLAlchemy removed in favour of CF-core SQLite stack

View file

@ -0,0 +1,32 @@
-- Migration 001: receipts + quality assessments (ported from Alembic f31d9044277e)
CREATE TABLE IF NOT EXISTS receipts (
id INTEGER PRIMARY KEY AUTOINCREMENT,
filename TEXT NOT NULL,
original_path TEXT NOT NULL,
processed_path TEXT,
status TEXT NOT NULL DEFAULT 'uploaded'
CHECK (status IN ('uploaded', 'processing', 'processed', 'error')),
error TEXT,
metadata TEXT NOT NULL DEFAULT '{}',
created_at TEXT NOT NULL DEFAULT (datetime('now')),
updated_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_receipts_status ON receipts (status);
CREATE INDEX IF NOT EXISTS idx_receipts_created_at ON receipts (created_at DESC);
CREATE TABLE IF NOT EXISTS quality_assessments (
id INTEGER PRIMARY KEY AUTOINCREMENT,
receipt_id INTEGER NOT NULL UNIQUE
REFERENCES receipts (id) ON DELETE CASCADE,
overall_score REAL NOT NULL CHECK (overall_score >= 0 AND overall_score <= 100),
is_acceptable INTEGER NOT NULL DEFAULT 0,
metrics TEXT NOT NULL DEFAULT '{}',
improvement_suggestions TEXT NOT NULL DEFAULT '[]',
created_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_quality_receipt_id ON quality_assessments (receipt_id);
CREATE INDEX IF NOT EXISTS idx_quality_score ON quality_assessments (overall_score);
CREATE INDEX IF NOT EXISTS idx_quality_acceptable ON quality_assessments (is_acceptable);

View file

@ -0,0 +1,53 @@
-- Migration 002: products + inventory items (ported from Alembic 8fc1bf4e7a91)
CREATE TABLE IF NOT EXISTS products (
id INTEGER PRIMARY KEY AUTOINCREMENT,
barcode TEXT UNIQUE,
name TEXT NOT NULL,
brand TEXT,
category TEXT,
description TEXT,
image_url TEXT,
nutrition_data TEXT NOT NULL DEFAULT '{}',
source TEXT NOT NULL DEFAULT 'manual'
CHECK (source IN ('openfoodfacts', 'manual', 'receipt_ocr')),
source_data TEXT NOT NULL DEFAULT '{}',
created_at TEXT NOT NULL DEFAULT (datetime('now')),
updated_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_products_barcode ON products (barcode);
CREATE INDEX IF NOT EXISTS idx_products_name ON products (name);
CREATE INDEX IF NOT EXISTS idx_products_category ON products (category);
CREATE INDEX IF NOT EXISTS idx_products_source ON products (source);
CREATE TABLE IF NOT EXISTS inventory_items (
id INTEGER PRIMARY KEY AUTOINCREMENT,
product_id INTEGER NOT NULL
REFERENCES products (id) ON DELETE RESTRICT,
receipt_id INTEGER
REFERENCES receipts (id) ON DELETE SET NULL,
quantity REAL NOT NULL DEFAULT 1 CHECK (quantity > 0),
unit TEXT NOT NULL DEFAULT 'count',
location TEXT NOT NULL,
sublocation TEXT,
purchase_date TEXT,
expiration_date TEXT,
status TEXT NOT NULL DEFAULT 'available'
CHECK (status IN ('available', 'consumed', 'expired', 'discarded')),
consumed_at TEXT,
notes TEXT,
source TEXT NOT NULL DEFAULT 'manual'
CHECK (source IN ('barcode_scan', 'manual', 'receipt')),
created_at TEXT NOT NULL DEFAULT (datetime('now')),
updated_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_inventory_product ON inventory_items (product_id);
CREATE INDEX IF NOT EXISTS idx_inventory_receipt ON inventory_items (receipt_id);
CREATE INDEX IF NOT EXISTS idx_inventory_status ON inventory_items (status);
CREATE INDEX IF NOT EXISTS idx_inventory_location ON inventory_items (location);
CREATE INDEX IF NOT EXISTS idx_inventory_expiration ON inventory_items (expiration_date);
CREATE INDEX IF NOT EXISTS idx_inventory_created ON inventory_items (created_at DESC);
CREATE INDEX IF NOT EXISTS idx_inventory_active_loc ON inventory_items (status, location)
WHERE status = 'available';

View file

@ -0,0 +1,38 @@
-- Migration 003: OCR receipt data table (ported from Alembic 54cddaf4f4e2)
CREATE TABLE IF NOT EXISTS receipt_data (
id INTEGER PRIMARY KEY AUTOINCREMENT,
receipt_id INTEGER NOT NULL UNIQUE
REFERENCES receipts (id) ON DELETE CASCADE,
merchant_name TEXT,
merchant_address TEXT,
merchant_phone TEXT,
merchant_email TEXT,
merchant_website TEXT,
merchant_tax_id TEXT,
transaction_date TEXT,
transaction_time TEXT,
receipt_number TEXT,
register_number TEXT,
cashier_name TEXT,
transaction_id TEXT,
items TEXT NOT NULL DEFAULT '[]',
subtotal REAL,
tax REAL,
discount REAL,
tip REAL,
total REAL,
payment_method TEXT,
amount_paid REAL,
change_given REAL,
raw_text TEXT,
confidence_scores TEXT NOT NULL DEFAULT '{}',
warnings TEXT NOT NULL DEFAULT '[]',
processing_time REAL,
created_at TEXT NOT NULL DEFAULT (datetime('now')),
updated_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_receipt_data_receipt_id ON receipt_data (receipt_id);
CREATE INDEX IF NOT EXISTS idx_receipt_data_merchant ON receipt_data (merchant_name);
CREATE INDEX IF NOT EXISTS idx_receipt_data_date ON receipt_data (transaction_date);

View file

@ -0,0 +1,23 @@
-- Migration 004: tags + product_tags join table (ported from Alembic 14f688cde2ca)
CREATE TABLE IF NOT EXISTS tags (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL UNIQUE,
slug TEXT NOT NULL UNIQUE,
description TEXT,
color TEXT,
category TEXT CHECK (category IN ('food_type', 'dietary', 'allergen', 'custom') OR category IS NULL),
created_at TEXT NOT NULL DEFAULT (datetime('now')),
updated_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_tags_name ON tags (name);
CREATE INDEX IF NOT EXISTS idx_tags_slug ON tags (slug);
CREATE INDEX IF NOT EXISTS idx_tags_category ON tags (category);
CREATE TABLE IF NOT EXISTS product_tags (
product_id INTEGER NOT NULL REFERENCES products (id) ON DELETE CASCADE,
tag_id INTEGER NOT NULL REFERENCES tags (id) ON DELETE CASCADE,
created_at TEXT NOT NULL DEFAULT (datetime('now')),
PRIMARY KEY (product_id, tag_id)
);

View file

@ -0,0 +1,37 @@
-- Migration 005: Add 'staged' and 'low_quality' to receipts status constraint.
--
-- SQLite does not support ALTER TABLE to modify CHECK constraints.
-- Pattern: create new table → copy data → drop old → rename.
PRAGMA foreign_keys = OFF;
CREATE TABLE receipts_new (
id INTEGER PRIMARY KEY AUTOINCREMENT,
filename TEXT NOT NULL,
original_path TEXT NOT NULL,
processed_path TEXT,
status TEXT NOT NULL DEFAULT 'uploaded'
CHECK (status IN (
'uploaded',
'processing',
'processed',
'staged',
'low_quality',
'error'
)),
error TEXT,
metadata TEXT NOT NULL DEFAULT '{}',
created_at TEXT NOT NULL DEFAULT (datetime('now')),
updated_at TEXT NOT NULL DEFAULT (datetime('now'))
);
INSERT INTO receipts_new SELECT * FROM receipts;
DROP TABLE receipts;
ALTER TABLE receipts_new RENAME TO receipts;
CREATE INDEX IF NOT EXISTS idx_receipts_status ON receipts (status);
CREATE INDEX IF NOT EXISTS idx_receipts_created_at ON receipts (created_at DESC);
PRAGMA foreign_keys = ON;

View file

@ -0,0 +1,48 @@
-- Migration 006: Ingredient element profiles + FlavorGraph molecule index.
CREATE TABLE ingredient_profiles (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
name_variants TEXT NOT NULL DEFAULT '[]', -- JSON array of aliases/alternate spellings
elements TEXT NOT NULL DEFAULT '[]', -- JSON array: ["Richness","Depth"]
-- Functional submetadata (from USDA FDC)
fat_pct REAL DEFAULT 0.0,
fat_saturated_pct REAL DEFAULT 0.0,
moisture_pct REAL DEFAULT 0.0,
protein_pct REAL DEFAULT 0.0,
starch_pct REAL DEFAULT 0.0,
binding_score INTEGER DEFAULT 0 CHECK (binding_score BETWEEN 0 AND 3),
glutamate_mg REAL DEFAULT 0.0,
ph_estimate REAL,
sodium_mg_per_100g REAL DEFAULT 0.0,
smoke_point_c REAL,
is_fermented INTEGER NOT NULL DEFAULT 0,
is_emulsifier INTEGER NOT NULL DEFAULT 0,
-- Aroma submetadata
flavor_molecule_ids TEXT NOT NULL DEFAULT '[]', -- JSON array of FlavorGraph compound IDs
heat_stable INTEGER NOT NULL DEFAULT 1,
add_timing TEXT NOT NULL DEFAULT 'any'
CHECK (add_timing IN ('early','finish','any')),
-- Brightness submetadata
acid_type TEXT CHECK (acid_type IN ('citric','acetic','lactic',NULL)),
-- Texture submetadata
texture_profile TEXT NOT NULL DEFAULT 'neutral',
water_activity REAL,
-- Source
usda_fdc_id TEXT,
source TEXT NOT NULL DEFAULT 'usda',
created_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE UNIQUE INDEX idx_ingredient_profiles_name ON ingredient_profiles (name);
CREATE INDEX idx_ingredient_profiles_elements ON ingredient_profiles (elements);
CREATE TABLE flavor_molecules (
id INTEGER PRIMARY KEY AUTOINCREMENT,
compound_id TEXT NOT NULL UNIQUE, -- FlavorGraph node ID
compound_name TEXT NOT NULL,
ingredient_names TEXT NOT NULL DEFAULT '[]', -- JSON array of ingredient names
created_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX idx_flavor_molecules_compound_id ON flavor_molecules (compound_id);

View file

@ -0,0 +1,24 @@
-- Migration 007: Recipe corpus index (food.com dataset).
CREATE TABLE recipes (
id INTEGER PRIMARY KEY AUTOINCREMENT,
external_id TEXT,
title TEXT NOT NULL,
ingredients TEXT NOT NULL DEFAULT '[]', -- JSON array of raw ingredient strings
ingredient_names TEXT NOT NULL DEFAULT '[]', -- JSON array of normalized names
directions TEXT NOT NULL DEFAULT '[]', -- JSON array of step strings
category TEXT,
keywords TEXT NOT NULL DEFAULT '[]', -- JSON array
calories REAL,
fat_g REAL,
protein_g REAL,
sodium_mg REAL,
-- Element coverage scores computed at import time
element_coverage TEXT NOT NULL DEFAULT '{}', -- JSON {element: 0.0-1.0}
source TEXT NOT NULL DEFAULT 'foodcom',
created_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX idx_recipes_title ON recipes (title);
CREATE INDEX idx_recipes_category ON recipes (category);
CREATE UNIQUE INDEX idx_recipes_external_id ON recipes (external_id);

View file

@ -0,0 +1,22 @@
-- Migration 008: Derived substitution pairs.
-- Source: diff of lishuyang/recipepairs (GPL-3.0 derivation — raw data not shipped).
CREATE TABLE substitution_pairs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
original_name TEXT NOT NULL,
substitute_name TEXT NOT NULL,
constraint_label TEXT NOT NULL, -- 'vegan'|'vegetarian'|'dairy_free'|'gluten_free'|'low_fat'|'low_sodium'
fat_delta REAL DEFAULT 0.0,
moisture_delta REAL DEFAULT 0.0,
glutamate_delta REAL DEFAULT 0.0,
protein_delta REAL DEFAULT 0.0,
occurrence_count INTEGER DEFAULT 1,
compensation_hints TEXT NOT NULL DEFAULT '[]', -- JSON [{ingredient, reason, element}]
source TEXT NOT NULL DEFAULT 'derived',
created_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX idx_substitution_pairs_original ON substitution_pairs (original_name);
CREATE INDEX idx_substitution_pairs_constraint ON substitution_pairs (constraint_label);
CREATE UNIQUE INDEX idx_substitution_pairs_pair
ON substitution_pairs (original_name, substitute_name, constraint_label);

View file

@ -0,0 +1,27 @@
-- Migration 009: Staple library (bulk-preparable base components).
CREATE TABLE staples (
id INTEGER PRIMARY KEY AUTOINCREMENT,
slug TEXT NOT NULL UNIQUE,
name TEXT NOT NULL,
description TEXT,
base_ingredients TEXT NOT NULL DEFAULT '[]', -- JSON array of ingredient strings
base_method TEXT,
base_time_minutes INTEGER,
yield_formats TEXT NOT NULL DEFAULT '{}', -- JSON {format_name: {elements, shelf_days, methods, texture}}
dietary_labels TEXT NOT NULL DEFAULT '[]', -- JSON ['vegan','high-protein']
compatible_styles TEXT NOT NULL DEFAULT '[]', -- JSON [style_id]
created_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE TABLE user_staples (
id INTEGER PRIMARY KEY AUTOINCREMENT,
staple_slug TEXT NOT NULL REFERENCES staples(slug) ON DELETE CASCADE,
active_format TEXT NOT NULL,
quantity_g REAL,
prepared_at TEXT,
notes TEXT,
created_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX idx_user_staples_slug ON user_staples (staple_slug);

View file

@ -0,0 +1,15 @@
-- Migration 010: User substitution approval log (opt-in dataset moat).
CREATE TABLE substitution_feedback (
id INTEGER PRIMARY KEY AUTOINCREMENT,
original_name TEXT NOT NULL,
substitute_name TEXT NOT NULL,
constraint_label TEXT,
compensation_used TEXT NOT NULL DEFAULT '[]', -- JSON array of compensation ingredient names
approved INTEGER NOT NULL DEFAULT 0,
opted_in INTEGER NOT NULL DEFAULT 0, -- user consented to anonymized sharing
created_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX idx_substitution_feedback_original ON substitution_feedback (original_name);
CREATE INDEX idx_substitution_feedback_opted_in ON substitution_feedback (opted_in);

View file

@ -0,0 +1,11 @@
-- Migration 011: Daily rate limits (leftover mode: 5/day free tier).
CREATE TABLE rate_limits (
id INTEGER PRIMARY KEY AUTOINCREMENT,
feature TEXT NOT NULL,
window_date TEXT NOT NULL, -- YYYY-MM-DD
count INTEGER NOT NULL DEFAULT 0,
UNIQUE (feature, window_date)
);
CREATE INDEX idx_rate_limits_feature_date ON rate_limits (feature, window_date);

View file

@ -0,0 +1,6 @@
-- Migration 012: User settings key-value store.
CREATE TABLE IF NOT EXISTS user_settings (
key TEXT PRIMARY KEY,
value TEXT NOT NULL
);

View file

@ -0,0 +1,18 @@
-- 006_background_tasks.sql
-- Shared background task queue used by the LLM task scheduler.
-- Schema mirrors Peregrine's background_tasks for circuitforge-core compatibility.
CREATE TABLE IF NOT EXISTS background_tasks (
id INTEGER PRIMARY KEY AUTOINCREMENT,
task_type TEXT NOT NULL,
job_id INTEGER NOT NULL DEFAULT 0,
status TEXT NOT NULL DEFAULT 'queued',
params TEXT,
error TEXT,
stage TEXT,
created_at TEXT NOT NULL DEFAULT CURRENT_TIMESTAMP,
updated_at TEXT NOT NULL DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX IF NOT EXISTS idx_bg_tasks_status_type
ON background_tasks (status, task_type);

View file

@ -0,0 +1,18 @@
-- Migration 014: Add macro nutrition columns to recipes and ingredient_profiles.
--
-- recipes: sugar, carbs, fiber, servings, and an estimated flag.
-- ingredient_profiles: carbs, fiber, calories, sugar per 100g (for estimation fallback).
ALTER TABLE recipes ADD COLUMN sugar_g REAL;
ALTER TABLE recipes ADD COLUMN carbs_g REAL;
ALTER TABLE recipes ADD COLUMN fiber_g REAL;
ALTER TABLE recipes ADD COLUMN servings REAL;
ALTER TABLE recipes ADD COLUMN nutrition_estimated INTEGER NOT NULL DEFAULT 0;
ALTER TABLE ingredient_profiles ADD COLUMN carbs_g_per_100g REAL DEFAULT 0.0;
ALTER TABLE ingredient_profiles ADD COLUMN fiber_g_per_100g REAL DEFAULT 0.0;
ALTER TABLE ingredient_profiles ADD COLUMN calories_per_100g REAL DEFAULT 0.0;
ALTER TABLE ingredient_profiles ADD COLUMN sugar_g_per_100g REAL DEFAULT 0.0;
CREATE INDEX idx_recipes_sugar_g ON recipes (sugar_g);
CREATE INDEX idx_recipes_carbs_g ON recipes (carbs_g);

View file

@ -0,0 +1,16 @@
-- Migration 015: FTS5 inverted index for recipe ingredient lookup.
--
-- Content table backed by `recipes` — stores only the inverted index, no text duplication.
-- MATCH queries replace O(N) LIKE scans with O(log N) token lookups.
--
-- One-time rebuild cost on 3.2M rows: ~15-30 seconds at startup.
-- Subsequent startups skip this migration entirely.
CREATE VIRTUAL TABLE IF NOT EXISTS recipes_fts USING fts5(
ingredient_names,
content=recipes,
content_rowid=id,
tokenize="unicode61"
);
INSERT INTO recipes_fts(recipes_fts) VALUES('rebuild');

577
app/db/models.py Normal file
View file

@ -0,0 +1,577 @@
"""
REMOVED schema is now managed by plain SQL migrations in app/db/migrations/.
This file is kept for historical reference only. Nothing imports it.
"""
# fmt: off # noqa — dead file, not linted
from sqlalchemy import (
Column,
String,
Text,
Boolean,
Numeric,
DateTime,
Date,
ForeignKey,
CheckConstraint,
Index,
Table,
)
from sqlalchemy.dialects.postgresql import UUID, JSONB
from sqlalchemy.orm import relationship
from sqlalchemy.sql import func
from datetime import datetime
import uuid
from app.db.base import Base
# Association table for many-to-many relationship between products and tags
product_tags = Table(
"product_tags",
Base.metadata,
Column(
"product_id",
UUID(as_uuid=True),
ForeignKey("products.id", ondelete="CASCADE"),
primary_key=True,
),
Column(
"tag_id",
UUID(as_uuid=True),
ForeignKey("tags.id", ondelete="CASCADE"),
primary_key=True,
),
Column(
"created_at",
DateTime(timezone=True),
nullable=False,
server_default=func.now(),
),
)
class Receipt(Base):
"""
Receipt model - stores receipt metadata and processing status.
Corresponds to the 'receipts' table in the database schema.
"""
__tablename__ = "receipts"
# Primary Key
id = Column(
UUID(as_uuid=True),
primary_key=True,
default=uuid.uuid4,
server_default=func.gen_random_uuid(),
)
# File Information
filename = Column(String(255), nullable=False)
original_path = Column(Text, nullable=False)
processed_path = Column(Text, nullable=True)
# Processing Status
status = Column(
String(50),
nullable=False,
default="uploaded",
server_default="uploaded",
)
error = Column(Text, nullable=True)
# Metadata (JSONB for flexibility)
# Using 'receipt_metadata' to avoid conflict with SQLAlchemy's metadata attribute
receipt_metadata = Column("metadata", JSONB, nullable=False, default={}, server_default="{}")
# Timestamps
created_at = Column(
DateTime(timezone=True),
nullable=False,
default=datetime.utcnow,
server_default=func.now(),
)
updated_at = Column(
DateTime(timezone=True),
nullable=False,
default=datetime.utcnow,
server_default=func.now(),
onupdate=func.now(),
)
# Relationships
quality_assessment = relationship(
"QualityAssessment",
back_populates="receipt",
uselist=False, # One-to-one relationship
cascade="all, delete-orphan",
)
receipt_data = relationship(
"ReceiptData",
back_populates="receipt",
uselist=False, # One-to-one relationship
cascade="all, delete-orphan",
)
# Constraints and Indexes
__table_args__ = (
CheckConstraint(
"status IN ('uploaded', 'processing', 'processed', 'error')",
name="receipts_status_check",
),
# Indexes will be created after table definition
)
def __repr__(self) -> str:
return f"<Receipt(id={self.id}, filename={self.filename}, status={self.status})>"
# Create indexes for Receipt table
Index("idx_receipts_status", Receipt.status)
Index("idx_receipts_created_at", Receipt.created_at.desc())
Index("idx_receipts_metadata", Receipt.receipt_metadata, postgresql_using="gin")
class QualityAssessment(Base):
"""
Quality Assessment model - stores quality evaluation results.
One-to-one relationship with Receipt.
Corresponds to the 'quality_assessments' table in the database schema.
"""
__tablename__ = "quality_assessments"
# Primary Key
id = Column(
UUID(as_uuid=True),
primary_key=True,
default=uuid.uuid4,
server_default=func.gen_random_uuid(),
)
# Foreign Key (1:1 with receipts)
receipt_id = Column(
UUID(as_uuid=True),
ForeignKey("receipts.id", ondelete="CASCADE"),
nullable=False,
unique=True,
)
# Quality Scores
overall_score = Column(Numeric(5, 2), nullable=False)
is_acceptable = Column(Boolean, nullable=False, default=False, server_default="false")
# Detailed Metrics (JSONB)
metrics = Column(JSONB, nullable=False, default={}, server_default="{}")
# Improvement Suggestions
improvement_suggestions = Column(JSONB, nullable=False, default=[], server_default="[]")
# Timestamp
created_at = Column(
DateTime(timezone=True),
nullable=False,
default=datetime.utcnow,
server_default=func.now(),
)
# Relationships
receipt = relationship("Receipt", back_populates="quality_assessment")
# Constraints
__table_args__ = (
CheckConstraint(
"overall_score >= 0 AND overall_score <= 100",
name="quality_assessments_score_range",
),
Index("idx_quality_assessments_receipt_id", "receipt_id"),
Index("idx_quality_assessments_score", "overall_score"),
Index("idx_quality_assessments_acceptable", "is_acceptable"),
Index("idx_quality_assessments_metrics", "metrics", postgresql_using="gin"),
)
def __repr__(self) -> str:
return (
f"<QualityAssessment(id={self.id}, receipt_id={self.receipt_id}, "
f"score={self.overall_score}, acceptable={self.is_acceptable})>"
)
class Product(Base):
"""
Product model - stores product catalog information.
Products can come from:
- Barcode scans (OpenFoodFacts API)
- Manual user entries
- Future: OCR extraction from receipts
One product can have many inventory items.
"""
__tablename__ = "products"
# Primary Key
id = Column(
UUID(as_uuid=True),
primary_key=True,
default=uuid.uuid4,
server_default=func.gen_random_uuid(),
)
# Identifiers
barcode = Column(String(50), unique=True, nullable=True) # UPC/EAN code
# Product Information
name = Column(String(500), nullable=False)
brand = Column(String(255), nullable=True)
category = Column(String(255), nullable=True)
# Additional Details
description = Column(Text, nullable=True)
image_url = Column(Text, nullable=True)
# Nutritional Data (JSONB for flexibility)
nutrition_data = Column(JSONB, nullable=False, default={}, server_default="{}")
# Source Tracking
source = Column(
String(50),
nullable=False,
default="manual",
server_default="manual",
) # 'openfoodfacts', 'manual', 'receipt_ocr'
source_data = Column(JSONB, nullable=False, default={}, server_default="{}")
# Timestamps
created_at = Column(
DateTime(timezone=True),
nullable=False,
default=datetime.utcnow,
server_default=func.now(),
)
updated_at = Column(
DateTime(timezone=True),
nullable=False,
default=datetime.utcnow,
server_default=func.now(),
onupdate=func.now(),
)
# Relationships
inventory_items = relationship(
"InventoryItem",
back_populates="product",
cascade="all, delete-orphan",
)
tags = relationship(
"Tag",
secondary=product_tags,
back_populates="products",
)
# Constraints
__table_args__ = (
CheckConstraint(
"source IN ('openfoodfacts', 'manual', 'receipt_ocr')",
name="products_source_check",
),
)
def __repr__(self) -> str:
return f"<Product(id={self.id}, name={self.name}, barcode={self.barcode})>"
class Tag(Base):
"""
Tag model - stores tags/labels for organizing products.
Tags can be used to categorize products by:
- Food type (dairy, meat, vegetables, fruit, etc.)
- Dietary restrictions (vegan, gluten-free, kosher, halal, etc.)
- Allergens (contains nuts, contains dairy, etc.)
- Custom user categories
Many-to-many relationship with products.
"""
__tablename__ = "tags"
# Primary Key
id = Column(
UUID(as_uuid=True),
primary_key=True,
default=uuid.uuid4,
server_default=func.gen_random_uuid(),
)
# Tag Information
name = Column(String(100), nullable=False, unique=True)
slug = Column(String(100), nullable=False, unique=True) # URL-safe version
description = Column(Text, nullable=True)
color = Column(String(7), nullable=True) # Hex color code for UI (#FF5733)
# Category (optional grouping)
category = Column(String(50), nullable=True) # 'food_type', 'dietary', 'allergen', 'custom'
# Timestamps
created_at = Column(
DateTime(timezone=True),
nullable=False,
default=datetime.utcnow,
server_default=func.now(),
)
updated_at = Column(
DateTime(timezone=True),
nullable=False,
default=datetime.utcnow,
server_default=func.now(),
onupdate=func.now(),
)
# Relationships
products = relationship(
"Product",
secondary=product_tags,
back_populates="tags",
)
# Constraints
__table_args__ = (
CheckConstraint(
"category IN ('food_type', 'dietary', 'allergen', 'custom', NULL)",
name="tags_category_check",
),
)
def __repr__(self) -> str:
return f"<Tag(id={self.id}, name={self.name}, category={self.category})>"
class InventoryItem(Base):
"""
Inventory Item model - tracks individual items in user's inventory.
Links to a Product and adds user-specific information like
quantity, location, expiration date, etc.
"""
__tablename__ = "inventory_items"
# Primary Key
id = Column(
UUID(as_uuid=True),
primary_key=True,
default=uuid.uuid4,
server_default=func.gen_random_uuid(),
)
# Foreign Keys
product_id = Column(
UUID(as_uuid=True),
ForeignKey("products.id", ondelete="RESTRICT"),
nullable=False,
)
receipt_id = Column(
UUID(as_uuid=True),
ForeignKey("receipts.id", ondelete="SET NULL"),
nullable=True,
)
# Quantity
quantity = Column(Numeric(10, 2), nullable=False, default=1)
unit = Column(String(50), nullable=False, default="count", server_default="count")
# Location
location = Column(String(100), nullable=False)
sublocation = Column(String(255), nullable=True)
# Dates
purchase_date = Column(Date, nullable=True)
expiration_date = Column(Date, nullable=True)
# Status
status = Column(
String(50),
nullable=False,
default="available",
server_default="available",
)
consumed_at = Column(DateTime(timezone=True), nullable=True)
# Notes
notes = Column(Text, nullable=True)
# Source Tracking
source = Column(
String(50),
nullable=False,
default="manual",
server_default="manual",
) # 'barcode_scan', 'manual', 'receipt'
# Timestamps
created_at = Column(
DateTime(timezone=True),
nullable=False,
default=datetime.utcnow,
server_default=func.now(),
)
updated_at = Column(
DateTime(timezone=True),
nullable=False,
default=datetime.utcnow,
server_default=func.now(),
onupdate=func.now(),
)
# Relationships
product = relationship("Product", back_populates="inventory_items")
receipt = relationship("Receipt")
# Constraints
__table_args__ = (
CheckConstraint(
"status IN ('available', 'consumed', 'expired', 'discarded')",
name="inventory_items_status_check",
),
CheckConstraint(
"source IN ('barcode_scan', 'manual', 'receipt')",
name="inventory_items_source_check",
),
CheckConstraint(
"quantity > 0",
name="inventory_items_quantity_positive",
),
)
def __repr__(self) -> str:
return (
f"<InventoryItem(id={self.id}, product_id={self.product_id}, "
f"quantity={self.quantity}, location={self.location}, status={self.status})>"
)
# Create indexes for Product table
Index("idx_products_barcode", Product.barcode)
Index("idx_products_name", Product.name)
Index("idx_products_category", Product.category)
Index("idx_products_source", Product.source)
Index("idx_products_nutrition_data", Product.nutrition_data, postgresql_using="gin")
# Create indexes for Tag table
Index("idx_tags_name", Tag.name)
Index("idx_tags_slug", Tag.slug)
Index("idx_tags_category", Tag.category)
# Create indexes for InventoryItem table
Index("idx_inventory_items_product", InventoryItem.product_id)
Index("idx_inventory_items_receipt", InventoryItem.receipt_id)
Index("idx_inventory_items_status", InventoryItem.status)
Index("idx_inventory_items_location", InventoryItem.location)
Index("idx_inventory_items_expiration", InventoryItem.expiration_date)
Index("idx_inventory_items_created", InventoryItem.created_at.desc())
# Composite index for common query: active items by location
Index(
"idx_inventory_items_active_by_location",
InventoryItem.status,
InventoryItem.location,
postgresql_where=(InventoryItem.status == "available"),
)
class ReceiptData(Base):
"""
Receipt Data model - stores OCR-extracted structured data from receipts.
One-to-one relationship with Receipt.
Stores merchant info, transaction details, line items, and totals.
"""
__tablename__ = "receipt_data"
# Primary Key
id = Column(
UUID(as_uuid=True),
primary_key=True,
default=uuid.uuid4,
server_default=func.gen_random_uuid(),
)
# Foreign Key (1:1 with receipts)
receipt_id = Column(
UUID(as_uuid=True),
ForeignKey("receipts.id", ondelete="CASCADE"),
nullable=False,
unique=True,
)
# Merchant Information
merchant_name = Column(String(500), nullable=True)
merchant_address = Column(Text, nullable=True)
merchant_phone = Column(String(50), nullable=True)
merchant_email = Column(String(255), nullable=True)
merchant_website = Column(String(255), nullable=True)
merchant_tax_id = Column(String(100), nullable=True)
# Transaction Information
transaction_date = Column(Date, nullable=True)
transaction_time = Column(String(20), nullable=True) # Store as string for flexibility
receipt_number = Column(String(100), nullable=True)
register_number = Column(String(50), nullable=True)
cashier_name = Column(String(255), nullable=True)
transaction_id = Column(String(100), nullable=True)
# Line Items (JSONB array)
items = Column(JSONB, nullable=False, default=[], server_default="[]")
# Financial Totals
subtotal = Column(Numeric(12, 2), nullable=True)
tax = Column(Numeric(12, 2), nullable=True)
discount = Column(Numeric(12, 2), nullable=True)
tip = Column(Numeric(12, 2), nullable=True)
total = Column(Numeric(12, 2), nullable=True)
payment_method = Column(String(100), nullable=True)
amount_paid = Column(Numeric(12, 2), nullable=True)
change_given = Column(Numeric(12, 2), nullable=True)
# OCR Metadata
raw_text = Column(Text, nullable=True) # Full OCR text output
confidence_scores = Column(JSONB, nullable=False, default={}, server_default="{}")
warnings = Column(JSONB, nullable=False, default=[], server_default="[]")
processing_time = Column(Numeric(8, 3), nullable=True) # seconds
# Timestamps
created_at = Column(
DateTime(timezone=True),
nullable=False,
default=datetime.utcnow,
server_default=func.now(),
)
updated_at = Column(
DateTime(timezone=True),
nullable=False,
default=datetime.utcnow,
server_default=func.now(),
onupdate=func.now(),
)
# Relationships
receipt = relationship("Receipt", back_populates="receipt_data")
def __repr__(self) -> str:
return (
f"<ReceiptData(id={self.id}, receipt_id={self.receipt_id}, "
f"merchant={self.merchant_name}, total={self.total})>"
)
# Create indexes for ReceiptData table
Index("idx_receipt_data_receipt_id", ReceiptData.receipt_id)
Index("idx_receipt_data_merchant", ReceiptData.merchant_name)
Index("idx_receipt_data_date", ReceiptData.transaction_date)
Index("idx_receipt_data_items", ReceiptData.items, postgresql_using="gin")
Index("idx_receipt_data_confidence", ReceiptData.confidence_scores, postgresql_using="gin")

23
app/db/session.py Normal file
View file

@ -0,0 +1,23 @@
"""
FastAPI dependency that provides a Store instance per request.
Local mode: opens a Store at settings.DB_PATH.
Cloud mode: opens a Store at the per-user DB path from the CloudUser session.
"""
from __future__ import annotations
from typing import Generator
from fastapi import Depends
from app.cloud_session import CloudUser, get_session
from app.db.store import Store
def get_store(session: CloudUser = Depends(get_session)) -> Generator[Store, None, None]:
"""FastAPI dependency — yields a Store for the current user, closes on completion."""
store = Store(session.db)
try:
yield store
finally:
store.close()

737
app/db/store.py Normal file
View file

@ -0,0 +1,737 @@
"""
SQLite data store for Kiwi.
Uses circuitforge-core for connection management and migrations.
"""
from __future__ import annotations
import json
import sqlite3
from pathlib import Path
from typing import Any
from circuitforge_core.db.base import get_connection
from circuitforge_core.db.migrations import run_migrations
MIGRATIONS_DIR = Path(__file__).parent / "migrations"
class Store:
def __init__(self, db_path: Path, key: str = "") -> None:
self.conn: sqlite3.Connection = get_connection(db_path, key)
self.conn.execute("PRAGMA journal_mode=WAL")
self.conn.execute("PRAGMA foreign_keys=ON")
run_migrations(self.conn, MIGRATIONS_DIR)
def close(self) -> None:
self.conn.close()
# ── helpers ───────────────────────────────────────────────────────────
def _row_to_dict(self, row: sqlite3.Row) -> dict[str, Any]:
d = dict(row)
# Deserialise any TEXT columns that contain JSON
for key in ("metadata", "nutrition_data", "source_data", "items",
"metrics", "improvement_suggestions", "confidence_scores",
"warnings",
# recipe columns
"ingredients", "ingredient_names", "directions",
"keywords", "element_coverage"):
if key in d and isinstance(d[key], str):
try:
d[key] = json.loads(d[key])
except (json.JSONDecodeError, TypeError):
pass
return d
def _fetch_one(self, sql: str, params: tuple = ()) -> dict[str, Any] | None:
self.conn.row_factory = sqlite3.Row
row = self.conn.execute(sql, params).fetchone()
return self._row_to_dict(row) if row else None
def _fetch_all(self, sql: str, params: tuple = ()) -> list[dict[str, Any]]:
self.conn.row_factory = sqlite3.Row
rows = self.conn.execute(sql, params).fetchall()
return [self._row_to_dict(r) for r in rows]
def _dump(self, value: Any) -> str:
"""Serialise a Python object to a JSON string for storage."""
return json.dumps(value)
# ── receipts ──────────────────────────────────────────────────────────
def _insert_returning(self, sql: str, params: tuple = ()) -> dict[str, Any]:
"""Execute an INSERT ... RETURNING * and return the new row as a dict.
Fetches the row BEFORE committing SQLite requires the cursor to be
fully consumed before the transaction is committed."""
self.conn.row_factory = sqlite3.Row
cur = self.conn.execute(sql, params)
row = self._row_to_dict(cur.fetchone())
self.conn.commit()
return row
def create_receipt(self, filename: str, original_path: str) -> dict[str, Any]:
return self._insert_returning(
"INSERT INTO receipts (filename, original_path) VALUES (?, ?) RETURNING *",
(filename, original_path),
)
def get_receipt(self, receipt_id: int) -> dict[str, Any] | None:
return self._fetch_one("SELECT * FROM receipts WHERE id = ?", (receipt_id,))
def list_receipts(self, limit: int = 50, offset: int = 0) -> list[dict[str, Any]]:
return self._fetch_all(
"SELECT * FROM receipts ORDER BY created_at DESC LIMIT ? OFFSET ?",
(limit, offset),
)
def update_receipt_status(self, receipt_id: int, status: str,
error: str | None = None) -> None:
self.conn.execute(
"UPDATE receipts SET status = ?, error = ?, updated_at = datetime('now') WHERE id = ?",
(status, error, receipt_id),
)
self.conn.commit()
def update_receipt_metadata(self, receipt_id: int, metadata: dict) -> None:
self.conn.execute(
"UPDATE receipts SET metadata = ?, updated_at = datetime('now') WHERE id = ?",
(self._dump(metadata), receipt_id),
)
self.conn.commit()
# ── quality assessments ───────────────────────────────────────────────
def upsert_quality_assessment(self, receipt_id: int, overall_score: float,
is_acceptable: bool, metrics: dict,
suggestions: list) -> dict[str, Any]:
self.conn.execute(
"""INSERT INTO quality_assessments
(receipt_id, overall_score, is_acceptable, metrics, improvement_suggestions)
VALUES (?, ?, ?, ?, ?)
ON CONFLICT (receipt_id) DO UPDATE SET
overall_score = excluded.overall_score,
is_acceptable = excluded.is_acceptable,
metrics = excluded.metrics,
improvement_suggestions = excluded.improvement_suggestions""",
(receipt_id, overall_score, int(is_acceptable),
self._dump(metrics), self._dump(suggestions)),
)
self.conn.commit()
return self._fetch_one(
"SELECT * FROM quality_assessments WHERE receipt_id = ?", (receipt_id,)
)
# ── products ──────────────────────────────────────────────────────────
def get_or_create_product(self, name: str, barcode: str | None = None,
**kwargs) -> tuple[dict[str, Any], bool]:
"""Returns (product, created). Looks up by barcode first, then name."""
if barcode:
existing = self._fetch_one(
"SELECT * FROM products WHERE barcode = ?", (barcode,)
)
if existing:
return existing, False
existing = self._fetch_one("SELECT * FROM products WHERE name = ?", (name,))
if existing:
return existing, False
row = self._insert_returning(
"""INSERT INTO products (name, barcode, brand, category, description,
image_url, nutrition_data, source, source_data)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?) RETURNING *""",
(
name, barcode,
kwargs.get("brand"), kwargs.get("category"),
kwargs.get("description"), kwargs.get("image_url"),
self._dump(kwargs.get("nutrition_data", {})),
kwargs.get("source", "manual"),
self._dump(kwargs.get("source_data", {})),
),
)
return row, True
def get_product(self, product_id: int) -> dict[str, Any] | None:
return self._fetch_one("SELECT * FROM products WHERE id = ?", (product_id,))
def list_products(self) -> list[dict[str, Any]]:
return self._fetch_all("SELECT * FROM products ORDER BY name")
# ── inventory ─────────────────────────────────────────────────────────
def add_inventory_item(self, product_id: int, location: str,
quantity: float = 1.0, unit: str = "count",
**kwargs) -> dict[str, Any]:
return self._insert_returning(
"""INSERT INTO inventory_items
(product_id, receipt_id, quantity, unit, location, sublocation,
purchase_date, expiration_date, notes, source)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?) RETURNING *""",
(
product_id, kwargs.get("receipt_id"),
quantity, unit, location, kwargs.get("sublocation"),
kwargs.get("purchase_date"), kwargs.get("expiration_date"),
kwargs.get("notes"), kwargs.get("source", "manual"),
),
)
def get_inventory_item(self, item_id: int) -> dict[str, Any] | None:
return self._fetch_one(
"""SELECT i.*, p.name as product_name, p.barcode, p.category
FROM inventory_items i
JOIN products p ON p.id = i.product_id
WHERE i.id = ?""",
(item_id,),
)
def list_inventory(self, location: str | None = None,
status: str = "available") -> list[dict[str, Any]]:
if location:
return self._fetch_all(
"""SELECT i.*, p.name as product_name, p.barcode, p.category
FROM inventory_items i
JOIN products p ON p.id = i.product_id
WHERE i.status = ? AND i.location = ?
ORDER BY i.expiration_date ASC NULLS LAST""",
(status, location),
)
return self._fetch_all(
"""SELECT i.*, p.name as product_name, p.barcode, p.category
FROM inventory_items i
JOIN products p ON p.id = i.product_id
WHERE i.status = ?
ORDER BY i.expiration_date ASC NULLS LAST""",
(status,),
)
def update_inventory_item(self, item_id: int, **kwargs) -> dict[str, Any] | None:
allowed = {"quantity", "unit", "location", "sublocation",
"expiration_date", "status", "notes", "consumed_at"}
updates = {k: v for k, v in kwargs.items() if k in allowed}
if not updates:
return self.get_inventory_item(item_id)
sets = ", ".join(f"{k} = ?" for k in updates)
values = list(updates.values()) + [item_id]
self.conn.execute(
f"UPDATE inventory_items SET {sets}, updated_at = datetime('now') WHERE id = ?",
values,
)
self.conn.commit()
return self.get_inventory_item(item_id)
def expiring_soon(self, days: int = 7) -> list[dict[str, Any]]:
return self._fetch_all(
"""SELECT i.*, p.name as product_name, p.category
FROM inventory_items i
JOIN products p ON p.id = i.product_id
WHERE i.status = 'available'
AND i.expiration_date IS NOT NULL
AND date(i.expiration_date) <= date('now', ? || ' days')
ORDER BY i.expiration_date ASC""",
(str(days),),
)
def recalculate_expiry(
self,
tier: str = "local",
has_byok: bool = False,
) -> tuple[int, int]:
"""Re-run the expiration predictor over all available inventory items.
Uses each item's existing purchase_date (falls back to today if NULL)
and its current location. Skips items that have an explicit
expiration_date from a source other than auto-prediction (i.e. items
whose expiry was found on a receipt or entered by the user) cannot be
distinguished all available items are recalculated.
Returns (updated_count, skipped_count).
"""
from datetime import date
from app.services.expiration_predictor import ExpirationPredictor
predictor = ExpirationPredictor()
rows = self._fetch_all(
"""SELECT i.id, i.location, i.purchase_date,
p.name AS product_name, p.category AS product_category
FROM inventory_items i
JOIN products p ON p.id = i.product_id
WHERE i.status = 'available'""",
(),
)
updated = skipped = 0
for row in rows:
cat = predictor.get_category_from_product(
row["product_name"] or "",
product_category=row.get("product_category"),
location=row.get("location"),
)
purchase_date_raw = row.get("purchase_date")
try:
purchase_date = (
date.fromisoformat(purchase_date_raw)
if purchase_date_raw
else date.today()
)
except (ValueError, TypeError):
purchase_date = date.today()
exp = predictor.predict_expiration(
cat,
row["location"] or "pantry",
purchase_date=purchase_date,
product_name=row["product_name"],
tier=tier,
has_byok=has_byok,
)
if exp is None:
skipped += 1
continue
self.conn.execute(
"UPDATE inventory_items SET expiration_date = ?, updated_at = datetime('now') WHERE id = ?",
(str(exp), row["id"]),
)
updated += 1
self.conn.commit()
return updated, skipped
# ── receipt_data ──────────────────────────────────────────────────────
def upsert_receipt_data(self, receipt_id: int, data: dict) -> dict[str, Any]:
fields = [
"merchant_name", "merchant_address", "merchant_phone", "merchant_email",
"merchant_website", "merchant_tax_id", "transaction_date", "transaction_time",
"receipt_number", "register_number", "cashier_name", "transaction_id",
"items", "subtotal", "tax", "discount", "tip", "total",
"payment_method", "amount_paid", "change_given",
"raw_text", "confidence_scores", "warnings", "processing_time",
]
json_fields = {"items", "confidence_scores", "warnings"}
cols = ", ".join(fields)
placeholders = ", ".join("?" for _ in fields)
values = [
self._dump(data.get(f)) if f in json_fields and data.get(f) is not None
else data.get(f)
for f in fields
]
self.conn.execute(
f"""INSERT INTO receipt_data (receipt_id, {cols})
VALUES (?, {placeholders})
ON CONFLICT (receipt_id) DO UPDATE SET
{', '.join(f'{f} = excluded.{f}' for f in fields)},
updated_at = datetime('now')""",
[receipt_id] + values,
)
self.conn.commit()
return self._fetch_one(
"SELECT * FROM receipt_data WHERE receipt_id = ?", (receipt_id,)
)
# ── recipes ───────────────────────────────────────────────────────────
def _fts_ready(self) -> bool:
"""Return True if the recipes_fts virtual table exists."""
row = self._fetch_one(
"SELECT 1 FROM sqlite_master WHERE type='table' AND name='recipes_fts'"
)
return row is not None
# Words that carry no recipe-ingredient signal and should be filtered
# out when tokenising multi-word product names for FTS expansion.
_FTS_TOKEN_STOPWORDS: frozenset[str] = frozenset({
# Common English stopwords
"a", "an", "the", "of", "in", "for", "with", "and", "or", "to",
"from", "at", "by", "as", "on", "into",
# Brand / marketing words that appear in product names
"lean", "cuisine", "healthy", "choice", "stouffer", "original",
"classic", "deluxe", "homestyle", "family", "style", "grade",
"premium", "select", "natural", "organic", "fresh", "lite",
"ready", "quick", "easy", "instant", "microwave", "frozen",
"brand", "size", "large", "small", "medium", "extra",
# Plant-based / alt-meat brand names
"daring", "gardein", "morningstar", "lightlife", "tofurky",
"quorn", "omni", "nuggs", "simulate", "simulate",
# Preparation states — "cut up chicken" is still chicken
"cut", "diced", "sliced", "chopped", "minced", "shredded",
"cooked", "raw", "whole", "boneless", "skinless", "trimmed",
"pre", "prepared", "marinated", "seasoned", "breaded", "battered",
"grilled", "roasted", "smoked", "canned", "dried", "dehydrated",
"pieces", "piece", "strips", "strip", "chunks", "chunk",
"fillets", "fillet", "cutlets", "cutlet", "tenders", "nuggets",
# Units / packaging
"oz", "lb", "lbs", "pkg", "pack", "box", "can", "bag", "jar",
})
# Maps substrings found in product-label names to canonical recipe-corpus
# ingredient terms. Checked as substring matches against the lower-cased
# full product name, then against each individual token.
_FTS_SYNONYMS: dict[str, str] = {
# Ground / minced beef
"burger patt": "hamburger",
"beef patt": "hamburger",
"ground beef": "hamburger",
"ground chuck": "hamburger",
"ground round": "hamburger",
"mince": "hamburger",
"veggie burger": "hamburger",
"beyond burger": "hamburger",
"impossible burger": "hamburger",
"plant burger": "hamburger",
"chicken patt": "hamburger", # FTS match only — recipe scoring still works
# Sausages
"kielbasa": "sausage",
"bratwurst": "sausage",
"brat ": "sausage",
"frankfurter": "hotdog",
"wiener": "hotdog",
# Chicken cuts + plant-based chicken → generic chicken for broader matching
"chicken breast": "chicken",
"chicken thigh": "chicken",
"chicken drumstick": "chicken",
"chicken wing": "chicken",
"rotisserie chicken": "chicken",
"chicken tender": "chicken",
"chicken strip": "chicken",
"chicken piece": "chicken",
"fake chicken": "chicken",
"plant chicken": "chicken",
"vegan chicken": "chicken",
"daring": "chicken", # Daring Foods brand
"gardein chick": "chicken",
"quorn chick": "chicken",
"chick'n": "chicken",
"chikn": "chicken",
"not-chicken": "chicken",
"no-chicken": "chicken",
# Plant-based beef subs — map to broad "beef" not "hamburger"
# (texture varies: strips ≠ ground; let corpus handle the specific form)
"not-beef": "beef",
"no-beef": "beef",
"plant beef": "beef",
"vegan beef": "beef",
# Plant-based pork subs
"not-pork": "pork",
"no-pork": "pork",
"plant pork": "pork",
"vegan pork": "pork",
"omnipork": "pork",
"omni pork": "pork",
# Generic alt-meat catch-alls → broad "beef" (safer than hamburger)
"fake meat": "beef",
"plant meat": "beef",
"vegan meat": "beef",
"meat-free": "beef",
"meatless": "beef",
# Pork cuts
"pork chop": "pork",
"pork loin": "pork",
"pork tenderloin": "pork",
# Tomato-based sauces
"marinara": "tomato sauce",
"pasta sauce": "tomato sauce",
"spaghetti sauce": "tomato sauce",
"pizza sauce": "tomato sauce",
# Pasta shapes — map to generic "pasta" so FTS finds any pasta recipe
"macaroni": "pasta",
"noodles": "pasta",
"spaghetti": "pasta",
"penne": "pasta",
"fettuccine": "pasta",
"rigatoni": "pasta",
"linguine": "pasta",
"rotini": "pasta",
"farfalle": "pasta",
# Cheese variants → "cheese" for broad matching
"shredded cheese": "cheese",
"sliced cheese": "cheese",
"american cheese": "cheese",
"cheddar": "cheese",
"mozzarella": "cheese",
# Cream variants
"heavy cream": "cream",
"whipping cream": "cream",
"half and half": "cream",
# Buns / rolls
"burger bun": "buns",
"hamburger bun": "buns",
"hot dog bun": "buns",
"bread roll": "buns",
"dinner roll": "buns",
# Tortillas / wraps
"flour tortilla": "tortillas",
"corn tortilla": "tortillas",
"tortilla wrap": "tortillas",
"soft taco shell": "tortillas",
"taco shell": "taco shells",
"pita bread": "pita",
"flatbread": "flatbread",
# Canned beans
"black bean": "beans",
"pinto bean": "beans",
"kidney bean": "beans",
"refried bean": "beans",
"chickpea": "beans",
"garbanzo": "beans",
# Rice variants
"white rice": "rice",
"brown rice": "rice",
"jasmine rice": "rice",
"basmati rice": "rice",
"instant rice": "rice",
"microwavable rice": "rice",
# Salsa / hot sauce
"hot sauce": "salsa",
"taco sauce": "salsa",
"enchilada sauce": "salsa",
# Sour cream substitute
"greek yogurt": "sour cream",
# Prepackaged meals
"lean cuisine": "casserole",
"stouffer": "casserole",
"healthy choice": "casserole",
"marie callender": "casserole",
}
@staticmethod
def _normalize_for_fts(name: str) -> list[str]:
"""Expand one pantry item to all FTS search terms it should contribute.
Returns the original name plus:
- Any synonym-map canonical terms (handles product-label corpus name)
- Individual significant tokens from multi-word product names
(handles packaged meals like "Lean Cuisine Chicken Alfredo" also
searches for "chicken" and "alfredo" independently)
"""
lower = name.lower().strip()
if not lower:
return []
terms: list[str] = [lower]
# Substring synonym check on full name
for pattern, canonical in Store._FTS_SYNONYMS.items():
if pattern in lower:
terms.append(canonical)
# For multi-word product names, also add individual significant tokens
if " " in lower:
for token in lower.split():
if len(token) <= 3 or token in Store._FTS_TOKEN_STOPWORDS:
continue
if token not in terms:
terms.append(token)
# Synonym-expand individual tokens too
if token in Store._FTS_SYNONYMS:
canonical = Store._FTS_SYNONYMS[token]
if canonical not in terms:
terms.append(canonical)
return terms
@staticmethod
def _build_fts_query(ingredient_names: list[str]) -> str:
"""Build an FTS5 MATCH expression ORing all ingredient terms.
Each pantry item is expanded via _normalize_for_fts so that
product-label names (e.g. "burger patties") also search for their
recipe-corpus equivalents (e.g. "hamburger"), and multi-word packaged
product names contribute their individual ingredient tokens.
"""
parts: list[str] = []
seen: set[str] = set()
for name in ingredient_names:
for term in Store._normalize_for_fts(name):
# Strip characters that break FTS5 query syntax
clean = term.replace('"', "").replace("'", "")
if not clean or clean in seen:
continue
seen.add(clean)
parts.append(f'"{clean}"')
return " OR ".join(parts)
def search_recipes_by_ingredients(
self,
ingredient_names: list[str],
limit: int = 20,
category: str | None = None,
max_calories: float | None = None,
max_sugar_g: float | None = None,
max_carbs_g: float | None = None,
max_sodium_mg: float | None = None,
excluded_ids: list[int] | None = None,
) -> list[dict]:
"""Find recipes containing any of the given ingredient names.
Scores by match count and returns highest-scoring first.
Uses FTS5 index (migration 015) when available O(log N) per query.
Falls back to LIKE scans on older databases.
Nutrition filters use NULL-passthrough: rows without nutrition data
always pass (they may be estimated or absent entirely).
"""
if not ingredient_names:
return []
extra_clauses: list[str] = []
extra_params: list = []
if category:
extra_clauses.append("r.category = ?")
extra_params.append(category)
if max_calories is not None:
extra_clauses.append("(r.calories IS NULL OR r.calories <= ?)")
extra_params.append(max_calories)
if max_sugar_g is not None:
extra_clauses.append("(r.sugar_g IS NULL OR r.sugar_g <= ?)")
extra_params.append(max_sugar_g)
if max_carbs_g is not None:
extra_clauses.append("(r.carbs_g IS NULL OR r.carbs_g <= ?)")
extra_params.append(max_carbs_g)
if max_sodium_mg is not None:
extra_clauses.append("(r.sodium_mg IS NULL OR r.sodium_mg <= ?)")
extra_params.append(max_sodium_mg)
if excluded_ids:
placeholders = ",".join("?" * len(excluded_ids))
extra_clauses.append(f"r.id NOT IN ({placeholders})")
extra_params.extend(excluded_ids)
where_extra = (" AND " + " AND ".join(extra_clauses)) if extra_clauses else ""
if self._fts_ready():
return self._search_recipes_fts(
ingredient_names, limit, where_extra, extra_params
)
return self._search_recipes_like(
ingredient_names, limit, where_extra, extra_params
)
def _search_recipes_fts(
self,
ingredient_names: list[str],
limit: int,
where_extra: str,
extra_params: list,
) -> list[dict]:
"""FTS5-backed ingredient search. Candidates fetched via inverted index;
match_count computed in Python over the small candidate set."""
fts_query = self._build_fts_query(ingredient_names)
if not fts_query:
return []
# Pull up to 10× limit candidates so ranking has enough headroom.
sql = f"""
SELECT r.*
FROM recipes_fts
JOIN recipes r ON r.id = recipes_fts.rowid
WHERE recipes_fts MATCH ?
{where_extra}
LIMIT ?
"""
rows = self._fetch_all(sql, (fts_query, *extra_params, limit * 10))
pantry_set = {n.lower().strip() for n in ingredient_names}
scored: list[dict] = []
for row in rows:
raw = row.get("ingredient_names") or []
names: list[str] = raw if isinstance(raw, list) else json.loads(raw or "[]")
match_count = sum(1 for n in names if n.lower() in pantry_set)
scored.append({**row, "match_count": match_count})
scored.sort(key=lambda r: (-r["match_count"], r["id"]))
return scored[:limit]
def _search_recipes_like(
self,
ingredient_names: list[str],
limit: int,
where_extra: str,
extra_params: list,
) -> list[dict]:
"""Legacy LIKE-based ingredient search (O(N×rows) — slow on large corpora)."""
like_params = [f'%"{n}"%' for n in ingredient_names]
like_clauses = " OR ".join(
"r.ingredient_names LIKE ?" for _ in ingredient_names
)
match_score = " + ".join(
"CASE WHEN r.ingredient_names LIKE ? THEN 1 ELSE 0 END"
for _ in ingredient_names
)
sql = f"""
SELECT r.*, ({match_score}) AS match_count
FROM recipes r
WHERE ({like_clauses})
{where_extra}
ORDER BY match_count DESC, r.id ASC
LIMIT ?
"""
all_params = like_params + like_params + extra_params + [limit]
return self._fetch_all(sql, tuple(all_params))
def get_recipe(self, recipe_id: int) -> dict | None:
return self._fetch_one("SELECT * FROM recipes WHERE id = ?", (recipe_id,))
# ── rate limits ───────────────────────────────────────────────────────
def check_and_increment_rate_limit(
self, feature: str, daily_max: int
) -> tuple[bool, int]:
"""Check daily counter for feature; only increment if under the limit.
Returns (allowed, current_count). Rejected calls do not consume quota."""
from datetime import date
today = date.today().isoformat()
row = self._fetch_one(
"SELECT count FROM rate_limits WHERE feature = ? AND window_date = ?",
(feature, today),
)
current = row["count"] if row else 0
if current >= daily_max:
return (False, current)
self.conn.execute("""
INSERT INTO rate_limits (feature, window_date, count)
VALUES (?, ?, 1)
ON CONFLICT(feature, window_date) DO UPDATE SET count = count + 1
""", (feature, today))
self.conn.commit()
return (True, current + 1)
# ── user settings ────────────────────────────────────────────────────
def get_setting(self, key: str) -> str | None:
"""Return the value for a settings key, or None if not set."""
row = self._fetch_one(
"SELECT value FROM user_settings WHERE key = ?", (key,)
)
return row["value"] if row else None
def set_setting(self, key: str, value: str) -> None:
"""Upsert a settings key-value pair."""
self.conn.execute(
"INSERT INTO user_settings (key, value) VALUES (?, ?)"
" ON CONFLICT(key) DO UPDATE SET value = excluded.value",
(key, value),
)
self.conn.commit()
# ── substitution feedback ─────────────────────────────────────────────
def log_substitution_feedback(
self,
original: str,
substitute: str,
constraint: str | None,
compensation_used: list[str],
approved: bool,
opted_in: bool,
) -> None:
self.conn.execute("""
INSERT INTO substitution_feedback
(original_name, substitute_name, constraint_label,
compensation_used, approved, opted_in)
VALUES (?,?,?,?,?,?)
""", (
original, substitute, constraint,
self._dump(compensation_used),
int(approved), int(opted_in),
))
self.conn.commit()

55
app/main.py Normal file
View file

@ -0,0 +1,55 @@
#!/usr/bin/env python
# app/main.py
import logging
from contextlib import asynccontextmanager
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from app.api.routes import api_router
from app.core.config import settings
logger = logging.getLogger(__name__)
@asynccontextmanager
async def lifespan(app: FastAPI):
logger.info("Starting Kiwi API...")
settings.ensure_dirs()
# Start LLM background task scheduler
from app.tasks.scheduler import get_scheduler
get_scheduler(settings.DB_PATH)
logger.info("Task scheduler started.")
yield
# Graceful scheduler shutdown
from app.tasks.scheduler import get_scheduler, reset_scheduler
get_scheduler(settings.DB_PATH).shutdown(timeout=10.0)
reset_scheduler()
logger.info("Kiwi API shutting down.")
app = FastAPI(
title=settings.PROJECT_NAME,
description="Pantry tracking + leftover recipe suggestions",
version="0.1.0",
lifespan=lifespan,
)
app.add_middleware(
CORSMiddleware,
allow_origins=settings.CORS_ORIGINS,
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
app.include_router(api_router, prefix=settings.API_PREFIX)
@app.get("/")
async def root():
return {"service": "kiwi-api", "docs": "/docs"}

5
app/models/__init__.py Normal file
View file

@ -0,0 +1,5 @@
# app/models/__init__.py
"""
Data models for Kiwi.
Contains domain models and Pydantic schemas.
"""

View file

@ -0,0 +1,5 @@
# app/models/domain/__init__.py
"""
Domain models for Kiwi.
These represent the core business entities.
"""

View file

@ -0,0 +1,4 @@
from app.models.schemas.receipt import ReceiptResponse
from app.models.schemas.quality import QualityAssessment
__all__ = ["ReceiptResponse", "QualityAssessment"]

View file

@ -0,0 +1,143 @@
"""Pydantic schemas for inventory management (integer IDs, SQLite-compatible)."""
from __future__ import annotations
from datetime import date, datetime
from typing import Any, Dict, List, Optional
from pydantic import BaseModel, Field
# ── Tags ──────────────────────────────────────────────────────────────────────
class TagCreate(BaseModel):
name: str = Field(..., max_length=100)
slug: str = Field(..., max_length=100)
description: Optional[str] = None
color: Optional[str] = Field(None, max_length=7)
category: Optional[str] = None
class TagResponse(BaseModel):
id: int
name: str
slug: str
description: Optional[str]
color: Optional[str]
category: Optional[str]
created_at: str
updated_at: str
model_config = {"from_attributes": True}
# ── Products ──────────────────────────────────────────────────────────────────
class ProductCreate(BaseModel):
name: str = Field(..., max_length=500)
barcode: Optional[str] = Field(None, max_length=50)
brand: Optional[str] = None
category: Optional[str] = None
description: Optional[str] = None
image_url: Optional[str] = None
nutrition_data: Dict[str, Any] = Field(default_factory=dict)
source: str = "manual"
source_data: Dict[str, Any] = Field(default_factory=dict)
class ProductUpdate(BaseModel):
name: Optional[str] = None
brand: Optional[str] = None
category: Optional[str] = None
description: Optional[str] = None
image_url: Optional[str] = None
nutrition_data: Optional[Dict[str, Any]] = None
class ProductResponse(BaseModel):
id: int
barcode: Optional[str]
name: str
brand: Optional[str]
category: Optional[str]
description: Optional[str]
image_url: Optional[str]
nutrition_data: Dict[str, Any]
source: str
created_at: str
updated_at: str
model_config = {"from_attributes": True}
# ── Inventory Items ───────────────────────────────────────────────────────────
class InventoryItemCreate(BaseModel):
product_id: int
quantity: float = Field(default=1.0, gt=0)
unit: str = "count"
location: str
sublocation: Optional[str] = None
purchase_date: Optional[date] = None
expiration_date: Optional[date] = None
notes: Optional[str] = None
source: str = "manual"
class InventoryItemUpdate(BaseModel):
quantity: Optional[float] = Field(None, gt=0)
unit: Optional[str] = None
location: Optional[str] = None
sublocation: Optional[str] = None
expiration_date: Optional[date] = None
status: Optional[str] = None
notes: Optional[str] = None
class InventoryItemResponse(BaseModel):
id: int
product_id: int
product_name: Optional[str] = None
barcode: Optional[str] = None
category: Optional[str] = None
quantity: float
unit: str
location: str
sublocation: Optional[str]
purchase_date: Optional[str]
expiration_date: Optional[str]
status: str
notes: Optional[str]
source: str
created_at: str
updated_at: str
model_config = {"from_attributes": True}
# ── Barcode scan ──────────────────────────────────────────────────────────────
class BarcodeScanResult(BaseModel):
barcode: str
barcode_type: str
product: Optional[ProductResponse]
inventory_item: Optional[InventoryItemResponse]
added_to_inventory: bool
message: str
class BarcodeScanResponse(BaseModel):
success: bool
barcodes_found: int
results: List[BarcodeScanResult]
message: str
# ── Stats ─────────────────────────────────────────────────────────────────────
class InventoryStats(BaseModel):
total_items: int
available_items: int
expiring_soon: int
expired_items: int
locations: Dict[str, int]

138
app/models/schemas/ocr.py Normal file
View file

@ -0,0 +1,138 @@
#!/usr/bin/env python
"""
Pydantic schemas for OCR data models.
"""
from datetime import datetime, date, time
from typing import Optional, List, Dict, Any
from uuid import UUID
from pydantic import BaseModel, Field, validator
class MerchantInfo(BaseModel):
"""Merchant/store information from receipt."""
name: Optional[str] = None
address: Optional[str] = None
phone: Optional[str] = None
email: Optional[str] = None
website: Optional[str] = None
tax_id: Optional[str] = None
class TransactionInfo(BaseModel):
"""Transaction details from receipt."""
date: Optional[date] = None
time: Optional[time] = None
receipt_number: Optional[str] = None
register: Optional[str] = None
cashier: Optional[str] = None
transaction_id: Optional[str] = None
class ReceiptItem(BaseModel):
"""Individual line item from receipt."""
name: str
quantity: float = 1.0
unit_price: Optional[float] = None
total_price: float
category: Optional[str] = None
tax_code: Optional[str] = None
discount: Optional[float] = 0.0
barcode: Optional[str] = None
notes: Optional[str] = None
class ReceiptTotals(BaseModel):
"""Financial totals from receipt."""
subtotal: float
tax: Optional[float] = 0.0
discount: Optional[float] = 0.0
tip: Optional[float] = 0.0
total: float
payment_method: Optional[str] = None
amount_paid: Optional[float] = None
change: Optional[float] = 0.0
calculated_subtotal: Optional[float] = None # For validation
class ConfidenceScores(BaseModel):
"""Confidence scores for extracted data."""
overall: float = Field(ge=0.0, le=1.0)
merchant: Optional[float] = Field(default=0.5, ge=0.0, le=1.0)
items: Optional[float] = Field(default=0.5, ge=0.0, le=1.0)
totals: Optional[float] = Field(default=0.5, ge=0.0, le=1.0)
transaction: Optional[float] = Field(default=0.5, ge=0.0, le=1.0)
class OCRResult(BaseModel):
"""Complete OCR extraction result."""
merchant: MerchantInfo
transaction: TransactionInfo
items: List[ReceiptItem]
totals: ReceiptTotals
confidence: ConfidenceScores
raw_text: Optional[str] = None
warnings: List[str] = Field(default_factory=list)
processing_time: Optional[float] = None # seconds
class ReceiptDataCreate(BaseModel):
"""Schema for creating receipt data."""
receipt_id: UUID
merchant_name: Optional[str] = None
merchant_address: Optional[str] = None
merchant_phone: Optional[str] = None
transaction_date: Optional[date] = None
transaction_time: Optional[time] = None
receipt_number: Optional[str] = None
items: List[Dict[str, Any]] = Field(default_factory=list)
subtotal: Optional[float] = None
tax: Optional[float] = None
tip: Optional[float] = None
total: Optional[float] = None
payment_method: Optional[str] = None
raw_text: Optional[str] = None
confidence_scores: Optional[Dict[str, float]] = None
warnings: List[str] = Field(default_factory=list)
class ReceiptDataResponse(BaseModel):
"""Schema for receipt data response."""
id: UUID
receipt_id: UUID
merchant_name: Optional[str]
merchant_address: Optional[str]
merchant_phone: Optional[str]
transaction_date: Optional[date]
transaction_time: Optional[time]
receipt_number: Optional[str]
items: List[Dict[str, Any]]
subtotal: Optional[float]
tax: Optional[float]
tip: Optional[float]
total: Optional[float]
payment_method: Optional[str]
raw_text: Optional[str]
confidence_scores: Optional[Dict[str, float]]
warnings: List[str]
created_at: datetime
updated_at: datetime
class Config:
from_attributes = True
class OCRStatusResponse(BaseModel):
"""OCR processing status response."""
receipt_id: UUID
ocr_completed: bool
has_data: bool
confidence: Optional[float] = None
item_count: Optional[int] = None
warnings: List[str] = Field(default_factory=list)
class OCRTriggerRequest(BaseModel):
"""Request to trigger OCR processing."""
force_reprocess: bool = False
use_quantization: bool = False

View file

@ -0,0 +1,17 @@
"""Quality assessment schemas (integer IDs, SQLite-compatible)."""
from __future__ import annotations
from typing import Any, Dict, List
from pydantic import BaseModel
class QualityAssessment(BaseModel):
id: int
receipt_id: int
overall_score: float
is_acceptable: bool
metrics: Dict[str, Any] = {}
improvement_suggestions: List[str] = []
created_at: str
model_config = {"from_attributes": True}

View file

@ -0,0 +1,46 @@
"""Receipt schemas (integer IDs, SQLite-compatible)."""
from __future__ import annotations
from typing import Any, Dict, List, Optional
from pydantic import BaseModel, Field
class ReceiptResponse(BaseModel):
id: int
filename: str
status: str
error: Optional[str] = None
metadata: Dict[str, Any] = {}
created_at: str
updated_at: str
model_config = {"from_attributes": True}
class ApproveOCRRequest(BaseModel):
"""Approve staged OCR items for inventory population.
item_indices: which items (by 0-based index) to approve.
Omit or pass null to approve all items.
location: pantry location for created inventory items.
"""
item_indices: Optional[List[int]] = Field(
default=None,
description="0-based indices of items to approve. Null = approve all.",
)
location: str = Field(default="pantry")
class ApprovedInventoryItem(BaseModel):
inventory_id: int
product_name: str
quantity: float
location: str
expiration_date: Optional[str] = None
class ApproveOCRResponse(BaseModel):
receipt_id: int
approved: int
skipped: int
inventory_items: List[ApprovedInventoryItem]

View file

@ -0,0 +1,81 @@
"""Pydantic schemas for the recipe engine API."""
from __future__ import annotations
from pydantic import BaseModel, Field
class SwapCandidate(BaseModel):
original_name: str
substitute_name: str
constraint_label: str
explanation: str
compensation_hints: list[dict] = Field(default_factory=list)
class NutritionPanel(BaseModel):
"""Per-recipe macro summary. All values are per-serving when servings is known,
otherwise for the full recipe. None means data is unavailable."""
calories: float | None = None
fat_g: float | None = None
protein_g: float | None = None
carbs_g: float | None = None
fiber_g: float | None = None
sugar_g: float | None = None
sodium_mg: float | None = None
servings: float | None = None
estimated: bool = False # True when nutrition was inferred from ingredient profiles
class RecipeSuggestion(BaseModel):
id: int
title: str
match_count: int
element_coverage: dict[str, float] = Field(default_factory=dict)
swap_candidates: list[SwapCandidate] = Field(default_factory=list)
missing_ingredients: list[str] = Field(default_factory=list)
directions: list[str] = Field(default_factory=list)
prep_notes: list[str] = Field(default_factory=list)
notes: str = ""
level: int = 1
is_wildcard: bool = False
nutrition: NutritionPanel | None = None
class GroceryLink(BaseModel):
ingredient: str
retailer: str
url: str
class RecipeResult(BaseModel):
suggestions: list[RecipeSuggestion]
element_gaps: list[str]
grocery_list: list[str] = Field(default_factory=list)
grocery_links: list[GroceryLink] = Field(default_factory=list)
rate_limited: bool = False
rate_limit_count: int = 0
class NutritionFilters(BaseModel):
"""Optional per-serving upper bounds for macro filtering. None = no filter."""
max_calories: float | None = None
max_sugar_g: float | None = None
max_carbs_g: float | None = None
max_sodium_mg: float | None = None
class RecipeRequest(BaseModel):
pantry_items: list[str]
level: int = Field(default=1, ge=1, le=4)
constraints: list[str] = Field(default_factory=list)
expiry_first: bool = False
hard_day_mode: bool = False
max_missing: int | None = None
style_id: str | None = None
category: str | None = None
tier: str = "free"
has_byok: bool = False
wildcard_confirmed: bool = False
allergies: list[str] = Field(default_factory=list)
nutrition_filters: NutritionFilters = Field(default_factory=NutritionFilters)
excluded_ids: list[int] = Field(default_factory=list)

8
app/services/__init__.py Normal file
View file

@ -0,0 +1,8 @@
# app/services/__init__.py
"""
Business logic services for Kiwi.
"""
from app.services.receipt_service import ReceiptService
__all__ = ["ReceiptService"]

View file

@ -0,0 +1,396 @@
"""
Barcode scanning service using pyzbar.
This module provides functionality to detect and decode barcodes
from images (UPC, EAN, QR codes, etc.).
"""
import io
import cv2
import numpy as np
from pyzbar import pyzbar
from pathlib import Path
from typing import List, Dict, Any, Optional
import logging
try:
from PIL import Image as _PILImage
_HAS_PIL = True
except ImportError:
_HAS_PIL = False
logger = logging.getLogger(__name__)
class BarcodeScanner:
"""
Service for scanning barcodes from images.
Supports various barcode formats:
- UPC-A, UPC-E
- EAN-8, EAN-13
- Code 39, Code 128
- QR codes
- And more via pyzbar/libzbar
"""
def scan_image(self, image_path: Path) -> List[Dict[str, Any]]:
"""
Scan an image for barcodes.
Args:
image_path: Path to the image file
Returns:
List of detected barcodes, each as a dictionary with:
- data: Barcode data (string)
- type: Barcode type (e.g., 'EAN13', 'QRCODE')
- quality: Quality score (0-100)
- rect: Bounding box (x, y, width, height)
"""
try:
# Read image
image = cv2.imread(str(image_path))
if image is None:
logger.error(f"Failed to load image: {image_path}")
return []
# Convert to grayscale for better detection
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Try multiple preprocessing techniques and rotations for better detection
barcodes = []
# 1. Try on original grayscale
barcodes.extend(self._detect_barcodes(gray, image))
# 2. Try with adaptive thresholding (helps with poor lighting)
if not barcodes:
thresh = cv2.adaptiveThreshold(
gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY, 11, 2
)
barcodes.extend(self._detect_barcodes(thresh, image))
# 3. Try with sharpening (helps with blurry images)
if not barcodes:
kernel = np.array([[-1, -1, -1],
[-1, 9, -1],
[-1, -1, -1]])
sharpened = cv2.filter2D(gray, -1, kernel)
barcodes.extend(self._detect_barcodes(sharpened, image))
# 4. Try rotations if still no barcodes found (handles tilted/rotated barcodes)
if not barcodes:
logger.info("No barcodes found in standard orientation, trying rotations...")
for angle in [90, 180, 270, 45, 135]:
rotated_gray = self._rotate_image(gray, angle)
rotated_color = self._rotate_image(image, angle)
detected = self._detect_barcodes(rotated_gray, rotated_color)
if detected:
logger.info(f"Found barcode(s) at {angle}° rotation")
barcodes.extend(detected)
break # Stop after first successful rotation
# Remove duplicates (same data)
unique_barcodes = self._deduplicate_barcodes(barcodes)
logger.info(f"Found {len(unique_barcodes)} barcode(s) in {image_path}")
return unique_barcodes
except Exception as e:
logger.error(f"Error scanning image {image_path}: {e}")
return []
def _detect_barcodes(
self,
image: np.ndarray,
original_image: np.ndarray
) -> List[Dict[str, Any]]:
"""
Detect barcodes in a preprocessed image.
Args:
image: Preprocessed image (grayscale)
original_image: Original color image (for quality assessment)
Returns:
List of detected barcodes
"""
detected = pyzbar.decode(image)
barcodes = []
for barcode in detected:
# Decode barcode data
barcode_data = barcode.data.decode("utf-8")
barcode_type = barcode.type
# Get bounding box
rect = barcode.rect
bbox = {
"x": rect.left,
"y": rect.top,
"width": rect.width,
"height": rect.height,
}
# Assess quality of barcode region
quality = self._assess_barcode_quality(original_image, bbox)
barcodes.append({
"data": barcode_data,
"type": barcode_type,
"quality": quality,
"rect": bbox,
})
return barcodes
def _assess_barcode_quality(
self,
image: np.ndarray,
bbox: Dict[str, int]
) -> int:
"""
Assess the quality of a detected barcode.
Args:
image: Original image
bbox: Bounding box of barcode
Returns:
Quality score (0-100)
"""
try:
# Extract barcode region
x, y, w, h = bbox["x"], bbox["y"], bbox["width"], bbox["height"]
# Add padding
pad = 10
y1 = max(0, y - pad)
y2 = min(image.shape[0], y + h + pad)
x1 = max(0, x - pad)
x2 = min(image.shape[1], x + w + pad)
region = image[y1:y2, x1:x2]
if region.size == 0:
return 50
# Convert to grayscale if needed
if len(region.shape) == 3:
region = cv2.cvtColor(region, cv2.COLOR_BGR2GRAY)
# Calculate sharpness (Laplacian variance)
laplacian_var = cv2.Laplacian(region, cv2.CV_64F).var()
sharpness_score = min(100, laplacian_var / 10) # Normalize
# Calculate contrast
min_val, max_val = region.min(), region.max()
contrast = (max_val - min_val) / 255.0 * 100
# Calculate size score (larger is better, up to a point)
area = w * h
size_score = min(100, area / 100) # Normalize
# Weighted average
quality = (sharpness_score * 0.4 + contrast * 0.4 + size_score * 0.2)
return int(quality)
except Exception as e:
logger.warning(f"Error assessing barcode quality: {e}")
return 50
def _rotate_image(self, image: np.ndarray, angle: float) -> np.ndarray:
"""
Rotate an image by a given angle.
Args:
image: Input image
angle: Rotation angle in degrees (any angle, but optimized for 90° increments)
Returns:
Rotated image
"""
# Use fast optimized rotation for common angles
if angle == 90:
return cv2.rotate(image, cv2.ROTATE_90_CLOCKWISE)
elif angle == 180:
return cv2.rotate(image, cv2.ROTATE_180)
elif angle == 270:
return cv2.rotate(image, cv2.ROTATE_90_COUNTERCLOCKWISE)
elif angle == 0:
return image
else:
# For arbitrary angles, use affine transformation
(h, w) = image.shape[:2]
center = (w // 2, h // 2)
# Get rotation matrix
M = cv2.getRotationMatrix2D(center, angle, 1.0)
# Calculate new bounding dimensions
cos = np.abs(M[0, 0])
sin = np.abs(M[0, 1])
new_w = int((h * sin) + (w * cos))
new_h = int((h * cos) + (w * sin))
# Adjust rotation matrix for new dimensions
M[0, 2] += (new_w / 2) - center[0]
M[1, 2] += (new_h / 2) - center[1]
# Perform rotation
return cv2.warpAffine(image, M, (new_w, new_h),
flags=cv2.INTER_CUBIC,
borderMode=cv2.BORDER_REPLICATE)
def _deduplicate_barcodes(
self,
barcodes: List[Dict[str, Any]]
) -> List[Dict[str, Any]]:
"""
Remove duplicate barcodes (same data).
If multiple detections of the same barcode, keep the one
with the highest quality score.
Args:
barcodes: List of detected barcodes
Returns:
Deduplicated list
"""
seen = {}
for barcode in barcodes:
data = barcode["data"]
if data not in seen or barcode["quality"] > seen[data]["quality"]:
seen[data] = barcode
return list(seen.values())
def _fix_exif_orientation(self, image_bytes: bytes) -> bytes:
"""Apply EXIF orientation correction so cv2 sees an upright image.
Phone cameras embed rotation in EXIF; cv2.imdecode ignores it,
so a photo taken in portrait may arrive physically sideways in memory.
"""
if not _HAS_PIL:
return image_bytes
try:
pil = _PILImage.open(io.BytesIO(image_bytes))
pil = _PILImage.fromarray(np.array(pil)) # strips EXIF but applies orientation via PIL
# Use ImageOps.exif_transpose for proper EXIF-aware rotation
import PIL.ImageOps
pil = PIL.ImageOps.exif_transpose(pil)
buf = io.BytesIO()
pil.save(buf, format="JPEG")
return buf.getvalue()
except Exception:
return image_bytes
def scan_from_bytes(self, image_bytes: bytes) -> List[Dict[str, Any]]:
"""
Scan barcodes from image bytes (uploaded file).
Args:
image_bytes: Image data as bytes
Returns:
List of detected barcodes
"""
try:
# Apply EXIF orientation correction first (phone cameras embed rotation in EXIF;
# cv2.imdecode ignores it, causing sideways barcodes to appear rotated in memory).
image_bytes = self._fix_exif_orientation(image_bytes)
# Convert bytes to numpy array
nparr = np.frombuffer(image_bytes, np.uint8)
image = cv2.imdecode(nparr, cv2.IMREAD_COLOR)
if image is None:
logger.error("Failed to decode image from bytes")
return []
# Convert to grayscale
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Try multiple approaches for better detection
barcodes = []
# 1. Try original orientation
barcodes.extend(self._detect_barcodes(gray, image))
# 2. Try with adaptive thresholding
if not barcodes:
thresh = cv2.adaptiveThreshold(
gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY, 11, 2
)
barcodes.extend(self._detect_barcodes(thresh, image))
# 3. Try all 90° rotations + common tilt angles
# 90/270 catches truly sideways barcodes; 180 catches upside-down;
# 45/135 catches tilted barcodes on flat surfaces.
if not barcodes:
logger.info("No barcodes found in uploaded image, trying rotations...")
for angle in [90, 180, 270, 45, 135]:
rotated_gray = self._rotate_image(gray, angle)
rotated_color = self._rotate_image(image, angle)
detected = self._detect_barcodes(rotated_gray, rotated_color)
if detected:
logger.info(f"Found barcode(s) in uploaded image at {angle}° rotation")
barcodes.extend(detected)
break
return self._deduplicate_barcodes(barcodes)
except Exception as e:
logger.error(f"Error scanning image from bytes: {e}")
return []
def validate_barcode(self, barcode: str, barcode_type: str) -> bool:
"""
Validate a barcode using check digits (for EAN/UPC).
Args:
barcode: Barcode string
barcode_type: Type of barcode (e.g., 'EAN13', 'UPCA')
Returns:
True if valid, False otherwise
"""
if barcode_type in ["EAN13", "UPCA"]:
return self._validate_ean13(barcode)
elif barcode_type == "EAN8":
return self._validate_ean8(barcode)
# For other types, assume valid if detected
return True
def _validate_ean13(self, barcode: str) -> bool:
"""Validate EAN-13 barcode using check digit."""
if len(barcode) != 13 or not barcode.isdigit():
return False
# Calculate check digit
odd_sum = sum(int(barcode[i]) for i in range(0, 12, 2))
even_sum = sum(int(barcode[i]) for i in range(1, 12, 2))
total = odd_sum + (even_sum * 3)
check_digit = (10 - (total % 10)) % 10
return int(barcode[12]) == check_digit
def _validate_ean8(self, barcode: str) -> bool:
"""Validate EAN-8 barcode using check digit."""
if len(barcode) != 8 or not barcode.isdigit():
return False
# Calculate check digit
odd_sum = sum(int(barcode[i]) for i in range(1, 7, 2))
even_sum = sum(int(barcode[i]) for i in range(0, 7, 2))
total = (odd_sum * 3) + even_sum
check_digit = (10 - (total % 10)) % 10
return int(barcode[7]) == check_digit

View file

@ -0,0 +1,450 @@
"""
Expiration Date Prediction Service.
Predicts expiration dates for food items based on category and storage location.
Fast path: deterministic lookup table (USDA FoodKeeper / FDA guidelines).
Fallback path: LLMRouter only fires for unknown products when tier allows it
and a LLM backend is configured.
"""
import logging
import re
from datetime import date, timedelta
from typing import Optional, List
from circuitforge_core.llm.router import LLMRouter
from app.tiers import can_use
logger = logging.getLogger(__name__)
class ExpirationPredictor:
"""Predict expiration dates based on product category and storage location."""
# Canonical location names and their aliases.
# All location strings are normalised through this before table lookup.
LOCATION_ALIASES: dict[str, str] = {
'garage_freezer': 'freezer',
'chest_freezer': 'freezer',
'deep_freezer': 'freezer',
'upright_freezer': 'freezer',
'refrigerator': 'fridge',
'frig': 'fridge',
'cupboard': 'cabinet',
'shelf': 'pantry',
'counter': 'pantry',
}
# When a category has no entry for the requested location, try these
# alternatives in order — prioritising same-temperature storage first.
LOCATION_FALLBACK: dict[str, tuple[str, ...]] = {
'freezer': ('freezer', 'fridge', 'pantry', 'cabinet'),
'fridge': ('fridge', 'pantry', 'cabinet', 'freezer'),
'pantry': ('pantry', 'cabinet', 'fridge', 'freezer'),
'cabinet': ('cabinet', 'pantry', 'fridge', 'freezer'),
}
# Default shelf life in days by category and location
# Sources: USDA FoodKeeper app, FDA guidelines
SHELF_LIFE = {
# Dairy
'dairy': {'fridge': 7, 'freezer': 90},
'milk': {'fridge': 7, 'freezer': 90},
'cheese': {'fridge': 21, 'freezer': 180},
'yogurt': {'fridge': 14, 'freezer': 60},
'butter': {'fridge': 30, 'freezer': 365},
'cream': {'fridge': 5, 'freezer': 60},
# Meat & Poultry
'meat': {'fridge': 3, 'freezer': 180},
'beef': {'fridge': 3, 'freezer': 270},
'pork': {'fridge': 3, 'freezer': 180},
'lamb': {'fridge': 3, 'freezer': 270},
'poultry': {'fridge': 2, 'freezer': 270},
'chicken': {'fridge': 2, 'freezer': 270},
'turkey': {'fridge': 2, 'freezer': 270},
'tempeh': {'fridge': 10, 'freezer': 365},
'tofu': {'fridge': 5, 'freezer': 180},
'ground_meat': {'fridge': 2, 'freezer': 120},
# Seafood
'fish': {'fridge': 2, 'freezer': 180},
'seafood': {'fridge': 2, 'freezer': 180},
'shrimp': {'fridge': 2, 'freezer': 180},
'salmon': {'fridge': 2, 'freezer': 180},
# Eggs
'eggs': {'fridge': 35, 'freezer': None},
# Produce
'vegetables': {'fridge': 7, 'pantry': 5, 'freezer': 270},
'fruits': {'fridge': 7, 'pantry': 5, 'freezer': 365},
'leafy_greens': {'fridge': 5, 'freezer': 270},
'berries': {'fridge': 5, 'freezer': 270},
'apples': {'fridge': 30, 'pantry': 14},
'bananas': {'pantry': 5, 'fridge': 7},
'citrus': {'fridge': 21, 'pantry': 7},
# Bread & Bakery
'bread': {'pantry': 5, 'freezer': 90},
'bakery': {'pantry': 3, 'fridge': 7, 'freezer': 90},
# Frozen
'frozen_foods': {'freezer': 180, 'fridge': 3},
'frozen_vegetables': {'freezer': 270, 'fridge': 4},
'frozen_fruit': {'freezer': 365, 'fridge': 4},
'ice_cream': {'freezer': 60},
# Pantry Staples
'canned_goods': {'pantry': 730, 'cabinet': 730},
'dry_goods': {'pantry': 365, 'cabinet': 365},
'pasta': {'pantry': 730, 'cabinet': 730},
'rice': {'pantry': 730, 'cabinet': 730},
'flour': {'pantry': 180, 'cabinet': 180},
'sugar': {'pantry': 730, 'cabinet': 730},
'cereal': {'pantry': 180, 'cabinet': 180},
'chips': {'pantry': 90, 'cabinet': 90},
'cookies': {'pantry': 90, 'cabinet': 90},
# Condiments
'condiments': {'fridge': 90, 'pantry': 180},
'ketchup': {'fridge': 180, 'pantry': 365},
'mustard': {'fridge': 365, 'pantry': 365},
'mayo': {'fridge': 60, 'pantry': 180},
'salad_dressing': {'fridge': 90, 'pantry': 180},
'soy_sauce': {'fridge': 730, 'pantry': 730},
# Beverages
'beverages': {'fridge': 14, 'pantry': 180},
'juice': {'fridge': 7, 'freezer': 90},
'soda': {'fridge': 270, 'pantry': 270},
'water': {'fridge': 365, 'pantry': 365},
# Other
'deli_meat': {'fridge': 5, 'freezer': 60},
'leftovers': {'fridge': 4, 'freezer': 90},
'prepared_foods': {'fridge': 4, 'freezer': 90},
}
# Keyword lists are checked in declaration order — most specific first.
# Rules:
# - canned/processed goods BEFORE raw-meat terms (canned chicken != raw chicken)
# - frozen prepared foods BEFORE generic protein terms
# - multi-word phrases before single words where ambiguity exists
CATEGORY_KEYWORDS = {
# ── Frozen prepared foods ─────────────────────────────────────────────
# Before raw protein entries so plant-based frozen products don't
# inherit 23 day raw-meat shelf lives.
'ice_cream': ['ice cream', 'gelato', 'frozen yogurt', 'sorbet', 'sherbet'],
'frozen_fruit': [
'frozen berries', 'frozen mango', 'frozen strawberries',
'frozen blueberries', 'frozen raspberries', 'frozen peaches',
'frozen fruit', 'frozen cherries',
],
'frozen_vegetables': [
'frozen veg', 'frozen corn', 'frozen peas', 'frozen broccoli',
'frozen spinach', 'frozen edamame', 'frozen green beans',
'frozen mixed vegetables', 'frozen carrots',
'peas & carrots', 'peas and carrots', 'mixed vegetables',
'spring rolls', 'vegetable spring rolls',
],
'frozen_foods': [
'plant-based', 'plant based', 'meatless', 'impossible',
"chik'n", 'chikn', 'veggie burger', 'veggie patty',
'nugget', 'tater tot', 'waffle fries', 'hash brown',
'onion ring', 'fish stick', 'fish fillet', 'potsticker',
'dumpling', 'egg roll', 'empanada', 'tamale', 'falafel',
'mac & cheese bite', 'cauliflower wing', 'ranchero potato',
],
# ── Canned / shelf-stable processed goods ─────────────────────────────
# Before raw protein keywords so "canned chicken", "cream of chicken",
# and "lentil soup" resolve here rather than to raw chicken/cream.
'canned_goods': [
'canned', 'can of', 'tin of', 'tinned',
'cream of ', 'condensed soup', 'condensed cream',
'baked beans', 'refried beans',
'canned beans', 'canned tomatoes', 'canned corn', 'canned peas',
'canned soup', 'canned tuna', 'canned salmon', 'canned chicken',
'canned fruit', 'canned peaches', 'canned pears',
'enchilada sauce', 'tomato sauce', 'tomato paste',
'lentil soup', 'bean soup', 'chicken noodle soup',
],
# ── Condiments & brined items ─────────────────────────────────────────
# Before produce/protein terms so brined olives, jarred peppers, etc.
# don't inherit raw vegetable shelf lives.
'ketchup': ['ketchup', 'catsup'],
'mustard': ['mustard', 'dijon', 'dijion', 'stoneground mustard'],
'mayo': ['mayo', 'mayonnaise', 'miracle whip'],
'soy_sauce': ['soy sauce', 'tamari', 'shoyu'],
'salad_dressing': ['salad dressing', 'ranch', 'italian dressing', 'vinaigrette'],
'condiments': [
# brined / jarred items
'dill chips', 'hamburger chips', 'gherkin',
'olive', 'capers', 'jalapeño', 'jalapeno', 'pepperoncini',
'pimiento', 'banana pepper', 'cornichon',
# sauces
'hot sauce', 'hot pepper sauce', 'sriracha', 'cholula',
'worcestershire', 'barbecue sauce', 'bbq sauce',
'chipotle sauce', 'chipotle mayo', 'chipotle creamy',
'salsa', 'chutney', 'relish',
'teriyaki', 'hoisin', 'oyster sauce', 'fish sauce',
'miso', 'ssamjang', 'gochujang', 'doenjang',
'soybean paste', 'fermented soybean',
# nut butters / spreads
'peanut butter', 'almond butter', 'tahini', 'hummus',
# seasoning mixes
'seasoning', 'spice blend', 'borracho',
# other shelf-stable sauces
'yuzu', 'ponzu', 'lizano',
],
# ── Soy / fermented proteins ──────────────────────────────────────────
'tempeh': ['tempeh'],
'tofu': ['tofu', 'bean curd'],
# ── Dairy ─────────────────────────────────────────────────────────────
'milk': ['milk', 'whole milk', '2% milk', 'skim milk', 'almond milk', 'oat milk', 'soy milk'],
'cheese': ['cheese', 'cheddar', 'mozzarella', 'swiss', 'parmesan', 'feta', 'gouda', 'velveeta'],
'yogurt': ['yogurt', 'greek yogurt', 'yoghurt'],
'butter': ['butter', 'margarine'],
# Bare 'cream' removed — "cream of X" is canned_goods (matched above).
'cream': ['heavy cream', 'whipping cream', 'sour cream', 'crème fraîche',
'cream cheese', 'whipped topping', 'whipped cream'],
'eggs': ['eggs', 'egg'],
# ── Raw proteins ──────────────────────────────────────────────────────
# After canned/frozen so "canned chicken" is already resolved above.
'salmon': ['salmon'],
'shrimp': ['shrimp', 'prawns'],
'fish': ['fish', 'cod', 'tilapia', 'halibut', 'pollock'],
# Specific chicken cuts only — bare 'chicken' handled in generic fallback
'chicken': ['chicken breast', 'chicken thigh', 'chicken wings', 'chicken leg',
'whole chicken', 'rotisserie chicken', 'raw chicken'],
'turkey': ['turkey breast', 'whole turkey'],
'ground_meat': ['ground beef', 'ground pork', 'ground chicken', 'ground turkey',
'ground lamb', 'ground bison'],
'pork': ['pork', 'bacon', 'ham', 'pork chop', 'pork loin'],
'beef': ['beef', 'steak', 'brisket', 'ribeye', 'sirloin', 'roast beef'],
'deli_meat': ['deli', 'sliced turkey', 'sliced ham', 'lunch meat', 'cold cuts',
'prosciutto', 'salami', 'pepperoni'],
# ── Produce ───────────────────────────────────────────────────────────
'leafy_greens': ['lettuce', 'spinach', 'kale', 'arugula', 'mixed greens'],
'berries': ['strawberries', 'blueberries', 'raspberries', 'blackberries'],
'apples': ['apple', 'apples'],
'bananas': ['banana', 'bananas'],
'citrus': ['orange', 'lemon', 'lime', 'grapefruit', 'tangerine'],
# ── Bakery ────────────────────────────────────────────────────────────
'bakery': [
'muffin', 'croissant', 'donut', 'danish', 'puff pastry', 'pastry puff',
'cinnamon roll', 'dinner roll', 'parkerhouse roll', 'scone',
],
'bread': ['bread', 'loaf', 'baguette', 'bagel', 'bun', 'pita', 'naan',
'english muffin', 'sourdough'],
# ── Dry pantry staples ────────────────────────────────────────────────
'pasta': ['pasta', 'spaghetti', 'penne', 'macaroni', 'noodles', 'couscous', 'orzo'],
'rice': ['rice', 'brown rice', 'white rice', 'jasmine rice', 'basmati',
'spanish rice', 'rice mix'],
'cereal': ['cereal', 'granola', 'oatmeal'],
'chips': ['chips', 'crisps', 'tortilla chips', 'pretzel', 'popcorn'],
'cookies': ['cookies', 'biscuits', 'crackers', 'graham cracker', 'wafer'],
# ── Beverages ─────────────────────────────────────────────────────────
'juice': ['juice', 'orange juice', 'apple juice', 'lemonade'],
'soda': ['soda', 'cola', 'sprite', 'pepsi', 'coke', 'carbonated soft drink'],
}
def __init__(self) -> None:
self._router: Optional[LLMRouter] = None
try:
self._router = LLMRouter()
except FileNotFoundError:
logger.debug("LLM config not found — expiry LLM fallback disabled")
except Exception as e:
logger.warning("LLMRouter init failed (%s) — expiry LLM fallback disabled", e)
# ── Public API ────────────────────────────────────────────────────────────
def predict_expiration(
self,
category: Optional[str],
location: str,
purchase_date: Optional[date] = None,
product_name: Optional[str] = None,
tier: str = "free",
has_byok: bool = False,
) -> Optional[date]:
"""
Predict expiration date.
Fast path: deterministic lookup table.
Fallback: LLM query when table has no match, tier allows it, and a
backend is configured. Returns None rather than crashing if
inference fails.
"""
if not purchase_date:
purchase_date = date.today()
days = self._lookup_days(category, location)
if days is None and product_name and self._router and can_use("expiry_llm_matching", tier, has_byok):
days = self._llm_predict_days(product_name, category, location)
if days is None:
return None
return purchase_date + timedelta(days=days)
def get_category_from_product(
self,
product_name: str,
product_category: Optional[str] = None,
tags: Optional[List[str]] = None,
location: Optional[str] = None,
) -> Optional[str]:
"""Determine category from product name, existing category, and tags.
location is used as a last-resort hint: unknown items in the freezer
default to frozen_foods rather than dry_goods.
"""
if product_category:
cat = product_category.lower().strip()
if cat in self.SHELF_LIFE:
return cat
for key in self.SHELF_LIFE:
if key in cat or cat in key:
return key
if tags:
for tag in tags:
t = tag.lower().strip()
if t in self.SHELF_LIFE:
return t
name = product_name.lower().strip()
for category, keywords in self.CATEGORY_KEYWORDS.items():
if any(kw in name for kw in keywords):
return category
# Generic single-word fallbacks — checked after the keyword dict so
# multi-word phrases (e.g. "canned chicken") already matched above.
for words, fallback in [
(['frozen'], 'frozen_foods'),
(['canned', 'tinned'], 'canned_goods'),
# bare 'chicken' / 'sausage' / 'ham' kept here so raw-meat names
# that don't appear in the specific keyword lists still resolve.
(['chicken', 'turkey'], 'poultry'),
(['sausage', 'ham', 'bacon'], 'pork'),
(['beef', 'steak'], 'beef'),
(['meat', 'pork'], 'meat'),
(['vegetable', 'veggie', 'produce'], 'vegetables'),
(['fruit'], 'fruits'),
(['dairy'], 'dairy'),
]:
if any(w in name for w in words):
return fallback
# Location-aware final fallback: unknown item in a freezer → frozen_foods.
# This handles unlabelled frozen products (e.g. "Birthday Littles",
# "Pulled BBQ Crumbles") without requiring every brand name to be listed.
canon_loc = self._normalize_location(location or '')
if canon_loc == 'freezer':
return 'frozen_foods'
return 'dry_goods'
def get_shelf_life_info(self, category: str, location: str) -> Optional[int]:
"""Shelf life in days for a given category + location, or None."""
return self._lookup_days(category, location)
def list_categories(self) -> List[str]:
return list(self.SHELF_LIFE.keys())
def list_locations(self) -> List[str]:
locations: set[str] = set()
for shelf_life in self.SHELF_LIFE.values():
locations.update(shelf_life.keys())
return sorted(locations)
# ── Private helpers ───────────────────────────────────────────────────────
def _normalize_location(self, location: str) -> str:
"""Resolve location aliases to canonical names."""
loc = location.lower().strip()
return self.LOCATION_ALIASES.get(loc, loc)
def _lookup_days(self, category: Optional[str], location: str) -> Optional[int]:
"""Pure deterministic lookup — no I/O.
Normalises location aliases (e.g. garage_freezer freezer) and uses
a context-aware fallback order so pantry items don't accidentally get
fridge shelf-life and vice versa.
"""
if not category:
return None
cat = category.lower().strip()
if cat not in self.SHELF_LIFE:
for key in self.SHELF_LIFE:
if key in cat or cat in key:
cat = key
break
else:
return None
canon_loc = self._normalize_location(location)
shelf = self.SHELF_LIFE[cat]
# Try the canonical location first, then work through the
# context-aware fallback chain for that location type.
fallback_order = self.LOCATION_FALLBACK.get(
canon_loc, (canon_loc, 'pantry', 'fridge', 'cabinet', 'freezer')
)
for loc in fallback_order:
days = shelf.get(loc)
if days is not None:
return days
return None
def _llm_predict_days(
self,
product_name: str,
category: Optional[str],
location: str,
) -> Optional[int]:
"""
Ask the LLM how many days this product keeps in the given location.
TODO: Fill in the prompts below. Good prompts should:
- Give enough context for the LLM to reason about food safety
- Specify output format clearly (just an integer nothing else)
- Err conservative (shorter shelf life) when uncertain
- Stay concise this fires on every unknown barcode scan
Parameters available:
product_name e.g. "Trader Joe's Organic Tempeh"
category best-guess from get_category_from_product(), may be None
location "fridge" | "freezer" | "pantry" | "cabinet"
"""
assert self._router is not None
system = (
"You are a food safety expert. Given a food product name, an optional "
"category hint, and a storage location, respond with ONLY a single "
"integer: the number of days the product typically remains safe to eat "
"from purchase when stored as specified. No explanation, no units, no "
"punctuation — just the integer. When uncertain, give the conservative "
"(shorter) estimate."
)
parts = [f"Product: {product_name}"]
if category:
parts.append(f"Category: {category}")
parts.append(f"Storage location: {location}")
parts.append("Days until expiry from purchase:")
prompt = "\n".join(parts)
try:
raw = self._router.complete(prompt, system=system, max_tokens=16)
match = re.search(r'\b(\d+)\b', raw)
if match:
days = int(match.group(1))
# Sanity cap: >5 years is implausible for a perishable unknown to
# the deterministic table. If the LLM returns something absurd,
# fall back to None rather than storing a misleading date.
if days > 1825:
logger.warning(
"LLM returned implausible shelf life (%d days) for %r — discarding",
days, product_name,
)
return None
logger.debug(
"LLM shelf life for %r in %s: %d days", product_name, location, days
)
return days
except Exception as e:
logger.warning("LLM expiry prediction failed for %r: %s", product_name, e)
return None

View file

@ -0,0 +1 @@
# app/services/export/__init__.py

View file

@ -0,0 +1,325 @@
# app/services/export/spreadsheet_export.py
"""
Service for exporting receipt data to CSV and Excel formats.
This module provides functionality to convert receipt and quality assessment
data into spreadsheet formats for easy viewing and analysis.
"""
import pandas as pd
from datetime import datetime
from typing import List, Dict, Optional
from pathlib import Path
from app.models.schemas.receipt import ReceiptResponse
from app.models.schemas.quality import QualityAssessment
class SpreadsheetExporter:
"""
Service for exporting receipt data to CSV/Excel formats.
Provides methods to convert receipt and quality assessment data into
spreadsheet formats that can be opened in Excel, Google Sheets, or
LibreOffice Calc.
"""
def export_to_csv(
self,
receipts: List[ReceiptResponse],
quality_data: Dict[str, QualityAssessment],
ocr_data: Optional[Dict[str, Dict]] = None
) -> str:
"""
Export receipts to CSV format.
Args:
receipts: List of receipt responses
quality_data: Dict mapping receipt_id to quality assessment
ocr_data: Optional dict mapping receipt_id to OCR extracted data
Returns:
CSV string ready for download
"""
df = self._receipts_to_dataframe(receipts, quality_data, ocr_data)
return df.to_csv(index=False)
def export_to_excel(
self,
receipts: List[ReceiptResponse],
quality_data: Dict[str, QualityAssessment],
output_path: str,
ocr_data: Optional[Dict[str, Dict]] = None
) -> None:
"""
Export receipts to Excel format with multiple sheets.
Creates an Excel file with sheets:
- Receipts: Main receipt data with OCR results
- Line Items: Detailed items from all receipts (if OCR available)
- Quality Details: Detailed quality metrics
- Summary: Aggregated statistics
Args:
receipts: List of receipt responses
quality_data: Dict mapping receipt_id to quality assessment
output_path: Path to save Excel file
ocr_data: Optional dict mapping receipt_id to OCR extracted data
"""
with pd.ExcelWriter(output_path, engine='openpyxl') as writer:
# Sheet 1: Receipts with OCR data
receipts_df = self._receipts_to_dataframe(receipts, quality_data, ocr_data)
receipts_df.to_excel(writer, sheet_name='Receipts', index=False)
# Sheet 2: Line Items (if OCR data available)
if ocr_data:
items_df = self._items_to_dataframe(receipts, ocr_data)
if not items_df.empty:
items_df.to_excel(writer, sheet_name='Line Items', index=False)
# Sheet 3: Quality Details
if quality_data:
quality_df = self._quality_to_dataframe(quality_data)
quality_df.to_excel(writer, sheet_name='Quality Details', index=False)
# Sheet 4: Summary
summary_df = self._create_summary(receipts, quality_data, ocr_data)
summary_df.to_excel(writer, sheet_name='Summary', index=False)
def _receipts_to_dataframe(
self,
receipts: List[ReceiptResponse],
quality_data: Dict[str, QualityAssessment],
ocr_data: Optional[Dict[str, Dict]] = None
) -> pd.DataFrame:
"""
Convert receipts to pandas DataFrame.
Args:
receipts: List of receipt responses
quality_data: Dict mapping receipt_id to quality assessment
ocr_data: Optional dict mapping receipt_id to OCR extracted data
Returns:
DataFrame with receipt data
"""
data = []
for receipt in receipts:
quality = quality_data.get(receipt.id)
ocr = ocr_data.get(receipt.id) if ocr_data else None
# Base columns
row = {
'ID': receipt.id,
'Filename': receipt.filename,
'Status': receipt.status,
'Quality Score': quality.overall_score if quality else None,
}
# Add OCR data if available
if ocr:
merchant = ocr.get('merchant', {})
transaction = ocr.get('transaction', {})
totals = ocr.get('totals', {})
items = ocr.get('items', [])
row.update({
'Merchant': merchant.get('name', ''),
'Store Address': merchant.get('address', ''),
'Store Phone': merchant.get('phone', ''),
'Date': transaction.get('date', ''),
'Time': transaction.get('time', ''),
'Receipt Number': transaction.get('receipt_number', ''),
'Item Count': len(items),
'Subtotal': totals.get('subtotal', ''),
'Tax': totals.get('tax', ''),
'Total': totals.get('total', ''),
'Payment Method': totals.get('payment_method', ''),
'OCR Confidence': ocr.get('confidence', {}).get('overall', ''),
})
# Add items as text
items_text = '; '.join([
f"{item.get('name', 'Unknown')} (${item.get('total_price', 0):.2f})"
for item in items[:10] # Limit to first 10 items for CSV
])
if len(items) > 10:
items_text += f'; ... and {len(items) - 10} more items'
row['Items'] = items_text
else:
# No OCR data - show image metadata instead
row.update({
'Merchant': 'N/A - No OCR',
'Date': '',
'Total': '',
'Item Count': 0,
'Width': receipt.metadata.get('width'),
'Height': receipt.metadata.get('height'),
'File Size (KB)': round(receipt.metadata.get('file_size_bytes', 0) / 1024, 2),
})
data.append(row)
return pd.DataFrame(data)
def _items_to_dataframe(
self,
receipts: List[ReceiptResponse],
ocr_data: Dict[str, Dict]
) -> pd.DataFrame:
"""
Convert line items from all receipts to DataFrame.
Args:
receipts: List of receipt responses
ocr_data: Dict mapping receipt_id to OCR extracted data
Returns:
DataFrame with all line items from all receipts
"""
data = []
for receipt in receipts:
ocr = ocr_data.get(receipt.id)
if not ocr:
continue
merchant = ocr.get('merchant', {}).get('name', 'Unknown')
date = ocr.get('transaction', {}).get('date', '')
items = ocr.get('items', [])
for item in items:
data.append({
'Receipt ID': receipt.id,
'Receipt File': receipt.filename,
'Merchant': merchant,
'Date': date,
'Item Name': item.get('name', 'Unknown'),
'Quantity': item.get('quantity', 1),
'Unit Price': item.get('unit_price', ''),
'Total Price': item.get('total_price', 0),
'Category': item.get('category', ''),
'Tax Code': item.get('tax_code', ''),
'Discount': item.get('discount', 0),
})
return pd.DataFrame(data)
def _quality_to_dataframe(
self,
quality_data: Dict[str, QualityAssessment]
) -> pd.DataFrame:
"""
Convert quality assessments to DataFrame.
Args:
quality_data: Dict mapping receipt_id to quality assessment
Returns:
DataFrame with quality metrics
"""
data = []
for receipt_id, quality in quality_data.items():
metrics = quality.metrics
row = {
'Receipt ID': receipt_id,
'Overall Score': round(quality.overall_score, 2),
'Acceptable': quality.is_acceptable,
'Blur Score': round(metrics.get('blur_score', 0), 2),
'Lighting Score': round(metrics.get('lighting_score', 0), 2),
'Contrast Score': round(metrics.get('contrast_score', 0), 2),
'Size Score': round(metrics.get('size_score', 0), 2),
'Fold Detected': metrics.get('fold_detected', False),
'Fold Severity': round(metrics.get('fold_severity', 0), 2),
'Suggestions': '; '.join(quality.suggestions) if quality.suggestions else 'None',
}
data.append(row)
return pd.DataFrame(data)
def _create_summary(
self,
receipts: List[ReceiptResponse],
quality_data: Dict[str, QualityAssessment],
ocr_data: Optional[Dict[str, Dict]] = None
) -> pd.DataFrame:
"""
Create summary statistics DataFrame.
Args:
receipts: List of receipt responses
quality_data: Dict mapping receipt_id to quality assessment
ocr_data: Optional dict mapping receipt_id to OCR extracted data
Returns:
DataFrame with summary statistics
"""
quality_scores = [q.overall_score for q in quality_data.values() if q]
# Count statuses
status_counts = {}
for receipt in receipts:
status_counts[receipt.status] = status_counts.get(receipt.status, 0) + 1
metrics = [
'Total Receipts',
'Processed',
'Processing',
'Uploaded',
'Failed',
'Average Quality Score',
'Best Quality Score',
'Worst Quality Score',
'Acceptable Quality Count',
'Unacceptable Quality Count',
]
values = [
len(receipts),
status_counts.get('processed', 0),
status_counts.get('processing', 0),
status_counts.get('uploaded', 0),
status_counts.get('error', 0),
f"{sum(quality_scores) / len(quality_scores):.2f}" if quality_scores else 'N/A',
f"{max(quality_scores):.2f}" if quality_scores else 'N/A',
f"{min(quality_scores):.2f}" if quality_scores else 'N/A',
len([q for q in quality_data.values() if q and q.is_acceptable]),
len([q for q in quality_data.values() if q and not q.is_acceptable]),
]
# Add OCR statistics if available
if ocr_data:
receipts_with_ocr = len([r for r in receipts if r.id in ocr_data])
total_items = sum(len(ocr.get('items', [])) for ocr in ocr_data.values())
total_spent = sum(
ocr.get('totals', {}).get('total', 0) or 0
for ocr in ocr_data.values()
)
avg_confidence = sum(
ocr.get('confidence', {}).get('overall', 0) or 0
for ocr in ocr_data.values()
) / len(ocr_data) if ocr_data else 0
metrics.extend([
'', # Blank row
'OCR Statistics',
'Receipts with OCR Data',
'Total Line Items Extracted',
'Total Amount Spent',
'Average OCR Confidence',
])
values.extend([
'',
'',
receipts_with_ocr,
total_items,
f"${total_spent:.2f}" if total_spent > 0 else 'N/A',
f"{avg_confidence:.2%}" if avg_confidence > 0 else 'N/A',
])
summary = {
'Metric': metrics,
'Value': values
}
return pd.DataFrame(summary)

View file

@ -0,0 +1,10 @@
# app/services/image_preprocessing/__init__.py
"""
Image preprocessing services for Kiwi.
Contains functions for image enhancement, format conversion, and perspective correction.
"""
from app.services.image_preprocessing.format_conversion import convert_to_standard_format, extract_metadata
from app.services.image_preprocessing.enhancement import enhance_image, correct_perspective
__all__ = ["convert_to_standard_format", "extract_metadata", "enhance_image", "correct_perspective"]

View file

@ -0,0 +1,172 @@
#!/usr/bin/env python
# app/services/image_preprocessing/
import cv2
import numpy as np
import logging
from pathlib import Path
from typing import Tuple, Optional
logger = logging.getLogger(__name__)
def enhance_image(
image_path: Path,
output_path: Optional[Path] = None,
adaptive_threshold: bool = True,
denoise: bool = True,
) -> Tuple[bool, str, Optional[Path]]:
"""
Enhance receipt image for better OCR.
Args:
image_path: Path to input image
output_path: Optional path to save enhanced image
adaptive_threshold: Whether to apply adaptive thresholding
denoise: Whether to apply denoising
Returns:
Tuple containing (success, message, output_path)
"""
try:
# Check if CUDA is available
use_cuda = cv2.cuda.getCudaEnabledDeviceCount() > 0
# Set output path if not provided
if output_path is None:
output_path = image_path.with_stem(f"{image_path.stem}_enhanced")
# Read image
img = cv2.imread(str(image_path))
if img is None:
return False, f"Failed to read image: {image_path}", None
# Convert to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Apply denoising if requested
if denoise:
if use_cuda:
# GPU accelerated denoising
gpu_img = cv2.cuda_GpuMat()
gpu_img.upload(gray)
gpu_result = cv2.cuda.createNonLocalMeans().apply(gpu_img)
denoised = gpu_result.download()
else:
# CPU denoising
denoised = cv2.fastNlMeansDenoising(gray, None, 10, 7, 21)
else:
denoised = gray
# Apply adaptive thresholding if requested
if adaptive_threshold:
# Adaptive thresholding works well for receipts with varying backgrounds
binary = cv2.adaptiveThreshold(
denoised,
255,
cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY,
11,
2
)
processed = binary
else:
processed = denoised
# Write enhanced image
success = cv2.imwrite(str(output_path), processed)
if not success:
return False, f"Failed to write enhanced image to {output_path}", None
return True, "Image enhanced successfully", output_path
except Exception as e:
logger.exception(f"Error enhancing image: {e}")
return False, f"Error enhancing image: {str(e)}", None
def correct_perspective(
image_path: Path,
output_path: Optional[Path] = None,
) -> Tuple[bool, str, Optional[Path]]:
"""
Correct perspective distortion in receipt image.
Args:
image_path: Path to input image
output_path: Optional path to save corrected image
Returns:
Tuple containing (success, message, output_path)
"""
try:
# Set output path if not provided
if output_path is None:
output_path = image_path.with_stem(f"{image_path.stem}_perspective")
# Read image
img = cv2.imread(str(image_path))
if img is None:
return False, f"Failed to read image: {image_path}", None
# Convert to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Apply Gaussian blur to reduce noise
blur = cv2.GaussianBlur(gray, (5, 5), 0)
# Apply edge detection
edges = cv2.Canny(blur, 50, 150, apertureSize=3)
# Find contours
contours, _ = cv2.findContours(edges, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
# Find the largest contour by area which is likely the receipt
if not contours:
return False, "No contours found in image", None
largest_contour = max(contours, key=cv2.contourArea)
# Approximate the contour to get the corners
epsilon = 0.02 * cv2.arcLength(largest_contour, True)
approx = cv2.approxPolyDP(largest_contour, epsilon, True)
# If we have a quadrilateral, we can apply perspective transform
if len(approx) == 4:
# Sort the points for the perspective transform
# This is a simplified implementation
src_pts = approx.reshape(4, 2).astype(np.float32)
# Get width and height for the destination image
width = int(max(
np.linalg.norm(src_pts[0] - src_pts[1]),
np.linalg.norm(src_pts[2] - src_pts[3])
))
height = int(max(
np.linalg.norm(src_pts[0] - src_pts[3]),
np.linalg.norm(src_pts[1] - src_pts[2])
))
# Define destination points
dst_pts = np.array([
[0, 0],
[width - 1, 0],
[width - 1, height - 1],
[0, height - 1]
], dtype=np.float32)
# Get perspective transform matrix
M = cv2.getPerspectiveTransform(src_pts, dst_pts)
# Apply perspective transform
warped = cv2.warpPerspective(img, M, (width, height))
# Write corrected image
success = cv2.imwrite(str(output_path), warped)
if not success:
return False, f"Failed to write perspective-corrected image to {output_path}", None
return True, "Perspective corrected successfully", output_path
else:
return False, "Receipt corners not clearly detected", None
except Exception as e:
logger.exception(f"Error correcting perspective: {e}")
return False, f"Error correcting perspective: {str(e)}", None

View file

@ -0,0 +1,89 @@
#!/usr/bin/env python
# app/services/image_preprocessing/format_conversion.py
import cv2
import numpy as np
import logging
from pathlib import Path
from typing import Tuple, Optional
logger = logging.getLogger(__name__)
def convert_to_standard_format(
image_path: Path,
output_path: Optional[Path] = None,
target_format: str = "png"
) -> Tuple[bool, str, Optional[Path]]:
"""
Convert image to standard internal format.
Args:
image_path: Path to input image
output_path: Optional path to save converted image
target_format: Target format (png, jpg)
Returns:
Tuple containing (success, message, output_path)
"""
try:
# Check if CUDA is available and set up GPU processing
if cv2.cuda.getCudaEnabledDeviceCount() > 0:
logger.info("CUDA available, using GPU acceleration")
use_cuda = True
else:
logger.info("CUDA not available, using CPU processing")
use_cuda = False
# Read image
img = cv2.imread(str(image_path))
if img is None:
return False, f"Failed to read image: {image_path}", None
# If PDF, extract first page (simplified for Phase 1)
if image_path.suffix.lower() == '.pdf':
# This is a placeholder for PDF handling
# In a real implementation, you'd use a PDF processing library
return False, "PDF processing not implemented in Phase 1", None
# Set output path if not provided
if output_path is None:
output_path = image_path.with_suffix(f".{target_format}")
# Write converted image
success = cv2.imwrite(str(output_path), img)
if not success:
return False, f"Failed to write converted image to {output_path}", None
return True, "Image converted successfully", output_path
except Exception as e:
logger.exception(f"Error converting image: {e}")
return False, f"Error converting image: {str(e)}", None
def extract_metadata(image_path: Path) -> dict:
"""
Extract metadata from image file.
Args:
image_path: Path to input image
Returns:
Dictionary containing metadata
"""
metadata = {
"filename": image_path.name,
"original_format": image_path.suffix.lstrip(".").lower(),
"file_size_bytes": image_path.stat().st_size,
}
try:
img = cv2.imread(str(image_path))
if img is not None:
metadata.update({
"width": img.shape[1],
"height": img.shape[0],
"channels": img.shape[2] if len(img.shape) > 2 else 1,
})
except Exception as e:
logger.exception(f"Error extracting image metadata: {e}")
return metadata

View file

@ -0,0 +1,539 @@
"""
Inventory management service.
This service orchestrates:
- Barcode scanning
- Product lookups (OpenFoodFacts)
- Inventory CRUD operations
- Tag management
"""
from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy import select, func, and_, or_
from sqlalchemy.orm import selectinload
from typing import List, Optional, Dict, Any
from datetime import date, datetime, timedelta
from pathlib import Path
from uuid import UUID
import uuid
import logging
from app.db.models import Product, InventoryItem, Tag, product_tags
from app.models.schemas.inventory import (
ProductCreate,
ProductUpdate,
ProductResponse,
InventoryItemCreate,
InventoryItemUpdate,
InventoryItemResponse,
TagCreate,
TagResponse,
InventoryStats,
)
from app.services.barcode_scanner import BarcodeScanner
from app.services.openfoodfacts import OpenFoodFactsService
from app.services.expiration_predictor import ExpirationPredictor
logger = logging.getLogger(__name__)
class InventoryService:
"""Service for managing inventory and products."""
def __init__(self):
self.barcode_scanner = BarcodeScanner()
self.openfoodfacts = OpenFoodFactsService()
self.expiration_predictor = ExpirationPredictor()
# ========== Barcode Scanning ==========
async def scan_barcode_image(
self,
image_path: Path,
db: AsyncSession,
auto_add: bool = True,
location: str = "pantry",
quantity: float = 1.0,
) -> Dict[str, Any]:
"""
Scan an image for barcodes and optionally add to inventory.
Args:
image_path: Path to image file
db: Database session
auto_add: Whether to auto-add to inventory
location: Default storage location
quantity: Default quantity
Returns:
Dictionary with scan results
"""
# Scan for barcodes
barcodes = self.barcode_scanner.scan_image(image_path)
if not barcodes:
return {
"success": False,
"barcodes_found": 0,
"results": [],
"message": "No barcodes detected in image",
}
results = []
for barcode_data in barcodes:
result = await self._process_barcode(
barcode_data, db, auto_add, location, quantity
)
results.append(result)
return {
"success": True,
"barcodes_found": len(barcodes),
"results": results,
"message": f"Found {len(barcodes)} barcode(s)",
}
async def _process_barcode(
self,
barcode_data: Dict[str, Any],
db: AsyncSession,
auto_add: bool,
location: str,
quantity: float,
) -> Dict[str, Any]:
"""Process a single barcode detection."""
barcode = barcode_data["data"]
barcode_type = barcode_data["type"]
# Check if product already exists
product = await self.get_product_by_barcode(db, barcode)
# If not found, lookup in OpenFoodFacts
if not product:
off_data = await self.openfoodfacts.lookup_product(barcode)
if off_data:
# Create product from OpenFoodFacts data
product_create = ProductCreate(
barcode=barcode,
name=off_data["name"],
brand=off_data.get("brand"),
category=off_data.get("category"),
description=off_data.get("description"),
image_url=off_data.get("image_url"),
nutrition_data=off_data.get("nutrition_data", {}),
source="openfoodfacts",
source_data=off_data.get("raw_data", {}),
)
product = await self.create_product(db, product_create)
source = "openfoodfacts"
else:
# Product not found in OpenFoodFacts
# Create a placeholder product
product_create = ProductCreate(
barcode=barcode,
name=f"Unknown Product ({barcode})",
source="manual",
)
product = await self.create_product(db, product_create)
source = "manual"
else:
source = product.source
# Auto-add to inventory if requested
inventory_item = None
predicted_expiration = None
if auto_add:
# Predict expiration date based on product category and location
category = self.expiration_predictor.get_category_from_product(
product.name,
product.category,
[tag.name for tag in product.tags] if product.tags else None
)
if category:
predicted_expiration = self.expiration_predictor.predict_expiration(
category,
location,
date.today()
)
item_create = InventoryItemCreate(
product_id=product.id,
quantity=quantity,
location=location,
purchase_date=date.today(),
expiration_date=predicted_expiration,
source="barcode_scan",
)
inventory_item = await self.create_inventory_item(db, item_create)
return {
"barcode": barcode,
"barcode_type": barcode_type,
"quality": barcode_data["quality"],
"product": ProductResponse.from_orm(product),
"inventory_item": (
InventoryItemResponse.from_orm(inventory_item) if inventory_item else None
),
"source": source,
"predicted_expiration": predicted_expiration.isoformat() if predicted_expiration else None,
"predicted_category": category if auto_add else None,
}
# ========== Product Management ==========
async def create_product(
self,
db: AsyncSession,
product: ProductCreate,
) -> Product:
"""Create a new product."""
# Create product
db_product = Product(
id=uuid.uuid4(),
barcode=product.barcode,
name=product.name,
brand=product.brand,
category=product.category,
description=product.description,
image_url=product.image_url,
nutrition_data=product.nutrition_data,
source=product.source,
source_data=product.source_data,
)
db.add(db_product)
await db.flush()
# Add tags if specified
if product.tag_ids:
for tag_id in product.tag_ids:
tag = await db.get(Tag, tag_id)
if tag:
db_product.tags.append(tag)
await db.commit()
await db.refresh(db_product, ["tags"])
return db_product
async def get_product(self, db: AsyncSession, product_id: UUID) -> Optional[Product]:
"""Get a product by ID."""
result = await db.execute(
select(Product).where(Product.id == product_id).options(selectinload(Product.tags))
)
return result.scalar_one_or_none()
async def get_product_by_barcode(
self, db: AsyncSession, barcode: str
) -> Optional[Product]:
"""Get a product by barcode."""
result = await db.execute(
select(Product).where(Product.barcode == barcode).options(selectinload(Product.tags))
)
return result.scalar_one_or_none()
async def list_products(
self,
db: AsyncSession,
skip: int = 0,
limit: int = 100,
category: Optional[str] = None,
) -> List[Product]:
"""List products with optional filtering."""
query = select(Product).options(selectinload(Product.tags))
if category:
query = query.where(Product.category == category)
query = query.offset(skip).limit(limit).order_by(Product.name)
result = await db.execute(query)
return list(result.scalars().all())
async def update_product(
self,
db: AsyncSession,
product_id: UUID,
product_update: ProductUpdate,
) -> Optional[Product]:
"""Update a product."""
product = await self.get_product(db, product_id)
if not product:
return None
# Update fields
for field, value in product_update.dict(exclude_unset=True).items():
if field == "tag_ids":
# Update tags
product.tags = []
for tag_id in value:
tag = await db.get(Tag, tag_id)
if tag:
product.tags.append(tag)
else:
setattr(product, field, value)
product.updated_at = datetime.utcnow()
await db.commit()
await db.refresh(product, ["tags"])
return product
async def delete_product(self, db: AsyncSession, product_id: UUID) -> bool:
"""Delete a product (will fail if inventory items exist)."""
product = await self.get_product(db, product_id)
if not product:
return False
await db.delete(product)
await db.commit()
return True
# ========== Inventory Item Management ==========
async def create_inventory_item(
self,
db: AsyncSession,
item: InventoryItemCreate,
) -> InventoryItem:
"""Create a new inventory item."""
db_item = InventoryItem(
id=uuid.uuid4(),
product_id=item.product_id,
quantity=item.quantity,
unit=item.unit,
location=item.location,
sublocation=item.sublocation,
purchase_date=item.purchase_date,
expiration_date=item.expiration_date,
notes=item.notes,
source=item.source,
status="available",
)
db.add(db_item)
await db.commit()
await db.refresh(db_item, ["product"])
return db_item
async def get_inventory_item(
self, db: AsyncSession, item_id: UUID
) -> Optional[InventoryItem]:
"""Get an inventory item by ID."""
result = await db.execute(
select(InventoryItem)
.where(InventoryItem.id == item_id)
.options(selectinload(InventoryItem.product).selectinload(Product.tags))
)
return result.scalar_one_or_none()
async def list_inventory_items(
self,
db: AsyncSession,
skip: int = 0,
limit: int = 100,
location: Optional[str] = None,
status: str = "available",
) -> List[InventoryItem]:
"""List inventory items with filtering."""
query = select(InventoryItem).options(
selectinload(InventoryItem.product).selectinload(Product.tags)
)
query = query.where(InventoryItem.status == status)
if location:
query = query.where(InventoryItem.location == location)
query = (
query.offset(skip)
.limit(limit)
.order_by(InventoryItem.expiration_date.asc().nullsfirst())
)
result = await db.execute(query)
return list(result.scalars().all())
async def update_inventory_item(
self,
db: AsyncSession,
item_id: UUID,
item_update: InventoryItemUpdate,
) -> Optional[InventoryItem]:
"""Update an inventory item."""
item = await self.get_inventory_item(db, item_id)
if not item:
return None
for field, value in item_update.dict(exclude_unset=True).items():
setattr(item, field, value)
item.updated_at = datetime.utcnow()
if item_update.status == "consumed" and not item.consumed_at:
item.consumed_at = datetime.utcnow()
await db.commit()
await db.refresh(item, ["product"])
return item
async def delete_inventory_item(self, db: AsyncSession, item_id: UUID) -> bool:
"""Delete an inventory item."""
item = await self.get_inventory_item(db, item_id)
if not item:
return False
await db.delete(item)
await db.commit()
return True
async def mark_as_consumed(
self, db: AsyncSession, item_id: UUID
) -> Optional[InventoryItem]:
"""Mark an inventory item as consumed."""
return await self.update_inventory_item(
db, item_id, InventoryItemUpdate(status="consumed")
)
# ========== Tag Management ==========
async def create_tag(self, db: AsyncSession, tag: TagCreate) -> Tag:
"""Create a new tag."""
db_tag = Tag(
id=uuid.uuid4(),
name=tag.name,
slug=tag.slug,
description=tag.description,
color=tag.color,
category=tag.category,
)
db.add(db_tag)
await db.commit()
await db.refresh(db_tag)
return db_tag
async def get_tag(self, db: AsyncSession, tag_id: UUID) -> Optional[Tag]:
"""Get a tag by ID."""
return await db.get(Tag, tag_id)
async def list_tags(
self, db: AsyncSession, category: Optional[str] = None
) -> List[Tag]:
"""List all tags, optionally filtered by category."""
query = select(Tag).order_by(Tag.name)
if category:
query = query.where(Tag.category == category)
result = await db.execute(query)
return list(result.scalars().all())
# ========== Statistics and Analytics ==========
async def get_inventory_stats(self, db: AsyncSession) -> InventoryStats:
"""Get inventory statistics."""
# Total items (available only)
total_result = await db.execute(
select(func.count(InventoryItem.id)).where(InventoryItem.status == "available")
)
total_items = total_result.scalar() or 0
# Total unique products
products_result = await db.execute(
select(func.count(func.distinct(InventoryItem.product_id))).where(
InventoryItem.status == "available"
)
)
total_products = products_result.scalar() or 0
# Items by location
location_result = await db.execute(
select(
InventoryItem.location,
func.count(InventoryItem.id).label("count"),
)
.where(InventoryItem.status == "available")
.group_by(InventoryItem.location)
)
items_by_location = {row[0]: row[1] for row in location_result}
# Items by status
status_result = await db.execute(
select(InventoryItem.status, func.count(InventoryItem.id).label("count")).group_by(
InventoryItem.status
)
)
items_by_status = {row[0]: row[1] for row in status_result}
# Expiring soon (next 7 days)
today = date.today()
week_from_now = today + timedelta(days=7)
expiring_result = await db.execute(
select(func.count(InventoryItem.id)).where(
and_(
InventoryItem.status == "available",
InventoryItem.expiration_date.isnot(None),
InventoryItem.expiration_date <= week_from_now,
InventoryItem.expiration_date >= today,
)
)
)
expiring_soon = expiring_result.scalar() or 0
# Expired
expired_result = await db.execute(
select(func.count(InventoryItem.id)).where(
and_(
InventoryItem.status == "available",
InventoryItem.expiration_date.isnot(None),
InventoryItem.expiration_date < today,
)
)
)
expired = expired_result.scalar() or 0
return InventoryStats(
total_items=total_items,
total_products=total_products,
items_by_location=items_by_location,
items_by_status=items_by_status,
expiring_soon=expiring_soon,
expired=expired,
)
async def get_expiring_items(
self, db: AsyncSession, days: int = 7
) -> List[Dict[str, Any]]:
"""Get items expiring within N days."""
today = date.today()
cutoff_date = today + timedelta(days=days)
result = await db.execute(
select(InventoryItem)
.where(
and_(
InventoryItem.status == "available",
InventoryItem.expiration_date.isnot(None),
InventoryItem.expiration_date <= cutoff_date,
InventoryItem.expiration_date >= today,
)
)
.options(selectinload(InventoryItem.product).selectinload(Product.tags))
.order_by(InventoryItem.expiration_date.asc())
)
items = result.scalars().all()
return [
{
"inventory_item": item,
"days_until_expiry": (item.expiration_date - today).days,
}
for item in items
]

View file

@ -0,0 +1,5 @@
"""OCR services for receipt text extraction."""
from .vl_model import VisionLanguageOCR
__all__ = ["VisionLanguageOCR"]

View file

@ -0,0 +1,60 @@
"""Thin HTTP client for the cf-docuvision document vision service."""
from __future__ import annotations
import base64
from dataclasses import dataclass
from pathlib import Path
import httpx
@dataclass
class DocuvisionResult:
text: str
confidence: float | None = None
raw: dict | None = None
class DocuvisionClient:
"""Thin client for the cf-docuvision service."""
def __init__(self, base_url: str) -> None:
self._base_url = base_url.rstrip("/")
def extract_text(self, image_path: str | Path) -> DocuvisionResult:
"""Send an image to docuvision and return extracted text."""
image_bytes = Path(image_path).read_bytes()
b64 = base64.b64encode(image_bytes).decode()
with httpx.Client(timeout=30.0) as client:
resp = client.post(
f"{self._base_url}/extract",
json={"image": b64},
)
resp.raise_for_status()
data = resp.json()
return DocuvisionResult(
text=data.get("text", ""),
confidence=data.get("confidence"),
raw=data,
)
async def extract_text_async(self, image_path: str | Path) -> DocuvisionResult:
"""Async version."""
image_bytes = Path(image_path).read_bytes()
b64 = base64.b64encode(image_bytes).decode()
async with httpx.AsyncClient(timeout=30.0) as client:
resp = await client.post(
f"{self._base_url}/extract",
json={"image": b64},
)
resp.raise_for_status()
data = resp.json()
return DocuvisionResult(
text=data.get("text", ""),
confidence=data.get("confidence"),
raw=data,
)

View file

@ -0,0 +1,410 @@
#!/usr/bin/env python
"""
Vision-Language Model service for receipt OCR and structured data extraction.
Uses Qwen3-VL-2B-Instruct for intelligent receipt processing that combines
OCR with understanding of receipt structure to extract structured JSON data.
"""
import json
import logging
import os
import re
from pathlib import Path
from typing import Dict, Any, Optional, List
from datetime import datetime
from PIL import Image
import torch
from transformers import (
Qwen2VLForConditionalGeneration,
AutoProcessor,
BitsAndBytesConfig
)
from app.core.config import settings
logger = logging.getLogger(__name__)
def _try_docuvision(image_path: str | Path) -> str | None:
"""Try to extract text via cf-docuvision. Returns None if unavailable."""
cf_orch_url = os.environ.get("CF_ORCH_URL")
if not cf_orch_url:
return None
try:
from circuitforge_core.resources import CFOrchClient
from app.services.ocr.docuvision_client import DocuvisionClient
client = CFOrchClient(cf_orch_url)
with client.allocate(
service="cf-docuvision",
model_candidates=["cf-docuvision"],
ttl_s=60.0,
caller="kiwi-ocr",
) as alloc:
if alloc is None:
return None
doc_client = DocuvisionClient(alloc.url)
result = doc_client.extract_text(image_path)
return result.text if result.text else None
except Exception as exc:
logger.debug("cf-docuvision fast-path failed, falling back: %s", exc)
return None
class VisionLanguageOCR:
"""Vision-Language Model for receipt OCR and structured extraction."""
def __init__(self, use_quantization: bool = False):
"""
Initialize the VLM OCR service.
Args:
use_quantization: Use 8-bit quantization to reduce memory usage
"""
self.model = None
self.processor = None
self.device = "cuda" if torch.cuda.is_available() and settings.USE_GPU else "cpu"
self.use_quantization = use_quantization
self.model_name = "Qwen/Qwen2.5-VL-7B-Instruct"
logger.info(f"Initializing VisionLanguageOCR with device: {self.device}")
# Lazy loading - model will be loaded on first use
self._model_loaded = False
def _load_model(self):
"""Load the VLM model (lazy loading)."""
if self._model_loaded:
return
logger.info(f"Loading VLM model: {self.model_name}")
try:
if self.use_quantization and self.device == "cuda":
# Use 8-bit quantization for lower memory usage
quantization_config = BitsAndBytesConfig(
load_in_8bit=True,
llm_int8_threshold=6.0
)
self.model = Qwen2VLForConditionalGeneration.from_pretrained(
self.model_name,
quantization_config=quantization_config,
device_map="auto",
low_cpu_mem_usage=True
)
logger.info("Model loaded with 8-bit quantization")
else:
# Standard FP16 loading
self.model = Qwen2VLForConditionalGeneration.from_pretrained(
self.model_name,
torch_dtype=torch.float16 if self.device == "cuda" else torch.float32,
device_map="auto" if self.device == "cuda" else None,
low_cpu_mem_usage=True
)
if self.device == "cpu":
self.model = self.model.to("cpu")
logger.info(f"Model loaded in {'FP16' if self.device == 'cuda' else 'FP32'} mode")
self.processor = AutoProcessor.from_pretrained(self.model_name)
self.model.eval() # Set to evaluation mode
self._model_loaded = True
logger.info("VLM model loaded successfully")
except Exception as e:
logger.error(f"Failed to load VLM model: {e}")
raise RuntimeError(f"Could not load VLM model: {e}")
def extract_receipt_data(self, image_path: str) -> Dict[str, Any]:
"""
Extract structured data from receipt image.
Args:
image_path: Path to the receipt image
Returns:
Dictionary containing extracted receipt data with structure:
{
"merchant": {...},
"transaction": {...},
"items": [...],
"totals": {...},
"confidence": {...},
"raw_text": "...",
"warnings": [...]
}
"""
# Try docuvision fast path first (skips heavy local VLM if available)
docuvision_text = _try_docuvision(image_path)
if docuvision_text is not None:
parsed = self._parse_json_from_text(docuvision_text)
# Only accept the docuvision result if it yielded meaningful content;
# an empty-skeleton dict (no items, no merchant) means the text was
# garbled and we should fall through to the local VLM instead.
if parsed.get("items") or parsed.get("merchant"):
parsed["raw_text"] = docuvision_text
return self._validate_result(parsed)
# Parsed result has no meaningful content — fall through to local VLM
self._load_model()
try:
# Load image
image = Image.open(image_path)
# Convert to RGB if needed
if image.mode != 'RGB':
image = image.convert('RGB')
# Build extraction prompt
prompt = self._build_extraction_prompt()
# Process image and text
logger.info(f"Processing receipt image: {image_path}")
inputs = self.processor(
images=image,
text=prompt,
return_tensors="pt"
)
# Move to device
if self.device == "cuda":
inputs = {k: v.to("cuda", torch.float16) if isinstance(v, torch.Tensor) else v
for k, v in inputs.items()}
# Generate
with torch.no_grad():
output_ids = self.model.generate(
**inputs,
max_new_tokens=2048,
do_sample=False, # Deterministic for consistency
temperature=0.0,
pad_token_id=self.processor.tokenizer.pad_token_id,
eos_token_id=self.processor.tokenizer.eos_token_id,
)
# Decode output
output_text = self.processor.decode(
output_ids[0],
skip_special_tokens=True
)
# Remove the prompt from output
output_text = output_text.replace(prompt, "").strip()
logger.info(f"VLM output length: {len(output_text)} characters")
# Parse JSON from output
result = self._parse_json_from_text(output_text)
# Add raw text for reference
result["raw_text"] = output_text
# Validate and enhance result
result = self._validate_result(result)
return result
except Exception as e:
logger.error(f"Error extracting receipt data: {e}", exc_info=True)
return {
"error": str(e),
"merchant": {},
"transaction": {},
"items": [],
"totals": {},
"confidence": {"overall": 0.0},
"warnings": [f"Extraction failed: {str(e)}"]
}
def _build_extraction_prompt(self) -> str:
"""Build the prompt for receipt data extraction."""
return """You are a receipt OCR specialist. Extract all information from this receipt image and return it in the exact JSON format specified below.
Return a JSON object with this exact structure:
{
"merchant": {
"name": "Store Name",
"address": "123 Main St, City, State ZIP",
"phone": "555-1234"
},
"transaction": {
"date": "2025-10-30",
"time": "14:30:00",
"receipt_number": "12345",
"register": "01",
"cashier": "Jane"
},
"items": [
{
"name": "Product name",
"quantity": 2,
"unit_price": 10.99,
"total_price": 21.98,
"category": "grocery",
"tax_code": "F",
"discount": 0.00
}
],
"totals": {
"subtotal": 21.98,
"tax": 1.98,
"discount": 0.00,
"total": 23.96,
"payment_method": "Credit Card",
"amount_paid": 23.96,
"change": 0.00
},
"confidence": {
"overall": 0.95,
"merchant": 0.98,
"items": 0.92,
"totals": 0.97
}
}
Important instructions:
1. Extract ALL items from the receipt, no matter how many there are
2. Use null for fields you cannot find
3. For dates, use YYYY-MM-DD format
4. For times, use HH:MM:SS format
5. For prices, use numeric values (not strings)
6. Estimate confidence scores (0.0-1.0) based on image quality and text clarity
7. Return ONLY the JSON object, no other text or explanation"""
def _parse_json_from_text(self, text: str) -> Dict[str, Any]:
"""
Extract and parse JSON from model output text.
Args:
text: Raw text output from the model
Returns:
Parsed JSON dictionary
"""
# Try to find JSON object in text
# Look for content between first { and last }
json_match = re.search(r'\{.*\}', text, re.DOTALL)
if json_match:
json_str = json_match.group(0)
try:
return json.loads(json_str)
except json.JSONDecodeError as e:
logger.warning(f"Failed to parse JSON: {e}")
# Try to fix common issues
json_str = self._fix_json(json_str)
try:
return json.loads(json_str)
except json.JSONDecodeError:
logger.error("Could not parse JSON even after fixes")
# Return empty structure if parsing fails
logger.warning("No valid JSON found in output, returning empty structure")
return {
"merchant": {},
"transaction": {},
"items": [],
"totals": {},
"confidence": {"overall": 0.1}
}
def _fix_json(self, json_str: str) -> str:
"""Attempt to fix common JSON formatting issues."""
# Remove trailing commas
json_str = re.sub(r',\s*}', '}', json_str)
json_str = re.sub(r',\s*]', ']', json_str)
# Fix single quotes to double quotes
json_str = json_str.replace("'", '"')
return json_str
def _validate_result(self, result: Dict[str, Any]) -> Dict[str, Any]:
"""
Validate and enhance extracted data.
Args:
result: Extracted receipt data
Returns:
Validated and enhanced result with warnings
"""
warnings = []
# Ensure required fields exist
required_fields = ["merchant", "transaction", "items", "totals", "confidence"]
for field in required_fields:
if field not in result:
result[field] = {} if field != "items" else []
warnings.append(f"Missing required field: {field}")
# Validate items
if not result.get("items"):
warnings.append("No items found on receipt")
else:
# Validate item structure
for i, item in enumerate(result["items"]):
if "total_price" not in item and "unit_price" in item and "quantity" in item:
item["total_price"] = item["unit_price"] * item["quantity"]
# Validate totals
if result.get("items") and result.get("totals"):
calculated_subtotal = sum(
item.get("total_price", 0)
for item in result["items"]
)
reported_subtotal = result["totals"].get("subtotal", 0)
# Allow small variance (rounding errors)
if abs(calculated_subtotal - reported_subtotal) > 0.10:
warnings.append(
f"Total mismatch: calculated ${calculated_subtotal:.2f}, "
f"reported ${reported_subtotal:.2f}"
)
result["totals"]["calculated_subtotal"] = calculated_subtotal
# Validate date format
if result.get("transaction", {}).get("date"):
try:
datetime.strptime(result["transaction"]["date"], "%Y-%m-%d")
except ValueError:
warnings.append(f"Invalid date format: {result['transaction']['date']}")
# Add warnings to result
if warnings:
result["warnings"] = warnings
# Ensure confidence exists
if "confidence" not in result or not result["confidence"]:
result["confidence"] = {
"overall": 0.5,
"merchant": 0.5,
"items": 0.5,
"totals": 0.5
}
return result
def get_model_info(self) -> Dict[str, Any]:
"""Get information about the loaded model."""
return {
"model_name": self.model_name,
"device": self.device,
"quantization": self.use_quantization,
"loaded": self._model_loaded,
"gpu_available": torch.cuda.is_available(),
"gpu_memory_allocated": torch.cuda.memory_allocated() if torch.cuda.is_available() else 0,
"gpu_memory_reserved": torch.cuda.memory_reserved() if torch.cuda.is_available() else 0
}
def clear_cache(self):
"""Clear GPU memory cache."""
if torch.cuda.is_available():
torch.cuda.empty_cache()
logger.info("GPU cache cleared")

View file

@ -0,0 +1,234 @@
"""
OpenFoodFacts API integration service.
This module provides functionality to look up product information
from the OpenFoodFacts database using barcodes (UPC/EAN).
"""
import httpx
from typing import Optional, Dict, Any
from app.core.config import settings
import logging
logger = logging.getLogger(__name__)
class OpenFoodFactsService:
"""
Service for interacting with the OpenFoodFacts API.
OpenFoodFacts is a free, open database of food products with
ingredients, allergens, and nutrition facts.
"""
BASE_URL = "https://world.openfoodfacts.org/api/v2"
USER_AGENT = "Kiwi/0.1.0 (https://circuitforge.tech)"
async def lookup_product(self, barcode: str) -> Optional[Dict[str, Any]]:
"""
Look up a product by barcode in the OpenFoodFacts database.
Args:
barcode: UPC/EAN barcode (8-13 digits)
Returns:
Dictionary with product information, or None if not found
Example response:
{
"name": "Organic Milk",
"brand": "Horizon",
"categories": ["Dairy", "Milk"],
"image_url": "https://...",
"nutrition_data": {...},
"raw_data": {...} # Full API response
}
"""
try:
async with httpx.AsyncClient() as client:
url = f"{self.BASE_URL}/product/{barcode}.json"
response = await client.get(
url,
headers={"User-Agent": self.USER_AGENT},
timeout=10.0,
)
if response.status_code == 404:
logger.info(f"Product not found in OpenFoodFacts: {barcode}")
return None
response.raise_for_status()
data = response.json()
if data.get("status") != 1:
logger.info(f"Product not found in OpenFoodFacts: {barcode}")
return None
return self._parse_product_data(data, barcode)
except httpx.HTTPError as e:
logger.error(f"HTTP error looking up barcode {barcode}: {e}")
return None
except Exception as e:
logger.error(f"Error looking up barcode {barcode}: {e}")
return None
def _parse_product_data(self, data: Dict[str, Any], barcode: str) -> Dict[str, Any]:
"""
Parse OpenFoodFacts API response into our product format.
Args:
data: Raw API response
barcode: Original barcode
Returns:
Parsed product dictionary
"""
product = data.get("product", {})
# Extract basic info
name = (
product.get("product_name")
or product.get("product_name_en")
or f"Unknown Product ({barcode})"
)
brand = product.get("brands", "").split(",")[0].strip() if product.get("brands") else None
# Categories (comma-separated string to list)
categories_str = product.get("categories", "")
categories = [c.strip() for c in categories_str.split(",") if c.strip()]
category = categories[0] if categories else None
# Description
description = product.get("generic_name") or product.get("generic_name_en")
# Image
image_url = product.get("image_url") or product.get("image_front_url")
# Nutrition data
nutrition_data = self._extract_nutrition_data(product)
# Allergens and dietary info
allergens = product.get("allergens_tags", [])
labels = product.get("labels_tags", [])
return {
"name": name,
"brand": brand,
"category": category,
"categories": categories,
"description": description,
"image_url": image_url,
"nutrition_data": nutrition_data,
"allergens": allergens,
"labels": labels,
"raw_data": product, # Store full response for debugging
}
def _extract_nutrition_data(self, product: Dict[str, Any]) -> Dict[str, Any]:
"""
Extract nutrition facts from product data.
Args:
product: Product data from OpenFoodFacts
Returns:
Dictionary of nutrition facts
"""
nutriments = product.get("nutriments", {})
# Extract common nutrients (per 100g)
nutrition = {}
# Energy
if "energy-kcal_100g" in nutriments:
nutrition["calories"] = nutriments["energy-kcal_100g"]
elif "energy_100g" in nutriments:
# Convert kJ to kcal (1 kcal = 4.184 kJ)
nutrition["calories"] = round(nutriments["energy_100g"] / 4.184, 1)
# Macronutrients
if "fat_100g" in nutriments:
nutrition["fat_g"] = nutriments["fat_100g"]
if "saturated-fat_100g" in nutriments:
nutrition["saturated_fat_g"] = nutriments["saturated-fat_100g"]
if "carbohydrates_100g" in nutriments:
nutrition["carbohydrates_g"] = nutriments["carbohydrates_100g"]
if "sugars_100g" in nutriments:
nutrition["sugars_g"] = nutriments["sugars_100g"]
if "fiber_100g" in nutriments:
nutrition["fiber_g"] = nutriments["fiber_100g"]
if "proteins_100g" in nutriments:
nutrition["protein_g"] = nutriments["proteins_100g"]
# Minerals
if "salt_100g" in nutriments:
nutrition["salt_g"] = nutriments["salt_100g"]
elif "sodium_100g" in nutriments:
# Convert sodium to salt (1g sodium = 2.5g salt)
nutrition["salt_g"] = round(nutriments["sodium_100g"] * 2.5, 2)
# Serving size
if "serving_size" in product:
nutrition["serving_size"] = product["serving_size"]
return nutrition
async def search_products(
self,
query: str,
page: int = 1,
page_size: int = 20
) -> Dict[str, Any]:
"""
Search for products by name in OpenFoodFacts.
Args:
query: Search query
page: Page number (1-indexed)
page_size: Number of results per page
Returns:
Dictionary with search results and metadata
"""
try:
async with httpx.AsyncClient() as client:
url = f"{self.BASE_URL}/search"
response = await client.get(
url,
params={
"search_terms": query,
"page": page,
"page_size": page_size,
"json": 1,
},
headers={"User-Agent": self.USER_AGENT},
timeout=10.0,
)
response.raise_for_status()
data = response.json()
products = [
self._parse_product_data({"product": p}, p.get("code", ""))
for p in data.get("products", [])
]
return {
"products": products,
"count": data.get("count", 0),
"page": data.get("page", page),
"page_size": data.get("page_size", page_size),
}
except Exception as e:
logger.error(f"Error searching OpenFoodFacts: {e}")
return {
"products": [],
"count": 0,
"page": page,
"page_size": page_size,
}

View file

@ -0,0 +1,9 @@
# app/services/quality/__init__.py
"""
Quality assessment services for Kiwi.
Contains functionality for evaluating receipt image quality.
"""
from app.services.quality.assessment import QualityAssessor
__all__ = ["QualityAssessor"]

View file

@ -0,0 +1,332 @@
#!/usr/bin/env python
# app/services/quality/assessment.py
import cv2
import numpy as np
import logging
from pathlib import Path
from typing import Dict, Any, Optional, Tuple
logger = logging.getLogger(__name__)
class QualityAssessor:
"""
Assesses the quality of receipt images for processing suitability.
"""
def __init__(self, min_quality_score: float = 50.0):
"""
Initialize the quality assessor.
Args:
min_quality_score: Minimum acceptable quality score (0-100)
"""
self.min_quality_score = min_quality_score
def assess_image(self, image_path: Path) -> Dict[str, Any]:
"""
Assess the quality of an image.
Args:
image_path: Path to the image
Returns:
Dictionary containing quality metrics
"""
try:
# Read image
img = cv2.imread(str(image_path))
if img is None:
return {
"success": False,
"error": f"Failed to read image: {image_path}",
"overall_score": 0.0,
}
# Convert to grayscale for some metrics
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Calculate various quality metrics
blur_score = self._calculate_blur_score(gray)
lighting_score = self._calculate_lighting_score(gray)
contrast_score = self._calculate_contrast_score(gray)
size_score = self._calculate_size_score(img.shape)
# Check for potential fold lines
fold_detected, fold_severity = self._detect_folds(gray)
# Calculate overall quality score
overall_score = self._calculate_overall_score({
"blur": blur_score,
"lighting": lighting_score,
"contrast": contrast_score,
"size": size_score,
"fold": 100.0 if not fold_detected else (100.0 - fold_severity * 20.0)
})
# Create assessment result
result = {
"success": True,
"metrics": {
"blur_score": blur_score,
"lighting_score": lighting_score,
"contrast_score": contrast_score,
"size_score": size_score,
"fold_detected": fold_detected,
"fold_severity": fold_severity if fold_detected else 0.0,
},
"overall_score": overall_score,
"is_acceptable": overall_score >= self.min_quality_score,
"improvement_suggestions": self._generate_suggestions({
"blur": blur_score,
"lighting": lighting_score,
"contrast": contrast_score,
"size": size_score,
"fold": fold_detected,
"fold_severity": fold_severity if fold_detected else 0.0,
}),
}
return result
except Exception as e:
logger.exception(f"Error assessing image quality: {e}")
return {
"success": False,
"error": f"Error assessing image quality: {str(e)}",
"overall_score": 0.0,
}
def _calculate_blur_score(self, gray_img: np.ndarray) -> float:
"""
Calculate blur score using Laplacian variance.
Higher variance = less blurry (higher score)
Args:
gray_img: Grayscale image
Returns:
Blur score (0-100)
"""
# Use Laplacian for edge detection
laplacian = cv2.Laplacian(gray_img, cv2.CV_64F)
# Calculate variance of Laplacian
variance = laplacian.var()
# Map variance to a 0-100 score
# These thresholds might need adjustment based on your specific requirements
if variance < 10:
return 0.0 # Very blurry
elif variance < 100:
return (variance - 10) / 90 * 50 # Map 10-100 to 0-50
elif variance < 1000:
return 50 + (variance - 100) / 900 * 50 # Map 100-1000 to 50-100
else:
return 100.0 # Very sharp
def _calculate_lighting_score(self, gray_img: np.ndarray) -> float:
"""
Calculate lighting score based on average brightness and std dev.
Args:
gray_img: Grayscale image
Returns:
Lighting score (0-100)
"""
# Calculate mean brightness
mean = gray_img.mean()
# Calculate standard deviation of brightness
std = gray_img.std()
# Ideal mean would be around 127 (middle of 0-255)
# Penalize if too dark or too bright
mean_score = 100 - abs(mean - 127) / 127 * 100
# Higher std dev generally means better contrast
# But we'll cap at 60 for reasonable balance
std_score = min(std / 60 * 100, 100)
# Combine scores (weighted)
return 0.6 * mean_score + 0.4 * std_score
def _calculate_contrast_score(self, gray_img: np.ndarray) -> float:
"""
Calculate contrast score.
Args:
gray_img: Grayscale image
Returns:
Contrast score (0-100)
"""
# Calculate histogram
hist = cv2.calcHist([gray_img], [0], None, [256], [0, 256])
# Calculate percentage of pixels in each brightness range
total_pixels = gray_img.shape[0] * gray_img.shape[1]
dark_pixels = np.sum(hist[:50]) / total_pixels
mid_pixels = np.sum(hist[50:200]) / total_pixels
bright_pixels = np.sum(hist[200:]) / total_pixels
# Ideal: good distribution across ranges with emphasis on mid-range
# This is a simplified model - real receipts may need different distributions
score = (
(0.2 * min(dark_pixels * 500, 100)) + # Want some dark pixels (text)
(0.6 * min(mid_pixels * 200, 100)) + # Want many mid pixels
(0.2 * min(bright_pixels * 500, 100)) # Want some bright pixels (background)
)
return score
def _calculate_size_score(self, shape: Tuple[int, int, int]) -> float:
"""
Calculate score based on image dimensions.
Args:
shape: Image shape (height, width, channels)
Returns:
Size score (0-100)
"""
height, width = shape[0], shape[1]
# Minimum recommended dimensions for good OCR
min_height, min_width = 800, 600
# Calculate size score
if height < min_height or width < min_width:
# Penalize if below minimum dimensions
return min(height / min_height, width / min_width) * 100
else:
# Full score if dimensions are adequate
return 100.0
def _detect_folds(self, gray_img: np.ndarray) -> Tuple[bool, float]:
"""
Detect potential fold lines in the image.
Args:
gray_img: Grayscale image
Returns:
Tuple of (fold_detected, fold_severity)
fold_severity is a value between 0 and 5, with 5 being the most severe
"""
# Apply edge detection
edges = cv2.Canny(gray_img, 50, 150, apertureSize=3)
# Apply Hough Line Transform to detect straight lines
lines = cv2.HoughLinesP(
edges,
rho=1,
theta=np.pi/180,
threshold=100,
minLineLength=gray_img.shape[1] // 3, # Look for lines at least 1/3 of image width
maxLineGap=10
)
if lines is None:
return False, 0.0
# Check for horizontal or vertical lines that could be folds
potential_folds = []
height, width = gray_img.shape
for line in lines:
x1, y1, x2, y2 = line[0]
length = np.sqrt((x2 - x1)**2 + (y2 - y1)**2)
angle = np.abs(np.arctan2(y2 - y1, x2 - x1) * 180 / np.pi)
# Check if horizontal (0±10°) or vertical (90±10°)
is_horizontal = angle < 10 or angle > 170
is_vertical = abs(angle - 90) < 10
# Check if length is significant
is_significant = (is_horizontal and length > width * 0.5) or \
(is_vertical and length > height * 0.5)
if (is_horizontal or is_vertical) and is_significant:
# Calculate intensity variance along the line
# This helps determine if it's a fold (sharp brightness change)
# Simplified implementation for Phase 1
potential_folds.append({
"length": length,
"is_horizontal": is_horizontal,
})
# Determine if folds are detected and their severity
if not potential_folds:
return False, 0.0
# Severity based on number and length of potential folds
# This is a simplified metric for Phase 1
total_len = sum(fold["length"] for fold in potential_folds)
if is_horizontal:
severity = min(5.0, total_len / width * 2.5)
else:
severity = min(5.0, total_len / height * 2.5)
return True, severity
def _calculate_overall_score(self, scores: Dict[str, float]) -> float:
"""
Calculate overall quality score from individual metrics.
Args:
scores: Dictionary of individual quality scores
Returns:
Overall quality score (0-100)
"""
# Weights for different factors
weights = {
"blur": 0.30,
"lighting": 0.25,
"contrast": 0.25,
"size": 0.10,
"fold": 0.10,
}
# Calculate weighted average
overall = sum(weights[key] * scores[key] for key in weights)
return overall
def _generate_suggestions(self, metrics: Dict[str, Any]) -> list:
"""
Generate improvement suggestions based on metrics.
Args:
metrics: Dictionary of quality metrics
Returns:
List of improvement suggestions
"""
suggestions = []
# Blur suggestions
if metrics["blur"] < 60:
suggestions.append("Hold the camera steady and ensure the receipt is in focus.")
# Lighting suggestions
if metrics["lighting"] < 60:
suggestions.append("Improve lighting conditions and avoid shadows on the receipt.")
# Contrast suggestions
if metrics["contrast"] < 60:
suggestions.append("Ensure good contrast between text and background.")
# Size suggestions
if metrics["size"] < 60:
suggestions.append("Move the camera closer to the receipt for better resolution.")
# Fold suggestions
if metrics["fold"]:
if metrics["fold_severity"] > 3.0:
suggestions.append("The receipt has severe folds. Try to flatten it before capturing.")
else:
suggestions.append("Flatten the receipt to remove fold lines for better processing.")
return suggestions

View file

@ -0,0 +1,126 @@
"""
Receipt processing service orchestrates the OCR pipeline.
Pipeline stages:
1. Preprocess enhance image, convert to PNG
2. Quality score image; abort to 'low_quality' if below threshold
3. OCR VisionLanguageOCR extracts structured data
4. Persist flatten result into receipt_data table
5. Stage set status to 'staged'; items await human approval
Items are NOT added to inventory automatically. Use the
POST /receipts/{id}/ocr/approve endpoint to commit approved items.
"""
from __future__ import annotations
import logging
from pathlib import Path
from typing import Any
from app.db.store import Store
logger = logging.getLogger(__name__)
def _flatten_ocr_result(result: dict[str, Any]) -> dict[str, Any]:
"""Map nested VisionLanguageOCR output to the flat receipt_data schema."""
merchant = result.get("merchant") or {}
transaction = result.get("transaction") or {}
totals = result.get("totals") or {}
return {
"merchant_name": merchant.get("name"),
"merchant_address": merchant.get("address"),
"merchant_phone": merchant.get("phone"),
"transaction_date": transaction.get("date"),
"transaction_time": transaction.get("time"),
"receipt_number": transaction.get("receipt_number"),
"register_number": transaction.get("register"),
"cashier_name": transaction.get("cashier"),
"items": result.get("items") or [],
"subtotal": totals.get("subtotal"),
"tax": totals.get("tax"),
"discount": totals.get("discount"),
"total": totals.get("total"),
"payment_method": totals.get("payment_method"),
"amount_paid": totals.get("amount_paid"),
"change_given": totals.get("change"),
"raw_text": result.get("raw_text"),
"confidence_scores": result.get("confidence") or {},
"warnings": result.get("warnings") or [],
}
class ReceiptService:
def __init__(self, store: Store) -> None:
self.store = store
async def process(self, receipt_id: int, image_path: Path) -> None:
"""Run the full OCR pipeline for a receipt image.
Stages run synchronously inside asyncio.to_thread so SQLite and the
VLM (which uses torch) both stay off the async event loop.
"""
import asyncio
await asyncio.to_thread(self._run_pipeline, receipt_id, image_path)
def _run_pipeline(self, receipt_id: int, image_path: Path) -> None:
from app.core.config import settings
from app.services.image_preprocessing.enhancement import ImageEnhancer
from app.services.image_preprocessing.format_conversion import FormatConverter
from app.services.quality.assessment import QualityAssessor
# ── Stage 1: Preprocess ───────────────────────────────────────────────
enhancer = ImageEnhancer()
converter = FormatConverter()
enhanced = enhancer.enhance(image_path)
processed_path = converter.to_png(enhanced)
# ── Stage 2: Quality assessment ───────────────────────────────────────
assessor = QualityAssessor()
assessment = assessor.assess(processed_path)
self.store.upsert_quality_assessment(
receipt_id,
overall_score=assessment["overall_score"],
is_acceptable=assessment["is_acceptable"],
metrics=assessment.get("metrics", {}),
suggestions=assessment.get("suggestions", []),
)
if not assessment["is_acceptable"]:
self.store.update_receipt_status(receipt_id, "low_quality")
logger.warning(
"Receipt %s: quality too low for OCR (score=%.1f) — %s",
receipt_id, assessment["overall_score"],
"; ".join(assessment.get("suggestions", [])),
)
return
if not settings.ENABLE_OCR:
self.store.update_receipt_status(receipt_id, "processed")
logger.info("Receipt %s: quality OK but ENABLE_OCR=false — skipping OCR", receipt_id)
return
# ── Stage 3: OCR extraction ───────────────────────────────────────────
from app.services.ocr.vl_model import VisionLanguageOCR
ocr = VisionLanguageOCR()
result = ocr.extract_receipt_data(str(processed_path))
if result.get("error"):
self.store.update_receipt_status(receipt_id, "error", result["error"])
logger.error("Receipt %s: OCR failed — %s", receipt_id, result["error"])
return
# ── Stage 4: Persist extracted data ───────────────────────────────────
flat = _flatten_ocr_result(result)
self.store.upsert_receipt_data(receipt_id, flat)
item_count = len(flat.get("items") or [])
# ── Stage 5: Stage for human approval ────────────────────────────────
self.store.update_receipt_status(receipt_id, "staged")
logger.info(
"Receipt %s: OCR complete — %d item(s) staged for review "
"(confidence=%.2f)",
receipt_id, item_count,
(result.get("confidence") or {}).get("overall", 0.0),
)

View file

@ -0,0 +1,295 @@
#!/usr/bin/env python
# app/services/receipt_service.py
import os
import uuid
import shutil
import aiofiles
from pathlib import Path
from typing import Optional, List, Dict, Any
from fastapi import UploadFile, BackgroundTasks, HTTPException
import asyncio
import logging
import sys
from app.utils.progress import ProgressIndicator
from app.services.image_preprocessing.format_conversion import convert_to_standard_format, extract_metadata
from app.services.image_preprocessing.enhancement import enhance_image, correct_perspective
from app.services.quality.assessment import QualityAssessor
from app.models.schemas.receipt import ReceiptCreate, ReceiptResponse
from app.models.schemas.quality import QualityAssessment
from app.core.config import settings
logger = logging.getLogger(__name__)
class ReceiptService:
"""
Service for handling receipt processing.
"""
def __init__(self):
"""
Initialize the receipt service.
"""
self.quality_assessor = QualityAssessor()
self.upload_dir = Path(settings.UPLOAD_DIR)
self.processing_dir = Path(settings.PROCESSING_DIR)
# Create directories if they don't exist
self.upload_dir.mkdir(parents=True, exist_ok=True)
self.processing_dir.mkdir(parents=True, exist_ok=True)
# In-memory storage for Phase 1 (would be replaced by DB in production)
self.receipts = {}
self.quality_assessments = {}
async def process_receipt(
self,
file: UploadFile,
background_tasks: BackgroundTasks
) -> ReceiptResponse:
"""
Process a single receipt file.
Args:
file: Uploaded receipt file
background_tasks: FastAPI background tasks
Returns:
ReceiptResponse object
"""
# Generate unique ID for receipt
receipt_id = str(uuid.uuid4())
# Save uploaded file
upload_path = self.upload_dir / f"{receipt_id}_{file.filename}"
await self._save_upload_file(file, upload_path)
# Create receipt entry
receipt = {
"id": receipt_id,
"filename": file.filename,
"status": "uploaded",
"original_path": str(upload_path),
"processed_path": None,
"metadata": {},
}
self.receipts[receipt_id] = receipt
# Add background task for processing
background_tasks.add_task(
self._process_receipt_background,
receipt_id,
upload_path
)
return ReceiptResponse(
id=receipt_id,
filename=file.filename,
status="processing",
metadata={},
quality_score=None,
)
async def get_receipt(self, receipt_id: str) -> Optional[ReceiptResponse]:
"""
Get receipt by ID.
Args:
receipt_id: Receipt ID
Returns:
ReceiptResponse object or None if not found
"""
receipt = self.receipts.get(receipt_id)
if not receipt:
return None
quality = self.quality_assessments.get(receipt_id)
quality_score = quality.get("overall_score") if quality else None
return ReceiptResponse(
id=receipt["id"],
filename=receipt["filename"],
status=receipt["status"],
metadata=receipt["metadata"],
quality_score=quality_score,
)
async def get_receipt_quality(self, receipt_id: str) -> Optional[QualityAssessment]:
"""
Get quality assessment for a receipt.
Args:
receipt_id: Receipt ID
Returns:
QualityAssessment object or None if not found
"""
quality = self.quality_assessments.get(receipt_id)
if not quality:
return None
return QualityAssessment(
receipt_id=receipt_id,
overall_score=quality["overall_score"],
is_acceptable=quality["is_acceptable"],
metrics=quality["metrics"],
suggestions=quality["improvement_suggestions"],
)
def list_receipts(self) -> List[ReceiptResponse]:
"""
List all receipts.
Returns:
List of ReceiptResponse objects
"""
result = []
for receipt_id, receipt in self.receipts.items():
quality = self.quality_assessments.get(receipt_id)
quality_score = quality.get("overall_score") if quality else None
result.append(ReceiptResponse(
id=receipt["id"],
filename=receipt["filename"],
status=receipt["status"],
metadata=receipt["metadata"],
quality_score=quality_score,
))
return result
def get_quality_assessments(self) -> Dict[str, QualityAssessment]:
"""
Get all quality assessments.
Returns:
Dict mapping receipt_id to QualityAssessment object
"""
result = {}
for receipt_id, quality in self.quality_assessments.items():
result[receipt_id] = QualityAssessment(
receipt_id=receipt_id,
overall_score=quality["overall_score"],
is_acceptable=quality["is_acceptable"],
metrics=quality["metrics"],
suggestions=quality["improvement_suggestions"],
)
return result
async def _save_upload_file(self, file: UploadFile, destination: Path) -> None:
"""
Save an uploaded file to disk.
Args:
file: Uploaded file
destination: Destination path
"""
try:
async with aiofiles.open(destination, 'wb') as out_file:
# Read in chunks to handle large files
content = await file.read(1024 * 1024) # 1MB chunks
while content:
await out_file.write(content)
content = await file.read(1024 * 1024)
except Exception as e:
logger.exception(f"Error saving upload file: {e}")
raise HTTPException(status_code=500, detail=f"Error saving upload file: {str(e)}")
async def _process_receipt_background(self, receipt_id: str, upload_path: Path) -> None:
"""
Background task for processing a receipt with progress indicators.
Args:
receipt_id: Receipt ID
upload_path: Path to uploaded file
"""
try:
# Print a message to indicate start of processing
print(f"\nProcessing receipt {receipt_id}...")
# Update status
self.receipts[receipt_id]["status"] = "processing"
# Create processing directory for this receipt
receipt_dir = self.processing_dir / receipt_id
receipt_dir.mkdir(parents=True, exist_ok=True)
# Step 1: Convert to standard format
print(" Step 1/4: Converting to standard format...")
converted_path = receipt_dir / f"{receipt_id}_converted.png"
success, message, actual_converted_path = convert_to_standard_format(
upload_path,
converted_path
)
if not success:
print(f" ✗ Format conversion failed: {message}")
self.receipts[receipt_id]["status"] = "error"
self.receipts[receipt_id]["error"] = message
return
print(" ✓ Format conversion complete")
# Step 2: Correct perspective
print(" Step 2/4: Correcting perspective...")
perspective_path = receipt_dir / f"{receipt_id}_perspective.png"
success, message, actual_perspective_path = correct_perspective(
actual_converted_path,
perspective_path
)
# Use corrected image if available, otherwise use converted image
current_path = actual_perspective_path if success else actual_converted_path
if success:
print(" ✓ Perspective correction complete")
else:
print(f" ⚠ Perspective correction skipped: {message}")
# Step 3: Enhance image
print(" Step 3/4: Enhancing image...")
enhanced_path = receipt_dir / f"{receipt_id}_enhanced.png"
success, message, actual_enhanced_path = enhance_image(
current_path,
enhanced_path
)
if not success:
print(f" ⚠ Enhancement warning: {message}")
# Continue with current path
else:
current_path = actual_enhanced_path
print(" ✓ Image enhancement complete")
# Step 4: Assess quality
print(" Step 4/4: Assessing quality...")
quality_assessment = self.quality_assessor.assess_image(current_path)
self.quality_assessments[receipt_id] = quality_assessment
print(f" ✓ Quality assessment complete: score {quality_assessment['overall_score']:.1f}/100")
# Step 5: Extract metadata
print(" Extracting metadata...")
metadata = extract_metadata(upload_path)
if current_path != upload_path:
processed_metadata = extract_metadata(current_path)
metadata["processed"] = {
"width": processed_metadata.get("width"),
"height": processed_metadata.get("height"),
"format": processed_metadata.get("original_format"),
}
print(" ✓ Metadata extraction complete")
# Update receipt entry
self.receipts[receipt_id].update({
"status": "processed",
"processed_path": str(current_path),
"metadata": metadata,
})
print(f"✓ Receipt {receipt_id} processed successfully!")
except Exception as e:
print(f"✗ Error processing receipt {receipt_id}: {e}")
self.receipts[receipt_id]["status"] = "error"
self.receipts[receipt_id]["error"] = str(e)

View file

View file

@ -0,0 +1,647 @@
"""
Assembly-dish template matcher for Level 1/2.
Assembly dishes (burritos, stir fry, fried rice, omelettes, sandwiches, etc.)
are defined by structural roles -- container + filler + sauce -- not by a fixed
ingredient list. The corpus can never fully cover them.
This module fires when the pantry covers all *required* roles of a template.
Results are injected at the top of the Level 1/2 suggestion list with negative
ids (client displays them identically to corpus recipes).
Templates define:
- required: list of role sets -- ALL must have at least one pantry match
- optional: role sets whose matched items are shown as extras
- directions: short cooking instructions
- notes: serving suggestions / variations
"""
from __future__ import annotations
import hashlib
from dataclasses import dataclass
from app.models.schemas.recipe import RecipeSuggestion
# IDs in range -100..-1 are reserved for assembly-generated suggestions
_ASSEMBLY_ID_START = -1
@dataclass
class AssemblyRole:
"""One role in a template (e.g. 'protein').
display: human-readable role label
keywords: substrings matched against pantry item (lowercased)
"""
display: str
keywords: list[str]
@dataclass
class AssemblyTemplate:
"""A template assembly dish."""
id: int
title: str
required: list[AssemblyRole]
optional: list[AssemblyRole]
directions: list[str]
notes: str = ""
def _matches_role(role: AssemblyRole, pantry_set: set[str]) -> list[str]:
"""Return pantry items that satisfy this role.
Single-word keywords use whole-word matching (word must appear as a
discrete token) so short words like 'pea', 'ham', 'egg' don't false-match
inside longer words like 'peanut', 'hamburger', 'eggnog'.
Multi-word keywords (e.g. 'burger patt') use substring matching.
"""
hits: list[str] = []
for item in pantry_set:
item_lower = item.lower()
item_words = set(item_lower.split())
for kw in role.keywords:
if " " in kw:
# Multi-word: substring match
if kw in item_lower:
hits.append(item)
break
else:
# Single-word: whole-word match only
if kw in item_words:
hits.append(item)
break
return hits
def _pick_one(items: list[str], seed: int) -> str:
"""Deterministically pick one item from a list using a seed."""
return sorted(items)[seed % len(items)]
def _pantry_hash(pantry_set: set[str]) -> int:
"""Stable integer derived from pantry contents — used for deterministic picks."""
key = ",".join(sorted(pantry_set))
return int(hashlib.md5(key.encode()).hexdigest(), 16) # noqa: S324 — non-crypto use
def _keyword_label(item: str, role: AssemblyRole) -> str:
"""Return a short, clean label derived from the keyword that matched.
Uses the longest matching keyword (most specific) as the base label,
then title-cases it. This avoids pasting full raw pantry names like
'Organic Extra Firm Tofu' into titles just 'Tofu' instead.
"""
lower = item.lower()
best_kw = ""
for kw in role.keywords:
if kw in lower and len(kw) > len(best_kw):
best_kw = kw
label = (best_kw or item).strip().title()
# Drop trailing 's' from keywords like "beans" → "Bean" when it reads better
return label
def _personalized_title(tmpl: AssemblyTemplate, pantry_set: set[str], seed: int) -> str:
"""Build a specific title using actual pantry items, e.g. 'Chicken & Broccoli Burrito'.
Uses the matched keyword as the label (not the full pantry item name) so
'Organic Extra Firm Tofu Block' 'Tofu' in the title.
Picks at most two roles; prefers protein then vegetable.
"""
priority_displays = ["protein", "vegetables", "sauce base", "cheese"]
picked: list[str] = []
for display in priority_displays:
for role in tmpl.optional:
if role.display != display:
continue
hits = _matches_role(role, pantry_set)
if hits:
item = _pick_one(hits, seed)
label = _keyword_label(item, role)
if label not in picked:
picked.append(label)
if len(picked) >= 2:
break
if not picked:
return tmpl.title
return f"{' & '.join(picked)} {tmpl.title}"
# ---------------------------------------------------------------------------
# Template definitions
# ---------------------------------------------------------------------------
ASSEMBLY_TEMPLATES: list[AssemblyTemplate] = [
AssemblyTemplate(
id=-1,
title="Burrito / Taco",
required=[
AssemblyRole("tortilla or wrap", [
"tortilla", "wrap", "taco shell", "flatbread", "pita",
]),
],
optional=[
AssemblyRole("protein", [
"chicken", "beef", "steak", "pork", "sausage", "hamburger",
"burger patt", "shrimp", "egg", "tofu", "beans", "bean",
]),
AssemblyRole("rice or starch", ["rice", "quinoa", "potato"]),
AssemblyRole("cheese", [
"cheese", "cheddar", "mozzarella", "monterey", "queso",
]),
AssemblyRole("salsa or sauce", [
"salsa", "hot sauce", "taco sauce", "enchilada", "guacamole",
]),
AssemblyRole("sour cream or yogurt", ["sour cream", "greek yogurt", "crema"]),
AssemblyRole("vegetables", [
"pepper", "onion", "tomato", "lettuce", "corn", "avocado",
"spinach", "broccoli", "zucchini",
]),
],
directions=[
"Warm the tortilla in a dry skillet or microwave for 20 seconds.",
"Heat any proteins or vegetables in a pan until cooked through.",
"Layer ingredients down the center: rice first, then protein, then vegetables.",
"Add cheese, salsa, and sour cream last so they stay cool.",
"Fold in the sides and roll tightly. Optionally toast seam-side down 1-2 minutes.",
],
notes="Works as a burrito (rolled), taco (folded), or quesadilla (cheese only, pressed flat).",
),
AssemblyTemplate(
id=-2,
title="Fried Rice",
required=[
AssemblyRole("cooked rice", [
"rice", "leftover rice", "instant rice", "microwavable rice",
]),
],
optional=[
AssemblyRole("protein", [
"chicken", "beef", "pork", "shrimp", "egg", "tofu",
"sausage", "ham", "spam",
]),
AssemblyRole("soy sauce or seasoning", [
"soy sauce", "tamari", "teriyaki", "oyster sauce", "fish sauce",
]),
AssemblyRole("oil", ["oil", "butter", "sesame"]),
AssemblyRole("egg", ["egg"]),
AssemblyRole("vegetables", [
"carrot", "peas", "corn", "onion", "scallion", "green onion",
"broccoli", "bok choy", "bean sprout", "zucchini", "spinach",
]),
AssemblyRole("garlic or ginger", ["garlic", "ginger"]),
],
directions=[
"Use day-old cold rice if available -- it fries better than fresh.",
"Heat oil in a large skillet or wok over high heat.",
"Add garlic/ginger and any raw vegetables; stir fry 2-3 minutes.",
"Push to the side, scramble eggs in the same pan if using.",
"Add protein (pre-cooked or raw) and cook through.",
"Add rice, breaking up clumps. Stir fry until heated and lightly toasted.",
"Season with soy sauce and any other sauces. Toss to combine.",
],
notes="Add a fried egg on top. A drizzle of sesame oil at the end adds a lot.",
),
AssemblyTemplate(
id=-3,
title="Omelette / Scramble",
required=[
AssemblyRole("eggs", ["egg"]),
],
optional=[
AssemblyRole("cheese", [
"cheese", "cheddar", "mozzarella", "feta", "parmesan",
]),
AssemblyRole("vegetables", [
"pepper", "onion", "tomato", "spinach", "mushroom",
"broccoli", "zucchini", "scallion", "avocado",
]),
AssemblyRole("protein", [
"ham", "bacon", "sausage", "chicken", "turkey",
"smoked salmon",
]),
AssemblyRole("herbs or seasoning", [
"herb", "basil", "chive", "parsley", "salt", "pepper",
"hot sauce", "salsa",
]),
],
directions=[
"Beat eggs with a splash of water or milk and a pinch of salt.",
"Saute any vegetables and proteins in butter or oil over medium heat until softened.",
"Pour eggs over fillings (scramble) or pour into a clean buttered pan (omelette).",
"For omelette: cook until nearly set, add fillings to one side, fold over.",
"For scramble: stir gently over medium-low heat until just set.",
"Season and serve immediately.",
],
notes="Works for breakfast, lunch, or a quick dinner. Any leftover vegetables work well.",
),
AssemblyTemplate(
id=-4,
title="Stir Fry",
required=[
AssemblyRole("vegetables", [
"pepper", "broccoli", "carrot", "snap pea", "bok choy",
"zucchini", "mushroom", "corn", "onion", "bean sprout",
"cabbage", "spinach", "asparagus",
]),
],
optional=[
AssemblyRole("protein", [
"chicken", "beef", "pork", "shrimp", "tofu", "egg",
]),
AssemblyRole("sauce", [
"soy sauce", "teriyaki", "oyster sauce", "hoisin",
"stir fry sauce", "sesame",
]),
AssemblyRole("starch base", ["rice", "noodle", "pasta", "ramen"]),
AssemblyRole("garlic or ginger", ["garlic", "ginger"]),
AssemblyRole("oil", ["oil", "sesame"]),
],
directions=[
"Cut all proteins and vegetables into similar-sized pieces for even cooking.",
"Heat oil in a wok or large skillet over the highest heat your stove allows.",
"Cook protein first until nearly done; remove and set aside.",
"Add dense vegetables (carrots, broccoli) first; quick-cooking veg last.",
"Return protein, add sauce, and toss everything together for 1-2 minutes.",
"Serve over rice or noodles.",
],
notes="High heat is the key. Do not crowd the pan -- cook in batches if needed.",
),
AssemblyTemplate(
id=-5,
title="Pasta with Whatever You Have",
required=[
AssemblyRole("pasta", [
"pasta", "spaghetti", "penne", "fettuccine", "rigatoni",
"linguine", "rotini", "farfalle", "macaroni", "noodle",
]),
],
optional=[
AssemblyRole("sauce base", [
"tomato", "marinara", "pasta sauce", "cream", "butter",
"olive oil", "pesto",
]),
AssemblyRole("protein", [
"chicken", "beef", "pork", "shrimp", "sausage", "bacon",
"ham", "tuna", "canned fish",
]),
AssemblyRole("cheese", [
"parmesan", "romano", "mozzarella", "ricotta", "feta",
]),
AssemblyRole("vegetables", [
"tomato", "spinach", "mushroom", "pepper", "zucchini",
"broccoli", "artichoke", "olive", "onion",
]),
AssemblyRole("garlic", ["garlic"]),
],
directions=[
"Cook pasta in well-salted boiling water until al dente. Reserve 1 cup pasta water.",
"While pasta cooks, saute garlic in olive oil over medium heat.",
"Add proteins and cook through; add vegetables until tender.",
"Add sauce base and simmer 5 minutes. Add pasta water to loosen if needed.",
"Toss cooked pasta with sauce. Finish with cheese if using.",
],
notes="Pasta water is the secret -- the starch thickens and binds any sauce.",
),
AssemblyTemplate(
id=-6,
title="Sandwich / Wrap",
required=[
AssemblyRole("bread or wrap", [
"bread", "roll", "bun", "wrap", "tortilla", "pita",
"bagel", "english muffin", "croissant", "flatbread",
]),
],
optional=[
AssemblyRole("protein", [
"chicken", "turkey", "ham", "roast beef", "tuna", "egg",
"bacon", "salami", "pepperoni", "tofu", "tempeh",
]),
AssemblyRole("cheese", [
"cheese", "cheddar", "swiss", "provolone", "mozzarella",
]),
AssemblyRole("condiment", [
"mayo", "mustard", "ketchup", "hot sauce", "ranch",
"hummus", "pesto", "aioli",
]),
AssemblyRole("vegetables", [
"lettuce", "tomato", "onion", "cucumber", "avocado",
"pepper", "sprout", "arugula",
]),
],
directions=[
"Toast bread if desired.",
"Spread condiments on both inner surfaces.",
"Layer protein first, then cheese, then vegetables.",
"Press together and cut diagonally.",
],
notes="Leftovers, deli meat, canned fish -- nearly anything works between bread.",
),
AssemblyTemplate(
id=-7,
title="Grain Bowl",
required=[
AssemblyRole("grain base", [
"rice", "quinoa", "farro", "barley", "couscous",
"bulgur", "freekeh", "polenta",
]),
],
optional=[
AssemblyRole("protein", [
"chicken", "beef", "pork", "tofu", "egg", "shrimp",
"beans", "bean", "lentil", "chickpea",
]),
AssemblyRole("vegetables", [
"roasted", "broccoli", "carrot", "kale", "spinach",
"cucumber", "tomato", "corn", "edamame", "avocado",
"beet", "sweet potato",
]),
AssemblyRole("dressing or sauce", [
"dressing", "tahini", "vinaigrette", "sauce",
"olive oil", "lemon", "soy sauce",
]),
AssemblyRole("toppings", [
"nut", "seed", "feta", "parmesan", "herb",
]),
],
directions=[
"Cook grain base according to package directions; season with salt.",
"Roast or saute vegetables with oil, salt, and pepper until tender.",
"Cook or slice protein.",
"Arrange grain in a bowl, top with protein and vegetables.",
"Drizzle with dressing and add toppings.",
],
notes="Great for meal prep -- cook grains and proteins in bulk, assemble bowls all week.",
),
AssemblyTemplate(
id=-8,
title="Soup / Stew",
required=[
AssemblyRole("broth or liquid base", [
"broth", "stock", "bouillon",
"tomato sauce", "coconut milk", "cream of",
]),
],
optional=[
AssemblyRole("protein", [
"chicken", "beef", "pork", "sausage", "shrimp",
"beans", "bean", "lentil", "tofu",
]),
AssemblyRole("vegetables", [
"carrot", "celery", "onion", "potato", "tomato",
"spinach", "kale", "corn", "pea", "zucchini",
]),
AssemblyRole("starch thickener", [
"potato", "pasta", "noodle", "rice", "barley",
"flour", "cornstarch",
]),
AssemblyRole("seasoning", [
"garlic", "herb", "bay leaf", "thyme", "rosemary",
"cumin", "paprika", "chili",
]),
],
directions=[
"Saute onion, celery, and garlic in oil until softened, about 5 minutes.",
"Add any raw proteins and cook until browned.",
"Add broth or liquid base and bring to a simmer.",
"Add dense vegetables (carrots, potatoes) first; quick-cooking veg in the last 10 minutes.",
"Add starches and cook until tender.",
"Season to taste and simmer at least 20 minutes for flavors to develop.",
],
notes="Soups and stews improve overnight in the fridge. Almost any combination works.",
),
AssemblyTemplate(
id=-9,
title="Casserole / Bake",
required=[
AssemblyRole("starch or base", [
"pasta", "rice", "potato", "noodle", "bread",
"tortilla", "polenta", "grits", "macaroni",
]),
AssemblyRole("binder or sauce", [
"cream of", "cheese", "cream cheese", "sour cream",
"soup mix", "gravy", "tomato sauce", "marinara",
"broth", "stock", "milk", "cream",
]),
],
optional=[
AssemblyRole("protein", [
"chicken", "beef", "pork", "tuna", "ham", "sausage",
"ground", "shrimp", "beans", "bean", "lentil",
]),
AssemblyRole("vegetables", [
"broccoli", "corn", "pea", "onion", "mushroom",
"spinach", "zucchini", "tomato", "pepper", "carrot",
]),
AssemblyRole("cheese topping", [
"cheddar", "mozzarella", "parmesan", "swiss",
"cheese", "breadcrumb",
]),
AssemblyRole("seasoning", [
"garlic", "herb", "thyme", "rosemary", "paprika",
"onion powder", "salt", "pepper",
]),
],
directions=[
"Preheat oven to 375 F (190 C). Grease a 9x13 baking dish.",
"Cook starch base (pasta, rice, potato) until just underdone -- it finishes in the oven.",
"Mix cooked starch with sauce/binder, protein, and vegetables in the dish.",
"Season generously -- casseroles need salt.",
"Top with cheese or breadcrumbs if using.",
"Bake covered 25 minutes, then uncovered 15 minutes until golden and bubbly.",
],
notes="Classic pantry dump dinner. Cream of anything soup is the universal binder.",
),
AssemblyTemplate(
id=-10,
title="Pancakes / Waffles / Quick Bread",
required=[
AssemblyRole("flour or baking mix", [
"flour", "bisquick", "pancake mix", "waffle mix",
"baking mix", "cornmeal", "oats",
]),
AssemblyRole("leavening or egg", [
"egg", "baking powder", "baking soda", "yeast",
]),
],
optional=[
AssemblyRole("liquid", [
"milk", "buttermilk", "water", "juice",
"almond milk", "oat milk", "sour cream",
]),
AssemblyRole("fat", [
"butter", "oil", "margarine",
]),
AssemblyRole("sweetener", [
"sugar", "honey", "maple syrup", "brown sugar",
]),
AssemblyRole("mix-ins", [
"blueberr", "banana", "apple", "chocolate chip",
"nut", "berry", "cinnamon", "vanilla",
]),
],
directions=[
"Whisk dry ingredients (flour, leavening, sugar, salt) together in a bowl.",
"Whisk wet ingredients (egg, milk, melted butter) in a separate bowl.",
"Fold wet into dry until just combined -- lumps are fine, do not overmix.",
"For pancakes: cook on a buttered griddle over medium heat, flip when bubbles form.",
"For waffles: pour into preheated waffle iron according to manufacturer instructions.",
"For muffins or quick bread: pour into greased pan, bake at 375 F until a toothpick comes out clean.",
],
notes="Overmixing develops gluten and makes pancakes tough. Stop when just combined.",
),
AssemblyTemplate(
id=-11,
title="Porridge / Oatmeal",
required=[
AssemblyRole("oats or grain porridge", [
"oat", "porridge", "grits", "semolina", "cream of wheat",
"polenta", "congee", "rice porridge",
]),
],
optional=[
AssemblyRole("liquid", ["milk", "water", "almond milk", "oat milk", "coconut milk"]),
AssemblyRole("sweetener", ["sugar", "honey", "maple syrup", "brown sugar", "agave"]),
AssemblyRole("fruit", ["banana", "berry", "apple", "raisin", "date", "mango"]),
AssemblyRole("toppings", ["nut", "seed", "granola", "coconut", "chocolate"]),
AssemblyRole("spice", ["cinnamon", "nutmeg", "vanilla", "cardamom"]),
],
directions=[
"Combine oats with liquid in a pot — typically 1 part oats to 2 parts liquid.",
"Bring to a gentle simmer over medium heat, stirring occasionally.",
"Cook 5 minutes (rolled oats) or 2 minutes (quick oats) until thickened to your liking.",
"Stir in sweetener and spices.",
"Top with fruit, nuts, or seeds and serve immediately.",
],
notes="Overnight oats: skip cooking — soak oats in cold milk overnight in the fridge.",
),
AssemblyTemplate(
id=-12,
title="Pie / Pot Pie",
required=[
AssemblyRole("pastry or crust", [
"pastry", "puff pastry", "pie crust", "shortcrust",
"pie shell", "phyllo", "filo", "biscuit",
]),
],
optional=[
AssemblyRole("protein filling", [
"chicken", "beef", "pork", "lamb", "turkey", "tofu",
"mushroom", "beans", "bean", "lentil", "tuna", "salmon",
]),
AssemblyRole("vegetables", [
"carrot", "pea", "corn", "potato", "onion", "leek",
"broccoli", "spinach", "mushroom", "parsnip", "swede",
]),
AssemblyRole("sauce or binder", [
"gravy", "cream of", "stock", "broth", "cream",
"white sauce", "bechamel", "cheese sauce",
]),
AssemblyRole("seasoning", [
"thyme", "rosemary", "sage", "garlic", "herb",
"mustard", "worcestershire",
]),
AssemblyRole("sweet filling", [
"apple", "berry", "cherry", "pear", "peach",
"rhubarb", "plum", "custard",
]),
],
directions=[
"For pot pie: make a sauce by combining stock or cream-of-something with cooked vegetables and protein.",
"Season generously — fillings need more salt than you think.",
"Pour filling into a baking dish and top with pastry, pressing edges to seal.",
"Cut a few slits in the top to release steam. Brush with egg wash or milk if available.",
"Bake at 400 F (200 C) for 25-35 minutes until pastry is golden brown.",
"For sweet pie: fill unbaked crust with fruit filling, top with second crust or crumble, bake similarly.",
],
notes="Puff pastry from the freezer is the shortcut to impressive pot pies. Thaw in the fridge overnight.",
),
AssemblyTemplate(
id=-13,
title="Pudding / Custard",
required=[
AssemblyRole("dairy or dairy-free milk", [
"milk", "cream", "oat milk", "almond milk",
"soy milk", "coconut milk",
]),
AssemblyRole("thickener or set", [
"egg", "cornstarch", "custard powder", "gelatin",
"agar", "tapioca", "arrowroot",
]),
],
optional=[
AssemblyRole("sweetener", ["sugar", "honey", "maple syrup", "condensed milk"]),
AssemblyRole("flavouring", [
"vanilla", "chocolate", "cocoa", "caramel",
"lemon", "orange", "cinnamon", "nutmeg",
]),
AssemblyRole("starchy base", [
"rice", "bread", "sponge", "cake", "biscuit",
]),
AssemblyRole("fruit", ["raisin", "sultana", "berry", "banana", "apple"]),
],
directions=[
"For stovetop custard: whisk eggs and sugar together, heat milk until steaming.",
"Slowly pour hot milk into egg mixture while whisking constantly (tempering).",
"Return to low heat and stir until mixture coats the back of a spoon.",
"For cornstarch pudding: whisk cornstarch into cold milk first, then heat while stirring.",
"Add flavourings (vanilla, cocoa) once off heat.",
"Pour into dishes and refrigerate at least 2 hours to set.",
],
notes="UK-style pudding is broad — bread pudding, rice pudding, spotted dick, treacle sponge all count.",
),
]
# ---------------------------------------------------------------------------
# Public API
# ---------------------------------------------------------------------------
def match_assembly_templates(
pantry_items: list[str],
pantry_set: set[str],
excluded_ids: list[int],
) -> list[RecipeSuggestion]:
"""Return assembly-dish suggestions whose required roles are all satisfied.
Titles are personalized with specific pantry items (deterministically chosen
from the pantry contents so the same pantry always produces the same title).
Skips templates whose id is in excluded_ids (dismiss/load-more support).
"""
excluded = set(excluded_ids)
seed = _pantry_hash(pantry_set)
results: list[RecipeSuggestion] = []
for tmpl in ASSEMBLY_TEMPLATES:
if tmpl.id in excluded:
continue
# All required roles must be satisfied
if any(not _matches_role(role, pantry_set) for role in tmpl.required):
continue
optional_hit_count = sum(
1 for role in tmpl.optional if _matches_role(role, pantry_set)
)
results.append(RecipeSuggestion(
id=tmpl.id,
title=_personalized_title(tmpl, pantry_set, seed + tmpl.id),
match_count=len(tmpl.required) + optional_hit_count,
element_coverage={},
swap_candidates=[],
missing_ingredients=[],
directions=tmpl.directions,
notes=tmpl.notes,
level=1,
is_wildcard=False,
nutrition=None,
))
# Sort by optional coverage descending — best-matched templates first
results.sort(key=lambda s: s.match_count, reverse=True)
return results

View file

@ -0,0 +1,135 @@
"""
ElementClassifier -- classify pantry items into culinary element tags.
Lookup order:
1. ingredient_profiles table (pre-computed from USDA FDC)
2. Keyword heuristic fallback (for unlisted ingredients)
"""
from __future__ import annotations
import json
from dataclasses import dataclass, field
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from app.db.store import Store
# All valid ingredient-level element labels (Method is recipe-level, not ingredient-level)
ELEMENTS = frozenset({
"Seasoning", "Richness", "Brightness", "Depth",
"Aroma", "Structure", "Texture",
})
_HEURISTIC: list[tuple[list[str], str]] = [
(["vinegar", "lemon", "lime", "citrus", "wine", "yogurt", "kefir",
"buttermilk", "tomato", "tamarind"], "Brightness"),
(["oil", "butter", "cream", "lard", "fat", "avocado", "coconut milk",
"ghee", "shortening", "crisco"], "Richness"),
(["salt", "soy", "miso", "tamari", "fish sauce", "worcestershire",
"anchov", "capers", "olive", "brine"], "Seasoning"),
(["mushroom", "parmesan", "miso", "nutritional yeast", "bouillon",
"broth", "umami", "anchov", "dried tomato", "soy"], "Depth"),
(["garlic", "onion", "shallot", "herb", "basil", "oregano", "thyme",
"rosemary", "spice", "cumin", "coriander", "paprika", "chili",
"ginger", "cinnamon", "pepper", "cilantro", "dill", "fennel",
"cardamom", "turmeric", "smoke"], "Aroma"),
(["flour", "starch", "cornstarch", "arrowroot", "egg", "gelatin",
"agar", "breadcrumb", "panko", "roux"], "Structure"),
(["nut", "seed", "cracker", "crisp", "wafer", "chip", "crouton",
"granola", "tofu", "tempeh"], "Texture"),
]
def _safe_json_list(val) -> list:
if isinstance(val, list):
return val
if isinstance(val, str):
try:
return json.loads(val)
except Exception:
return []
return []
@dataclass(frozen=True)
class IngredientProfile:
name: str
elements: list[str]
fat_pct: float = 0.0
fat_saturated_pct: float = 0.0
moisture_pct: float = 0.0
protein_pct: float = 0.0
starch_pct: float = 0.0
binding_score: int = 0
glutamate_mg: float = 0.0
ph_estimate: float | None = None
flavor_molecule_ids: list[str] = field(default_factory=list)
heat_stable: bool = True
add_timing: str = "any"
acid_type: str | None = None
sodium_mg_per_100g: float = 0.0
is_fermented: bool = False
texture_profile: str = "neutral"
smoke_point_c: float | None = None
is_emulsifier: bool = False
source: str = "heuristic"
class ElementClassifier:
def __init__(self, store: "Store") -> None:
self._store = store
def classify(self, ingredient_name: str) -> IngredientProfile:
"""Return element profile for a single ingredient name."""
name = ingredient_name.lower().strip()
if not name:
return IngredientProfile(name="", elements=[], source="heuristic")
row = self._store._fetch_one(
"SELECT * FROM ingredient_profiles WHERE name = ?", (name,)
)
if row:
return self._row_to_profile(row)
return self._heuristic_profile(name)
def classify_batch(self, names: list[str]) -> list[IngredientProfile]:
return [self.classify(n) for n in names]
def identify_gaps(self, profiles: list[IngredientProfile]) -> list[str]:
"""Return element names that have no coverage in the given profile list."""
covered = set()
for p in profiles:
covered.update(p.elements)
return sorted(ELEMENTS - covered)
def _row_to_profile(self, row: dict) -> IngredientProfile:
return IngredientProfile(
name=row["name"],
elements=_safe_json_list(row.get("elements")),
fat_pct=row.get("fat_pct") or 0.0,
fat_saturated_pct=row.get("fat_saturated_pct") or 0.0,
moisture_pct=row.get("moisture_pct") or 0.0,
protein_pct=row.get("protein_pct") or 0.0,
starch_pct=row.get("starch_pct") or 0.0,
binding_score=row.get("binding_score") or 0,
glutamate_mg=row.get("glutamate_mg") or 0.0,
ph_estimate=row.get("ph_estimate"),
flavor_molecule_ids=_safe_json_list(row.get("flavor_molecule_ids")),
heat_stable=bool(row.get("heat_stable", 1)),
add_timing=row.get("add_timing") or "any",
acid_type=row.get("acid_type"),
sodium_mg_per_100g=row.get("sodium_mg_per_100g") or 0.0,
is_fermented=bool(row.get("is_fermented", 0)),
texture_profile=row.get("texture_profile") or "neutral",
smoke_point_c=row.get("smoke_point_c"),
is_emulsifier=bool(row.get("is_emulsifier", 0)),
source="db",
)
def _heuristic_profile(self, name: str) -> IngredientProfile:
seen: set[str] = set()
elements: list[str] = []
for keywords, element in _HEURISTIC:
if element not in seen and any(kw in name for kw in keywords):
elements.append(element)
seen.add(element)
return IngredientProfile(name=name, elements=elements, source="heuristic")

View file

@ -0,0 +1,75 @@
"""
GroceryLinkBuilder affiliate deeplinks for missing ingredient grocery lists.
Free tier: URL construction only (Amazon Fresh, Walmart, Instacart).
Paid+: live product search API (stubbed future task).
Config (env vars, all optional missing = retailer disabled):
AMAZON_AFFILIATE_TAG e.g. "circuitforge-20"
INSTACART_AFFILIATE_ID e.g. "circuitforge"
WALMART_AFFILIATE_ID e.g. "circuitforge" (Impact affiliate network)
"""
from __future__ import annotations
import os
from urllib.parse import quote_plus
from app.models.schemas.recipe import GroceryLink
def _amazon_link(ingredient: str, tag: str) -> GroceryLink:
q = quote_plus(ingredient)
url = f"https://www.amazon.com/s?k={q}&i=amazonfresh&tag={tag}"
return GroceryLink(ingredient=ingredient, retailer="Amazon Fresh", url=url)
def _walmart_link(ingredient: str, affiliate_id: str) -> GroceryLink:
q = quote_plus(ingredient)
# Walmart Impact affiliate deeplink pattern
url = f"https://goto.walmart.com/c/{affiliate_id}/walmart?u=https://www.walmart.com/search?q={q}"
return GroceryLink(ingredient=ingredient, retailer="Walmart Grocery", url=url)
def _instacart_link(ingredient: str, affiliate_id: str) -> GroceryLink:
q = quote_plus(ingredient)
url = f"https://www.instacart.com/store/s?k={q}&aff={affiliate_id}"
return GroceryLink(ingredient=ingredient, retailer="Instacart", url=url)
class GroceryLinkBuilder:
def __init__(self, tier: str = "free", has_byok: bool = False) -> None:
self._tier = tier
self._has_byok = has_byok
self._amazon_tag = os.environ.get("AMAZON_AFFILIATE_TAG", "")
self._instacart_id = os.environ.get("INSTACART_AFFILIATE_ID", "")
self._walmart_id = os.environ.get("WALMART_AFFILIATE_ID", "")
def build_links(self, ingredient: str) -> list[GroceryLink]:
"""Build affiliate deeplinks for a single ingredient.
Free tier: URL construction only.
Paid+: would call live product search APIs (stubbed).
"""
if not ingredient.strip():
return []
links: list[GroceryLink] = []
if self._amazon_tag:
links.append(_amazon_link(ingredient, self._amazon_tag))
if self._walmart_id:
links.append(_walmart_link(ingredient, self._walmart_id))
if self._instacart_id:
links.append(_instacart_link(ingredient, self._instacart_id))
# Paid+: live API stub (future task)
# if self._tier in ("paid", "premium") and not self._has_byok:
# links.extend(self._search_kroger_api(ingredient))
return links
def build_all(self, ingredients: list[str]) -> list[GroceryLink]:
"""Build links for a list of ingredients."""
links: list[GroceryLink] = []
for ingredient in ingredients:
links.extend(self.build_links(ingredient))
return links

View file

@ -0,0 +1,313 @@
"""LLM-driven recipe generator for Levels 3 and 4."""
from __future__ import annotations
import logging
import os
import re
from contextlib import nullcontext
from typing import TYPE_CHECKING
from openai import OpenAI
if TYPE_CHECKING:
from app.db.store import Store
from app.models.schemas.recipe import RecipeRequest, RecipeResult, RecipeSuggestion
from app.services.recipe.element_classifier import IngredientProfile
from app.services.recipe.style_adapter import StyleAdapter
logger = logging.getLogger(__name__)
def _filter_allergies(pantry_items: list[str], allergies: list[str]) -> list[str]:
"""Return pantry items with allergy matches removed (bidirectional substring)."""
if not allergies:
return list(pantry_items)
return [
item for item in pantry_items
if not any(
allergy.lower() in item.lower() or item.lower() in allergy.lower()
for allergy in allergies
)
]
class LLMRecipeGenerator:
def __init__(self, store: "Store") -> None:
self._store = store
self._style_adapter = StyleAdapter()
def build_level3_prompt(
self,
req: RecipeRequest,
profiles: list[IngredientProfile],
gaps: list[str],
) -> str:
"""Build a structured element-scaffold prompt for Level 3."""
allergy_list = req.allergies
safe_pantry = _filter_allergies(req.pantry_items, allergy_list)
covered_elements: list[str] = []
for profile in profiles:
for element in profile.elements:
if element not in covered_elements:
covered_elements.append(element)
lines: list[str] = [
"You are a creative chef. Generate a recipe using the ingredients below.",
"IMPORTANT: When you use a pantry item, list it in Ingredients using its exact name from the pantry list. Do not add adjectives, quantities, or cooking states (e.g. use 'butter', not 'unsalted butter' or '2 tbsp butter').",
"IMPORTANT: Only use pantry items that make culinary sense for the dish. Do NOT force flavoured/sweetened items (vanilla yoghurt, fruit yoghurt, jam, dessert sauces, flavoured syrups) into savoury dishes. Plain yoghurt, plain cream, and plain dairy are fine in savoury cooking.",
"IMPORTANT: Do not default to the same ingredient repeatedly across dishes. If a pantry item does not genuinely improve this specific dish, leave it out.",
"",
f"Pantry items: {', '.join(safe_pantry)}",
]
if req.constraints:
lines.append(f"Dietary constraints: {', '.join(req.constraints)}")
if allergy_list:
lines.append(f"IMPORTANT — must NOT contain: {', '.join(allergy_list)}")
lines.append("")
lines.append(f"Covered culinary elements: {', '.join(covered_elements) or 'none'}")
if gaps:
lines.append(
f"Missing elements to address: {', '.join(gaps)}. "
"Incorporate ingredients or techniques to fill these gaps."
)
if req.style_id:
template = self._style_adapter.get(req.style_id)
if template:
lines.append(f"Cuisine style: {template.name}")
if template.aromatics:
lines.append(f"Preferred aromatics: {', '.join(template.aromatics[:4])}")
lines += [
"",
"Reply using EXACTLY this plain-text format — no markdown, no bold, no extra commentary:",
"Title: <name of the dish>",
"Ingredients: <comma-separated list>",
"Directions:",
"1. <first step>",
"2. <second step>",
"3. <continue for each step>",
"Notes: <optional tips>",
]
return "\n".join(lines)
def build_level4_prompt(
self,
req: RecipeRequest,
) -> str:
"""Build a minimal wildcard prompt for Level 4."""
allergy_list = req.allergies
safe_pantry = _filter_allergies(req.pantry_items, allergy_list)
lines: list[str] = [
"Surprise me with a creative, unexpected recipe.",
"Only use ingredients that make culinary sense together. Do not force flavoured/sweetened items (vanilla yoghurt, flavoured syrups, jam) into savoury dishes.",
f"Ingredients available: {', '.join(safe_pantry)}",
]
if req.constraints:
lines.append(f"Constraints: {', '.join(req.constraints)}")
if allergy_list:
lines.append(f"Must NOT contain: {', '.join(allergy_list)}")
lines += [
"Treat any mystery ingredient as a wildcard — use your imagination.",
"Reply using EXACTLY this plain-text format — no markdown, no bold:",
"Title: <name of the dish>",
"Ingredients: <comma-separated list>",
"Directions:",
"1. <first step>",
"2. <second step>",
"Notes: <optional tips>",
]
return "\n".join(lines)
_MODEL_CANDIDATES: list[str] = ["Ouro-2.6B-Thinking", "Ouro-1.4B"]
def _get_llm_context(self):
"""Return a sync context manager that yields an Allocation or None.
When CF_ORCH_URL is set, uses CFOrchClient to acquire a vLLM allocation
(which handles service lifecycle and VRAM). Falls back to nullcontext(None)
when the env var is absent or CFOrchClient raises on construction.
"""
cf_orch_url = os.environ.get("CF_ORCH_URL")
if cf_orch_url:
try:
from circuitforge_core.resources import CFOrchClient
client = CFOrchClient(cf_orch_url)
return client.allocate(
service="vllm",
model_candidates=self._MODEL_CANDIDATES,
ttl_s=300.0,
caller="kiwi-recipe",
)
except Exception as exc:
logger.debug("CFOrchClient init failed, falling back to direct URL: %s", exc)
return nullcontext(None)
def _call_llm(self, prompt: str) -> str:
"""Call the LLM, using CFOrchClient allocation when CF_ORCH_URL is set.
With CF_ORCH_URL set: acquires a vLLM allocation via CFOrchClient and
calls the OpenAI-compatible API directly against the allocated service URL.
Without CF_ORCH_URL: falls back to LLMRouter using its configured backends.
"""
try:
with self._get_llm_context() as alloc:
if alloc is not None:
base_url = alloc.url.rstrip("/") + "/v1"
client = OpenAI(base_url=base_url, api_key="any")
model = alloc.model or "__auto__"
if model == "__auto__":
model = client.models.list().data[0].id
resp = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
)
return resp.choices[0].message.content or ""
else:
from circuitforge_core.llm.router import LLMRouter
router = LLMRouter()
return router.complete(prompt)
except Exception as exc:
logger.error("LLM call failed: %s", exc)
return ""
# Strips markdown bold/italic markers so "**Directions:**" parses like "Directions:"
_MD_BOLD = re.compile(r"\*{1,2}([^*]+)\*{1,2}")
def _strip_md(self, text: str) -> str:
return self._MD_BOLD.sub(r"\1", text).strip()
def _parse_response(self, response: str) -> dict[str, str | list[str]]:
"""Parse LLM response text into structured recipe fields.
Handles both plain-text and markdown-formatted responses. Directions are
preserved as newline-separated text so the caller can split on step numbers.
"""
result: dict[str, str | list[str]] = {
"title": "",
"ingredients": [],
"directions": "",
"notes": "",
}
current_key: str | None = None
buffer: list[str] = []
def _flush(key: str | None, buf: list[str]) -> None:
if key is None or not buf:
return
if key == "directions":
result["directions"] = "\n".join(buf)
elif key == "ingredients":
text = " ".join(buf)
result["ingredients"] = [i.strip() for i in text.split(",") if i.strip()]
else:
result[key] = " ".join(buf).strip()
for raw_line in response.splitlines():
line = self._strip_md(raw_line)
lower = line.lower()
if lower.startswith("title:"):
_flush(current_key, buffer)
current_key, buffer = "title", [line.split(":", 1)[1].strip()]
elif lower.startswith("ingredients:"):
_flush(current_key, buffer)
current_key, buffer = "ingredients", [line.split(":", 1)[1].strip()]
elif lower.startswith("directions:"):
_flush(current_key, buffer)
rest = line.split(":", 1)[1].strip()
current_key, buffer = "directions", ([rest] if rest else [])
elif lower.startswith("notes:"):
_flush(current_key, buffer)
current_key, buffer = "notes", [line.split(":", 1)[1].strip()]
elif current_key and line.strip():
buffer.append(line.strip())
elif current_key is None and line.strip() and ":" not in line:
# Before any section header: a 2-10 word colon-free line is the dish name
words = line.split()
if 2 <= len(words) <= 10 and not result["title"]:
result["title"] = line.strip()
_flush(current_key, buffer)
return result
def generate(
self,
req: RecipeRequest,
profiles: list[IngredientProfile],
gaps: list[str],
) -> RecipeResult:
"""Generate a recipe via LLM and return a RecipeResult."""
if req.level == 4:
prompt = self.build_level4_prompt(req)
else:
prompt = self.build_level3_prompt(req, profiles, gaps)
response = self._call_llm(prompt)
if not response:
return RecipeResult(suggestions=[], element_gaps=gaps)
parsed = self._parse_response(response)
raw_directions = parsed.get("directions", "")
if isinstance(raw_directions, str):
# Split on newlines; strip leading step numbers ("1.", "2.", "- ", "* ")
_step_prefix = re.compile(r"^\s*(?:\d+[.)]\s*|[-*]\s+)")
directions_list = [
_step_prefix.sub("", s).strip()
for s in raw_directions.splitlines()
if s.strip()
]
else:
directions_list = list(raw_directions)
raw_notes = parsed.get("notes", "")
notes_str: str = raw_notes if isinstance(raw_notes, str) else ""
all_ingredients: list[str] = list(parsed.get("ingredients", []))
pantry_set = {item.lower() for item in (req.pantry_items or [])}
# Strip leading quantities/units (e.g. "2 cups rice" → "rice") before
# checking against pantry, since LLMs return formatted ingredient strings.
_qty_re = re.compile(
r"^\s*[\d½¼¾⅓⅔]+[\s/\-]*" # leading digits or fractions
r"(?:cup|cups|tbsp|tsp|tablespoon|teaspoon|oz|lb|lbs|g|kg|"
r"can|cans|clove|cloves|bunch|package|pkg|slice|slices|"
r"piece|pieces|pinch|dash|handful|head|heads|large|small|medium"
r")s?\b[,\s]*",
re.IGNORECASE,
)
missing = []
for ing in all_ingredients:
bare = _qty_re.sub("", ing).strip().lower()
if bare not in pantry_set and ing.lower() not in pantry_set:
missing.append(bare or ing)
suggestion = RecipeSuggestion(
id=0,
title=parsed.get("title") or "LLM Recipe",
match_count=len(req.pantry_items),
element_coverage={},
missing_ingredients=missing,
directions=directions_list,
notes=notes_str,
level=req.level,
is_wildcard=(req.level == 4),
)
return RecipeResult(
suggestions=[suggestion],
element_gaps=gaps,
)

View file

@ -0,0 +1,583 @@
"""
RecipeEngine orchestrates the four creativity levels.
Level 1: corpus lookup ranked by ingredient match + expiry urgency
Level 2: Level 1 + deterministic substitution swaps
Level 3: element scaffold LLM constrained prompt (see llm_recipe.py)
Level 4: wildcard LLM (see llm_recipe.py)
Amendments:
- max_missing: filter to recipes missing N pantry items
- hard_day_mode: filter to easy-method recipes only
- grocery_list: aggregated missing ingredients across suggestions
"""
from __future__ import annotations
import json
import re
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from app.db.store import Store
from app.models.schemas.recipe import GroceryLink, NutritionPanel, RecipeRequest, RecipeResult, RecipeSuggestion, SwapCandidate
from app.services.recipe.assembly_recipes import match_assembly_templates
from app.services.recipe.element_classifier import ElementClassifier
from app.services.recipe.grocery_links import GroceryLinkBuilder
from app.services.recipe.substitution_engine import SubstitutionEngine
_LEFTOVER_DAILY_MAX_FREE = 5
# Words that carry no ingredient-identity signal — stripped before overlap scoring
_SWAP_STOPWORDS = frozenset({
"a", "an", "the", "of", "in", "for", "with", "and", "or",
"to", "from", "at", "by", "as", "on",
})
# Maps product-label substrings to recipe-corpus canonical terms.
# Kept in sync with Store._FTS_SYNONYMS — both must agree on canonical names.
# Used to expand pantry_set so single-word recipe ingredients can match
# multi-word product names (e.g. "hamburger" satisfied by "burger patties").
_PANTRY_LABEL_SYNONYMS: dict[str, str] = {
"burger patt": "hamburger",
"beef patt": "hamburger",
"ground beef": "hamburger",
"ground chuck": "hamburger",
"ground round": "hamburger",
"mince": "hamburger",
"veggie burger": "hamburger",
"beyond burger": "hamburger",
"impossible burger": "hamburger",
"plant burger": "hamburger",
"chicken patt": "chicken patty",
"kielbasa": "sausage",
"bratwurst": "sausage",
"frankfurter": "hotdog",
"wiener": "hotdog",
"chicken breast": "chicken",
"chicken thigh": "chicken",
"chicken drumstick": "chicken",
"chicken wing": "chicken",
"rotisserie chicken": "chicken",
"chicken tender": "chicken",
"chicken strip": "chicken",
"chicken piece": "chicken",
"fake chicken": "chicken",
"plant chicken": "chicken",
"vegan chicken": "chicken",
"daring": "chicken",
"gardein chick": "chicken",
"quorn chick": "chicken",
"chick'n": "chicken",
"chikn": "chicken",
"not-chicken": "chicken",
"no-chicken": "chicken",
# Plant-based beef subs → broad "beef" (strips ≠ ground; texture matters)
"not-beef": "beef",
"no-beef": "beef",
"plant beef": "beef",
"vegan beef": "beef",
# Plant-based pork subs
"not-pork": "pork",
"no-pork": "pork",
"plant pork": "pork",
"vegan pork": "pork",
"omnipork": "pork",
"omni pork": "pork",
# Generic alt-meat catch-alls → broad "beef"
"fake meat": "beef",
"plant meat": "beef",
"vegan meat": "beef",
"meat-free": "beef",
"meatless": "beef",
"pork chop": "pork",
"pork loin": "pork",
"pork tenderloin": "pork",
"marinara": "tomato sauce",
"pasta sauce": "tomato sauce",
"spaghetti sauce": "tomato sauce",
"pizza sauce": "tomato sauce",
"macaroni": "pasta",
"noodles": "pasta",
"spaghetti": "pasta",
"penne": "pasta",
"fettuccine": "pasta",
"rigatoni": "pasta",
"linguine": "pasta",
"rotini": "pasta",
"farfalle": "pasta",
"shredded cheese": "cheese",
"sliced cheese": "cheese",
"american cheese": "cheese",
"cheddar": "cheese",
"mozzarella": "cheese",
"heavy cream": "cream",
"whipping cream": "cream",
"half and half": "cream",
"burger bun": "buns",
"hamburger bun": "buns",
"hot dog bun": "buns",
"bread roll": "buns",
"dinner roll": "buns",
# Tortillas / wraps — assembly dishes (burritos, tacos, quesadillas)
"flour tortilla": "tortillas",
"corn tortilla": "tortillas",
"tortilla wrap": "tortillas",
"soft taco shell": "tortillas",
"taco shell": "taco shells",
"pita bread": "pita",
"flatbread": "flatbread",
# Canned beans — extremely interchangeable in assembly dishes
"black bean": "beans",
"pinto bean": "beans",
"kidney bean": "beans",
"refried bean": "beans",
"chickpea": "beans",
"garbanzo": "beans",
# Rice variants
"white rice": "rice",
"brown rice": "rice",
"jasmine rice": "rice",
"basmati rice": "rice",
"instant rice": "rice",
"microwavable rice": "rice",
# Salsa / hot sauce
"hot sauce": "salsa",
"taco sauce": "salsa",
"enchilada sauce": "salsa",
# Sour cream / Greek yogurt — functional substitutes
"greek yogurt": "sour cream",
# Frozen/prepackaged meal token extraction — handled by individual token
# fallback in _normalize_for_fts; these are the most common single-serve meal types
"lean cuisine": "casserole",
"stouffer": "casserole",
"healthy choice": "casserole",
"marie callender": "casserole",
}
# Matches leading quantity/unit prefixes in recipe ingredient strings,
# e.g. "2 cups flour" → "flour", "1/2 c. ketchup" → "ketchup",
# "3 oz. butter" → "butter"
_QUANTITY_PREFIX = re.compile(
r"^\s*(?:\d+(?:[./]\d+)?\s*)?" # optional leading number (1, 1/2, 2.5)
r"(?:to\s+\d+\s*)?" # optional "to N" range
r"(?:c\.|cup|cups|tbsp|tsp|oz|lb|lbs|g|kg|ml|l|"
r"can|cans|pkg|pkg\.|package|slice|slices|clove|cloves|"
r"small|medium|large|bunch|head|piece|pieces|"
r"pinch|dash|handful|sprig|sprigs)\s*\b",
re.IGNORECASE,
)
# Preparation-state words that modify an ingredient without changing what it is.
# Stripped from both ends so "melted butter", "butter, melted" both → "butter".
_PREP_STATES = re.compile(
r"\b(melted|softened|cold|warm|hot|room.temperature|"
r"diced|sliced|chopped|minced|grated|shredded|shredded|beaten|whipped|"
r"cooked|raw|frozen|canned|dried|dehydrated|marinated|seasoned|"
r"roasted|toasted|ground|crushed|pressed|peeled|seeded|pitted|"
r"boneless|skinless|trimmed|halved|quartered|julienned|"
r"thinly|finely|roughly|coarsely|freshly|lightly|"
r"packed|heaping|level|sifted|divided|optional)\b",
re.IGNORECASE,
)
# Trailing comma + optional prep state (e.g. "butter, melted")
_TRAILING_PREP = re.compile(r",\s*\w+$")
# Maps prep-state words to human-readable instruction templates.
# {ingredient} is replaced with the actual ingredient name.
# None means the state is passive (frozen, canned) — no note needed.
_PREP_INSTRUCTIONS: dict[str, str | None] = {
"melted": "Melt the {ingredient} before starting.",
"softened": "Let the {ingredient} soften to room temperature before using.",
"room temperature": "Bring the {ingredient} to room temperature before using.",
"beaten": "Beat the {ingredient} lightly before adding.",
"whipped": "Whip the {ingredient} until soft peaks form.",
"sifted": "Sift the {ingredient} before measuring.",
"toasted": "Toast the {ingredient} in a dry pan until fragrant.",
"roasted": "Roast the {ingredient} before using.",
"pressed": "Press the {ingredient} to remove excess moisture.",
"diced": "Dice the {ingredient} into small pieces.",
"sliced": "Slice the {ingredient} thinly.",
"chopped": "Chop the {ingredient} roughly.",
"minced": "Mince the {ingredient} finely.",
"grated": "Grate the {ingredient}.",
"shredded": "Shred the {ingredient}.",
"ground": "Grind the {ingredient}.",
"crushed": "Crush the {ingredient}.",
"peeled": "Peel the {ingredient} before use.",
"seeded": "Remove seeds from the {ingredient}.",
"pitted": "Pit the {ingredient} before use.",
"trimmed": "Trim any excess from the {ingredient}.",
"julienned": "Cut the {ingredient} into thin matchstick strips.",
"cooked": "Pre-cook the {ingredient} before adding.",
# Passive states — ingredient is used as-is, no prep note needed
"cold": None,
"warm": None,
"hot": None,
"raw": None,
"frozen": None,
"canned": None,
"dried": None,
"dehydrated": None,
"marinated": None,
"seasoned": None,
"boneless": None,
"skinless": None,
"divided": None,
"optional": None,
"fresh": None,
"freshly": None,
"thinly": None,
"finely": None,
"roughly": None,
"coarsely": None,
"lightly": None,
"packed": None,
"heaping": None,
"level": None,
}
# Finds the first actionable prep state in an ingredient string
_PREP_STATE_SEARCH = re.compile(
r"\b(" + "|".join(re.escape(k) for k in _PREP_INSTRUCTIONS) + r")\b",
re.IGNORECASE,
)
def _strip_quantity(ingredient: str) -> str:
"""Remove leading quantity/unit and preparation-state words from a recipe ingredient.
e.g. "2 tbsp melted butter" "butter"
"butter, melted" "butter"
"1/4 cup flour, sifted" "flour"
"""
stripped = _QUANTITY_PREFIX.sub("", ingredient).strip()
# Strip any remaining leading number (e.g. "3 eggs" → "eggs")
stripped = re.sub(r"^\d+\s+", "", stripped)
# Strip trailing ", prep_state"
stripped = _TRAILING_PREP.sub("", stripped).strip()
# Strip prep-state words (may be leading or embedded)
stripped = _PREP_STATES.sub("", stripped).strip()
# Clean up any double spaces left behind
stripped = re.sub(r"\s{2,}", " ", stripped).strip()
return stripped or ingredient
def _prep_note_for(ingredient: str) -> str | None:
"""Return a human-readable prep instruction for this ingredient string, or None.
e.g. "2 tbsp melted butter" "Melt the butter before starting."
"onion, diced" "Dice the onion into small pieces."
"frozen peas" None (passive state, no action needed)
"""
match = _PREP_STATE_SEARCH.search(ingredient)
if not match:
return None
state = match.group(1).lower()
template = _PREP_INSTRUCTIONS.get(state)
if not template:
return None
# Use the stripped ingredient name as the subject
ingredient_name = _strip_quantity(ingredient)
return template.format(ingredient=ingredient_name)
def _expand_pantry_set(pantry_items: list[str]) -> set[str]:
"""Return pantry_set expanded with canonical recipe-corpus synonyms.
For each pantry item, checks _PANTRY_LABEL_SYNONYMS for substring matches
and adds the canonical form. This lets single-word recipe ingredients
("hamburger", "chicken") match product-label pantry entries
("burger patties", "rotisserie chicken").
"""
expanded: set[str] = set()
for item in pantry_items:
lower = item.lower().strip()
expanded.add(lower)
for pattern, canonical in _PANTRY_LABEL_SYNONYMS.items():
if pattern in lower:
expanded.add(canonical)
return expanded
def _ingredient_in_pantry(ingredient: str, pantry_set: set[str]) -> bool:
"""Return True if the recipe ingredient is satisfied by the pantry.
Checks three layers in order:
1. Exact match after quantity stripping
2. Synonym lookup: ingredient canonical in pantry_set
(handles "ground beef" matched by "burger patties" via shared canonical)
3. Token subset: all content tokens of the ingredient appear in pantry
(handles "diced onions" when "onions" is in pantry)
"""
clean = _strip_quantity(ingredient).lower()
if clean in pantry_set:
return True
# Check if this recipe ingredient maps to a canonical that's in pantry
for pattern, canonical in _PANTRY_LABEL_SYNONYMS.items():
if pattern in clean and canonical in pantry_set:
return True
# Single-token ingredient whose token appears in pantry (e.g. "ketchup" in "c. ketchup")
tokens = [t for t in clean.split() if t not in _SWAP_STOPWORDS and len(t) > 2]
if tokens and all(t in pantry_set for t in tokens):
return True
return False
def _content_tokens(text: str) -> frozenset[str]:
return frozenset(
w for w in text.lower().split()
if w not in _SWAP_STOPWORDS and len(w) > 1
)
def _pantry_creative_swap(required: str, pantry_items: set[str]) -> str | None:
"""Return a pantry item that's a plausible creative substitute, or None.
Requires 2 shared content tokens AND 50% bidirectional overlap so that
single-word differences (cream-of-mushroom vs cream-of-potato) qualify while
single-word ingredients (butter, flour) don't accidentally match supersets
(peanut butter, bread flour).
"""
req_tokens = _content_tokens(required)
if len(req_tokens) < 2:
return None # single-word ingredients must already be in pantry_set
best: str | None = None
best_score = 0.0
for item in pantry_items:
if item.lower() == required.lower():
continue
pan_tokens = _content_tokens(item)
if not pan_tokens:
continue
overlap = len(req_tokens & pan_tokens)
if overlap < 2:
continue
score = min(overlap / len(req_tokens), overlap / len(pan_tokens))
if score >= 0.5 and score > best_score:
best_score = score
best = item
return best
# Method complexity classification patterns
_EASY_METHODS = re.compile(
r"\b(microwave|mix|stir|blend|toast|assemble|heat)\b", re.IGNORECASE
)
_INVOLVED_METHODS = re.compile(
r"\b(braise|roast|knead|deep.?fry|fry|sauté|saute|bake|boil)\b", re.IGNORECASE
)
def _classify_method_complexity(
directions: list[str],
available_equipment: list[str] | None = None,
) -> str:
"""Classify recipe method complexity from direction strings.
Returns 'easy', 'moderate', or 'involved'.
available_equipment can expand the easy set (e.g. ['toaster', 'air fryer']).
"""
text = " ".join(directions).lower()
equipment_set = {e.lower() for e in (available_equipment or [])}
if _INVOLVED_METHODS.search(text):
return "involved"
if _EASY_METHODS.search(text):
return "easy"
# Check equipment-specific easy methods
for equip in equipment_set:
if equip in text:
return "easy"
return "moderate"
class RecipeEngine:
def __init__(self, store: "Store") -> None:
self._store = store
self._classifier = ElementClassifier(store)
self._substitution = SubstitutionEngine(store)
def suggest(
self,
req: RecipeRequest,
available_equipment: list[str] | None = None,
) -> RecipeResult:
# Load cooking equipment from user settings when hard_day_mode is active
if req.hard_day_mode and available_equipment is None:
equipment_json = self._store.get_setting("cooking_equipment")
if equipment_json:
try:
available_equipment = json.loads(equipment_json)
except (json.JSONDecodeError, TypeError):
available_equipment = []
else:
available_equipment = []
# Rate-limit leftover mode for free tier
if req.expiry_first and req.tier == "free":
allowed, count = self._store.check_and_increment_rate_limit(
"leftover_mode", _LEFTOVER_DAILY_MAX_FREE
)
if not allowed:
return RecipeResult(
suggestions=[], element_gaps=[], rate_limited=True, rate_limit_count=count
)
profiles = self._classifier.classify_batch(req.pantry_items)
gaps = self._classifier.identify_gaps(profiles)
pantry_set = _expand_pantry_set(req.pantry_items)
if req.level >= 3:
from app.services.recipe.llm_recipe import LLMRecipeGenerator
gen = LLMRecipeGenerator(self._store)
return gen.generate(req, profiles, gaps)
# Level 1 & 2: deterministic path
nf = req.nutrition_filters
rows = self._store.search_recipes_by_ingredients(
req.pantry_items,
limit=20,
category=req.category or None,
max_calories=nf.max_calories,
max_sugar_g=nf.max_sugar_g,
max_carbs_g=nf.max_carbs_g,
max_sodium_mg=nf.max_sodium_mg,
excluded_ids=req.excluded_ids or [],
)
suggestions = []
for row in rows:
ingredient_names: list[str] = row.get("ingredient_names") or []
if isinstance(ingredient_names, str):
try:
ingredient_names = json.loads(ingredient_names)
except Exception:
ingredient_names = []
# Compute missing ingredients, detecting pantry coverage first.
# When covered, collect any prep-state annotations (e.g. "melted butter"
# → note "Melt the butter before starting.") to surface separately.
swap_candidates: list[SwapCandidate] = []
missing: list[str] = []
prep_note_set: set[str] = set()
for n in ingredient_names:
if _ingredient_in_pantry(n, pantry_set):
note = _prep_note_for(n)
if note:
prep_note_set.add(note)
continue
swap_item = _pantry_creative_swap(n, pantry_set)
if swap_item:
swap_candidates.append(SwapCandidate(
original_name=n,
substitute_name=swap_item,
constraint_label="pantry_swap",
explanation=f"You have {swap_item} — use it in place of {n}.",
compensation_hints=[],
))
else:
missing.append(n)
# Filter by max_missing (pantry swaps don't count as missing)
if req.max_missing is not None and len(missing) > req.max_missing:
continue
# Filter by hard_day_mode
if req.hard_day_mode:
directions: list[str] = row.get("directions") or []
if isinstance(directions, str):
try:
directions = json.loads(directions)
except Exception:
directions = [directions]
complexity = _classify_method_complexity(directions, available_equipment)
if complexity == "involved":
continue
# Level 2: also add dietary constraint swaps from substitution_pairs
if req.level == 2 and req.constraints:
for ing in ingredient_names:
for constraint in req.constraints:
swaps = self._substitution.find_substitutes(ing, constraint)
for swap in swaps[:1]:
swap_candidates.append(SwapCandidate(
original_name=swap.original_name,
substitute_name=swap.substitute_name,
constraint_label=swap.constraint_label,
explanation=swap.explanation,
compensation_hints=swap.compensation_hints,
))
coverage_raw = row.get("element_coverage") or {}
if isinstance(coverage_raw, str):
try:
coverage_raw = json.loads(coverage_raw)
except Exception:
coverage_raw = {}
servings = row.get("servings") or None
nutrition = NutritionPanel(
calories=row.get("calories"),
fat_g=row.get("fat_g"),
protein_g=row.get("protein_g"),
carbs_g=row.get("carbs_g"),
fiber_g=row.get("fiber_g"),
sugar_g=row.get("sugar_g"),
sodium_mg=row.get("sodium_mg"),
servings=servings,
estimated=bool(row.get("nutrition_estimated", 0)),
)
has_nutrition = any(
v is not None
for v in (nutrition.calories, nutrition.sugar_g, nutrition.carbs_g)
)
suggestions.append(RecipeSuggestion(
id=row["id"],
title=row["title"],
match_count=int(row.get("match_count") or 0),
element_coverage=coverage_raw,
swap_candidates=swap_candidates,
missing_ingredients=missing,
prep_notes=sorted(prep_note_set),
level=req.level,
nutrition=nutrition if has_nutrition else None,
))
# Prepend assembly-dish templates (burrito, stir fry, omelette, etc.)
# These fire regardless of corpus coverage — any pantry can make a burrito.
assembly = match_assembly_templates(
pantry_items=req.pantry_items,
pantry_set=pantry_set,
excluded_ids=req.excluded_ids or [],
)
suggestions = assembly + suggestions
# Build grocery list — deduplicated union of all missing ingredients
seen: set[str] = set()
grocery_list: list[str] = []
for s in suggestions:
for item in s.missing_ingredients:
if item not in seen:
grocery_list.append(item)
seen.add(item)
# Build grocery links — affiliate deeplinks for each missing ingredient
link_builder = GroceryLinkBuilder(tier=req.tier, has_byok=req.has_byok)
grocery_links = link_builder.build_all(grocery_list)
return RecipeResult(
suggestions=suggestions,
element_gaps=gaps,
grocery_list=grocery_list,
grocery_links=grocery_links,
)

View file

@ -0,0 +1,60 @@
"""
StapleLibrary -- bulk-preparable base component reference data.
Loaded from YAML files in app/staples/.
"""
from __future__ import annotations
from dataclasses import dataclass
from pathlib import Path
from typing import Any
import yaml
_STAPLES_DIR = Path(__file__).parents[2] / "staples"
@dataclass(frozen=True)
class StapleEntry:
slug: str
name: str
description: str
dietary_labels: list[str]
base_ingredients: list[str]
base_method: str
base_time_minutes: int
yield_formats: dict[str, Any]
compatible_styles: list[str]
class StapleLibrary:
def __init__(self, staples_dir: Path = _STAPLES_DIR) -> None:
self._staples: dict[str, StapleEntry] = {}
for yaml_path in sorted(staples_dir.glob("*.yaml")):
entry = self._load(yaml_path)
self._staples[entry.slug] = entry
def get(self, slug: str) -> StapleEntry | None:
return self._staples.get(slug)
def list_all(self) -> list[StapleEntry]:
return list(self._staples.values())
def filter_by_dietary(self, label: str) -> list[StapleEntry]:
return [s for s in self._staples.values() if label in s.dietary_labels]
def _load(self, path: Path) -> StapleEntry:
try:
data = yaml.safe_load(path.read_text())
return StapleEntry(
slug=data["slug"],
name=data["name"],
description=data.get("description", ""),
dietary_labels=data.get("dietary_labels", []),
base_ingredients=data.get("base_ingredients", []),
base_method=data.get("base_method", ""),
base_time_minutes=int(data.get("base_time_minutes", 0)),
yield_formats=data.get("yield_formats", {}),
compatible_styles=data.get("compatible_styles", []),
)
except (KeyError, yaml.YAMLError) as exc:
raise ValueError(f"Failed to load staple from {path}: {exc}") from exc

View file

@ -0,0 +1,132 @@
"""
StyleAdapter cuisine-mode overlay that biases element dimensions.
YAML templates in app/styles/.
"""
from __future__ import annotations
from dataclasses import dataclass
from pathlib import Path
import yaml
_STYLES_DIR = Path(__file__).parents[2] / "styles"
@dataclass(frozen=True)
class StyleTemplate:
style_id: str
name: str
aromatics: tuple[str, ...]
depth_sources: tuple[str, ...]
brightness_sources: tuple[str, ...]
method_bias: dict[str, float]
structure_forms: tuple[str, ...]
seasoning_bias: str
finishing_fat_str: str
def bias_aroma_selection(self, pantry_items: list[str]) -> list[str]:
"""Return aromatics present in pantry (bidirectional substring match)."""
result = []
for aroma in self.aromatics:
for item in pantry_items:
if aroma.lower() in item.lower() or item.lower() in aroma.lower():
result.append(aroma)
break
return result
def preferred_depth_sources(self, pantry_items: list[str]) -> list[str]:
"""Return depth_sources present in pantry."""
result = []
for src in self.depth_sources:
for item in pantry_items:
if src.lower() in item.lower() or item.lower() in src.lower():
result.append(src)
break
return result
def preferred_structure_forms(self, pantry_items: list[str]) -> list[str]:
"""Return structure_forms present in pantry."""
result = []
for form in self.structure_forms:
for item in pantry_items:
if form.lower() in item.lower() or item.lower() in form.lower():
result.append(form)
break
return result
def method_weights(self) -> dict[str, float]:
"""Return method bias weights."""
return dict(self.method_bias)
def seasoning_vector(self) -> str:
"""Return seasoning bias."""
return self.seasoning_bias
def finishing_fat(self) -> str:
"""Return finishing fat."""
return self.finishing_fat_str
class StyleAdapter:
def __init__(self, styles_dir: Path = _STYLES_DIR) -> None:
self._styles: dict[str, StyleTemplate] = {}
for yaml_path in sorted(styles_dir.glob("*.yaml")):
try:
template = self._load(yaml_path)
self._styles[template.style_id] = template
except (KeyError, yaml.YAMLError, TypeError) as exc:
raise ValueError(f"Failed to load style from {yaml_path}: {exc}") from exc
@property
def styles(self) -> dict[str, StyleTemplate]:
return self._styles
def get(self, style_id: str) -> StyleTemplate | None:
return self._styles.get(style_id)
def list_all(self) -> list[StyleTemplate]:
return list(self._styles.values())
def bias_aroma_selection(self, style_id: str, pantry_items: list[str]) -> list[str]:
"""Return pantry items that match the style's preferred aromatics.
Falls back to all pantry items if no match found."""
template = self._styles.get(style_id)
if not template:
return pantry_items
matched = [
item for item in pantry_items
if any(
aroma.lower() in item.lower() or item.lower() in aroma.lower()
for aroma in template.aromatics
)
]
return matched if matched else pantry_items
def apply(self, style_id: str, pantry_items: list[str]) -> dict:
"""Return style-biased ingredient guidance for each element dimension."""
template = self._styles.get(style_id)
if not template:
return {}
return {
"aroma_candidates": self.bias_aroma_selection(style_id, pantry_items),
"depth_suggestions": list(template.depth_sources),
"brightness_suggestions": list(template.brightness_sources),
"method_bias": template.method_bias,
"structure_forms": list(template.structure_forms),
"seasoning_bias": template.seasoning_bias,
"finishing_fat": template.finishing_fat_str,
}
def _load(self, path: Path) -> StyleTemplate:
data = yaml.safe_load(path.read_text())
return StyleTemplate(
style_id=data["style_id"],
name=data["name"],
aromatics=tuple(data.get("aromatics", [])),
depth_sources=tuple(data.get("depth_sources", [])),
brightness_sources=tuple(data.get("brightness_sources", [])),
method_bias=dict(data.get("method_bias", {})),
structure_forms=tuple(data.get("structure_forms", [])),
seasoning_bias=data.get("seasoning_bias", ""),
finishing_fat_str=data.get("finishing_fat", ""),
)

View file

@ -0,0 +1,126 @@
"""
SubstitutionEngine deterministic ingredient swap candidates with compensation hints.
Powered by:
- substitution_pairs table (derived from lishuyang/recipepairs)
- ingredient_profiles functional metadata (USDA FDC)
"""
from __future__ import annotations
import json
from dataclasses import dataclass, field
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from app.db.store import Store
# Compensation threshold — if |delta| exceeds this, surface a hint
_FAT_THRESHOLD = 5.0 # grams per 100g
_GLUTAMATE_THRESHOLD = 1.0 # mg per 100g
_MOISTURE_THRESHOLD = 15.0 # grams per 100g
_RICHNESS_COMPENSATORS = ["olive oil", "coconut oil", "butter", "shortening", "full-fat coconut milk"]
_DEPTH_COMPENSATORS = ["nutritional yeast", "soy sauce", "miso", "mushroom powder",
"better than bouillon not-beef", "smoked paprika"]
_MOISTURE_BINDERS = ["cornstarch", "flour", "arrowroot", "breadcrumbs"]
@dataclass(frozen=True)
class CompensationHint:
ingredient: str
reason: str
element: str
@dataclass(frozen=True)
class SubstitutionSwap:
original_name: str
substitute_name: str
constraint_label: str
fat_delta: float
moisture_delta: float
glutamate_delta: float
protein_delta: float
occurrence_count: int
compensation_hints: list[dict] = field(default_factory=list)
explanation: str = ""
class SubstitutionEngine:
def __init__(self, store: "Store") -> None:
self._store = store
def find_substitutes(
self,
ingredient_name: str,
constraint: str,
) -> list[SubstitutionSwap]:
rows = self._store._fetch_all("""
SELECT substitute_name, constraint_label,
fat_delta, moisture_delta, glutamate_delta, protein_delta,
occurrence_count, compensation_hints
FROM substitution_pairs
WHERE original_name = ? AND constraint_label = ?
ORDER BY occurrence_count DESC
""", (ingredient_name.lower(), constraint))
return [self._row_to_swap(ingredient_name, row) for row in rows]
def _row_to_swap(self, original: str, row: dict) -> SubstitutionSwap:
hints = self._build_hints(row)
explanation = self._build_explanation(original, row, hints)
return SubstitutionSwap(
original_name=original,
substitute_name=row["substitute_name"],
constraint_label=row["constraint_label"],
fat_delta=row.get("fat_delta") or 0.0,
moisture_delta=row.get("moisture_delta") or 0.0,
glutamate_delta=row.get("glutamate_delta") or 0.0,
protein_delta=row.get("protein_delta") or 0.0,
occurrence_count=row.get("occurrence_count") or 1,
compensation_hints=[{"ingredient": h.ingredient, "reason": h.reason, "element": h.element} for h in hints],
explanation=explanation,
)
def _build_hints(self, row: dict) -> list[CompensationHint]:
hints = []
fat_delta = row.get("fat_delta") or 0.0
glutamate_delta = row.get("glutamate_delta") or 0.0
moisture_delta = row.get("moisture_delta") or 0.0
if fat_delta < -_FAT_THRESHOLD:
missing = abs(fat_delta)
sugg = _RICHNESS_COMPENSATORS[0]
hints.append(CompensationHint(
ingredient=sugg,
reason=f"substitute has ~{missing:.0f}g/100g less fat — add {sugg} to restore Richness",
element="Richness",
))
if glutamate_delta < -_GLUTAMATE_THRESHOLD:
sugg = _DEPTH_COMPENSATORS[0]
hints.append(CompensationHint(
ingredient=sugg,
reason=f"substitute is lower in umami — add {sugg} to restore Depth",
element="Depth",
))
if moisture_delta > _MOISTURE_THRESHOLD:
sugg = _MOISTURE_BINDERS[0]
hints.append(CompensationHint(
ingredient=sugg,
reason=f"substitute adds ~{moisture_delta:.0f}g/100g more moisture — add {sugg} to maintain Structure",
element="Structure",
))
return hints
def _build_explanation(
self, original: str, row: dict, hints: list[CompensationHint]
) -> str:
sub = row["substitute_name"]
count = row.get("occurrence_count") or 1
base = f"Replace {original} with {sub} (seen in {count} recipes)."
if hints:
base += " To compensate: " + "; ".join(h.reason for h in hints) + "."
return base

38
app/staples/seitan.yaml Normal file
View file

@ -0,0 +1,38 @@
slug: seitan
name: Seitan (Wheat Meat)
description: High-protein wheat gluten that mimics the texture of meat. Can be made in bulk and stored in multiple formats.
dietary_labels: [vegan, high-protein]
base_ingredients:
- vital wheat gluten
- nutritional yeast
- soy sauce
- garlic powder
- vegetable broth
base_method: simmer
base_time_minutes: 45
yield_formats:
fresh:
elements: [Structure, Depth, Richness]
shelf_days: 5
storage: airtight container, refrigerated in broth
methods: [saute, braise, grill, stir-fry]
texture: chewy, meaty
frozen:
elements: [Structure, Depth]
shelf_days: 90
storage: vacuum-sealed freezer bag
methods: [thaw then any method]
texture: slightly softer after thaw
braised:
elements: [Structure, Depth, Seasoning]
shelf_days: 4
storage: covered in braising liquid, refrigerated
methods: [serve directly, slice for sandwiches]
texture: tender, falling-apart
grilled:
elements: [Structure, Aroma, Texture]
shelf_days: 3
storage: refrigerated, uncovered to maintain crust
methods: [slice cold, reheat in pan]
texture: crisp exterior, chewy interior
compatible_styles: [italian, latin, east_asian, eastern_european]

28
app/staples/tempeh.yaml Normal file
View file

@ -0,0 +1,28 @@
slug: tempeh
name: Tempeh
description: Fermented soybean cake. Dense, nutty, high in protein. Excellent at absorbing marinades.
dietary_labels: [vegan, high-protein, fermented]
base_ingredients:
- tempeh block (store-bought or homemade from soybeans + starter)
base_method: steam then marinate
base_time_minutes: 20
yield_formats:
raw:
elements: [Structure, Depth, Richness]
shelf_days: 7
storage: refrigerated in original packaging or wrapped
methods: [steam, crumble, slice]
texture: dense, firm
marinated:
elements: [Structure, Depth, Seasoning, Aroma]
shelf_days: 5
storage: submerged in marinade, refrigerated
methods: [bake, pan-fry, grill]
texture: chewy, flavor-dense
crumbled:
elements: [Structure, Depth, Texture]
shelf_days: 3
storage: refrigerated, use quickly
methods: [saute as ground meat substitute, add to tacos or pasta]
texture: crumbly, browned bits
compatible_styles: [latin, east_asian, mediterranean]

View file

@ -0,0 +1,34 @@
slug: tofu_firm
name: Firm Tofu
description: Pressed soybean curd. Neutral flavor, excellent at absorbing surrounding flavors. Freeze-thaw cycle creates meatier texture.
dietary_labels: [vegan, high-protein]
base_ingredients:
- firm or extra-firm tofu block
base_method: press (30 min) then prepare
base_time_minutes: 30
yield_formats:
pressed_raw:
elements: [Structure]
shelf_days: 5
storage: submerged in water, refrigerated, change water daily
methods: [cube, slice, crumble]
texture: dense, uniform
freeze_thawed:
elements: [Structure, Texture]
shelf_days: 5
storage: refrigerated after thawing
methods: [squeeze dry, saute, bake]
texture: chewy, porous, absorbs marinades deeply
baked:
elements: [Structure, Texture, Aroma]
shelf_days: 4
storage: refrigerated, uncovered
methods: [add to stir-fry, bowl, salad]
texture: crisp exterior, chewy interior
silken:
elements: [Richness, Structure]
shelf_days: 3
storage: refrigerated, use within days of opening
methods: [blend into sauces, custards, dressings]
texture: silky, smooth
compatible_styles: [east_asian, mediterranean]

926
app/static/index.html Normal file
View file

@ -0,0 +1,926 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Project Thoth - Inventory & Receipt Manager</title>
<style>
* {
margin: 0;
padding: 0;
box-sizing: border-box;
}
body {
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, sans-serif;
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
min-height: 100vh;
padding: 20px;
}
.container {
max-width: 1200px;
margin: 0 auto;
}
.header {
text-align: center;
color: white;
margin-bottom: 40px;
}
.header h1 {
font-size: 2.5em;
margin-bottom: 10px;
}
.header p {
font-size: 1.2em;
opacity: 0.9;
}
/* Tabs */
.tabs {
display: flex;
gap: 10px;
margin-bottom: 20px;
}
.tab {
background: rgba(255, 255, 255, 0.2);
color: white;
border: none;
padding: 15px 30px;
font-size: 16px;
border-radius: 8px;
cursor: pointer;
transition: all 0.3s;
}
.tab:hover {
background: rgba(255, 255, 255, 0.3);
}
.tab.active {
background: white;
color: #667eea;
font-weight: 600;
}
.tab-content {
display: none;
}
.tab-content.active {
display: block;
}
.card {
background: white;
border-radius: 12px;
padding: 30px;
box-shadow: 0 10px 40px rgba(0,0,0,0.2);
margin-bottom: 20px;
}
.upload-area {
border: 3px dashed #667eea;
border-radius: 8px;
padding: 40px;
text-align: center;
cursor: pointer;
transition: all 0.3s;
background: #f7f9fc;
}
.upload-area:hover {
border-color: #764ba2;
background: #eef2f7;
}
.upload-icon {
font-size: 48px;
margin-bottom: 20px;
}
.upload-text {
font-size: 18px;
color: #333;
margin-bottom: 10px;
}
.upload-hint {
font-size: 14px;
color: #666;
}
input[type="file"] {
display: none;
}
.button {
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
color: white;
border: none;
padding: 12px 30px;
font-size: 16px;
border-radius: 6px;
cursor: pointer;
transition: transform 0.2s;
margin-right: 10px;
}
.button:hover {
transform: translateY(-2px);
}
.button:disabled {
opacity: 0.5;
cursor: not-allowed;
transform: none;
}
.button-secondary {
background: #6c757d;
}
.button-small {
padding: 8px 16px;
font-size: 14px;
}
.loading {
text-align: center;
padding: 20px;
display: none;
}
.spinner {
border: 4px solid #f3f3f3;
border-top: 4px solid #667eea;
border-radius: 50%;
width: 40px;
height: 40px;
animation: spin 1s linear infinite;
margin: 0 auto 10px;
}
@keyframes spin {
0% { transform: rotate(0deg); }
100% { transform: rotate(360deg); }
}
.results {
margin-top: 20px;
display: none;
}
.result-item {
padding: 15px;
border-radius: 6px;
margin-bottom: 10px;
}
.result-success {
background: #d4edda;
color: #155724;
border: 1px solid #c3e6cb;
}
.result-error {
background: #f8d7da;
color: #721c24;
border: 1px solid #f5c6cb;
}
.result-info {
background: #d1ecf1;
color: #0c5460;
border: 1px solid #bee5eb;
}
/* Stats */
.stats-grid {
display: grid;
grid-template-columns: repeat(auto-fit, minmax(200px, 1fr));
gap: 15px;
margin-bottom: 20px;
}
.stat-card {
background: #f7f9fc;
padding: 20px;
border-radius: 8px;
text-align: center;
}
.stat-value {
font-size: 32px;
font-weight: bold;
color: #667eea;
margin-bottom: 5px;
}
.stat-label {
font-size: 14px;
color: #666;
}
/* Inventory List */
.inventory-list {
margin-top: 20px;
}
.inventory-item {
background: #f7f9fc;
padding: 15px;
border-radius: 6px;
margin-bottom: 10px;
display: flex;
justify-content: space-between;
align-items: center;
}
.item-info {
flex: 1;
}
.item-name {
font-weight: 600;
font-size: 16px;
margin-bottom: 5px;
}
.item-details {
font-size: 14px;
color: #666;
}
.item-tags {
display: flex;
gap: 5px;
margin-top: 5px;
flex-wrap: wrap;
}
.tag {
background: #667eea;
color: white;
padding: 2px 8px;
border-radius: 12px;
font-size: 12px;
}
.expiry-warning {
color: #ff6b6b;
font-weight: 600;
}
.expiry-soon {
color: #ffa500;
font-weight: 600;
}
/* Form */
.form-group {
margin-bottom: 15px;
}
.form-group label {
display: block;
margin-bottom: 5px;
font-weight: 600;
color: #333;
}
.form-group input,
.form-group select,
.form-group textarea {
width: 100%;
padding: 10px;
border: 1px solid #ddd;
border-radius: 6px;
font-size: 14px;
}
.form-row {
display: grid;
grid-template-columns: 1fr 1fr;
gap: 15px;
}
.hidden {
display: none !important;
}
</style>
</head>
<body>
<div class="container">
<div class="header">
<h1>📦 Project Thoth</h1>
<p>Smart Inventory & Receipt Management</p>
</div>
<!-- Tabs -->
<div class="tabs">
<button class="tab active" onclick="switchTab('inventory')">🏪 Inventory</button>
<button class="tab" onclick="switchTab('receipts')">🧾 Receipts</button>
</div>
<!-- Inventory Tab -->
<div id="inventoryTab" class="tab-content active">
<!-- Stats -->
<div class="card">
<h2>📊 Inventory Overview</h2>
<div class="stats-grid" id="inventoryStats">
<div class="stat-card">
<div class="stat-value" id="totalItems">0</div>
<div class="stat-label">Total Items</div>
</div>
<div class="stat-card">
<div class="stat-value" id="totalProducts">0</div>
<div class="stat-label">Unique Products</div>
</div>
<div class="stat-card">
<div class="stat-value expiry-soon" id="expiringSoon">0</div>
<div class="stat-label">Expiring Soon</div>
</div>
<div class="stat-card">
<div class="stat-value expiry-warning" id="expired">0</div>
<div class="stat-label">Expired</div>
</div>
</div>
</div>
<!-- Barcode Scanner Gun -->
<div class="card">
<h2>🔫 Scanner Gun</h2>
<p style="color: #666; margin-bottom: 15px;">Use your barcode scanner gun below. Scan will auto-submit when Enter is pressed.</p>
<div class="form-group">
<label for="scannerGunInput">Scan barcode here:</label>
<input
type="text"
id="scannerGunInput"
placeholder="Focus here and scan with barcode gun..."
style="font-size: 18px; font-family: monospace; background: #f0f8ff;"
autocomplete="off"
>
</div>
<div class="form-row">
<div class="form-group">
<label for="scannerLocation">Location</label>
<select id="scannerLocation">
<option value="fridge">Fridge</option>
<option value="freezer">Freezer</option>
<option value="pantry" selected>Pantry</option>
<option value="cabinet">Cabinet</option>
</select>
</div>
<div class="form-group">
<label for="scannerQuantity">Quantity</label>
<input type="number" id="scannerQuantity" value="1" min="0.1" step="0.1">
</div>
</div>
<div class="loading" id="scannerLoading">
<div class="spinner"></div>
<p>Processing barcode...</p>
</div>
<div class="results" id="scannerResults"></div>
</div>
<!-- Barcode Scan (Camera/Image) -->
<div class="card">
<h2>📷 Scan Barcode (Camera/Image)</h2>
<div class="upload-area" id="barcodeUploadArea">
<div class="upload-icon">📸</div>
<div class="upload-text">Click to scan barcode or drag and drop</div>
<div class="upload-hint">Take a photo of a product barcode (UPC/EAN)</div>
</div>
<input type="file" id="barcodeInput" accept="image/*" capture="environment">
<div class="form-row" style="margin-top: 20px;">
<div class="form-group">
<label for="barcodeLocation">Location</label>
<select id="barcodeLocation">
<option value="fridge">Fridge</option>
<option value="freezer">Freezer</option>
<option value="pantry" selected>Pantry</option>
<option value="cabinet">Cabinet</option>
</select>
</div>
<div class="form-group">
<label for="barcodeQuantity">Quantity</label>
<input type="number" id="barcodeQuantity" value="1" min="0.1" step="0.1">
</div>
</div>
<div class="loading" id="barcodeLoading">
<div class="spinner"></div>
<p>Scanning barcode...</p>
</div>
<div class="results" id="barcodeResults"></div>
</div>
<!-- Manual Add -->
<div class="card">
<h2> Add Item Manually</h2>
<div class="form-row">
<div class="form-group">
<label for="itemName">Product Name*</label>
<input type="text" id="itemName" placeholder="e.g., Organic Milk" required>
</div>
<div class="form-group">
<label for="itemBrand">Brand</label>
<input type="text" id="itemBrand" placeholder="e.g., Horizon">
</div>
</div>
<div class="form-row">
<div class="form-group">
<label for="itemQuantity">Quantity*</label>
<input type="number" id="itemQuantity" value="1" min="0.1" step="0.1" required>
</div>
<div class="form-group">
<label for="itemUnit">Unit</label>
<select id="itemUnit">
<option value="count">Count</option>
<option value="kg">Kilograms</option>
<option value="lbs">Pounds</option>
<option value="oz">Ounces</option>
<option value="liters">Liters</option>
</select>
</div>
</div>
<div class="form-row">
<div class="form-group">
<label for="itemLocation">Location*</label>
<select id="itemLocation" required>
<option value="fridge">Fridge</option>
<option value="freezer">Freezer</option>
<option value="pantry" selected>Pantry</option>
<option value="cabinet">Cabinet</option>
</select>
</div>
<div class="form-group">
<label for="itemExpiration">Expiration Date</label>
<input type="date" id="itemExpiration">
</div>
</div>
<button class="button" onclick="addManualItem()">Add to Inventory</button>
</div>
<!-- Inventory List -->
<div class="card">
<h2>📋 Current Inventory</h2>
<div class="inventory-list" id="inventoryList">
<p style="text-align: center; color: #666;">No items yet. Scan a barcode or add manually!</p>
</div>
</div>
<!-- Export -->
<div class="card">
<h2>📥 Export</h2>
<button class="button" onclick="exportInventoryCSV()">📊 Download CSV</button>
<button class="button" onclick="exportInventoryExcel()">📈 Download Excel</button>
</div>
</div>
<!-- Receipts Tab -->
<div id="receiptsTab" class="tab-content">
<div class="card">
<h2>📸 Upload Receipt</h2>
<div class="upload-area" id="receiptUploadArea">
<div class="upload-icon">🧾</div>
<div class="upload-text">Click to upload or drag and drop</div>
<div class="upload-hint">Supports JPG, PNG (max 10MB)</div>
</div>
<input type="file" id="receiptInput" accept="image/*">
<div class="loading" id="receiptLoading">
<div class="spinner"></div>
<p>Processing receipt...</p>
</div>
<div class="results" id="receiptResults"></div>
</div>
<div class="card">
<h2>📋 Recent Receipts</h2>
<div id="receiptStats">
<p style="text-align: center; color: #666;">No receipts yet. Upload one above!</p>
</div>
<div style="margin-top: 20px;">
<button class="button" onclick="exportReceiptCSV()">📊 Download CSV</button>
<button class="button" onclick="exportReceiptExcel()">📈 Download Excel</button>
</div>
</div>
</div>
</div>
<script>
const API_BASE = '/api/v1';
let currentInventory = [];
// Tab switching
function switchTab(tab) {
document.querySelectorAll('.tab').forEach(t => t.classList.remove('active'));
document.querySelectorAll('.tab-content').forEach(c => c.classList.remove('active'));
if (tab === 'inventory') {
document.querySelector('.tab:nth-child(1)').classList.add('active');
document.getElementById('inventoryTab').classList.add('active');
loadInventoryData();
// Auto-focus scanner gun input for quick scanning
setTimeout(() => {
document.getElementById('scannerGunInput').focus();
}, 100);
} else {
document.querySelector('.tab:nth-child(2)').classList.add('active');
document.getElementById('receiptsTab').classList.add('active');
loadReceiptData();
}
}
// Scanner gun (text input)
const scannerGunInput = document.getElementById('scannerGunInput');
// Auto-focus scanner gun input when inventory tab is active
scannerGunInput.addEventListener('keypress', async (e) => {
if (e.key === 'Enter') {
e.preventDefault();
const barcode = scannerGunInput.value.trim();
if (!barcode) return;
await handleScannerGunInput(barcode);
scannerGunInput.value = ''; // Clear for next scan
scannerGunInput.focus(); // Re-focus for next scan
}
});
async function handleScannerGunInput(barcode) {
const location = document.getElementById('scannerLocation').value;
const quantity = parseFloat(document.getElementById('scannerQuantity').value);
showLoading('scanner', true);
showResults('scanner', false);
try {
const response = await fetch(`${API_BASE}/inventory/scan/text`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
barcode,
location,
quantity,
auto_add_to_inventory: true
})
});
const data = await response.json();
if (data.success && data.barcodes_found > 0) {
const result = data.results[0];
showResult('scanner', 'success',
`✓ Added: ${result.product.name}${result.product.brand ? ' (' + result.product.brand + ')' : ''} to ${location}`
);
loadInventoryData();
// Beep sound (optional - browser may block)
try {
const audioContext = new (window.AudioContext || window.webkitAudioContext)();
const oscillator = audioContext.createOscillator();
const gainNode = audioContext.createGain();
oscillator.connect(gainNode);
gainNode.connect(audioContext.destination);
oscillator.frequency.value = 800;
oscillator.type = 'sine';
gainNode.gain.setValueAtTime(0.3, audioContext.currentTime);
oscillator.start(audioContext.currentTime);
oscillator.stop(audioContext.currentTime + 0.1);
} catch (e) {
// Audio failed, ignore
}
} else {
showResult('scanner', 'error', data.message || 'Barcode not found');
}
} catch (error) {
showResult('scanner', 'error', `Error: ${error.message}`);
} finally {
showLoading('scanner', false);
}
}
// Barcode scanning (image)
const barcodeUploadArea = document.getElementById('barcodeUploadArea');
const barcodeInput = document.getElementById('barcodeInput');
barcodeUploadArea.addEventListener('click', () => barcodeInput.click());
barcodeInput.addEventListener('change', handleBarcodeScan);
async function handleBarcodeScan(e) {
const file = e.target.files[0];
if (!file) return;
const location = document.getElementById('barcodeLocation').value;
const quantity = parseFloat(document.getElementById('barcodeQuantity').value);
showLoading('barcode', true);
showResults('barcode', false);
const formData = new FormData();
formData.append('file', file);
formData.append('location', location);
formData.append('quantity', quantity);
formData.append('auto_add_to_inventory', 'true');
try {
const response = await fetch(`${API_BASE}/inventory/scan`, {
method: 'POST',
body: formData
});
const data = await response.json();
if (data.success && data.barcodes_found > 0) {
const result = data.results[0];
showResult('barcode', 'success',
`✓ Found: ${result.product.name}${result.product.brand ? ' (' + result.product.brand + ')' : ''}`
);
loadInventoryData();
} else {
showResult('barcode', 'error', 'No barcode found in image');
}
} catch (error) {
showResult('barcode', 'error', `Error: ${error.message}`);
} finally {
showLoading('barcode', false);
barcodeInput.value = '';
}
}
// Manual add
async function addManualItem() {
const name = document.getElementById('itemName').value;
const brand = document.getElementById('itemBrand').value;
const quantity = parseFloat(document.getElementById('itemQuantity').value);
const unit = document.getElementById('itemUnit').value;
const location = document.getElementById('itemLocation').value;
const expiration = document.getElementById('itemExpiration').value;
if (!name || !quantity || !location) {
alert('Please fill in required fields');
return;
}
try {
// First, create product
const productResp = await fetch(`${API_BASE}/inventory/products`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
name,
brand: brand || null,
source: 'manual'
})
});
const product = await productResp.json();
// Then, add to inventory
const itemResp = await fetch(`${API_BASE}/inventory/items`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
product_id: product.id,
quantity,
unit,
location,
expiration_date: expiration || null,
source: 'manual'
})
});
if (itemResp.ok) {
alert('✓ Item added to inventory!');
// Clear form
document.getElementById('itemName').value = '';
document.getElementById('itemBrand').value = '';
document.getElementById('itemQuantity').value = '1';
document.getElementById('itemExpiration').value = '';
loadInventoryData();
} else {
alert('Failed to add item');
}
} catch (error) {
alert(`Error: ${error.message}`);
}
}
// Load inventory data
async function loadInventoryData() {
try {
// Load stats
const statsResp = await fetch(`${API_BASE}/inventory/stats`);
const stats = await statsResp.json();
document.getElementById('totalItems').textContent = stats.total_items;
document.getElementById('totalProducts').textContent = stats.total_products;
document.getElementById('expiringSoon').textContent = stats.expiring_soon;
document.getElementById('expired').textContent = stats.expired;
// Load inventory items
const itemsResp = await fetch(`${API_BASE}/inventory/items?limit=100`);
const items = await itemsResp.json();
currentInventory = items;
displayInventory(items);
} catch (error) {
console.error('Failed to load inventory:', error);
}
}
function displayInventory(items) {
const list = document.getElementById('inventoryList');
if (items.length === 0) {
list.innerHTML = '<p style="text-align: center; color: #666;">No items yet. Scan a barcode or add manually!</p>';
return;
}
list.innerHTML = items.map(item => {
const product = item.product;
let expiryInfo = '';
if (item.expiration_date) {
const expiry = new Date(item.expiration_date);
const today = new Date();
const daysUntil = Math.ceil((expiry - today) / (1000 * 60 * 60 * 24));
if (daysUntil < 0) {
expiryInfo = `<span class="expiry-warning">Expired ${Math.abs(daysUntil)} days ago</span>`;
} else if (daysUntil <= 7) {
expiryInfo = `<span class="expiry-soon">Expires in ${daysUntil} days</span>`;
} else {
expiryInfo = `Expires ${expiry.toLocaleDateString()}`;
}
}
const tags = product.tags ? product.tags.map(tag =>
`<span class="tag" style="background: ${tag.color || '#667eea'}">${tag.name}</span>`
).join('') : '';
return `
<div class="inventory-item">
<div class="item-info">
<div class="item-name">${product.name}${product.brand ? ` - ${product.brand}` : ''}</div>
<div class="item-details">
${item.quantity} ${item.unit} • ${item.location}${expiryInfo ? ' • ' + expiryInfo : ''}
</div>
${tags ? `<div class="item-tags">${tags}</div>` : ''}
</div>
<button class="button button-small" onclick="markAsConsumed('${item.id}')">✓ Consumed</button>
</div>
`;
}).join('');
}
async function markAsConsumed(itemId) {
if (!confirm('Mark this item as consumed?')) return;
try {
await fetch(`${API_BASE}/inventory/items/${itemId}/consume`, { method: 'POST' });
loadInventoryData();
} catch (error) {
alert(`Error: ${error.message}`);
}
}
// Receipt handling
const receiptUploadArea = document.getElementById('receiptUploadArea');
const receiptInput = document.getElementById('receiptInput');
receiptUploadArea.addEventListener('click', () => receiptInput.click());
receiptInput.addEventListener('change', handleReceiptUpload);
async function handleReceiptUpload(e) {
const file = e.target.files[0];
if (!file) return;
showLoading('receipt', true);
showResults('receipt', false);
const formData = new FormData();
formData.append('file', file);
try {
const response = await fetch(`${API_BASE}/receipts/`, {
method: 'POST',
body: formData
});
const data = await response.json();
if (response.ok) {
showResult('receipt', 'success', `Receipt uploaded! ID: ${data.id}`);
showResult('receipt', 'info', 'Processing in background...');
setTimeout(loadReceiptData, 3000);
} else {
showResult('receipt', 'error', `Upload failed: ${data.detail}`);
}
} catch (error) {
showResult('receipt', 'error', `Error: ${error.message}`);
} finally {
showLoading('receipt', false);
receiptInput.value = '';
}
}
async function loadReceiptData() {
try {
const response = await fetch(`${API_BASE}/export/stats`);
const stats = await response.json();
const statsDiv = document.getElementById('receiptStats');
if (stats.total_receipts === 0) {
statsDiv.innerHTML = '<p style="text-align: center; color: #666;">No receipts yet. Upload one above!</p>';
} else {
statsDiv.innerHTML = `
<div class="stats-grid">
<div class="stat-card">
<div class="stat-value">${stats.total_receipts}</div>
<div class="stat-label">Total Receipts</div>
</div>
<div class="stat-card">
<div class="stat-value">${stats.average_quality_score.toFixed(1)}</div>
<div class="stat-label">Avg Quality Score</div>
</div>
<div class="stat-card">
<div class="stat-value">${stats.acceptable_quality_count}</div>
<div class="stat-label">Good Quality</div>
</div>
</div>
`;
}
} catch (error) {
console.error('Failed to load receipt data:', error);
}
}
// Export functions
function exportInventoryCSV() {
window.open(`${API_BASE}/export/inventory/csv`, '_blank');
}
function exportInventoryExcel() {
window.open(`${API_BASE}/export/inventory/excel`, '_blank');
}
function exportReceiptCSV() {
window.open(`${API_BASE}/export/csv`, '_blank');
}
function exportReceiptExcel() {
window.open(`${API_BASE}/export/excel`, '_blank');
}
// Utility functions
function showLoading(type, show) {
document.getElementById(`${type}Loading`).style.display = show ? 'block' : 'none';
}
function showResults(type, show) {
const results = document.getElementById(`${type}Results`);
if (!show) {
results.innerHTML = '';
}
results.style.display = show ? 'block' : 'none';
}
function showResult(type, resultType, message) {
const results = document.getElementById(`${type}Results`);
results.style.display = 'block';
const div = document.createElement('div');
div.className = `result-item result-${resultType}`;
div.textContent = message;
results.appendChild(div);
setTimeout(() => div.remove(), 5000);
}
// Load initial data
loadInventoryData();
setInterval(loadInventoryData, 30000); // Refresh every 30 seconds
</script>
</body>
</html>

459
app/static/upload.html Normal file
View file

@ -0,0 +1,459 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Project Thoth - Receipt Upload</title>
<style>
* {
margin: 0;
padding: 0;
box-sizing: border-box;
}
body {
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, sans-serif;
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
min-height: 100vh;
padding: 20px;
}
.container {
max-width: 800px;
margin: 0 auto;
}
.header {
text-align: center;
color: white;
margin-bottom: 40px;
}
.header h1 {
font-size: 2.5em;
margin-bottom: 10px;
}
.header p {
font-size: 1.2em;
opacity: 0.9;
}
.card {
background: white;
border-radius: 12px;
padding: 30px;
box-shadow: 0 10px 40px rgba(0,0,0,0.2);
margin-bottom: 20px;
}
.upload-area {
border: 3px dashed #667eea;
border-radius: 8px;
padding: 40px;
text-align: center;
cursor: pointer;
transition: all 0.3s;
background: #f7f9fc;
}
.upload-area:hover {
border-color: #764ba2;
background: #eef2f7;
}
.upload-area.dragover {
border-color: #764ba2;
background: #e0e7ff;
}
.upload-icon {
font-size: 48px;
margin-bottom: 20px;
}
.upload-text {
font-size: 18px;
color: #333;
margin-bottom: 10px;
}
.upload-hint {
font-size: 14px;
color: #666;
}
#fileInput {
display: none;
}
.preview-area {
margin-top: 20px;
display: none;
}
.preview-image {
max-width: 100%;
border-radius: 8px;
margin-bottom: 20px;
}
.button {
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
color: white;
border: none;
padding: 12px 30px;
font-size: 16px;
border-radius: 6px;
cursor: pointer;
transition: transform 0.2s;
margin-right: 10px;
}
.button:hover {
transform: translateY(-2px);
}
.button:disabled {
opacity: 0.5;
cursor: not-allowed;
transform: none;
}
.button-secondary {
background: #6c757d;
}
.results {
margin-top: 20px;
display: none;
}
.result-item {
padding: 15px;
border-radius: 6px;
margin-bottom: 10px;
}
.result-success {
background: #d4edda;
color: #155724;
border: 1px solid #c3e6cb;
}
.result-error {
background: #f8d7da;
color: #721c24;
border: 1px solid #f5c6cb;
}
.result-info {
background: #d1ecf1;
color: #0c5460;
border: 1px solid #bee5eb;
}
.loading {
text-align: center;
padding: 20px;
display: none;
}
.spinner {
border: 4px solid #f3f3f3;
border-top: 4px solid #667eea;
border-radius: 50%;
width: 40px;
height: 40px;
animation: spin 1s linear infinite;
margin: 0 auto 10px;
}
@keyframes spin {
0% { transform: rotate(0deg); }
100% { transform: rotate(360deg); }
}
.receipt-list {
margin-top: 20px;
}
.receipt-card {
background: #f7f9fc;
padding: 15px;
border-radius: 6px;
margin-bottom: 10px;
display: flex;
justify-content: space-between;
align-items: center;
}
.receipt-info {
flex: 1;
}
.receipt-id {
font-family: monospace;
font-size: 12px;
color: #666;
}
.receipt-status {
display: inline-block;
padding: 4px 12px;
border-radius: 12px;
font-size: 12px;
font-weight: 600;
margin-left: 10px;
}
.status-processing {
background: #fff3cd;
color: #856404;
}
.status-processed {
background: #d4edda;
color: #155724;
}
.status-error {
background: #f8d7da;
color: #721c24;
}
.quality-score {
font-size: 24px;
font-weight: bold;
color: #667eea;
}
.actions {
margin-top: 20px;
text-align: center;
}
.export-section {
margin-top: 30px;
text-align: center;
}
.export-section h3 {
margin-bottom: 15px;
color: #333;
}
</style>
</head>
<body>
<div class="container">
<div class="header">
<h1>📄 Project Thoth</h1>
<p>Receipt Processing System</p>
</div>
<div class="card">
<h2>Upload Receipt</h2>
<div class="upload-area" id="uploadArea">
<div class="upload-icon">📸</div>
<div class="upload-text">Click to upload or drag and drop</div>
<div class="upload-hint">Supports JPG, PNG (max 10MB)</div>
</div>
<input type="file" id="fileInput" accept="image/*">
<div class="preview-area" id="previewArea">
<img id="previewImage" class="preview-image">
<div>
<button class="button" id="uploadBtn">Upload Receipt</button>
<button class="button button-secondary" id="cancelBtn">Cancel</button>
</div>
</div>
<div class="loading" id="loading">
<div class="spinner"></div>
<p>Processing receipt...</p>
</div>
<div class="results" id="results"></div>
</div>
<div class="card">
<h2>Recent Receipts</h2>
<div class="receipt-list" id="receiptList">
<p style="text-align: center; color: #666;">No receipts yet. Upload one above!</p>
</div>
<div class="export-section">
<h3>Export Data</h3>
<button class="button" onclick="exportCSV()">📊 Download CSV</button>
<button class="button" onclick="exportExcel()">📈 Download Excel</button>
</div>
</div>
</div>
<script>
// Use relative URL so it works from any host (localhost or remote IP)
const API_BASE = '/api/v1';
const uploadArea = document.getElementById('uploadArea');
const fileInput = document.getElementById('fileInput');
const previewArea = document.getElementById('previewArea');
const previewImage = document.getElementById('previewImage');
const uploadBtn = document.getElementById('uploadBtn');
const cancelBtn = document.getElementById('cancelBtn');
const loading = document.getElementById('loading');
const results = document.getElementById('results');
const receiptList = document.getElementById('receiptList');
let selectedFile = null;
let receipts = [];
// Click to upload
uploadArea.addEventListener('click', () => fileInput.click());
// Drag and drop
uploadArea.addEventListener('dragover', (e) => {
e.preventDefault();
uploadArea.classList.add('dragover');
});
uploadArea.addEventListener('dragleave', () => {
uploadArea.classList.remove('dragover');
});
uploadArea.addEventListener('drop', (e) => {
e.preventDefault();
uploadArea.classList.remove('dragover');
const files = e.dataTransfer.files;
if (files.length > 0) {
handleFileSelect(files[0]);
}
});
// File input change
fileInput.addEventListener('change', (e) => {
if (e.target.files.length > 0) {
handleFileSelect(e.target.files[0]);
}
});
function handleFileSelect(file) {
if (!file.type.startsWith('image/')) {
showResult('error', 'Please select an image file');
return;
}
selectedFile = file;
const reader = new FileReader();
reader.onload = (e) => {
previewImage.src = e.target.result;
previewArea.style.display = 'block';
uploadArea.style.display = 'none';
};
reader.readAsDataURL(file);
}
cancelBtn.addEventListener('click', () => {
selectedFile = null;
previewArea.style.display = 'none';
uploadArea.style.display = 'block';
fileInput.value = '';
});
uploadBtn.addEventListener('click', uploadReceipt);
async function uploadReceipt() {
if (!selectedFile) return;
loading.style.display = 'block';
results.style.display = 'none';
uploadBtn.disabled = true;
const formData = new FormData();
formData.append('file', selectedFile);
try {
const response = await fetch(`${API_BASE}/receipts/`, {
method: 'POST',
body: formData
});
const data = await response.json();
if (response.ok) {
showResult('success', `Receipt uploaded! ID: ${data.id}`);
showResult('info', 'Processing in background... Refresh in a few seconds to see results.');
// Reset form
selectedFile = null;
previewArea.style.display = 'none';
uploadArea.style.display = 'block';
fileInput.value = '';
// Refresh list after a delay
setTimeout(loadReceipts, 3000);
} else {
showResult('error', `Upload failed: ${data.detail || 'Unknown error'}`);
}
} catch (error) {
showResult('error', `Network error: ${error.message}`);
} finally {
loading.style.display = 'none';
uploadBtn.disabled = false;
}
}
function showResult(type, message) {
results.style.display = 'block';
const div = document.createElement('div');
div.className = `result-item result-${type}`;
div.textContent = message;
results.appendChild(div);
// Auto-hide after 5 seconds
setTimeout(() => div.remove(), 5000);
}
async function loadReceipts() {
try {
const response = await fetch(`${API_BASE}/export/stats`);
const stats = await response.json();
if (stats.total_receipts === 0) {
receiptList.innerHTML = '<p style="text-align: center; color: #666;">No receipts yet. Upload one above!</p>';
return;
}
// For now, just show stats since we don't have a list endpoint
// In Phase 2, we'll add a proper list endpoint
receiptList.innerHTML = `
<div class="receipt-card">
<div class="receipt-info">
<strong>Total Receipts:</strong> ${stats.total_receipts}<br>
<strong>Average Quality:</strong> ${stats.average_quality_score}/100<br>
<strong>Acceptable Quality:</strong> ${stats.acceptable_quality_count}
</div>
</div>
<p style="text-align: center; color: #666; margin-top: 10px;">
Click "Download Excel" below to see all receipts with details!
</p>
`;
} catch (error) {
console.error('Failed to load receipts:', error);
}
}
async function exportCSV() {
window.open(`${API_BASE}/export/csv`, '_blank');
}
async function exportExcel() {
window.open(`${API_BASE}/export/excel`, '_blank');
}
// Load receipts on page load
loadReceipts();
// Auto-refresh every 10 seconds
setInterval(loadReceipts, 10000);
</script>
</body>
</html>

View file

@ -0,0 +1,459 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Project Thoth - Receipt Upload</title>
<style>
* {
margin: 0;
padding: 0;
box-sizing: border-box;
}
body {
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, sans-serif;
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
min-height: 100vh;
padding: 20px;
}
.container {
max-width: 800px;
margin: 0 auto;
}
.header {
text-align: center;
color: white;
margin-bottom: 40px;
}
.header h1 {
font-size: 2.5em;
margin-bottom: 10px;
}
.header p {
font-size: 1.2em;
opacity: 0.9;
}
.card {
background: white;
border-radius: 12px;
padding: 30px;
box-shadow: 0 10px 40px rgba(0,0,0,0.2);
margin-bottom: 20px;
}
.upload-area {
border: 3px dashed #667eea;
border-radius: 8px;
padding: 40px;
text-align: center;
cursor: pointer;
transition: all 0.3s;
background: #f7f9fc;
}
.upload-area:hover {
border-color: #764ba2;
background: #eef2f7;
}
.upload-area.dragover {
border-color: #764ba2;
background: #e0e7ff;
}
.upload-icon {
font-size: 48px;
margin-bottom: 20px;
}
.upload-text {
font-size: 18px;
color: #333;
margin-bottom: 10px;
}
.upload-hint {
font-size: 14px;
color: #666;
}
#fileInput {
display: none;
}
.preview-area {
margin-top: 20px;
display: none;
}
.preview-image {
max-width: 100%;
border-radius: 8px;
margin-bottom: 20px;
}
.button {
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
color: white;
border: none;
padding: 12px 30px;
font-size: 16px;
border-radius: 6px;
cursor: pointer;
transition: transform 0.2s;
margin-right: 10px;
}
.button:hover {
transform: translateY(-2px);
}
.button:disabled {
opacity: 0.5;
cursor: not-allowed;
transform: none;
}
.button-secondary {
background: #6c757d;
}
.results {
margin-top: 20px;
display: none;
}
.result-item {
padding: 15px;
border-radius: 6px;
margin-bottom: 10px;
}
.result-success {
background: #d4edda;
color: #155724;
border: 1px solid #c3e6cb;
}
.result-error {
background: #f8d7da;
color: #721c24;
border: 1px solid #f5c6cb;
}
.result-info {
background: #d1ecf1;
color: #0c5460;
border: 1px solid #bee5eb;
}
.loading {
text-align: center;
padding: 20px;
display: none;
}
.spinner {
border: 4px solid #f3f3f3;
border-top: 4px solid #667eea;
border-radius: 50%;
width: 40px;
height: 40px;
animation: spin 1s linear infinite;
margin: 0 auto 10px;
}
@keyframes spin {
0% { transform: rotate(0deg); }
100% { transform: rotate(360deg); }
}
.receipt-list {
margin-top: 20px;
}
.receipt-card {
background: #f7f9fc;
padding: 15px;
border-radius: 6px;
margin-bottom: 10px;
display: flex;
justify-content: space-between;
align-items: center;
}
.receipt-info {
flex: 1;
}
.receipt-id {
font-family: monospace;
font-size: 12px;
color: #666;
}
.receipt-status {
display: inline-block;
padding: 4px 12px;
border-radius: 12px;
font-size: 12px;
font-weight: 600;
margin-left: 10px;
}
.status-processing {
background: #fff3cd;
color: #856404;
}
.status-processed {
background: #d4edda;
color: #155724;
}
.status-error {
background: #f8d7da;
color: #721c24;
}
.quality-score {
font-size: 24px;
font-weight: bold;
color: #667eea;
}
.actions {
margin-top: 20px;
text-align: center;
}
.export-section {
margin-top: 30px;
text-align: center;
}
.export-section h3 {
margin-bottom: 15px;
color: #333;
}
</style>
</head>
<body>
<div class="container">
<div class="header">
<h1>📄 Project Thoth</h1>
<p>Receipt Processing System</p>
</div>
<div class="card">
<h2>Upload Receipt</h2>
<div class="upload-area" id="uploadArea">
<div class="upload-icon">📸</div>
<div class="upload-text">Click to upload or drag and drop</div>
<div class="upload-hint">Supports JPG, PNG (max 10MB)</div>
</div>
<input type="file" id="fileInput" accept="image/*">
<div class="preview-area" id="previewArea">
<img id="previewImage" class="preview-image">
<div>
<button class="button" id="uploadBtn">Upload Receipt</button>
<button class="button button-secondary" id="cancelBtn">Cancel</button>
</div>
</div>
<div class="loading" id="loading">
<div class="spinner"></div>
<p>Processing receipt...</p>
</div>
<div class="results" id="results"></div>
</div>
<div class="card">
<h2>Recent Receipts</h2>
<div class="receipt-list" id="receiptList">
<p style="text-align: center; color: #666;">No receipts yet. Upload one above!</p>
</div>
<div class="export-section">
<h3>Export Data</h3>
<button class="button" onclick="exportCSV()">📊 Download CSV</button>
<button class="button" onclick="exportExcel()">📈 Download Excel</button>
</div>
</div>
</div>
<script>
// Use relative URL so it works from any host (localhost or remote IP)
const API_BASE = '/api/v1';
const uploadArea = document.getElementById('uploadArea');
const fileInput = document.getElementById('fileInput');
const previewArea = document.getElementById('previewArea');
const previewImage = document.getElementById('previewImage');
const uploadBtn = document.getElementById('uploadBtn');
const cancelBtn = document.getElementById('cancelBtn');
const loading = document.getElementById('loading');
const results = document.getElementById('results');
const receiptList = document.getElementById('receiptList');
let selectedFile = null;
let receipts = [];
// Click to upload
uploadArea.addEventListener('click', () => fileInput.click());
// Drag and drop
uploadArea.addEventListener('dragover', (e) => {
e.preventDefault();
uploadArea.classList.add('dragover');
});
uploadArea.addEventListener('dragleave', () => {
uploadArea.classList.remove('dragover');
});
uploadArea.addEventListener('drop', (e) => {
e.preventDefault();
uploadArea.classList.remove('dragover');
const files = e.dataTransfer.files;
if (files.length > 0) {
handleFileSelect(files[0]);
}
});
// File input change
fileInput.addEventListener('change', (e) => {
if (e.target.files.length > 0) {
handleFileSelect(e.target.files[0]);
}
});
function handleFileSelect(file) {
if (!file.type.startsWith('image/')) {
showResult('error', 'Please select an image file');
return;
}
selectedFile = file;
const reader = new FileReader();
reader.onload = (e) => {
previewImage.src = e.target.result;
previewArea.style.display = 'block';
uploadArea.style.display = 'none';
};
reader.readAsDataURL(file);
}
cancelBtn.addEventListener('click', () => {
selectedFile = null;
previewArea.style.display = 'none';
uploadArea.style.display = 'block';
fileInput.value = '';
});
uploadBtn.addEventListener('click', uploadReceipt);
async function uploadReceipt() {
if (!selectedFile) return;
loading.style.display = 'block';
results.style.display = 'none';
uploadBtn.disabled = true;
const formData = new FormData();
formData.append('file', selectedFile);
try {
const response = await fetch(`${API_BASE}/receipts/`, {
method: 'POST',
body: formData
});
const data = await response.json();
if (response.ok) {
showResult('success', `Receipt uploaded! ID: ${data.id}`);
showResult('info', 'Processing in background... Refresh in a few seconds to see results.');
// Reset form
selectedFile = null;
previewArea.style.display = 'none';
uploadArea.style.display = 'block';
fileInput.value = '';
// Refresh list after a delay
setTimeout(loadReceipts, 3000);
} else {
showResult('error', `Upload failed: ${data.detail || 'Unknown error'}`);
}
} catch (error) {
showResult('error', `Network error: ${error.message}`);
} finally {
loading.style.display = 'none';
uploadBtn.disabled = false;
}
}
function showResult(type, message) {
results.style.display = 'block';
const div = document.createElement('div');
div.className = `result-item result-${type}`;
div.textContent = message;
results.appendChild(div);
// Auto-hide after 5 seconds
setTimeout(() => div.remove(), 5000);
}
async function loadReceipts() {
try {
const response = await fetch(`${API_BASE}/export/stats`);
const stats = await response.json();
if (stats.total_receipts === 0) {
receiptList.innerHTML = '<p style="text-align: center; color: #666;">No receipts yet. Upload one above!</p>';
return;
}
// For now, just show stats since we don't have a list endpoint
// In Phase 2, we'll add a proper list endpoint
receiptList.innerHTML = `
<div class="receipt-card">
<div class="receipt-info">
<strong>Total Receipts:</strong> ${stats.total_receipts}<br>
<strong>Average Quality:</strong> ${stats.average_quality_score}/100<br>
<strong>Acceptable Quality:</strong> ${stats.acceptable_quality_count}
</div>
</div>
<p style="text-align: center; color: #666; margin-top: 10px;">
Click "Download Excel" below to see all receipts with details!
</p>
`;
} catch (error) {
console.error('Failed to load receipts:', error);
}
}
async function exportCSV() {
window.open(`${API_BASE}/export/csv`, '_blank');
}
async function exportExcel() {
window.open(`${API_BASE}/export/excel`, '_blank');
}
// Load receipts on page load
loadReceipts();
// Auto-refresh every 10 seconds
setInterval(loadReceipts, 10000);
</script>
</body>
</html>

View file

@ -0,0 +1,13 @@
style_id: east_asian
name: East Asian
aromatics: [ginger, scallion, sesame, star anise, five spice, sichuan pepper, lemongrass]
depth_sources: [soy sauce, miso, oyster sauce, shiitake, fish sauce, bonito]
brightness_sources: [rice vinegar, mirin, citrus zest, ponzu]
method_bias:
stir_fry: 0.35
steam: 0.25
braise: 0.20
boil: 0.20
structure_forms: [dumpling wrapper, thin noodle, rice, bao]
seasoning_bias: soy sauce
finishing_fat: toasted sesame oil

View file

@ -0,0 +1,13 @@
style_id: eastern_european
name: Eastern European
aromatics: [dill, caraway, marjoram, parsley, horseradish, bay leaf]
depth_sources: [sour cream, smoked meats, bacon, dried mushrooms]
brightness_sources: [sauerkraut brine, apple cider vinegar, sour cream]
method_bias:
braise: 0.35
boil: 0.30
bake: 0.25
roast: 0.10
structure_forms: [dumpling wrapper, bread dough, stuffed cabbage]
seasoning_bias: kosher salt
finishing_fat: butter or lard

13
app/styles/italian.yaml Normal file
View file

@ -0,0 +1,13 @@
style_id: italian
name: Italian
aromatics: [basil, oregano, garlic, onion, fennel, rosemary, thyme, sage, marjoram]
depth_sources: [parmesan, pecorino, anchovies, canned tomato, porcini mushrooms]
brightness_sources: [lemon, white wine, tomato, red wine vinegar]
method_bias:
braise: 0.30
roast: 0.30
saute: 0.25
simmer: 0.15
structure_forms: [pasta, wrapped, layered, risotto]
seasoning_bias: sea salt
finishing_fat: olive oil

13
app/styles/latin.yaml Normal file
View file

@ -0,0 +1,13 @@
style_id: latin
name: Latin
aromatics: [cumin, chili, cilantro, epazote, mexican oregano, ancho, chipotle, smoked paprika]
depth_sources: [dried chilis, smoked peppers, chocolate, achiote]
brightness_sources: [lime, tomatillo, brined jalapeño, orange]
method_bias:
roast: 0.30
braise: 0.30
fry: 0.25
grill: 0.15
structure_forms: [wrapped in masa, pastry, stuffed, bowl]
seasoning_bias: kosher salt
finishing_fat: lard or neutral oil

View file

@ -0,0 +1,13 @@
style_id: mediterranean
name: Mediterranean
aromatics: [oregano, thyme, rosemary, mint, sumac, za'atar, preserved lemon]
depth_sources: [tahini, feta, halloumi, dried olives, harissa]
brightness_sources: [lemon, pomegranate molasses, yogurt, sumac]
method_bias:
roast: 0.35
grill: 0.30
braise: 0.25
saute: 0.10
structure_forms: [flatbread, stuffed vegetables, grain bowl, mezze plate]
seasoning_bias: sea salt
finishing_fat: olive oil

0
app/tasks/__init__.py Normal file
View file

145
app/tasks/runner.py Normal file
View file

@ -0,0 +1,145 @@
# app/tasks/runner.py
"""Kiwi background task runner.
Implements the run_task_fn interface expected by circuitforge_core.tasks.scheduler.
Each kiwi LLM task type has its own handler below.
Public API:
LLM_TASK_TYPES frozenset of task type strings to route through the scheduler
VRAM_BUDGETS VRAM GB estimates per task type
insert_task() deduplicating task insertion
run_task() called by the scheduler batch worker
"""
from __future__ import annotations
import json
import logging
import sqlite3
from datetime import date, timedelta
from pathlib import Path
from app.services.expiration_predictor import ExpirationPredictor
log = logging.getLogger(__name__)
LLM_TASK_TYPES: frozenset[str] = frozenset({"expiry_llm_fallback"})
VRAM_BUDGETS: dict[str, float] = {
# ExpirationPredictor uses a small LLM (16 tokens out, single pass).
"expiry_llm_fallback": 2.0,
# Recipe LLM (levels 3-4): full recipe generation, ~200-500 tokens out.
# Budget assumes a quantized 7B-class model.
"recipe_llm": 4.0,
}
def insert_task(
db_path: Path,
task_type: str,
job_id: int,
*,
params: str | None = None,
) -> tuple[int, bool]:
"""Insert a background task if no identical task is already in-flight.
Returns (task_id, True) if a new task was created.
Returns (existing_id, False) if an identical task is already queued/running.
"""
conn = sqlite3.connect(db_path)
conn.row_factory = sqlite3.Row
existing = conn.execute(
"SELECT id FROM background_tasks "
"WHERE task_type=? AND job_id=? AND status IN ('queued','running')",
(task_type, job_id),
).fetchone()
if existing:
conn.close()
return existing["id"], False
cursor = conn.execute(
"INSERT INTO background_tasks (task_type, job_id, params) VALUES (?,?,?)",
(task_type, job_id, params),
)
conn.commit()
task_id = cursor.lastrowid
conn.close()
return task_id, True
def _update_task_status(
db_path: Path, task_id: int, status: str, *, error: str = ""
) -> None:
with sqlite3.connect(db_path) as conn:
conn.execute(
"UPDATE background_tasks "
"SET status=?, error=?, updated_at=CURRENT_TIMESTAMP WHERE id=?",
(status, error, task_id),
)
def run_task(
db_path: Path,
task_id: int,
task_type: str,
job_id: int,
params: str | None = None,
) -> None:
"""Execute one background task. Called by the scheduler's batch worker."""
_update_task_status(db_path, task_id, "running")
try:
if task_type == "expiry_llm_fallback":
_run_expiry_llm_fallback(db_path, job_id, params)
else:
raise ValueError(f"Unknown kiwi task type: {task_type!r}")
_update_task_status(db_path, task_id, "completed")
except Exception as exc:
log.exception("Task %d (%s) failed: %s", task_id, task_type, exc)
_update_task_status(db_path, task_id, "failed", error=str(exc))
def _run_expiry_llm_fallback(
db_path: Path,
item_id: int,
params: str | None,
) -> None:
"""Predict expiry date via LLM for an inventory item and write result to DB.
params JSON keys:
product_name (required) e.g. "Trader Joe's Organic Tempeh"
category (optional) category hint for the predictor
location (optional) "fridge" | "freezer" | "pantry" (default: "fridge")
"""
p = json.loads(params or "{}")
product_name = p.get("product_name", "")
category = p.get("category")
location = p.get("location", "fridge")
if not product_name:
raise ValueError("expiry_llm_fallback: 'product_name' is required in params")
predictor = ExpirationPredictor()
days = predictor._llm_predict_days(product_name, category, location)
if days is None:
log.warning(
"LLM expiry fallback returned None for item_id=%d product=%r"
"expiry_date will remain NULL",
item_id,
product_name,
)
return
expiry = (date.today() + timedelta(days=days)).isoformat()
with sqlite3.connect(db_path) as conn:
conn.execute(
"UPDATE inventory_items SET expiry_date=? WHERE id=?",
(expiry, item_id),
)
log.info(
"LLM expiry fallback: item_id=%d %r%s (%d days)",
item_id,
product_name,
expiry,
days,
)

26
app/tasks/scheduler.py Normal file
View file

@ -0,0 +1,26 @@
# app/tasks/scheduler.py
"""Kiwi LLM task scheduler — thin shim over circuitforge_core.tasks.scheduler."""
from __future__ import annotations
from pathlib import Path
from circuitforge_core.tasks.scheduler import (
TaskScheduler,
get_scheduler as _base_get_scheduler,
reset_scheduler, # re-export for tests
)
from app.core.config import settings
from app.tasks.runner import LLM_TASK_TYPES, VRAM_BUDGETS, run_task
def get_scheduler(db_path: Path) -> TaskScheduler:
"""Return the process-level TaskScheduler singleton for Kiwi."""
return _base_get_scheduler(
db_path=db_path,
run_task_fn=run_task,
task_types=LLM_TASK_TYPES,
vram_budgets=VRAM_BUDGETS,
coordinator_url=settings.COORDINATOR_URL,
service_name="kiwi",
)

75
app/tiers.py Normal file
View file

@ -0,0 +1,75 @@
"""
Kiwi tier gates.
Tiers: free < paid < premium
(Ultra not used in Kiwi no human-in-the-loop operations.)
Uses circuitforge-core can_use() with Kiwi's feature map.
"""
from __future__ import annotations
from circuitforge_core.tiers.tiers import can_use as _can_use, BYOK_UNLOCKABLE
# Features that unlock when the user supplies their own LLM backend.
KIWI_BYOK_UNLOCKABLE: frozenset[str] = frozenset({
"recipe_suggestions",
"expiry_llm_matching",
"receipt_ocr",
})
# Feature → minimum tier required
KIWI_FEATURES: dict[str, str] = {
# Free tier
"inventory_crud": "free",
"barcode_scan": "free",
"receipt_upload": "free",
"expiry_alerts": "free",
"export_csv": "free",
"leftover_mode": "free", # Rate-limited at API layer, not tier-gated
"staple_library": "free",
# Paid tier
"receipt_ocr": "paid", # BYOK-unlockable
"recipe_suggestions": "paid", # BYOK-unlockable
"expiry_llm_matching": "paid", # BYOK-unlockable
"meal_planning": "paid",
"dietary_profiles": "paid",
"style_picker": "paid",
# Premium tier
"multi_household": "premium",
"background_monitoring": "premium",
}
def can_use(feature: str, tier: str, has_byok: bool = False) -> bool:
"""Return True if the given tier can access the feature.
The 'local' tier is assigned to dev-bypass and non-cloud sessions
it has unrestricted access to all features.
"""
if tier == "local":
return True
return _can_use(
feature,
tier,
has_byok=has_byok,
_features=KIWI_FEATURES,
_byok_unlockable=KIWI_BYOK_UNLOCKABLE,
)
def require_feature(feature: str, tier: str, has_byok: bool = False) -> None:
"""Raise ValueError if the tier cannot access the feature."""
if not can_use(feature, tier, has_byok):
from circuitforge_core.tiers.tiers import tier_label
needed = tier_label(
feature,
has_byok=has_byok,
_features=KIWI_FEATURES,
_byok_unlockable=KIWI_BYOK_UNLOCKABLE,
)
raise ValueError(
f"Feature '{feature}' requires {needed} tier. "
f"Current tier: {tier}."
)

5
app/utils/__init__.py Normal file
View file

@ -0,0 +1,5 @@
# app/utils/__init__.py
"""
Utility functions for Kiwi.
Contains common helpers used throughout the application.
"""

248
app/utils/progress.py Normal file
View file

@ -0,0 +1,248 @@
# app/utils/progress.py
import sys
import time
import asyncio
from typing import Optional, Callable, Any
import threading
class ProgressIndicator:
"""
A simple progress indicator for long-running operations.
This class provides different styles of progress indicators:
- dots: Animated dots (. .. ... ....)
- spinner: Spinning cursor (|/-\)
- percentage: Progress percentage [#### ] 40%
"""
def __init__(self,
message: str = "Processing",
style: str = "dots",
total: Optional[int] = None):
"""
Initialize the progress indicator.
Args:
message: The message to display before the indicator
style: The indicator style ('dots', 'spinner', or 'percentage')
total: Total items for percentage style (required for percentage)
"""
self.message = message
self.style = style
self.total = total
self.current = 0
self.start_time = None
self._running = False
self._thread = None
self._task = None
# Validate style
if style not in ["dots", "spinner", "percentage"]:
raise ValueError("Style must be 'dots', 'spinner', or 'percentage'")
# Validate total for percentage style
if style == "percentage" and total is None:
raise ValueError("Total must be specified for percentage style")
def start(self):
"""Start the progress indicator in a separate thread."""
if self._running:
return
self._running = True
self.start_time = time.time()
# Start the appropriate indicator
if self.style == "dots":
self._thread = threading.Thread(target=self._dots_indicator)
elif self.style == "spinner":
self._thread = threading.Thread(target=self._spinner_indicator)
elif self.style == "percentage":
self._thread = threading.Thread(target=self._percentage_indicator)
self._thread.daemon = True
self._thread.start()
async def start_async(self):
"""Start the progress indicator as an asyncio task."""
if self._running:
return
self._running = True
self.start_time = time.time()
# Start the appropriate indicator
if self.style == "dots":
self._task = asyncio.create_task(self._dots_indicator_async())
elif self.style == "spinner":
self._task = asyncio.create_task(self._spinner_indicator_async())
elif self.style == "percentage":
self._task = asyncio.create_task(self._percentage_indicator_async())
def update(self, current: int):
"""Update the progress (for percentage style)."""
self.current = current
def stop(self):
"""Stop the progress indicator."""
if not self._running:
return
self._running = False
if self._thread:
self._thread.join(timeout=1.0)
# Clear the progress line and write a newline
sys.stdout.write("\r" + " " * 80 + "\r")
sys.stdout.flush()
async def stop_async(self):
"""Stop the progress indicator (async version)."""
if not self._running:
return
self._running = False
if self._task:
self._task.cancel()
try:
await self._task
except asyncio.CancelledError:
pass
# Clear the progress line and write a newline
sys.stdout.write("\r" + " " * 80 + "\r")
sys.stdout.flush()
def _dots_indicator(self):
"""Display an animated dots indicator."""
i = 0
while self._running:
dots = "." * (i % 4 + 1)
elapsed = time.time() - self.start_time
sys.stdout.write(f"\r{self.message}{dots:<4} ({elapsed:.1f}s)")
sys.stdout.flush()
time.sleep(0.5)
i += 1
async def _dots_indicator_async(self):
"""Display an animated dots indicator (async version)."""
i = 0
while self._running:
dots = "." * (i % 4 + 1)
elapsed = time.time() - self.start_time
sys.stdout.write(f"\r{self.message}{dots:<4} ({elapsed:.1f}s)")
sys.stdout.flush()
await asyncio.sleep(0.5)
i += 1
def _spinner_indicator(self):
"""Display a spinning cursor indicator."""
chars = "|/-\\"
i = 0
while self._running:
char = chars[i % len(chars)]
elapsed = time.time() - self.start_time
sys.stdout.write(f"\r{self.message} {char} ({elapsed:.1f}s)")
sys.stdout.flush()
time.sleep(0.1)
i += 1
async def _spinner_indicator_async(self):
"""Display a spinning cursor indicator (async version)."""
chars = "|/-\\"
i = 0
while self._running:
char = chars[i % len(chars)]
elapsed = time.time() - self.start_time
sys.stdout.write(f"\r{self.message} {char} ({elapsed:.1f}s)")
sys.stdout.flush()
await asyncio.sleep(0.1)
i += 1
def _percentage_indicator(self):
"""Display a percentage progress bar."""
while self._running:
percentage = min(100, int((self.current / self.total) * 100))
bar_length = 20
filled_length = int(bar_length * percentage // 100)
bar = '#' * filled_length + ' ' * (bar_length - filled_length)
elapsed = time.time() - self.start_time
# Estimate time remaining if we have progress
if percentage > 0:
remaining = elapsed * (100 - percentage) / percentage
sys.stdout.write(f"\r{self.message} [{bar}] {percentage}% ({elapsed:.1f}s elapsed, ~{remaining:.1f}s remaining)")
else:
sys.stdout.write(f"\r{self.message} [{bar}] {percentage}% ({elapsed:.1f}s elapsed)")
sys.stdout.flush()
time.sleep(0.2)
async def _percentage_indicator_async(self):
"""Display a percentage progress bar (async version)."""
while self._running:
percentage = min(100, int((self.current / self.total) * 100))
bar_length = 20
filled_length = int(bar_length * percentage // 100)
bar = '#' * filled_length + ' ' * (bar_length - filled_length)
elapsed = time.time() - self.start_time
# Estimate time remaining if we have progress
if percentage > 0:
remaining = elapsed * (100 - percentage) / percentage
sys.stdout.write(f"\r{self.message} [{bar}] {percentage}% ({elapsed:.1f}s elapsed, ~{remaining:.1f}s remaining)")
else:
sys.stdout.write(f"\r{self.message} [{bar}] {percentage}% ({elapsed:.1f}s elapsed)")
sys.stdout.flush()
await asyncio.sleep(0.2)
# Convenience function for running a task with progress indicator
def with_progress(func: Callable, *args, message: str = "Processing", style: str = "dots", **kwargs) -> Any:
"""
Run a function with a progress indicator.
Args:
func: Function to run
*args: Arguments to pass to the function
message: Message to display
style: Progress indicator style
**kwargs: Keyword arguments to pass to the function
Returns:
The result of the function
"""
progress = ProgressIndicator(message=message, style=style)
progress.start()
try:
result = func(*args, **kwargs)
return result
finally:
progress.stop()
# Async version of with_progress
async def with_progress_async(func: Callable, *args, message: str = "Processing", style: str = "dots", **kwargs) -> Any:
"""
Run an async function with a progress indicator.
Args:
func: Async function to run
*args: Arguments to pass to the function
message: Message to display
style: Progress indicator style
**kwargs: Keyword arguments to pass to the function
Returns:
The result of the function
"""
progress = ProgressIndicator(message=message, style=style)
await progress.start_async()
try:
result = await func(*args, **kwargs)
return result
finally:
await progress.stop_async()

185
app/utils/units.py Normal file
View file

@ -0,0 +1,185 @@
"""
Unit normalization and conversion for Kiwi inventory.
Source of truth: metric.
- Mass grams (g)
- Volume milliliters (ml)
- Count each (dimensionless)
All inventory quantities are stored in the base metric unit.
Conversion to display units happens at the API/frontend boundary.
Usage:
from app.utils.units import normalize_to_metric, convert_from_metric
# Normalise OCR input
qty, unit = normalize_to_metric(2.0, "lb") # → (907.18, "g")
qty, unit = normalize_to_metric(1.0, "gal") # → (3785.41, "ml")
qty, unit = normalize_to_metric(3.0, "each") # → (3.0, "each")
# Convert for display
display_qty, display_unit = convert_from_metric(907.18, "g", preferred="imperial")
# → (2.0, "lb")
"""
from __future__ import annotations
# ── Unit categories ───────────────────────────────────────────────────────────
MASS_UNITS: frozenset[str] = frozenset({"g", "kg", "mg", "lb", "lbs", "oz"})
VOLUME_UNITS: frozenset[str] = frozenset({
"ml", "l",
"fl oz", "floz", "fluid oz", "fluid ounce", "fluid ounces",
"cup", "cups", "pt", "pint", "pints",
"qt", "quart", "quarts", "gal", "gallon", "gallons",
})
COUNT_UNITS: frozenset[str] = frozenset({
"each", "ea", "pc", "pcs", "piece", "pieces",
"ct", "count", "item", "items",
"pk", "pack", "packs", "bag", "bags",
"bunch", "bunches", "head", "heads",
"can", "cans", "bottle", "bottles", "box", "boxes",
"jar", "jars", "tube", "tubes", "roll", "rolls",
"loaf", "loaves", "dozen",
})
# ── Conversion factors to base metric unit ────────────────────────────────────
# All values are: 1 <unit> = N <base_unit>
# Mass → grams
_TO_GRAMS: dict[str, float] = {
"g": 1.0,
"mg": 0.001,
"kg": 1_000.0,
"oz": 28.3495,
"lb": 453.592,
"lbs": 453.592,
}
# Volume → millilitres
_TO_ML: dict[str, float] = {
"ml": 1.0,
"l": 1_000.0,
"fl oz": 29.5735,
"floz": 29.5735,
"fluid oz": 29.5735,
"fluid ounce": 29.5735,
"fluid ounces": 29.5735,
"cup": 236.588,
"cups": 236.588,
"pt": 473.176,
"pint": 473.176,
"pints": 473.176,
"qt": 946.353,
"quart": 946.353,
"quarts": 946.353,
"gal": 3_785.41,
"gallon": 3_785.41,
"gallons": 3_785.41,
}
# ── Imperial display preferences ─────────────────────────────────────────────
# For convert_from_metric — which metric threshold triggers the next
# larger imperial unit. Keeps display numbers human-readable.
_IMPERIAL_MASS_THRESHOLDS: list[tuple[float, str, float]] = [
# (min grams, display unit, grams-per-unit)
(453.592, "lb", 453.592), # ≥ 1 lb → show in lb
(0.0, "oz", 28.3495), # otherwise → oz
]
_METRIC_MASS_THRESHOLDS: list[tuple[float, str, float]] = [
(1_000.0, "kg", 1_000.0),
(0.0, "g", 1.0),
]
_IMPERIAL_VOLUME_THRESHOLDS: list[tuple[float, str, float]] = [
(3_785.41, "gal", 3_785.41),
(946.353, "qt", 946.353),
(473.176, "pt", 473.176),
(236.588, "cup", 236.588),
(0.0, "fl oz", 29.5735),
]
_METRIC_VOLUME_THRESHOLDS: list[tuple[float, str, float]] = [
(1_000.0, "l", 1_000.0),
(0.0, "ml", 1.0),
]
# ── Public API ────────────────────────────────────────────────────────────────
def normalize_unit(raw: str) -> str:
"""Canonicalize a raw unit string (lowercase, stripped)."""
return raw.strip().lower()
def classify_unit(unit: str) -> str:
"""Return 'mass', 'volume', or 'count' for a canonical unit string."""
u = normalize_unit(unit)
if u in MASS_UNITS:
return "mass"
if u in VOLUME_UNITS:
return "volume"
return "count"
def normalize_to_metric(quantity: float, unit: str) -> tuple[float, str]:
"""Convert quantity + unit to the canonical metric base unit.
Returns (metric_quantity, base_unit) where base_unit is one of:
'g' grams (for all mass units)
'ml' millilitres (for all volume units)
'each' countable items (for everything else)
Unknown or ambiguous units (e.g. 'bag', 'bunch') are treated as count.
"""
u = normalize_unit(unit)
if u in _TO_GRAMS:
return round(quantity * _TO_GRAMS[u], 4), "g"
if u in _TO_ML:
return round(quantity * _TO_ML[u], 4), "ml"
# Count / ambiguous — store as-is
return quantity, "each"
def convert_from_metric(
quantity: float,
base_unit: str,
preferred: str = "metric",
) -> tuple[float, str]:
"""Convert a stored metric quantity to a display unit.
Args:
quantity: stored metric quantity
base_unit: 'g', 'ml', or 'each'
preferred: 'metric' or 'imperial'
Returns (display_quantity, display_unit).
Rounds to 2 decimal places.
"""
if base_unit == "each":
return quantity, "each"
thresholds: list[tuple[float, str, float]]
if base_unit == "g":
thresholds = (
_IMPERIAL_MASS_THRESHOLDS if preferred == "imperial"
else _METRIC_MASS_THRESHOLDS
)
elif base_unit == "ml":
thresholds = (
_IMPERIAL_VOLUME_THRESHOLDS if preferred == "imperial"
else _METRIC_VOLUME_THRESHOLDS
)
else:
return quantity, base_unit
for min_qty, display_unit, factor in thresholds:
if quantity >= min_qty:
return round(quantity / factor, 2), display_unit
return round(quantity, 2), base_unit

46
compose.cloud.yml Normal file
View file

@ -0,0 +1,46 @@
# Kiwi — cloud managed instance
# Project: kiwi-cloud (docker compose -f compose.cloud.yml -p kiwi-cloud ...)
# Web: http://127.0.0.1:8515 → menagerie.circuitforge.tech/kiwi (via Caddy + JWT auth)
# API: internal only on kiwi-cloud-net (nginx proxies /api/ → api:8512)
services:
api:
build:
context: ..
dockerfile: kiwi/Dockerfile
restart: unless-stopped
env_file: .env
environment:
CLOUD_MODE: "true"
CLOUD_DATA_ROOT: /devl/kiwi-cloud-data
# DIRECTUS_JWT_SECRET, HEIMDALL_URL, HEIMDALL_ADMIN_TOKEN — set in .env
# DEV ONLY: comma-separated IPs that bypass JWT auth (LAN testing without Caddy).
# Production deployments must NOT set this. Leave blank or omit entirely.
CLOUD_AUTH_BYPASS_IPS: ${CLOUD_AUTH_BYPASS_IPS:-}
volumes:
- /devl/kiwi-cloud-data:/devl/kiwi-cloud-data
# LLM config — shared with other CF products; read-only in container
- ${HOME}/.config/circuitforge:/root/.config/circuitforge:ro
networks:
- kiwi-cloud-net
web:
build:
context: .
dockerfile: docker/web/Dockerfile
args:
VITE_BASE_URL: /kiwi
VITE_API_BASE: /kiwi
restart: unless-stopped
ports:
- "8515:80"
volumes:
- ./docker/web/nginx.cloud.conf:/etc/nginx/conf.d/default.conf:ro
networks:
- kiwi-cloud-net
depends_on:
- api
networks:
kiwi-cloud-net:
driver: bridge

24
compose.override.yml Normal file
View file

@ -0,0 +1,24 @@
# compose.override.yml — local dev additions (auto-merged by docker compose)
# Not used in cloud or demo stacks (those use compose.cloud.yml / compose.demo.yml directly).
services:
# cf-orch agent sidecar: registers kiwi as a GPU node with the coordinator.
# The API scheduler uses COORDINATOR_URL to lease VRAM cooperatively; this
# agent makes kiwi's VRAM usage visible on the orchestrator dashboard.
cf-orch-agent:
image: kiwi-api # reuse local api image — cf-core already installed there
network_mode: host
env_file: .env
environment:
# Override coordinator URL here or via .env
COORDINATOR_URL: ${COORDINATOR_URL:-http://10.1.10.71:7700}
command: >
conda run -n kiwi cf-orch agent
--coordinator ${COORDINATOR_URL:-http://10.1.10.71:7700}
--node-id kiwi
--host 0.0.0.0
--port 7702
--advertise-host ${CF_ORCH_ADVERTISE_HOST:-10.1.10.71}
restart: unless-stopped
depends_on:
- api

21
compose.yml Normal file
View file

@ -0,0 +1,21 @@
services:
api:
build:
context: ..
dockerfile: kiwi/Dockerfile
network_mode: host
env_file: .env
volumes:
- ./data:/app/kiwi/data
- ${HOME}/.config/circuitforge:/root/.config/circuitforge:ro
restart: unless-stopped
web:
build:
context: .
dockerfile: docker/web/Dockerfile
ports:
- "8511:80"
restart: unless-stopped
depends_on:
- api

22
docker/web/Dockerfile Normal file
View file

@ -0,0 +1,22 @@
# Stage 1: build
FROM node:20-alpine AS build
WORKDIR /app
COPY frontend/package*.json ./
RUN npm ci --prefer-offline
COPY frontend/ ./
# Build-time env vars — Vite bakes these as static strings into the bundle.
# VITE_BASE_URL: URL prefix the app is served under (/ for dev, /kiwi for cloud)
# VITE_API_BASE: prefix for all /api/* fetch calls (empty for dev, /kiwi for cloud)
ARG VITE_BASE_URL=/
ARG VITE_API_BASE=
ENV VITE_BASE_URL=$VITE_BASE_URL
ENV VITE_API_BASE=$VITE_API_BASE
RUN npm run build
# Stage 2: serve
FROM nginx:alpine
COPY docker/web/nginx.conf /etc/nginx/conf.d/default.conf
COPY --from=build /app/dist /usr/share/nginx/html
EXPOSE 80

View file

@ -0,0 +1,43 @@
server {
listen 80;
server_name _;
root /usr/share/nginx/html;
index index.html;
# Proxy API requests to the FastAPI container via Docker bridge network.
location /api/ {
proxy_pass http://api:8512;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $http_x_forwarded_proto;
# Forward the session header injected by Caddy from cf_session cookie.
proxy_set_header X-CF-Session $http_x_cf_session;
# Allow image uploads (barcode/receipt photos from phone cameras).
client_max_body_size 20m;
}
# When accessed directly (localhost:8515) instead of via Caddy (/kiwi path-strip),
# Vite's /kiwi base URL means assets are requested at /kiwi/assets/... but stored
# at /assets/... in nginx's root. Alias /kiwi/ → root so direct port access works.
# ^~ prevents regex locations from overriding this prefix match for /kiwi/ paths.
location ^~ /kiwi/ {
alias /usr/share/nginx/html/;
try_files $uri $uri/ /index.html;
}
location = /index.html {
add_header Cache-Control "no-cache, no-store, must-revalidate";
try_files $uri /index.html;
}
location / {
try_files $uri $uri/ /index.html;
}
location ~* \.(js|css|png|jpg|jpeg|gif|ico|svg|woff2?)$ {
expires 1y;
add_header Cache-Control "public, immutable";
}
}

29
docker/web/nginx.conf Normal file
View file

@ -0,0 +1,29 @@
server {
listen 80;
server_name _;
root /usr/share/nginx/html;
index index.html;
location /api/ {
proxy_pass http://172.17.0.1:8512;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
# Allow image uploads (barcode/receipt photos from phone cameras).
client_max_body_size 20m;
}
location = /index.html {
add_header Cache-Control "no-cache, no-store, must-revalidate";
try_files $uri /index.html;
}
location / {
try_files $uri $uri/ /index.html;
}
location ~* \.(js|css|png|jpg|jpeg|gif|ico|svg|woff2?)$ {
expires 1y;
add_header Cache-Control "public, immutable";
}
}

27
environment.yml Normal file
View file

@ -0,0 +1,27 @@
name: kiwi
channels:
- conda-forge
- defaults
dependencies:
- python=3.11
- pip
- pip:
- fastapi>=0.110
- uvicorn[standard]>=0.27
- python-multipart>=0.0.9
- aiofiles>=23.0
- opencv-python>=4.8
- numpy>=1.25
- pyzbar>=0.1.9
- httpx>=0.27
- psutil>=5.9
- pydantic>=2.5
- PyJWT>=2.8
- datasets
- huggingface_hub
- transformers
- sentence-transformers
- torch
- pyyaml
- pandas
- pyarrow

Some files were not shown because too many files have changed in this diff Show more