feat: initial cf-vision scaffold — ImageFrame API, stub inference modules

- cf_vision/models.py: ImageFrame + ImageElement + BoundingBox (MIT) Full Dolphin-v2 element taxonomy (21 types), convenience accessors (text_blocks, barcodes, tables, full_text) - cf_vision/router.py: VisionRouter — mock + real paths, task routing (document, barcode, receipt, general) - cf_vision/barcode.py: BarcodeScanner — pyzbar wrapper, CPU-only, MIT - cf_vision/ocr.py: DolphinOCR — ByteDance/Dolphin-v2 async stub (BSL 1.1) - cf_vision/receipt.py: ReceiptParser stub — Kiwi Phase 2 target (BSL 1.1) - cf_vision/camera.py: CameraCapture — OpenCV single-frame capture (MIT) - pyproject.toml: inference / barcode / camera optional extras - .env.example: HF_TOKEN, CF_VISION_DEVICE, CF_VISION_MOCK - README: module map, ImageFrame API reference, consumer roadmap - tests: 6 passing (ImageFrame accessors, VisionRouter mock/real) Extracted from circuitforge_core.vision per cf-core#36.
2026-04-06 17:59:00 -07:00 · 2026-04-06 17:59:00 -07:00 · 353525c1f4
commit 353525c1f4
13 changed files with 764 additions and 0 deletions
--- a/.env.example
+++ b/.env.example
@ -0,0 +1,20 @@
+# cf-vision environment — copy to .env and fill in values
+# cf-vision does not auto-load .env; consumers load it in their own startup.
+
+# ── Dolphin-v2 document parser ────────────────────────────────────────────────
+# HuggingFace model: ByteDance/Dolphin-v2
+# Requires ~8GB VRAM. Download cached automatically on first use.
+# Get a token at https://huggingface.co/settings/tokens
+HF_TOKEN=
+
+# ── Compute ───────────────────────────────────────────────────────────────────
+# auto (detect GPU), cuda, cpu
+CF_VISION_DEVICE=auto
+
+# ── Mock mode ─────────────────────────────────────────────────────────────────
+# Set to 1 to use synthetic ImageFrame responses — no GPU or camera required.
+CF_VISION_MOCK=
+
+# ── OCR confidence threshold ──────────────────────────────────────────────────
+# Results below this are marked low-confidence in the ImageFrame output.
+CF_VISION_CONFIDENCE_THRESHOLD=0.7
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,4 @@
+.env
+__pycache__/
+*.egg-info/
+.pytest_cache/
--- a/README.md
+++ b/README.md
@ -0,0 +1,100 @@
+# cf-vision
+
+CircuitForge vision pipeline. Produces `ImageFrame` objects from image sources -- documents, barcodes, receipts, camera captures -- using Dolphin-v2 (local, Free tier) or Claude vision (cloud, Paid tier).
+
+**Status:** Stub. `VisionRouter` and `BarcodeScanner` API surface locked; real Dolphin-v2 inference lands with Kiwi Phase 2.
+
+## Install
+
+```bash
+# Stub / mock mode (no GPU required)
+pip install -e ../cf-vision
+
+# Real document OCR (Dolphin-v2, ~8GB VRAM)
+pip install -e "../cf-vision[inference]"
+
+# Barcode / QR scanning (CPU, no GPU)
+pip install -e "../cf-vision[barcode]"
+
+# Camera capture
+pip install -e "../cf-vision[camera]"
+```
+
+Copy `.env.example` to `.env` and fill in `HF_TOKEN` for Dolphin-v2.
+
+## Quick start
+
+```python
+from cf_vision.router import VisionRouter
+
+# Mock mode (no hardware needed)
+router = VisionRouter(mock=True)
+frame = router.analyze(image_bytes, task="document")
+for element in frame.elements:
+    print(element.element_type, element.text)
+
+# Real mode (requires [inference] extras)
+router = VisionRouter.from_env()  # reads CF_VISION_MOCK, CF_VISION_DEVICE
+```
+
+---
+
+## ImageFrame
+
+The primary output type. MIT licensed.
+
+```python
+@dataclass
+class ImageFrame:
+    source: Literal["camera", "upload", "url", "mock"]
+    image_bytes: bytes | None
+    elements: list[ImageElement]
+    width_px: int
+    height_px: int
+    model: str              # "dolphin-v2", "pyzbar", "claude", "mock"
+
+    def text_blocks() -> list[ImageElement]
+    def barcodes()    -> list[ImageElement]
+    def tables()      -> list[ImageElement]
+    def full_text(separator="\n") -> str
+```
+
+### ImageElement
+
+```python
+@dataclass
+class ImageElement:
+    element_type: ElementType   # one of 21 Dolphin-v2 types
+    text: str
+    confidence: float           # 0.0–1.0
+    bbox: BoundingBox | None
+    metadata: dict              # e.g. {"format": "EAN13"} for barcodes
+```
+
+### ElementType (21 types from Dolphin-v2)
+
+`title` · `plain_text` · `text_block` · `header` · `footer` · `table` · `table_caption` · `table_footnote` · `figure` · `figure_caption` · `isolate_formula` · `formula_caption` · `inline_formula` · `page_number` · `seal` · `handwriting` · `barcode` · `qr_code` · `signature` · `watermark` · `abandon`
+
+---
+
+## Module structure
+
+| Module | License | Purpose |
+|--------|---------|---------|
+| `cf_vision.models` | MIT | `ImageFrame`, `ImageElement`, `BoundingBox` |
+| `cf_vision.router` | BSL 1.1* | `VisionRouter` — routes to local or cloud model |
+| `cf_vision.barcode` | MIT | `BarcodeScanner` — pyzbar wrapper, no GPU |
+| `cf_vision.ocr` | BSL 1.1 | `DolphinOCR` — Dolphin-v2 async wrapper |
+| `cf_vision.receipt` | BSL 1.1 | `ReceiptParser` — line-item extraction (stub) |
+| `cf_vision.camera` | MIT | `CameraCapture` — OpenCV frame capture |
+
+*BSL applies to inference modules. Models + barcode + camera = MIT.
+
+---
+
+## Consumed by
+
+- `Circuit-Forge/kiwi` — barcode scan + receipt OCR (Phase 2, primary consumer)
+- `Circuit-Forge/peregrine` — resume document parsing
+- `Circuit-Forge/falcon` (planned) — government form scanning
+- `Circuit-Forge/godwit` (planned) — emergency identity document capture
--- a/cf_vision/init.py
+++ b/cf_vision/init.py
@ -0,0 +1,7 @@
+"""
+cf-vision — CircuitForge vision pipeline.
+
+Primary API surface:
+    from cf_vision.models import ImageFrame
+    from cf_vision.router import VisionRouter
+"""
--- a/cf_vision/barcode.py
+++ b/cf_vision/barcode.py
@ -0,0 +1,85 @@
+# cf_vision/barcode.py — barcode and QR code scanning
+#
+# MIT licensed. Uses pyzbar (libzbar wrapper) — no GPU required.
+# Requires [barcode] extras: pip install cf-vision[barcode]
+#
+# Primary consumer: Kiwi (pantry item lookup by UPC/EAN barcode scan)
+from __future__ import annotations
+
+import logging
+from typing import Literal
+
+from cf_vision.models import BoundingBox, ImageElement, ImageFrame
+
+logger = logging.getLogger(__name__)
+
+BarcodeFormat = Literal[
+    "EAN13", "EAN8", "UPCA", "UPCE", "CODE128", "CODE39",
+    "QR_CODE", "PDF417", "DATAMATRIX", "ITF", "CODABAR",
+]
+
+
+class BarcodeScanner:
+    """
+    Lightweight barcode and QR code scanner using pyzbar.
+
+    No GPU required. Works on CPU with ~5ms per image.
+
+    Usage
+    -----
+        scanner = BarcodeScanner()
+        frame = scanner.scan(image_bytes)
+        for b in frame.barcodes():
+            print(b.text, b.metadata["format"])
+
+    Requires: pip install cf-vision[barcode]
+    """
+
+    def scan(self, image_bytes: bytes) -> ImageFrame:
+        """
+        Scan image_bytes for barcodes and QR codes.
+
+        Returns an ImageFrame with element_type "barcode" or "qr_code" for
+        each detected code. Elements include decoded text and bounding box.
+        """
+        try:
+            from pyzbar.pyzbar import decode as pyzbar_decode
+            from PIL import Image
+            import io
+        except ImportError as exc:
+            raise ImportError(
+                "pyzbar and Pillow are required for barcode scanning. "
+                "Install with: pip install cf-vision[barcode]"
+            ) from exc
+
+        img = Image.open(io.BytesIO(image_bytes)).convert("RGB")
+        w, h = img.size
+        decoded = pyzbar_decode(img)
+
+        elements: list[ImageElement] = []
+        for symbol in decoded:
+            fmt = symbol.type.upper()
+            el_type = "qr_code" if fmt == "QRCODE" else "barcode"
+            rect = symbol.rect
+            bbox = BoundingBox(
+                x=rect.left / w,
+                y=rect.top / h,
+                width=rect.width / w,
+                height=rect.height / h,
+            )
+            elements.append(ImageElement(
+                element_type=el_type,
+                text=symbol.data.decode("utf-8", errors="replace"),
+                confidence=1.0,   # pyzbar doesn't give confidence scores
+                bbox=bbox,
+                metadata={"format": fmt},
+            ))
+
+        return ImageFrame(
+            source="upload",
+            image_bytes=image_bytes,
+            elements=elements,
+            width_px=w,
+            height_px=h,
+            model="pyzbar",
+        )
--- a/cf_vision/camera.py
+++ b/cf_vision/camera.py
@ -0,0 +1,79 @@
+# cf_vision/camera.py — camera capture and preprocessing
+#
+# MIT licensed. Uses OpenCV for capture; no GPU required for capture itself.
+# Requires [camera] extras: pip install cf-vision[camera]
+#
+# Planned consumers:
+#   Kiwi    — live barcode scan from phone camera
+#   Godwit  — fingerprint/ID document capture for emergency identity recovery
+from __future__ import annotations
+
+import asyncio
+import logging
+
+logger = logging.getLogger(__name__)
+
+
+class CameraCapture:
+    """
+    Single-frame camera capture with preprocessing.
+
+    Captures one frame from a camera device, normalises it to JPEG bytes
+    suitable for VisionRouter.analyze() or BarcodeScanner.scan().
+
+    Usage
+    -----
+        capture = CameraCapture(device_index=0)
+        jpeg_bytes = await capture.capture_async()
+        frame = router.analyze(jpeg_bytes, task="barcode")
+
+    Requires: pip install cf-vision[camera]
+    """
+
+    def __init__(
+        self,
+        device_index: int = 0,
+        width: int = 1280,
+        height: int = 720,
+        jpeg_quality: int = 92,
+    ) -> None:
+        self._device_index = device_index
+        self._width = width
+        self._height = height
+        self._jpeg_quality = jpeg_quality
+
+    def capture(self) -> bytes:
+        """
+        Capture one frame and return JPEG bytes.
+        Stub: raises NotImplementedError until Kiwi Phase 2.
+        """
+        try:
+            import cv2
+        except ImportError as exc:
+            raise ImportError(
+                "OpenCV is required for camera capture. "
+                "Install with: pip install cf-vision[camera]"
+            ) from exc
+
+        cap = cv2.VideoCapture(self._device_index)
+        try:
+            cap.set(cv2.CAP_PROP_FRAME_WIDTH, self._width)
+            cap.set(cv2.CAP_PROP_FRAME_HEIGHT, self._height)
+            ok, frame = cap.read()
+            if not ok or frame is None:
+                raise RuntimeError(
+                    f"Camera device {self._device_index} did not return a frame."
+                )
+            ok, buf = cv2.imencode(
+                ".jpg", frame, [cv2.IMWRITE_JPEG_QUALITY, self._jpeg_quality]
+            )
+            if not ok:
+                raise RuntimeError("JPEG encoding failed")
+            return bytes(buf)
+        finally:
+            cap.release()
+
+    async def capture_async(self) -> bytes:
+        """capture() without blocking the event loop."""
+        loop = asyncio.get_event_loop()
+        return await loop.run_in_executor(None, self.capture)
--- a/cf_vision/models.py
+++ b/cf_vision/models.py
@ -0,0 +1,90 @@
+# cf_vision/models.py — ImageFrame API contract
+#
+# MIT licensed. All consumers (Kiwi, Peregrine, Falcon, Godwit) import
+# ImageFrame from here so the shape is consistent across the stack.
+from __future__ import annotations
+
+from dataclasses import dataclass, field
+from typing import Literal
+
+ElementType = Literal[
+    # Dolphin-v2 element taxonomy (21 types)
+    "title", "plain_text", "abandon", "figure", "figure_caption",
+    "table", "table_caption", "table_footnote", "isolate_formula",
+    "formula_caption", "text_block", "inline_formula", "header",
+    "footer", "page_number", "seal", "handwriting", "barcode",
+    "qr_code", "signature", "watermark",
+]
+
+
+@dataclass
+class BoundingBox:
+    """Pixel coordinates of a detected element, relative to the source image."""
+    x: float      # left edge, 0.0–1.0 (normalised) or pixels if absolute=True
+    y: float      # top edge
+    width: float
+    height: float
+    absolute: bool = False   # True when coordinates are in pixels
+
+
+@dataclass
+class ImageElement:
+    """
+    A single structured element extracted from an image.
+
+    Produced by cf_vision.ocr (Dolphin-v2) or cf_vision.barcode.
+    Consumers iterate over ImageFrame.elements to reconstruct document structure.
+    """
+    element_type: ElementType
+    text: str                         # extracted text, or empty for non-text types
+    confidence: float                 # 0.0–1.0
+    bbox: BoundingBox | None = None   # None when position is unknown
+    metadata: dict = field(default_factory=dict)
+
+
+@dataclass
+class ImageFrame:
+    """
+    A fully analysed image from cf-vision.
+
+    Produced by VisionRouter.analyze() and consumed by products that need
+    structured content from image sources (receipts, barcodes, documents,
+    camera captures).
+
+    Fields
+    ------
+    source          How the image arrived: "camera" | "upload" | "url" | "mock"
+    image_bytes     Original image bytes (JPEG/PNG). None when source="mock".
+    elements        Ordered list of extracted elements (top-to-bottom, left-to-right).
+    width_px        Source image dimensions, or 0 if unknown.
+    height_px
+    model           Model that produced the elements, e.g. "dolphin-v2", "mock".
+    """
+    source: Literal["camera", "upload", "url", "mock"]
+    image_bytes: bytes | None
+    elements: list[ImageElement] = field(default_factory=list)
+    width_px: int = 0
+    height_px: int = 0
+    model: str = "stub"
+
+    # ── Convenience accessors ─────────────────────────────────────────────────
+
+    def text_blocks(self) -> list[ImageElement]:
+        """All elements that carry text, in document order."""
+        text_types = {
+            "title", "plain_text", "text_block", "header", "footer",
+            "table_caption", "figure_caption", "handwriting",
+        }
+        return [e for e in self.elements if e.element_type in text_types]
+
+    def barcodes(self) -> list[ImageElement]:
+        """All barcode and QR code elements."""
+        return [e for e in self.elements if e.element_type in ("barcode", "qr_code")]
+
+    def tables(self) -> list[ImageElement]:
+        """All table elements."""
+        return [e for e in self.elements if e.element_type == "table"]
+
+    def full_text(self, separator: str = "\n") -> str:
+        """Concatenated text from all text-bearing elements."""
+        return separator.join(e.text for e in self.text_blocks() if e.text)
--- a/cf_vision/ocr.py
+++ b/cf_vision/ocr.py
@ -0,0 +1,109 @@
+# cf_vision/ocr.py — Dolphin-v2 document parser
+#
+# BSL 1.1: real inference. Requires [inference] extras + ~8GB VRAM.
+# Stub: raises NotImplementedError until Kiwi Phase 2 wires in the model.
+#
+# Model: ByteDance/Dolphin-v2
+# Handles 21 element types: title, plain_text, table, figure, barcode,
+# handwriting, formula, signature, watermark, and more.
+# Reference: https://huggingface.co/ByteDance/Dolphin-v2
+from __future__ import annotations
+
+import asyncio
+import logging
+import os
+from functools import partial
+
+from cf_vision.models import ImageFrame
+
+logger = logging.getLogger(__name__)
+
+_DOLPHIN_MODEL_ID = "ByteDance/Dolphin"   # HuggingFace model ID
+
+
+class DolphinOCR:
+    """
+    Async wrapper around Dolphin-v2 for structured document parsing.
+
+    Loads the model lazily on first call. Runs in a thread pool executor
+    so it never blocks the asyncio event loop (~200ms–2s per page on A100).
+
+    Usage
+    -----
+        ocr = DolphinOCR.from_env()
+        frame = await ocr.parse_async(image_bytes)
+        for element in frame.elements:
+            print(element.element_type, element.text[:80])
+
+    Navigation note: Dolphin-v2 returns elements in reading order
+    (top-to-bottom, left-to-right). Use ImageFrame.full_text() for a
+    plain concatenation or iterate elements for structured access.
+
+    Consumer roadmap:
+        Kiwi Phase 2  — receipt line-item extraction
+        Peregrine     — resume document parsing
+        Falcon        — government form scanning
+        Godwit        — identity document recovery
+    """
+
+    def __init__(self, device: str = "auto") -> None:
+        self._device = device
+        self._model = None
+        self._processor = None
+
+    @classmethod
+    def from_env(cls) -> "DolphinOCR":
+        return cls(device=os.environ.get("CF_VISION_DEVICE", "auto"))
+
+    def _load(self) -> None:
+        if self._model is not None:
+            return
+        try:
+            from transformers import AutoModelForCausalLM, AutoProcessor
+            import torch
+        except ImportError as exc:
+            raise ImportError(
+                "Dolphin-v2 requires [inference] extras: "
+                "pip install cf-vision[inference]"
+            ) from exc
+
+        device = self._device
+        if device == "auto":
+            device = "cuda" if _cuda_available() else "cpu"
+
+        hf_token = os.environ.get("HF_TOKEN") or None
+        logger.info("Loading Dolphin-v2 on %s", device)
+        self._processor = AutoProcessor.from_pretrained(
+            _DOLPHIN_MODEL_ID, token=hf_token
+        )
+        self._model = AutoModelForCausalLM.from_pretrained(
+            _DOLPHIN_MODEL_ID,
+            token=hf_token,
+            torch_dtype="auto",
+            device_map=device,
+        )
+
+    def parse(self, image_bytes: bytes) -> ImageFrame:
+        """
+        Parse document image bytes into a structured ImageFrame.
+
+        Stub: raises NotImplementedError. Real implementation coming in Kiwi Phase 2.
+        """
+        self._load()
+        raise NotImplementedError(
+            "DolphinOCR.parse() is not yet implemented. "
+            "Tracking: Kiwi Phase 2 / cf-vision#TBD"
+        )
+
+    async def parse_async(self, image_bytes: bytes) -> ImageFrame:
+        """parse() without blocking the event loop."""
+        loop = asyncio.get_event_loop()
+        return await loop.run_in_executor(None, self.parse, image_bytes)
+
+
+def _cuda_available() -> bool:
+    try:
+        import torch
+        return torch.cuda.is_available()
+    except ImportError:
+        return False
--- a/cf_vision/receipt.py
+++ b/cf_vision/receipt.py
@ -0,0 +1,44 @@
+# cf_vision/receipt.py — receipt line-item extraction
+#
+# BSL 1.1: real inference. Dolphin-v2 + post-processing.
+# Stub: raises NotImplementedError until Kiwi Phase 2.
+#
+# Planned pipeline:
+#   DolphinOCR.parse(image_bytes)   → ImageFrame with table/text elements
+#   ReceiptParser.extract(frame)    → list[LineItem]
+#   ProductResolver.resolve(items)  → matched pantry items (Kiwi-specific)
+from __future__ import annotations
+
+from dataclasses import dataclass, field
+
+
+@dataclass
+class LineItem:
+    """A single line item extracted from a receipt."""
+    name: str
+    quantity: float = 1.0
+    unit: str = ""           # "g", "ml", "oz", "each", etc.
+    price: float | None = None
+    barcode: str | None = None
+    confidence: float = 0.0
+
+
+class ReceiptParser:
+    """
+    Extract line items from a receipt ImageFrame.
+
+    Stub: raises NotImplementedError until Kiwi Phase 2.
+    Consumer: Kiwi Phase 2 pantry auto-population from receipt photos.
+
+    Real pipeline:
+        1. DolphinOCR produces an ImageFrame with table rows and text blocks
+        2. ReceiptParser identifies the items section (skip header/footer/totals)
+        3. Per-row NLP extracts name, quantity, unit, price
+        4. Optional: barcode lookup if any barcode elements present
+    """
+
+    def extract(self, frame: "ImageFrame") -> list[LineItem]:  # type: ignore[name-defined]
+        raise NotImplementedError(
+            "ReceiptParser.extract() is not yet implemented. "
+            "Tracking: Kiwi Phase 2 / cf-vision#TBD"
+        )
--- a/cf_vision/router.py
+++ b/cf_vision/router.py
@ -0,0 +1,107 @@
+# cf_vision/router.py — VisionRouter, the primary consumer API
+#
+# BSL 1.1 when real inference models are integrated (Dolphin-v2, Claude vision).
+# Currently a stub: analyze() raises NotImplementedError unless mock=True.
+from __future__ import annotations
+
+import os
+from typing import Literal
+
+from cf_vision.models import ImageFrame, ImageElement, BoundingBox
+
+_MOCK_ELEMENTS = [
+    ImageElement(
+        element_type="title",
+        text="[Mock document title]",
+        confidence=0.99,
+        bbox=BoundingBox(x=0.05, y=0.02, width=0.9, height=0.06),
+    ),
+    ImageElement(
+        element_type="plain_text",
+        text="[Mock paragraph — real content requires cf-vision[inference] and a vision model.]",
+        confidence=0.95,
+        bbox=BoundingBox(x=0.05, y=0.12, width=0.9, height=0.08),
+    ),
+]
+
+
+class VisionRouter:
+    """
+    Routes image analysis requests to local or cloud vision models.
+
+    Local models (Free tier):
+      - Dolphin-v2 (ByteDance) — universal document parser, 21 element types
+      - pyzbar — barcode / QR code scanning (no GPU required)
+
+    Cloud fallback (Paid tier):
+      - Claude vision API — general-purpose image understanding
+
+    Usage
+    -----
+        router = VisionRouter.from_env()
+        frame = router.analyze(image_bytes, task="document")
+        for element in frame.elements:
+            print(element.element_type, element.text)
+    """
+
+    def __init__(
+        self,
+        mock: bool = False,
+        device: str = "auto",
+    ) -> None:
+        self._mock = mock
+        self._device = device
+
+    @classmethod
+    def from_env(cls) -> "VisionRouter":
+        """Construct from CF_VISION_MOCK and CF_VISION_DEVICE env vars."""
+        mock = os.environ.get("CF_VISION_MOCK", "") == "1"
+        device = os.environ.get("CF_VISION_DEVICE", "auto")
+        return cls(mock=mock, device=device)
+
+    def analyze(
+        self,
+        image_bytes: bytes,
+        task: Literal["document", "barcode", "receipt", "general"] = "document",
+        prompt: str = "",
+    ) -> ImageFrame:
+        """
+        Analyse image_bytes and return a structured ImageFrame.
+
+        task:
+          "document"  — full document parsing via Dolphin-v2 (all 21 element types)
+          "barcode"   — barcode / QR code extraction via pyzbar (lightweight)
+          "receipt"   — receipt line-item extraction (Dolphin-v2 + post-processing)
+          "general"   — general image understanding via Claude vision (cloud, Paid tier)
+
+        Stub: raises NotImplementedError unless CF_VISION_MOCK=1 or mock=True.
+        Real implementation lands with Kiwi Phase 2 (cf_vision.ocr, cf_vision.barcode).
+        """
+        if self._mock:
+            return self._mock_frame(image_bytes, task)
+
+        raise NotImplementedError(
+            "VisionRouter real inference is not yet implemented. "
+            "Set CF_VISION_MOCK=1 or mock=True to use synthetic frames. "
+            "Real analysis requires: pip install cf-vision[inference]"
+        )
+
+    def _mock_frame(self, image_bytes: bytes, task: str) -> ImageFrame:
+        from cf_vision.models import ImageElement, BoundingBox
+        if task == "barcode":
+            elements = [
+                ImageElement(
+                    element_type="barcode",
+                    text="0123456789012",
+                    confidence=0.99,
+                    metadata={"format": "EAN13"},
+                )
+            ]
+        else:
+            elements = list(_MOCK_ELEMENTS)
+        return ImageFrame(
+            source="mock",
+            image_bytes=None,
+            elements=elements,
+            model="mock",
+        )
--- a/pyproject.toml
+++ b/pyproject.toml
@ -0,0 +1,48 @@
+[build-system]
+requires = ["setuptools>=68"]
+build-backend = "setuptools.build_meta"
+
+[project]
+name = "cf-vision"
+version = "0.1.0"
+description = "CircuitForge vision pipeline — ImageFrame API, OCR, barcode, receipt extraction"
+readme = "README.md"
+requires-python = ">=3.11"
+license = {text = "MIT"}
+dependencies = [
+    "pydantic>=2.0",
+]
+
+[project.optional-dependencies]
+# Real inference backends — not required for stub/mock mode
+inference = [
+    "torch>=2.0",
+    "torchvision>=0.15",
+    "numpy>=1.24",
+    "Pillow>=10.0",
+    "transformers>=4.40",
+    "python-dotenv>=1.0",
+]
+# Barcode / QR scanning
+barcode = [
+    "pyzbar>=0.1.9",
+    "Pillow>=10.0",
+]
+# Camera capture
+camera = [
+    "opencv-python>=4.8",
+]
+dev = [
+    "pytest>=8.0",
+    "pytest-asyncio>=0.23",
+    "Pillow>=10.0",
+    "numpy>=1.24",
+]
+
+[tool.setuptools.packages.find]
+where = ["."]
+include = ["cf_vision*"]
+
+[tool.pytest.ini_options]
+testpaths = ["tests"]
+asyncio_mode = "auto"
--- a/tests/init.py
+++ b/tests/init.py
--- a/tests/test_models.py
+++ b/tests/test_models.py
@ -0,0 +1,71 @@
+"""Tests for ImageFrame API contract."""
+import pytest
+from cf_vision.models import BoundingBox, ImageElement, ImageFrame
+
+
+def test_imageframe_text_blocks():
+    frame = ImageFrame(
+        source="mock",
+        image_bytes=None,
+        elements=[
+            ImageElement(element_type="title", text="My Doc", confidence=0.99),
+            ImageElement(element_type="barcode", text="123456", confidence=1.0),
+            ImageElement(element_type="plain_text", text="Body text.", confidence=0.9),
+        ],
+        model="mock",
+    )
+    blocks = frame.text_blocks()
+    assert len(blocks) == 2
+    assert all(b.element_type in ("title", "plain_text") for b in blocks)
+
+
+def test_imageframe_barcodes():
+    frame = ImageFrame(
+        source="mock",
+        image_bytes=None,
+        elements=[
+            ImageElement(element_type="barcode", text="0123456789012", confidence=1.0,
+                         metadata={"format": "EAN13"}),
+        ],
+        model="mock",
+    )
+    barcodes = frame.barcodes()
+    assert len(barcodes) == 1
+    assert barcodes[0].text == "0123456789012"
+
+
+def test_imageframe_full_text():
+    frame = ImageFrame(
+        source="mock",
+        image_bytes=None,
+        elements=[
+            ImageElement(element_type="title", text="Title", confidence=0.99),
+            ImageElement(element_type="plain_text", text="Paragraph.", confidence=0.9),
+        ],
+        model="mock",
+    )
+    assert frame.full_text() == "Title\nParagraph."
+
+
+def test_visionrouter_mock():
+    from cf_vision.router import VisionRouter
+    router = VisionRouter(mock=True)
+    frame = router.analyze(b"fake_image_bytes", task="document")
+    assert frame.source == "mock"
+    assert len(frame.elements) > 0
+
+
+def test_visionrouter_mock_barcode():
+    from cf_vision.router import VisionRouter
+    router = VisionRouter(mock=True)
+    frame = router.analyze(b"fake", task="barcode")
+    barcodes = frame.barcodes()
+    assert len(barcodes) == 1
+    assert barcodes[0].element_type == "barcode"
+
+
+def test_visionrouter_real_raises():
+    from cf_vision.router import VisionRouter
+    router = VisionRouter(mock=False)
+    with pytest.raises(NotImplementedError):
+        router.analyze(b"fake")