# cf-vision CircuitForge vision pipeline. Produces `ImageFrame` objects from image sources -- documents, barcodes, receipts, camera captures -- using Dolphin-v2 (local, Free tier) or Claude vision (cloud, Paid tier). **Status:** Stub. `VisionRouter` and `BarcodeScanner` API surface locked; real Dolphin-v2 inference lands with Kiwi Phase 2. ## Install ```bash # Stub / mock mode (no GPU required) pip install -e ../cf-vision # Real document OCR (Dolphin-v2, ~8GB VRAM) pip install -e "../cf-vision[inference]" # Barcode / QR scanning (CPU, no GPU) pip install -e "../cf-vision[barcode]" # Camera capture pip install -e "../cf-vision[camera]" ``` Copy `.env.example` to `.env` and fill in `HF_TOKEN` for Dolphin-v2. ## Quick start ```python from cf_vision.router import VisionRouter # Mock mode (no hardware needed) router = VisionRouter(mock=True) frame = router.analyze(image_bytes, task="document") for element in frame.elements: print(element.element_type, element.text) # Real mode (requires [inference] extras) router = VisionRouter.from_env() # reads CF_VISION_MOCK, CF_VISION_DEVICE ``` --- ## ImageFrame The primary output type. MIT licensed. ```python @dataclass class ImageFrame: source: Literal["camera", "upload", "url", "mock"] image_bytes: bytes | None elements: list[ImageElement] width_px: int height_px: int model: str # "dolphin-v2", "pyzbar", "claude", "mock" def text_blocks() -> list[ImageElement] def barcodes() -> list[ImageElement] def tables() -> list[ImageElement] def full_text(separator="\n") -> str ``` ### ImageElement ```python @dataclass class ImageElement: element_type: ElementType # one of 21 Dolphin-v2 types text: str confidence: float # 0.0–1.0 bbox: BoundingBox | None metadata: dict # e.g. {"format": "EAN13"} for barcodes ``` ### ElementType (21 types from Dolphin-v2) `title` · `plain_text` · `text_block` · `header` · `footer` · `table` · `table_caption` · `table_footnote` · `figure` · `figure_caption` · `isolate_formula` · `formula_caption` · `inline_formula` · `page_number` · `seal` · `handwriting` · `barcode` · `qr_code` · `signature` · `watermark` · `abandon` --- ## Module structure | Module | License | Purpose | |--------|---------|---------| | `cf_vision.models` | MIT | `ImageFrame`, `ImageElement`, `BoundingBox` | | `cf_vision.router` | BSL 1.1* | `VisionRouter` — routes to local or cloud model | | `cf_vision.barcode` | MIT | `BarcodeScanner` — pyzbar wrapper, no GPU | | `cf_vision.ocr` | BSL 1.1 | `DolphinOCR` — Dolphin-v2 async wrapper | | `cf_vision.receipt` | BSL 1.1 | `ReceiptParser` — line-item extraction (stub) | | `cf_vision.camera` | MIT | `CameraCapture` — OpenCV frame capture | *BSL applies to inference modules. Models + barcode + camera = MIT. --- ## Consumed by - `Circuit-Forge/kiwi` — barcode scan + receipt OCR (Phase 2, primary consumer) - `Circuit-Forge/peregrine` — resume document parsing - `Circuit-Forge/falcon` (planned) — government form scanning - `Circuit-Forge/godwit` (planned) — emergency identity document capture