cf-vision/README.md

# cf-vision

CircuitForge vision pipeline. Produces `ImageFrame` objects from image sources -- documents, barcodes, receipts, camera captures -- using Dolphin-v2 (local, Free tier) or Claude vision (cloud, Paid tier).

**Status:** Stub. `VisionRouter` and `BarcodeScanner` API surface locked; real Dolphin-v2 inference lands with Kiwi Phase 2.

## Install

```bash
# Stub / mock mode (no GPU required)
pip install -e ../cf-vision

# Real document OCR (Dolphin-v2, ~8GB VRAM)
pip install -e "../cf-vision[inference]"

# Barcode / QR scanning (CPU, no GPU)
pip install -e "../cf-vision[barcode]"

# Camera capture
pip install -e "../cf-vision[camera]"
```

Copy `.env.example` to `.env` and fill in `HF_TOKEN` for Dolphin-v2.

## Quick start

```python
from cf_vision.router import VisionRouter

# Mock mode (no hardware needed)
router = VisionRouter(mock=True)
frame = router.analyze(image_bytes, task="document")
for element in frame.elements:
    print(element.element_type, element.text)

# Real mode (requires [inference] extras)
router = VisionRouter.from_env()  # reads CF_VISION_MOCK, CF_VISION_DEVICE
```

---

## ImageFrame

The primary output type. MIT licensed.

```python
@dataclass
class ImageFrame:
    source: Literal["camera", "upload", "url", "mock"]
    image_bytes: bytes | None
    elements: list[ImageElement]
    width_px: int
    height_px: int
    model: str              # "dolphin-v2", "pyzbar", "claude", "mock"

    def text_blocks() -> list[ImageElement]
    def barcodes()    -> list[ImageElement]
    def tables()      -> list[ImageElement]
    def full_text(separator="\n") -> str
```

### ImageElement

```python
@dataclass
class ImageElement:
    element_type: ElementType   # one of 21 Dolphin-v2 types
    text: str
    confidence: float           # 0.0–1.0
    bbox: BoundingBox | None
    metadata: dict              # e.g. {"format": "EAN13"} for barcodes
```

### ElementType (21 types from Dolphin-v2)

`title` · `plain_text` · `text_block` · `header` · `footer` · `table` · `table_caption` · `table_footnote` · `figure` · `figure_caption` · `isolate_formula` · `formula_caption` · `inline_formula` · `page_number` · `seal` · `handwriting` · `barcode` · `qr_code` · `signature` · `watermark` · `abandon`

---

## Module structure

| Module | License | Purpose |
|--------|---------|---------|
| `cf_vision.models` | MIT | `ImageFrame`, `ImageElement`, `BoundingBox` |
| `cf_vision.router` | BSL 1.1* | `VisionRouter` — routes to local or cloud model |
| `cf_vision.barcode` | MIT | `BarcodeScanner` — pyzbar wrapper, no GPU |
| `cf_vision.ocr` | BSL 1.1 | `DolphinOCR` — Dolphin-v2 async wrapper |
| `cf_vision.receipt` | BSL 1.1 | `ReceiptParser` — line-item extraction (stub) |
| `cf_vision.camera` | MIT | `CameraCapture` — OpenCV frame capture |

*BSL applies to inference modules. Models + barcode + camera = MIT.

---

## Consumed by

- `Circuit-Forge/kiwi` — barcode scan + receipt OCR (Phase 2, primary consumer)
- `Circuit-Forge/peregrine` — resume document parsing
- `Circuit-Forge/falcon` (planned) — government form scanning
- `Circuit-Forge/godwit` (planned) — emergency identity document capture