cf-vision/README.md
pyr0ball 353525c1f4 feat: initial cf-vision scaffold — ImageFrame API, stub inference modules
- cf_vision/models.py: ImageFrame + ImageElement + BoundingBox (MIT)
  Full Dolphin-v2 element taxonomy (21 types), convenience accessors
  (text_blocks, barcodes, tables, full_text)
- cf_vision/router.py: VisionRouter — mock + real paths, task routing
  (document, barcode, receipt, general)
- cf_vision/barcode.py: BarcodeScanner — pyzbar wrapper, CPU-only, MIT
- cf_vision/ocr.py: DolphinOCR — ByteDance/Dolphin-v2 async stub (BSL 1.1)
- cf_vision/receipt.py: ReceiptParser stub — Kiwi Phase 2 target (BSL 1.1)
- cf_vision/camera.py: CameraCapture — OpenCV single-frame capture (MIT)
- pyproject.toml: inference / barcode / camera optional extras
- .env.example: HF_TOKEN, CF_VISION_DEVICE, CF_VISION_MOCK
- README: module map, ImageFrame API reference, consumer roadmap
- tests: 6 passing (ImageFrame accessors, VisionRouter mock/real)

Extracted from circuitforge_core.vision per cf-core#36.
2026-04-06 17:59:00 -07:00

100 lines
3.1 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# cf-vision
CircuitForge vision pipeline. Produces `ImageFrame` objects from image sources -- documents, barcodes, receipts, camera captures -- using Dolphin-v2 (local, Free tier) or Claude vision (cloud, Paid tier).
**Status:** Stub. `VisionRouter` and `BarcodeScanner` API surface locked; real Dolphin-v2 inference lands with Kiwi Phase 2.
## Install
```bash
# Stub / mock mode (no GPU required)
pip install -e ../cf-vision
# Real document OCR (Dolphin-v2, ~8GB VRAM)
pip install -e "../cf-vision[inference]"
# Barcode / QR scanning (CPU, no GPU)
pip install -e "../cf-vision[barcode]"
# Camera capture
pip install -e "../cf-vision[camera]"
```
Copy `.env.example` to `.env` and fill in `HF_TOKEN` for Dolphin-v2.
## Quick start
```python
from cf_vision.router import VisionRouter
# Mock mode (no hardware needed)
router = VisionRouter(mock=True)
frame = router.analyze(image_bytes, task="document")
for element in frame.elements:
print(element.element_type, element.text)
# Real mode (requires [inference] extras)
router = VisionRouter.from_env() # reads CF_VISION_MOCK, CF_VISION_DEVICE
```
---
## ImageFrame
The primary output type. MIT licensed.
```python
@dataclass
class ImageFrame:
source: Literal["camera", "upload", "url", "mock"]
image_bytes: bytes | None
elements: list[ImageElement]
width_px: int
height_px: int
model: str # "dolphin-v2", "pyzbar", "claude", "mock"
def text_blocks() -> list[ImageElement]
def barcodes() -> list[ImageElement]
def tables() -> list[ImageElement]
def full_text(separator="\n") -> str
```
### ImageElement
```python
@dataclass
class ImageElement:
element_type: ElementType # one of 21 Dolphin-v2 types
text: str
confidence: float # 0.01.0
bbox: BoundingBox | None
metadata: dict # e.g. {"format": "EAN13"} for barcodes
```
### ElementType (21 types from Dolphin-v2)
`title` · `plain_text` · `text_block` · `header` · `footer` · `table` · `table_caption` · `table_footnote` · `figure` · `figure_caption` · `isolate_formula` · `formula_caption` · `inline_formula` · `page_number` · `seal` · `handwriting` · `barcode` · `qr_code` · `signature` · `watermark` · `abandon`
---
## Module structure
| Module | License | Purpose |
|--------|---------|---------|
| `cf_vision.models` | MIT | `ImageFrame`, `ImageElement`, `BoundingBox` |
| `cf_vision.router` | BSL 1.1* | `VisionRouter` — routes to local or cloud model |
| `cf_vision.barcode` | MIT | `BarcodeScanner` — pyzbar wrapper, no GPU |
| `cf_vision.ocr` | BSL 1.1 | `DolphinOCR` — Dolphin-v2 async wrapper |
| `cf_vision.receipt` | BSL 1.1 | `ReceiptParser` — line-item extraction (stub) |
| `cf_vision.camera` | MIT | `CameraCapture` — OpenCV frame capture |
*BSL applies to inference modules. Models + barcode + camera = MIT.
---
## Consumed by
- `Circuit-Forge/kiwi` — barcode scan + receipt OCR (Phase 2, primary consumer)
- `Circuit-Forge/peregrine` — resume document parsing
- `Circuit-Forge/falcon` (planned) — government form scanning
- `Circuit-Forge/godwit` (planned) — emergency identity document capture