- cf_vision/models.py: ImageFrame + ImageElement + BoundingBox (MIT) Full Dolphin-v2 element taxonomy (21 types), convenience accessors (text_blocks, barcodes, tables, full_text) - cf_vision/router.py: VisionRouter — mock + real paths, task routing (document, barcode, receipt, general) - cf_vision/barcode.py: BarcodeScanner — pyzbar wrapper, CPU-only, MIT - cf_vision/ocr.py: DolphinOCR — ByteDance/Dolphin-v2 async stub (BSL 1.1) - cf_vision/receipt.py: ReceiptParser stub — Kiwi Phase 2 target (BSL 1.1) - cf_vision/camera.py: CameraCapture — OpenCV single-frame capture (MIT) - pyproject.toml: inference / barcode / camera optional extras - .env.example: HF_TOKEN, CF_VISION_DEVICE, CF_VISION_MOCK - README: module map, ImageFrame API reference, consumer roadmap - tests: 6 passing (ImageFrame accessors, VisionRouter mock/real) Extracted from circuitforge_core.vision per cf-core#36.
100 lines
3.1 KiB
Markdown
100 lines
3.1 KiB
Markdown
# cf-vision
|
||
|
||
CircuitForge vision pipeline. Produces `ImageFrame` objects from image sources -- documents, barcodes, receipts, camera captures -- using Dolphin-v2 (local, Free tier) or Claude vision (cloud, Paid tier).
|
||
|
||
**Status:** Stub. `VisionRouter` and `BarcodeScanner` API surface locked; real Dolphin-v2 inference lands with Kiwi Phase 2.
|
||
|
||
## Install
|
||
|
||
```bash
|
||
# Stub / mock mode (no GPU required)
|
||
pip install -e ../cf-vision
|
||
|
||
# Real document OCR (Dolphin-v2, ~8GB VRAM)
|
||
pip install -e "../cf-vision[inference]"
|
||
|
||
# Barcode / QR scanning (CPU, no GPU)
|
||
pip install -e "../cf-vision[barcode]"
|
||
|
||
# Camera capture
|
||
pip install -e "../cf-vision[camera]"
|
||
```
|
||
|
||
Copy `.env.example` to `.env` and fill in `HF_TOKEN` for Dolphin-v2.
|
||
|
||
## Quick start
|
||
|
||
```python
|
||
from cf_vision.router import VisionRouter
|
||
|
||
# Mock mode (no hardware needed)
|
||
router = VisionRouter(mock=True)
|
||
frame = router.analyze(image_bytes, task="document")
|
||
for element in frame.elements:
|
||
print(element.element_type, element.text)
|
||
|
||
# Real mode (requires [inference] extras)
|
||
router = VisionRouter.from_env() # reads CF_VISION_MOCK, CF_VISION_DEVICE
|
||
```
|
||
|
||
---
|
||
|
||
## ImageFrame
|
||
|
||
The primary output type. MIT licensed.
|
||
|
||
```python
|
||
@dataclass
|
||
class ImageFrame:
|
||
source: Literal["camera", "upload", "url", "mock"]
|
||
image_bytes: bytes | None
|
||
elements: list[ImageElement]
|
||
width_px: int
|
||
height_px: int
|
||
model: str # "dolphin-v2", "pyzbar", "claude", "mock"
|
||
|
||
def text_blocks() -> list[ImageElement]
|
||
def barcodes() -> list[ImageElement]
|
||
def tables() -> list[ImageElement]
|
||
def full_text(separator="\n") -> str
|
||
```
|
||
|
||
### ImageElement
|
||
|
||
```python
|
||
@dataclass
|
||
class ImageElement:
|
||
element_type: ElementType # one of 21 Dolphin-v2 types
|
||
text: str
|
||
confidence: float # 0.0–1.0
|
||
bbox: BoundingBox | None
|
||
metadata: dict # e.g. {"format": "EAN13"} for barcodes
|
||
```
|
||
|
||
### ElementType (21 types from Dolphin-v2)
|
||
|
||
`title` · `plain_text` · `text_block` · `header` · `footer` · `table` · `table_caption` · `table_footnote` · `figure` · `figure_caption` · `isolate_formula` · `formula_caption` · `inline_formula` · `page_number` · `seal` · `handwriting` · `barcode` · `qr_code` · `signature` · `watermark` · `abandon`
|
||
|
||
---
|
||
|
||
## Module structure
|
||
|
||
| Module | License | Purpose |
|
||
|--------|---------|---------|
|
||
| `cf_vision.models` | MIT | `ImageFrame`, `ImageElement`, `BoundingBox` |
|
||
| `cf_vision.router` | BSL 1.1* | `VisionRouter` — routes to local or cloud model |
|
||
| `cf_vision.barcode` | MIT | `BarcodeScanner` — pyzbar wrapper, no GPU |
|
||
| `cf_vision.ocr` | BSL 1.1 | `DolphinOCR` — Dolphin-v2 async wrapper |
|
||
| `cf_vision.receipt` | BSL 1.1 | `ReceiptParser` — line-item extraction (stub) |
|
||
| `cf_vision.camera` | MIT | `CameraCapture` — OpenCV frame capture |
|
||
|
||
*BSL applies to inference modules. Models + barcode + camera = MIT.
|
||
|
||
---
|
||
|
||
## Consumed by
|
||
|
||
- `Circuit-Forge/kiwi` — barcode scan + receipt OCR (Phase 2, primary consumer)
|
||
- `Circuit-Forge/peregrine` — resume document parsing
|
||
- `Circuit-Forge/falcon` (planned) — government form scanning
|
||
- `Circuit-Forge/godwit` (planned) — emergency identity document capture
|