CircuitForge vision pipeline — ImageFrame API, OCR, barcode, receipt extraction
- cf_vision/models.py: ImageFrame + ImageElement + BoundingBox (MIT) Full Dolphin-v2 element taxonomy (21 types), convenience accessors (text_blocks, barcodes, tables, full_text) - cf_vision/router.py: VisionRouter — mock + real paths, task routing (document, barcode, receipt, general) - cf_vision/barcode.py: BarcodeScanner — pyzbar wrapper, CPU-only, MIT - cf_vision/ocr.py: DolphinOCR — ByteDance/Dolphin-v2 async stub (BSL 1.1) - cf_vision/receipt.py: ReceiptParser stub — Kiwi Phase 2 target (BSL 1.1) - cf_vision/camera.py: CameraCapture — OpenCV single-frame capture (MIT) - pyproject.toml: inference / barcode / camera optional extras - .env.example: HF_TOKEN, CF_VISION_DEVICE, CF_VISION_MOCK - README: module map, ImageFrame API reference, consumer roadmap - tests: 6 passing (ImageFrame accessors, VisionRouter mock/real) Extracted from circuitforge_core.vision per cf-core#36. |
||
|---|---|---|
| cf_vision | ||
| tests | ||
| .env.example | ||
| .gitignore | ||
| pyproject.toml | ||
| README.md | ||
cf-vision
CircuitForge vision pipeline. Produces ImageFrame objects from image sources -- documents, barcodes, receipts, camera captures -- using Dolphin-v2 (local, Free tier) or Claude vision (cloud, Paid tier).
Status: Stub. VisionRouter and BarcodeScanner API surface locked; real Dolphin-v2 inference lands with Kiwi Phase 2.
Install
# Stub / mock mode (no GPU required)
pip install -e ../cf-vision
# Real document OCR (Dolphin-v2, ~8GB VRAM)
pip install -e "../cf-vision[inference]"
# Barcode / QR scanning (CPU, no GPU)
pip install -e "../cf-vision[barcode]"
# Camera capture
pip install -e "../cf-vision[camera]"
Copy .env.example to .env and fill in HF_TOKEN for Dolphin-v2.
Quick start
from cf_vision.router import VisionRouter
# Mock mode (no hardware needed)
router = VisionRouter(mock=True)
frame = router.analyze(image_bytes, task="document")
for element in frame.elements:
print(element.element_type, element.text)
# Real mode (requires [inference] extras)
router = VisionRouter.from_env() # reads CF_VISION_MOCK, CF_VISION_DEVICE
ImageFrame
The primary output type. MIT licensed.
@dataclass
class ImageFrame:
source: Literal["camera", "upload", "url", "mock"]
image_bytes: bytes | None
elements: list[ImageElement]
width_px: int
height_px: int
model: str # "dolphin-v2", "pyzbar", "claude", "mock"
def text_blocks() -> list[ImageElement]
def barcodes() -> list[ImageElement]
def tables() -> list[ImageElement]
def full_text(separator="\n") -> str
ImageElement
@dataclass
class ImageElement:
element_type: ElementType # one of 21 Dolphin-v2 types
text: str
confidence: float # 0.0–1.0
bbox: BoundingBox | None
metadata: dict # e.g. {"format": "EAN13"} for barcodes
ElementType (21 types from Dolphin-v2)
title · plain_text · text_block · header · footer · table · table_caption · table_footnote · figure · figure_caption · isolate_formula · formula_caption · inline_formula · page_number · seal · handwriting · barcode · qr_code · signature · watermark · abandon
Module structure
| Module | License | Purpose |
|---|---|---|
cf_vision.models |
MIT | ImageFrame, ImageElement, BoundingBox |
cf_vision.router |
BSL 1.1* | VisionRouter — routes to local or cloud model |
cf_vision.barcode |
MIT | BarcodeScanner — pyzbar wrapper, no GPU |
cf_vision.ocr |
BSL 1.1 | DolphinOCR — Dolphin-v2 async wrapper |
cf_vision.receipt |
BSL 1.1 | ReceiptParser — line-item extraction (stub) |
cf_vision.camera |
MIT | CameraCapture — OpenCV frame capture |
*BSL applies to inference modules. Models + barcode + camera = MIT.
Consumed by
Circuit-Forge/kiwi— barcode scan + receipt OCR (Phase 2, primary consumer)Circuit-Forge/peregrine— resume document parsingCircuit-Forge/falcon(planned) — government form scanningCircuit-Forge/godwit(planned) — emergency identity document capture