cf-vision/README.md
pyr0ball 353525c1f4 feat: initial cf-vision scaffold — ImageFrame API, stub inference modules
- cf_vision/models.py: ImageFrame + ImageElement + BoundingBox (MIT)
  Full Dolphin-v2 element taxonomy (21 types), convenience accessors
  (text_blocks, barcodes, tables, full_text)
- cf_vision/router.py: VisionRouter — mock + real paths, task routing
  (document, barcode, receipt, general)
- cf_vision/barcode.py: BarcodeScanner — pyzbar wrapper, CPU-only, MIT
- cf_vision/ocr.py: DolphinOCR — ByteDance/Dolphin-v2 async stub (BSL 1.1)
- cf_vision/receipt.py: ReceiptParser stub — Kiwi Phase 2 target (BSL 1.1)
- cf_vision/camera.py: CameraCapture — OpenCV single-frame capture (MIT)
- pyproject.toml: inference / barcode / camera optional extras
- .env.example: HF_TOKEN, CF_VISION_DEVICE, CF_VISION_MOCK
- README: module map, ImageFrame API reference, consumer roadmap
- tests: 6 passing (ImageFrame accessors, VisionRouter mock/real)

Extracted from circuitforge_core.vision per cf-core#36.
2026-04-06 17:59:00 -07:00

3.1 KiB
Raw Blame History

cf-vision

CircuitForge vision pipeline. Produces ImageFrame objects from image sources -- documents, barcodes, receipts, camera captures -- using Dolphin-v2 (local, Free tier) or Claude vision (cloud, Paid tier).

Status: Stub. VisionRouter and BarcodeScanner API surface locked; real Dolphin-v2 inference lands with Kiwi Phase 2.

Install

# Stub / mock mode (no GPU required)
pip install -e ../cf-vision

# Real document OCR (Dolphin-v2, ~8GB VRAM)
pip install -e "../cf-vision[inference]"

# Barcode / QR scanning (CPU, no GPU)
pip install -e "../cf-vision[barcode]"

# Camera capture
pip install -e "../cf-vision[camera]"

Copy .env.example to .env and fill in HF_TOKEN for Dolphin-v2.

Quick start

from cf_vision.router import VisionRouter

# Mock mode (no hardware needed)
router = VisionRouter(mock=True)
frame = router.analyze(image_bytes, task="document")
for element in frame.elements:
    print(element.element_type, element.text)

# Real mode (requires [inference] extras)
router = VisionRouter.from_env()  # reads CF_VISION_MOCK, CF_VISION_DEVICE

ImageFrame

The primary output type. MIT licensed.

@dataclass
class ImageFrame:
    source: Literal["camera", "upload", "url", "mock"]
    image_bytes: bytes | None
    elements: list[ImageElement]
    width_px: int
    height_px: int
    model: str              # "dolphin-v2", "pyzbar", "claude", "mock"

    def text_blocks() -> list[ImageElement]
    def barcodes()    -> list[ImageElement]
    def tables()      -> list[ImageElement]
    def full_text(separator="\n") -> str

ImageElement

@dataclass
class ImageElement:
    element_type: ElementType   # one of 21 Dolphin-v2 types
    text: str
    confidence: float           # 0.01.0
    bbox: BoundingBox | None
    metadata: dict              # e.g. {"format": "EAN13"} for barcodes

ElementType (21 types from Dolphin-v2)

title · plain_text · text_block · header · footer · table · table_caption · table_footnote · figure · figure_caption · isolate_formula · formula_caption · inline_formula · page_number · seal · handwriting · barcode · qr_code · signature · watermark · abandon


Module structure

Module License Purpose
cf_vision.models MIT ImageFrame, ImageElement, BoundingBox
cf_vision.router BSL 1.1* VisionRouter — routes to local or cloud model
cf_vision.barcode MIT BarcodeScanner — pyzbar wrapper, no GPU
cf_vision.ocr BSL 1.1 DolphinOCR — Dolphin-v2 async wrapper
cf_vision.receipt BSL 1.1 ReceiptParser — line-item extraction (stub)
cf_vision.camera MIT CameraCapture — OpenCV frame capture

*BSL applies to inference modules. Models + barcode + camera = MIT.


Consumed by

  • Circuit-Forge/kiwi — barcode scan + receipt OCR (Phase 2, primary consumer)
  • Circuit-Forge/peregrine — resume document parsing
  • Circuit-Forge/falcon (planned) — government form scanning
  • Circuit-Forge/godwit (planned) — emergency identity document capture