Implement cf-docuvision managed HTTP service (ByteDance/Dolphin-v2) #8

Closed
opened 2026-04-01 22:59:54 -07:00 by pyr0ball · 0 comments
Owner

Purpose

cf-docuvision is the document understanding inference service for the CF menagerie. It wraps ByteDance/Dolphin-v2 (Qwen2.5-VL-3B backbone, purpose-built document parser) behind an OpenAI-compatible or custom HTTP API so product apps (Kiwi, Falcon, Harrier, Ibis, Godwit) can call it without loading the model in-process.

Why Dolphin-v2

  • Trained specifically for document structure parsing (21 element types, layout analysis)
  • Strong table/line-item extraction (TEDS 87%, OCR edit distance 0.054)
  • 3B params — fits within cf-docuvision VRAM slots in all GPU profiles
  • Dolphin-v2 > generic Qwen2-VL for documents; cf-vision stays available for general multimodal tasks

VRAM slots (already in profiles)

Profile max_mb max_concurrent
6gb 3072 1
8gb 4096 2
16gb 6144 3
24gb 8192 4

Tasks

  • Build cf-docuvision Docker image or process spec
    • Option A: FastAPI wrapper around transformers Dolphin-v2 inference
    • Option B: vLLM serving ByteDance/Dolphin-v2 (if vLLM supports Qwen2.5-VL chat template)
  • Define HTTP API: POST /extract → accepts {image_path|image_b64} → returns structured receipt/document JSON
  • Add managed spec to cf-docuvision in all GPU profiles
  • Wire into cf-orch agent ServiceManager (DockerSpec or ProcessSpec)
  • Document expected output schema (compatible with Kiwi receipt_data table)

Downstream consumers

  • Kiwi (Circuit-Forge/kiwi) — receipt line-item extraction
  • Falcon — form/PDF field extraction
  • Harrier — EOB and insurance document parsing
  • Ibis — medical record and referral document extraction
  • Godwit — identity document parsing

Notes

  • cf-vision is intentionally separate — reserved for general multimodal tasks (classification, tracking, visual search)
  • Model not yet on disk; will download to /Library/Assets/LLM/ (path TBD — suggest docuvision/models/dolphin-v2)
  • Kiwi currently using in-process Qwen2.5-VL-7B-Instruct as interim until this ships
## Purpose `cf-docuvision` is the document understanding inference service for the CF menagerie. It wraps **ByteDance/Dolphin-v2** (Qwen2.5-VL-3B backbone, purpose-built document parser) behind an OpenAI-compatible or custom HTTP API so product apps (Kiwi, Falcon, Harrier, Ibis, Godwit) can call it without loading the model in-process. ## Why Dolphin-v2 - Trained specifically for document structure parsing (21 element types, layout analysis) - Strong table/line-item extraction (TEDS 87%, OCR edit distance 0.054) - 3B params — fits within `cf-docuvision` VRAM slots in all GPU profiles - Dolphin-v2 > generic Qwen2-VL for documents; cf-vision stays available for general multimodal tasks ## VRAM slots (already in profiles) | Profile | max_mb | max_concurrent | |---------|--------|----------------| | 6gb | 3072 | 1 | | 8gb | 4096 | 2 | | 16gb | 6144 | 3 | | 24gb | 8192 | 4 | ## Tasks - [ ] Build `cf-docuvision` Docker image or process spec - Option A: FastAPI wrapper around `transformers` Dolphin-v2 inference - Option B: vLLM serving `ByteDance/Dolphin-v2` (if vLLM supports Qwen2.5-VL chat template) - [ ] Define HTTP API: `POST /extract` → accepts `{image_path|image_b64}` → returns structured receipt/document JSON - [ ] Add `managed` spec to `cf-docuvision` in all GPU profiles - [ ] Wire into cf-orch agent `ServiceManager` (DockerSpec or ProcessSpec) - [ ] Document expected output schema (compatible with Kiwi `receipt_data` table) ## Downstream consumers - **Kiwi** (Circuit-Forge/kiwi) — receipt line-item extraction - **Falcon** — form/PDF field extraction - **Harrier** — EOB and insurance document parsing - **Ibis** — medical record and referral document extraction - **Godwit** — identity document parsing ## Notes - `cf-vision` is intentionally separate — reserved for general multimodal tasks (classification, tracking, visual search) - Model not yet on disk; will download to `/Library/Assets/LLM/` (path TBD — suggest `docuvision/models/dolphin-v2`) - Kiwi currently using in-process `Qwen2.5-VL-7B-Instruct` as interim until this ships
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Circuit-Forge/circuitforge-core#8
No description provided.