Dolphin-v2 document parsing microservice for CircuitForge products

Find a file

pyr0ball cf0e2fa649 docs: add README and MIT LICENSE Covers hardware requirements, Docker Compose quickstart, /extract API reference, CF_DOCUVISION_URL wiring, and kiwi#150 callout for the self-hosted CF_ORCH_URL code gap.		2026-06-05 11:59:25 -07:00
app	feat: initial cf-docuvision service — Dolphin-v2 document parsing	2026-06-05 10:25:18 -07:00
tests	feat: initial cf-docuvision service — Dolphin-v2 document parsing	2026-06-05 10:25:18 -07:00
.env.example	feat: initial cf-docuvision service — Dolphin-v2 document parsing	2026-06-05 10:25:18 -07:00
.gitignore	feat: initial cf-docuvision service — Dolphin-v2 document parsing	2026-06-05 10:25:18 -07:00
compose.yml	feat: initial cf-docuvision service — Dolphin-v2 document parsing	2026-06-05 10:25:18 -07:00
Dockerfile	feat: initial cf-docuvision service — Dolphin-v2 document parsing	2026-06-05 10:25:18 -07:00
LICENSE	docs: add README and MIT LICENSE	2026-06-05 11:59:25 -07:00
README.md	docs: add README and MIT LICENSE	2026-06-05 11:59:25 -07:00
requirements.txt	feat: initial cf-docuvision service — Dolphin-v2 document parsing	2026-06-05 10:25:18 -07:00

README.md

cf-docuvision

Document parsing service for CircuitForge products. Parses scanned documents, PDFs, forms, and receipts into structured elements (headings, paragraphs, tables, figures) using Dolphin-v2 (ByteDance, Apache 2.0).

Status: v0.1.0 — production-ready for single-page documents.

Prerequisites

Hardware

GPU VRAM	Result
16GB+	Recommended — fast single-page parsing (1–3 seconds)
8GB	Minimum — works for most documents
Under 8GB	Likely CUDA out-of-memory on model load
CPU only	Works — expect 60–120 seconds per page

If you are on CPU or have limited VRAM, set CF_DOCUVISION_DEVICE=cpu before starting. The service logs a warning and continues — CPU fallback is slow but functional.

Model download

First startup downloads approximately 5–8 GB from HuggingFace. Subsequent runs use the local cache. No HuggingFace account required (model is Apache 2.0, not gated).

To speed up large downloads:

pip install hf-transfer
export HF_HUB_ENABLE_HF_TRANSFER=1

Quick start (Docker Compose)

git clone https://git.opensourcesolarpunk.com/Circuit-Forge/cf-docuvision.git
cd cf-docuvision
cp .env.example .env    # edit if needed
docker compose up -d

Watch model load progress:

docker compose logs -f cf-docuvision

The service is ready when logs show cf-docuvision: ready. Confirm:

curl http://localhost:8003/health
# {"status": "ok", "model": "ByteDance/Dolphin-v2"}

Direct Python run

pip install -r requirements.txt
CF_DOCUVISION_DEVICE=cuda uvicorn app.main:app --host 0.0.0.0 --port 8003

CPU fallback:

CF_DOCUVISION_DEVICE=cpu uvicorn app.main:app --host 0.0.0.0 --port 8003

Configuration

Variable	Default	Description
`CF_DOCUVISION_MODEL`	`ByteDance/Dolphin-v2`	HuggingFace model ID or local path
`CF_DOCUVISION_DEVICE`	`auto`	`cuda`, `cpu`, or `auto` (GPU if available)
`CF_DOCUVISION_PORT`	`8003`	Service port (Docker Compose only)

To skip HuggingFace download, set CF_DOCUVISION_MODEL to a local directory:

# Optional: uncomment the volume mount in compose.yml
# - /Library/Assets/LLM/dolphin-v2:/models/dolphin-v2:ro
CF_DOCUVISION_MODEL=/models/dolphin-v2

Connecting from a product

Set CF_DOCUVISION_URL in the product's .env:

CF_DOCUVISION_URL=http://localhost:8003

Products using cf-core's DocuvisionClient pick this up automatically.

Kiwi note: Kiwi v0.10.x gates the docuvision call on CF_ORCH_URL — CF_DOCUVISION_URL is not yet read directly. Fix is tracked at kiwi#150. Once that ships, set CF_DOCUVISION_URL in Kiwi's .env and leave CF_ORCH_URL unset.

API reference

`GET /health`

Returns 200 when the model is loaded and ready.

{"status": "ok", "model": "ByteDance/Dolphin-v2"}

Returns 503 while the model is still loading at startup.

`POST /extract`

Parse a document image into structured elements.

Request:

{
  "image_b64": "<base64-encoded image bytes (JPEG, PNG, TIFF)>",
  "hint": "auto"
}

hint controls extraction focus:

Value	Behaviour
`auto`	General parsing — balanced detection of all element types (default)
`table`	Prioritise HTML table rendering
`text`	Prioritise text content and heading hierarchy
`form`	Prioritise form fields and key-value pairs

Response:

{
  "elements": [
    {"type": "heading", "text": "Invoice", "bbox": [0.05, 0.02, 0.9, 0.08]},
    {"type": "paragraph", "text": "Due date: 2026-07-01", "bbox": [0.05, 0.10, 0.6, 0.14]}
  ],
  "tables": [
    {"html": "<table>...</table>", "bbox": [0.05, 0.20, 0.95, 0.60]}
  ],
  "raw_text": "Invoice\nDue date: 2026-07-01\n...",
  "metadata": {
    "source": "cf-docuvision",
    "model": "ByteDance/Dolphin-v2",
    "hint": "auto",
    "elapsed_ms": 1240
  }
}

Element types: heading, paragraph, list, table, figure, formula, code.

bbox values are normalised to [0, 1] relative to the image dimensions.

Troubleshooting

CUDA out of memory at startup Dolphin-v2 requires ~8GB VRAM. Set CF_DOCUVISION_DEVICE=cpu to use CPU mode instead.

503 Model not loaded on first request The model is still loading. Watch logs for cf-docuvision: ready before sending requests. The Docker healthcheck waits up to 120 seconds.

Very slow processing CPU mode is expected to take 60–120 seconds per page. This is normal. If you need speed, a GPU is required.

trust_remote_code=True warning Dolphin-v2 requires trust_remote_code=True for its custom architecture. The model is Apache 2.0 and auditable at huggingface.co/ByteDance/Dolphin-v2.

License

cf-docuvision service: MIT — CircuitForge LLC
Dolphin-v2 model: Apache 2.0 — ByteDance

README.md Unescape Escape