Covers hardware requirements, Docker Compose quickstart, /extract API reference, CF_DOCUVISION_URL wiring, and kiwi#150 callout for the self-hosted CF_ORCH_URL code gap.
5 KiB
cf-docuvision
Document parsing service for CircuitForge products. Parses scanned documents, PDFs, forms, and receipts into structured elements (headings, paragraphs, tables, figures) using Dolphin-v2 (ByteDance, Apache 2.0).
Status: v0.1.0 — production-ready for single-page documents.
Prerequisites
Hardware
| GPU VRAM | Result |
|---|---|
| 16GB+ | Recommended — fast single-page parsing (1–3 seconds) |
| 8GB | Minimum — works for most documents |
| Under 8GB | Likely CUDA out-of-memory on model load |
| CPU only | Works — expect 60–120 seconds per page |
If you are on CPU or have limited VRAM, set CF_DOCUVISION_DEVICE=cpu before starting. The service logs a warning and continues — CPU fallback is slow but functional.
Model download
First startup downloads approximately 5–8 GB from HuggingFace. Subsequent runs use the local cache. No HuggingFace account required (model is Apache 2.0, not gated).
To speed up large downloads:
pip install hf-transfer
export HF_HUB_ENABLE_HF_TRANSFER=1
Quick start (Docker Compose)
git clone https://git.opensourcesolarpunk.com/Circuit-Forge/cf-docuvision.git
cd cf-docuvision
cp .env.example .env # edit if needed
docker compose up -d
Watch model load progress:
docker compose logs -f cf-docuvision
The service is ready when logs show cf-docuvision: ready. Confirm:
curl http://localhost:8003/health
# {"status": "ok", "model": "ByteDance/Dolphin-v2"}
Direct Python run
pip install -r requirements.txt
CF_DOCUVISION_DEVICE=cuda uvicorn app.main:app --host 0.0.0.0 --port 8003
CPU fallback:
CF_DOCUVISION_DEVICE=cpu uvicorn app.main:app --host 0.0.0.0 --port 8003
Configuration
| Variable | Default | Description |
|---|---|---|
CF_DOCUVISION_MODEL |
ByteDance/Dolphin-v2 |
HuggingFace model ID or local path |
CF_DOCUVISION_DEVICE |
auto |
cuda, cpu, or auto (GPU if available) |
CF_DOCUVISION_PORT |
8003 |
Service port (Docker Compose only) |
To skip HuggingFace download, set CF_DOCUVISION_MODEL to a local directory:
# Optional: uncomment the volume mount in compose.yml
# - /Library/Assets/LLM/dolphin-v2:/models/dolphin-v2:ro
CF_DOCUVISION_MODEL=/models/dolphin-v2
Connecting from a product
Set CF_DOCUVISION_URL in the product's .env:
CF_DOCUVISION_URL=http://localhost:8003
Products using cf-core's DocuvisionClient pick this up automatically.
Kiwi note: Kiwi v0.10.x gates the docuvision call on
CF_ORCH_URL—CF_DOCUVISION_URLis not yet read directly. Fix is tracked at kiwi#150. Once that ships, setCF_DOCUVISION_URLin Kiwi's.envand leaveCF_ORCH_URLunset.
API reference
GET /health
Returns 200 when the model is loaded and ready.
{"status": "ok", "model": "ByteDance/Dolphin-v2"}
Returns 503 while the model is still loading at startup.
POST /extract
Parse a document image into structured elements.
Request:
{
"image_b64": "<base64-encoded image bytes (JPEG, PNG, TIFF)>",
"hint": "auto"
}
hint controls extraction focus:
| Value | Behaviour |
|---|---|
auto |
General parsing — balanced detection of all element types (default) |
table |
Prioritise HTML table rendering |
text |
Prioritise text content and heading hierarchy |
form |
Prioritise form fields and key-value pairs |
Response:
{
"elements": [
{"type": "heading", "text": "Invoice", "bbox": [0.05, 0.02, 0.9, 0.08]},
{"type": "paragraph", "text": "Due date: 2026-07-01", "bbox": [0.05, 0.10, 0.6, 0.14]}
],
"tables": [
{"html": "<table>...</table>", "bbox": [0.05, 0.20, 0.95, 0.60]}
],
"raw_text": "Invoice\nDue date: 2026-07-01\n...",
"metadata": {
"source": "cf-docuvision",
"model": "ByteDance/Dolphin-v2",
"hint": "auto",
"elapsed_ms": 1240
}
}
Element types: heading, paragraph, list, table, figure, formula, code.
bbox values are normalised to [0, 1] relative to the image dimensions.
Troubleshooting
CUDA out of memory at startup
Dolphin-v2 requires ~8GB VRAM. Set CF_DOCUVISION_DEVICE=cpu to use CPU mode instead.
503 Model not loaded on first request
The model is still loading. Watch logs for cf-docuvision: ready before sending requests. The Docker healthcheck waits up to 120 seconds.
Very slow processing CPU mode is expected to take 60–120 seconds per page. This is normal. If you need speed, a GPU is required.
trust_remote_code=True warning
Dolphin-v2 requires trust_remote_code=True for its custom architecture. The model is Apache 2.0 and auditable at huggingface.co/ByteDance/Dolphin-v2.
License
- cf-docuvision service: MIT — CircuitForge LLC
- Dolphin-v2 model: Apache 2.0 — ByteDance