Dolphin-v2 document parsing microservice for CircuitForge products
Find a file
pyr0ball cf0e2fa649 docs: add README and MIT LICENSE
Covers hardware requirements, Docker Compose quickstart, /extract API
reference, CF_DOCUVISION_URL wiring, and kiwi#150 callout for the
self-hosted CF_ORCH_URL code gap.
2026-06-05 11:59:25 -07:00
app feat: initial cf-docuvision service — Dolphin-v2 document parsing 2026-06-05 10:25:18 -07:00
tests feat: initial cf-docuvision service — Dolphin-v2 document parsing 2026-06-05 10:25:18 -07:00
.env.example feat: initial cf-docuvision service — Dolphin-v2 document parsing 2026-06-05 10:25:18 -07:00
.gitignore feat: initial cf-docuvision service — Dolphin-v2 document parsing 2026-06-05 10:25:18 -07:00
compose.yml feat: initial cf-docuvision service — Dolphin-v2 document parsing 2026-06-05 10:25:18 -07:00
Dockerfile feat: initial cf-docuvision service — Dolphin-v2 document parsing 2026-06-05 10:25:18 -07:00
LICENSE docs: add README and MIT LICENSE 2026-06-05 11:59:25 -07:00
README.md docs: add README and MIT LICENSE 2026-06-05 11:59:25 -07:00
requirements.txt feat: initial cf-docuvision service — Dolphin-v2 document parsing 2026-06-05 10:25:18 -07:00

cf-docuvision

Document parsing service for CircuitForge products. Parses scanned documents, PDFs, forms, and receipts into structured elements (headings, paragraphs, tables, figures) using Dolphin-v2 (ByteDance, Apache 2.0).

Status: v0.1.0 — production-ready for single-page documents.


Prerequisites

Hardware

GPU VRAM Result
16GB+ Recommended — fast single-page parsing (13 seconds)
8GB Minimum — works for most documents
Under 8GB Likely CUDA out-of-memory on model load
CPU only Works — expect 60120 seconds per page

If you are on CPU or have limited VRAM, set CF_DOCUVISION_DEVICE=cpu before starting. The service logs a warning and continues — CPU fallback is slow but functional.

Model download

First startup downloads approximately 58 GB from HuggingFace. Subsequent runs use the local cache. No HuggingFace account required (model is Apache 2.0, not gated).

To speed up large downloads:

pip install hf-transfer
export HF_HUB_ENABLE_HF_TRANSFER=1

Quick start (Docker Compose)

git clone https://git.opensourcesolarpunk.com/Circuit-Forge/cf-docuvision.git
cd cf-docuvision
cp .env.example .env    # edit if needed
docker compose up -d

Watch model load progress:

docker compose logs -f cf-docuvision

The service is ready when logs show cf-docuvision: ready. Confirm:

curl http://localhost:8003/health
# {"status": "ok", "model": "ByteDance/Dolphin-v2"}

Direct Python run

pip install -r requirements.txt
CF_DOCUVISION_DEVICE=cuda uvicorn app.main:app --host 0.0.0.0 --port 8003

CPU fallback:

CF_DOCUVISION_DEVICE=cpu uvicorn app.main:app --host 0.0.0.0 --port 8003

Configuration

Variable Default Description
CF_DOCUVISION_MODEL ByteDance/Dolphin-v2 HuggingFace model ID or local path
CF_DOCUVISION_DEVICE auto cuda, cpu, or auto (GPU if available)
CF_DOCUVISION_PORT 8003 Service port (Docker Compose only)

To skip HuggingFace download, set CF_DOCUVISION_MODEL to a local directory:

# Optional: uncomment the volume mount in compose.yml
# - /Library/Assets/LLM/dolphin-v2:/models/dolphin-v2:ro
CF_DOCUVISION_MODEL=/models/dolphin-v2

Connecting from a product

Set CF_DOCUVISION_URL in the product's .env:

CF_DOCUVISION_URL=http://localhost:8003

Products using cf-core's DocuvisionClient pick this up automatically.

Kiwi note: Kiwi v0.10.x gates the docuvision call on CF_ORCH_URLCF_DOCUVISION_URL is not yet read directly. Fix is tracked at kiwi#150. Once that ships, set CF_DOCUVISION_URL in Kiwi's .env and leave CF_ORCH_URL unset.


API reference

GET /health

Returns 200 when the model is loaded and ready.

{"status": "ok", "model": "ByteDance/Dolphin-v2"}

Returns 503 while the model is still loading at startup.

POST /extract

Parse a document image into structured elements.

Request:

{
  "image_b64": "<base64-encoded image bytes (JPEG, PNG, TIFF)>",
  "hint": "auto"
}

hint controls extraction focus:

Value Behaviour
auto General parsing — balanced detection of all element types (default)
table Prioritise HTML table rendering
text Prioritise text content and heading hierarchy
form Prioritise form fields and key-value pairs

Response:

{
  "elements": [
    {"type": "heading", "text": "Invoice", "bbox": [0.05, 0.02, 0.9, 0.08]},
    {"type": "paragraph", "text": "Due date: 2026-07-01", "bbox": [0.05, 0.10, 0.6, 0.14]}
  ],
  "tables": [
    {"html": "<table>...</table>", "bbox": [0.05, 0.20, 0.95, 0.60]}
  ],
  "raw_text": "Invoice\nDue date: 2026-07-01\n...",
  "metadata": {
    "source": "cf-docuvision",
    "model": "ByteDance/Dolphin-v2",
    "hint": "auto",
    "elapsed_ms": 1240
  }
}

Element types: heading, paragraph, list, table, figure, formula, code.

bbox values are normalised to [0, 1] relative to the image dimensions.


Troubleshooting

CUDA out of memory at startup Dolphin-v2 requires ~8GB VRAM. Set CF_DOCUVISION_DEVICE=cpu to use CPU mode instead.

503 Model not loaded on first request The model is still loading. Watch logs for cf-docuvision: ready before sending requests. The Docker healthcheck waits up to 120 seconds.

Very slow processing CPU mode is expected to take 60120 seconds per page. This is normal. If you need speed, a GPU is required.

trust_remote_code=True warning Dolphin-v2 requires trust_remote_code=True for its custom architecture. The model is Apache 2.0 and auditable at huggingface.co/ByteDance/Dolphin-v2.


License

  • cf-docuvision service: MIT — CircuitForge LLC
  • Dolphin-v2 model: Apache 2.0 — ByteDance