# cf-docuvision Document parsing service for CircuitForge products. Parses scanned documents, PDFs, forms, and receipts into structured elements (headings, paragraphs, tables, figures) using [Dolphin-v2](https://huggingface.co/ByteDance/Dolphin-v2) (ByteDance, Apache 2.0). **Status:** v0.1.0 — production-ready for single-page documents. --- ## Prerequisites ### Hardware | GPU VRAM | Result | |----------|--------| | 16GB+ | Recommended — fast single-page parsing (1–3 seconds) | | 8GB | Minimum — works for most documents | | Under 8GB | Likely CUDA out-of-memory on model load | | CPU only | Works — expect 60–120 seconds per page | If you are on CPU or have limited VRAM, set `CF_DOCUVISION_DEVICE=cpu` before starting. The service logs a warning and continues — CPU fallback is slow but functional. ### Model download First startup downloads approximately **5–8 GB** from HuggingFace. Subsequent runs use the local cache. No HuggingFace account required (model is Apache 2.0, not gated). To speed up large downloads: ```bash pip install hf-transfer export HF_HUB_ENABLE_HF_TRANSFER=1 ``` --- ## Quick start (Docker Compose) ```bash git clone https://git.opensourcesolarpunk.com/Circuit-Forge/cf-docuvision.git cd cf-docuvision cp .env.example .env # edit if needed docker compose up -d ``` Watch model load progress: ```bash docker compose logs -f cf-docuvision ``` The service is ready when logs show `cf-docuvision: ready`. Confirm: ```bash curl http://localhost:8003/health # {"status": "ok", "model": "ByteDance/Dolphin-v2"} ``` --- ## Direct Python run ```bash pip install -r requirements.txt CF_DOCUVISION_DEVICE=cuda uvicorn app.main:app --host 0.0.0.0 --port 8003 ``` CPU fallback: ```bash CF_DOCUVISION_DEVICE=cpu uvicorn app.main:app --host 0.0.0.0 --port 8003 ``` --- ## Configuration | Variable | Default | Description | |---|---|---| | `CF_DOCUVISION_MODEL` | `ByteDance/Dolphin-v2` | HuggingFace model ID or local path | | `CF_DOCUVISION_DEVICE` | `auto` | `cuda`, `cpu`, or `auto` (GPU if available) | | `CF_DOCUVISION_PORT` | `8003` | Service port (Docker Compose only) | To skip HuggingFace download, set `CF_DOCUVISION_MODEL` to a local directory: ```bash # Optional: uncomment the volume mount in compose.yml # - /Library/Assets/LLM/dolphin-v2:/models/dolphin-v2:ro CF_DOCUVISION_MODEL=/models/dolphin-v2 ``` --- ## Connecting from a product Set `CF_DOCUVISION_URL` in the product's `.env`: ```bash CF_DOCUVISION_URL=http://localhost:8003 ``` Products using cf-core's `DocuvisionClient` pick this up automatically. > **Kiwi note:** Kiwi v0.10.x gates the docuvision call on `CF_ORCH_URL` — `CF_DOCUVISION_URL` is not yet read directly. Fix is tracked at [kiwi#150](https://git.opensourcesolarpunk.com/Circuit-Forge/kiwi/issues/150). Once that ships, set `CF_DOCUVISION_URL` in Kiwi's `.env` and leave `CF_ORCH_URL` unset. --- ## API reference ### `GET /health` Returns 200 when the model is loaded and ready. ```json {"status": "ok", "model": "ByteDance/Dolphin-v2"} ``` Returns 503 while the model is still loading at startup. ### `POST /extract` Parse a document image into structured elements. **Request:** ```json { "image_b64": "", "hint": "auto" } ``` `hint` controls extraction focus: | Value | Behaviour | |---|---| | `auto` | General parsing — balanced detection of all element types (default) | | `table` | Prioritise HTML table rendering | | `text` | Prioritise text content and heading hierarchy | | `form` | Prioritise form fields and key-value pairs | **Response:** ```json { "elements": [ {"type": "heading", "text": "Invoice", "bbox": [0.05, 0.02, 0.9, 0.08]}, {"type": "paragraph", "text": "Due date: 2026-07-01", "bbox": [0.05, 0.10, 0.6, 0.14]} ], "tables": [ {"html": "...
", "bbox": [0.05, 0.20, 0.95, 0.60]} ], "raw_text": "Invoice\nDue date: 2026-07-01\n...", "metadata": { "source": "cf-docuvision", "model": "ByteDance/Dolphin-v2", "hint": "auto", "elapsed_ms": 1240 } } ``` Element types: `heading`, `paragraph`, `list`, `table`, `figure`, `formula`, `code`. `bbox` values are normalised to [0, 1] relative to the image dimensions. --- ## Troubleshooting **`CUDA out of memory` at startup** Dolphin-v2 requires ~8GB VRAM. Set `CF_DOCUVISION_DEVICE=cpu` to use CPU mode instead. **`503 Model not loaded` on first request** The model is still loading. Watch logs for `cf-docuvision: ready` before sending requests. The Docker healthcheck waits up to 120 seconds. **Very slow processing** CPU mode is expected to take 60–120 seconds per page. This is normal. If you need speed, a GPU is required. **`trust_remote_code=True` warning** Dolphin-v2 requires `trust_remote_code=True` for its custom architecture. The model is Apache 2.0 and auditable at [huggingface.co/ByteDance/Dolphin-v2](https://huggingface.co/ByteDance/Dolphin-v2). --- ## License - cf-docuvision service: [MIT](LICENSE) — CircuitForge LLC - Dolphin-v2 model: [Apache 2.0](https://huggingface.co/ByteDance/Dolphin-v2) — ByteDance