Covers hardware requirements, Docker Compose quickstart, /extract API reference, CF_DOCUVISION_URL wiring, and kiwi#150 callout for the self-hosted CF_ORCH_URL code gap.
186 lines
5 KiB
Markdown
186 lines
5 KiB
Markdown
# cf-docuvision
|
||
|
||
Document parsing service for CircuitForge products. Parses scanned documents, PDFs, forms, and receipts into structured elements (headings, paragraphs, tables, figures) using [Dolphin-v2](https://huggingface.co/ByteDance/Dolphin-v2) (ByteDance, Apache 2.0).
|
||
|
||
**Status:** v0.1.0 — production-ready for single-page documents.
|
||
|
||
---
|
||
|
||
## Prerequisites
|
||
|
||
### Hardware
|
||
|
||
| GPU VRAM | Result |
|
||
|----------|--------|
|
||
| 16GB+ | Recommended — fast single-page parsing (1–3 seconds) |
|
||
| 8GB | Minimum — works for most documents |
|
||
| Under 8GB | Likely CUDA out-of-memory on model load |
|
||
| CPU only | Works — expect 60–120 seconds per page |
|
||
|
||
If you are on CPU or have limited VRAM, set `CF_DOCUVISION_DEVICE=cpu` before starting. The service logs a warning and continues — CPU fallback is slow but functional.
|
||
|
||
### Model download
|
||
|
||
First startup downloads approximately **5–8 GB** from HuggingFace. Subsequent runs use the local cache. No HuggingFace account required (model is Apache 2.0, not gated).
|
||
|
||
To speed up large downloads:
|
||
|
||
```bash
|
||
pip install hf-transfer
|
||
export HF_HUB_ENABLE_HF_TRANSFER=1
|
||
```
|
||
|
||
---
|
||
|
||
## Quick start (Docker Compose)
|
||
|
||
```bash
|
||
git clone https://git.opensourcesolarpunk.com/Circuit-Forge/cf-docuvision.git
|
||
cd cf-docuvision
|
||
cp .env.example .env # edit if needed
|
||
docker compose up -d
|
||
```
|
||
|
||
Watch model load progress:
|
||
|
||
```bash
|
||
docker compose logs -f cf-docuvision
|
||
```
|
||
|
||
The service is ready when logs show `cf-docuvision: ready`. Confirm:
|
||
|
||
```bash
|
||
curl http://localhost:8003/health
|
||
# {"status": "ok", "model": "ByteDance/Dolphin-v2"}
|
||
```
|
||
|
||
---
|
||
|
||
## Direct Python run
|
||
|
||
```bash
|
||
pip install -r requirements.txt
|
||
CF_DOCUVISION_DEVICE=cuda uvicorn app.main:app --host 0.0.0.0 --port 8003
|
||
```
|
||
|
||
CPU fallback:
|
||
|
||
```bash
|
||
CF_DOCUVISION_DEVICE=cpu uvicorn app.main:app --host 0.0.0.0 --port 8003
|
||
```
|
||
|
||
---
|
||
|
||
## Configuration
|
||
|
||
| Variable | Default | Description |
|
||
|---|---|---|
|
||
| `CF_DOCUVISION_MODEL` | `ByteDance/Dolphin-v2` | HuggingFace model ID or local path |
|
||
| `CF_DOCUVISION_DEVICE` | `auto` | `cuda`, `cpu`, or `auto` (GPU if available) |
|
||
| `CF_DOCUVISION_PORT` | `8003` | Service port (Docker Compose only) |
|
||
|
||
To skip HuggingFace download, set `CF_DOCUVISION_MODEL` to a local directory:
|
||
|
||
```bash
|
||
# Optional: uncomment the volume mount in compose.yml
|
||
# - /Library/Assets/LLM/dolphin-v2:/models/dolphin-v2:ro
|
||
CF_DOCUVISION_MODEL=/models/dolphin-v2
|
||
```
|
||
|
||
---
|
||
|
||
## Connecting from a product
|
||
|
||
Set `CF_DOCUVISION_URL` in the product's `.env`:
|
||
|
||
```bash
|
||
CF_DOCUVISION_URL=http://localhost:8003
|
||
```
|
||
|
||
Products using cf-core's `DocuvisionClient` pick this up automatically.
|
||
|
||
> **Kiwi note:** Kiwi v0.10.x gates the docuvision call on `CF_ORCH_URL` — `CF_DOCUVISION_URL` is not yet read directly. Fix is tracked at [kiwi#150](https://git.opensourcesolarpunk.com/Circuit-Forge/kiwi/issues/150). Once that ships, set `CF_DOCUVISION_URL` in Kiwi's `.env` and leave `CF_ORCH_URL` unset.
|
||
|
||
---
|
||
|
||
## API reference
|
||
|
||
### `GET /health`
|
||
|
||
Returns 200 when the model is loaded and ready.
|
||
|
||
```json
|
||
{"status": "ok", "model": "ByteDance/Dolphin-v2"}
|
||
```
|
||
|
||
Returns 503 while the model is still loading at startup.
|
||
|
||
### `POST /extract`
|
||
|
||
Parse a document image into structured elements.
|
||
|
||
**Request:**
|
||
|
||
```json
|
||
{
|
||
"image_b64": "<base64-encoded image bytes (JPEG, PNG, TIFF)>",
|
||
"hint": "auto"
|
||
}
|
||
```
|
||
|
||
`hint` controls extraction focus:
|
||
|
||
| Value | Behaviour |
|
||
|---|---|
|
||
| `auto` | General parsing — balanced detection of all element types (default) |
|
||
| `table` | Prioritise HTML table rendering |
|
||
| `text` | Prioritise text content and heading hierarchy |
|
||
| `form` | Prioritise form fields and key-value pairs |
|
||
|
||
**Response:**
|
||
|
||
```json
|
||
{
|
||
"elements": [
|
||
{"type": "heading", "text": "Invoice", "bbox": [0.05, 0.02, 0.9, 0.08]},
|
||
{"type": "paragraph", "text": "Due date: 2026-07-01", "bbox": [0.05, 0.10, 0.6, 0.14]}
|
||
],
|
||
"tables": [
|
||
{"html": "<table>...</table>", "bbox": [0.05, 0.20, 0.95, 0.60]}
|
||
],
|
||
"raw_text": "Invoice\nDue date: 2026-07-01\n...",
|
||
"metadata": {
|
||
"source": "cf-docuvision",
|
||
"model": "ByteDance/Dolphin-v2",
|
||
"hint": "auto",
|
||
"elapsed_ms": 1240
|
||
}
|
||
}
|
||
```
|
||
|
||
Element types: `heading`, `paragraph`, `list`, `table`, `figure`, `formula`, `code`.
|
||
|
||
`bbox` values are normalised to [0, 1] relative to the image dimensions.
|
||
|
||
---
|
||
|
||
## Troubleshooting
|
||
|
||
**`CUDA out of memory` at startup**
|
||
Dolphin-v2 requires ~8GB VRAM. Set `CF_DOCUVISION_DEVICE=cpu` to use CPU mode instead.
|
||
|
||
**`503 Model not loaded` on first request**
|
||
The model is still loading. Watch logs for `cf-docuvision: ready` before sending requests. The Docker healthcheck waits up to 120 seconds.
|
||
|
||
**Very slow processing**
|
||
CPU mode is expected to take 60–120 seconds per page. This is normal. If you need speed, a GPU is required.
|
||
|
||
**`trust_remote_code=True` warning**
|
||
Dolphin-v2 requires `trust_remote_code=True` for its custom architecture. The model is Apache 2.0 and auditable at [huggingface.co/ByteDance/Dolphin-v2](https://huggingface.co/ByteDance/Dolphin-v2).
|
||
|
||
---
|
||
|
||
## License
|
||
|
||
- cf-docuvision service: [MIT](LICENSE) — CircuitForge LLC
|
||
- Dolphin-v2 model: [Apache 2.0](https://huggingface.co/ByteDance/Dolphin-v2) — ByteDance
|