cf-docuvision/README.md
pyr0ball cf0e2fa649 docs: add README and MIT LICENSE
Covers hardware requirements, Docker Compose quickstart, /extract API
reference, CF_DOCUVISION_URL wiring, and kiwi#150 callout for the
self-hosted CF_ORCH_URL code gap.
2026-06-05 11:59:25 -07:00

186 lines
5 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# cf-docuvision
Document parsing service for CircuitForge products. Parses scanned documents, PDFs, forms, and receipts into structured elements (headings, paragraphs, tables, figures) using [Dolphin-v2](https://huggingface.co/ByteDance/Dolphin-v2) (ByteDance, Apache 2.0).
**Status:** v0.1.0 — production-ready for single-page documents.
---
## Prerequisites
### Hardware
| GPU VRAM | Result |
|----------|--------|
| 16GB+ | Recommended — fast single-page parsing (13 seconds) |
| 8GB | Minimum — works for most documents |
| Under 8GB | Likely CUDA out-of-memory on model load |
| CPU only | Works — expect 60120 seconds per page |
If you are on CPU or have limited VRAM, set `CF_DOCUVISION_DEVICE=cpu` before starting. The service logs a warning and continues — CPU fallback is slow but functional.
### Model download
First startup downloads approximately **58 GB** from HuggingFace. Subsequent runs use the local cache. No HuggingFace account required (model is Apache 2.0, not gated).
To speed up large downloads:
```bash
pip install hf-transfer
export HF_HUB_ENABLE_HF_TRANSFER=1
```
---
## Quick start (Docker Compose)
```bash
git clone https://git.opensourcesolarpunk.com/Circuit-Forge/cf-docuvision.git
cd cf-docuvision
cp .env.example .env # edit if needed
docker compose up -d
```
Watch model load progress:
```bash
docker compose logs -f cf-docuvision
```
The service is ready when logs show `cf-docuvision: ready`. Confirm:
```bash
curl http://localhost:8003/health
# {"status": "ok", "model": "ByteDance/Dolphin-v2"}
```
---
## Direct Python run
```bash
pip install -r requirements.txt
CF_DOCUVISION_DEVICE=cuda uvicorn app.main:app --host 0.0.0.0 --port 8003
```
CPU fallback:
```bash
CF_DOCUVISION_DEVICE=cpu uvicorn app.main:app --host 0.0.0.0 --port 8003
```
---
## Configuration
| Variable | Default | Description |
|---|---|---|
| `CF_DOCUVISION_MODEL` | `ByteDance/Dolphin-v2` | HuggingFace model ID or local path |
| `CF_DOCUVISION_DEVICE` | `auto` | `cuda`, `cpu`, or `auto` (GPU if available) |
| `CF_DOCUVISION_PORT` | `8003` | Service port (Docker Compose only) |
To skip HuggingFace download, set `CF_DOCUVISION_MODEL` to a local directory:
```bash
# Optional: uncomment the volume mount in compose.yml
# - /Library/Assets/LLM/dolphin-v2:/models/dolphin-v2:ro
CF_DOCUVISION_MODEL=/models/dolphin-v2
```
---
## Connecting from a product
Set `CF_DOCUVISION_URL` in the product's `.env`:
```bash
CF_DOCUVISION_URL=http://localhost:8003
```
Products using cf-core's `DocuvisionClient` pick this up automatically.
> **Kiwi note:** Kiwi v0.10.x gates the docuvision call on `CF_ORCH_URL` — `CF_DOCUVISION_URL` is not yet read directly. Fix is tracked at [kiwi#150](https://git.opensourcesolarpunk.com/Circuit-Forge/kiwi/issues/150). Once that ships, set `CF_DOCUVISION_URL` in Kiwi's `.env` and leave `CF_ORCH_URL` unset.
---
## API reference
### `GET /health`
Returns 200 when the model is loaded and ready.
```json
{"status": "ok", "model": "ByteDance/Dolphin-v2"}
```
Returns 503 while the model is still loading at startup.
### `POST /extract`
Parse a document image into structured elements.
**Request:**
```json
{
"image_b64": "<base64-encoded image bytes (JPEG, PNG, TIFF)>",
"hint": "auto"
}
```
`hint` controls extraction focus:
| Value | Behaviour |
|---|---|
| `auto` | General parsing — balanced detection of all element types (default) |
| `table` | Prioritise HTML table rendering |
| `text` | Prioritise text content and heading hierarchy |
| `form` | Prioritise form fields and key-value pairs |
**Response:**
```json
{
"elements": [
{"type": "heading", "text": "Invoice", "bbox": [0.05, 0.02, 0.9, 0.08]},
{"type": "paragraph", "text": "Due date: 2026-07-01", "bbox": [0.05, 0.10, 0.6, 0.14]}
],
"tables": [
{"html": "<table>...</table>", "bbox": [0.05, 0.20, 0.95, 0.60]}
],
"raw_text": "Invoice\nDue date: 2026-07-01\n...",
"metadata": {
"source": "cf-docuvision",
"model": "ByteDance/Dolphin-v2",
"hint": "auto",
"elapsed_ms": 1240
}
}
```
Element types: `heading`, `paragraph`, `list`, `table`, `figure`, `formula`, `code`.
`bbox` values are normalised to [0, 1] relative to the image dimensions.
---
## Troubleshooting
**`CUDA out of memory` at startup**
Dolphin-v2 requires ~8GB VRAM. Set `CF_DOCUVISION_DEVICE=cpu` to use CPU mode instead.
**`503 Model not loaded` on first request**
The model is still loading. Watch logs for `cf-docuvision: ready` before sending requests. The Docker healthcheck waits up to 120 seconds.
**Very slow processing**
CPU mode is expected to take 60120 seconds per page. This is normal. If you need speed, a GPU is required.
**`trust_remote_code=True` warning**
Dolphin-v2 requires `trust_remote_code=True` for its custom architecture. The model is Apache 2.0 and auditable at [huggingface.co/ByteDance/Dolphin-v2](https://huggingface.co/ByteDance/Dolphin-v2).
---
## License
- cf-docuvision service: [MIT](LICENSE) — CircuitForge LLC
- Dolphin-v2 model: [Apache 2.0](https://huggingface.co/ByteDance/Dolphin-v2) — ByteDance