diff --git a/LICENSE b/LICENSE new file mode 100644 index 0000000..15622c4 --- /dev/null +++ b/LICENSE @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) 2024 CircuitForge LLC + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/README.md b/README.md new file mode 100644 index 0000000..e0863ed --- /dev/null +++ b/README.md @@ -0,0 +1,186 @@ +# cf-docuvision + +Document parsing service for CircuitForge products. Parses scanned documents, PDFs, forms, and receipts into structured elements (headings, paragraphs, tables, figures) using [Dolphin-v2](https://huggingface.co/ByteDance/Dolphin-v2) (ByteDance, Apache 2.0). + +**Status:** v0.1.0 — production-ready for single-page documents. + +--- + +## Prerequisites + +### Hardware + +| GPU VRAM | Result | +|----------|--------| +| 16GB+ | Recommended — fast single-page parsing (1–3 seconds) | +| 8GB | Minimum — works for most documents | +| Under 8GB | Likely CUDA out-of-memory on model load | +| CPU only | Works — expect 60–120 seconds per page | + +If you are on CPU or have limited VRAM, set `CF_DOCUVISION_DEVICE=cpu` before starting. The service logs a warning and continues — CPU fallback is slow but functional. + +### Model download + +First startup downloads approximately **5–8 GB** from HuggingFace. Subsequent runs use the local cache. No HuggingFace account required (model is Apache 2.0, not gated). + +To speed up large downloads: + +```bash +pip install hf-transfer +export HF_HUB_ENABLE_HF_TRANSFER=1 +``` + +--- + +## Quick start (Docker Compose) + +```bash +git clone https://git.opensourcesolarpunk.com/Circuit-Forge/cf-docuvision.git +cd cf-docuvision +cp .env.example .env # edit if needed +docker compose up -d +``` + +Watch model load progress: + +```bash +docker compose logs -f cf-docuvision +``` + +The service is ready when logs show `cf-docuvision: ready`. Confirm: + +```bash +curl http://localhost:8003/health +# {"status": "ok", "model": "ByteDance/Dolphin-v2"} +``` + +--- + +## Direct Python run + +```bash +pip install -r requirements.txt +CF_DOCUVISION_DEVICE=cuda uvicorn app.main:app --host 0.0.0.0 --port 8003 +``` + +CPU fallback: + +```bash +CF_DOCUVISION_DEVICE=cpu uvicorn app.main:app --host 0.0.0.0 --port 8003 +``` + +--- + +## Configuration + +| Variable | Default | Description | +|---|---|---| +| `CF_DOCUVISION_MODEL` | `ByteDance/Dolphin-v2` | HuggingFace model ID or local path | +| `CF_DOCUVISION_DEVICE` | `auto` | `cuda`, `cpu`, or `auto` (GPU if available) | +| `CF_DOCUVISION_PORT` | `8003` | Service port (Docker Compose only) | + +To skip HuggingFace download, set `CF_DOCUVISION_MODEL` to a local directory: + +```bash +# Optional: uncomment the volume mount in compose.yml +# - /Library/Assets/LLM/dolphin-v2:/models/dolphin-v2:ro +CF_DOCUVISION_MODEL=/models/dolphin-v2 +``` + +--- + +## Connecting from a product + +Set `CF_DOCUVISION_URL` in the product's `.env`: + +```bash +CF_DOCUVISION_URL=http://localhost:8003 +``` + +Products using cf-core's `DocuvisionClient` pick this up automatically. + +> **Kiwi note:** Kiwi v0.10.x gates the docuvision call on `CF_ORCH_URL` — `CF_DOCUVISION_URL` is not yet read directly. Fix is tracked at [kiwi#150](https://git.opensourcesolarpunk.com/Circuit-Forge/kiwi/issues/150). Once that ships, set `CF_DOCUVISION_URL` in Kiwi's `.env` and leave `CF_ORCH_URL` unset. + +--- + +## API reference + +### `GET /health` + +Returns 200 when the model is loaded and ready. + +```json +{"status": "ok", "model": "ByteDance/Dolphin-v2"} +``` + +Returns 503 while the model is still loading at startup. + +### `POST /extract` + +Parse a document image into structured elements. + +**Request:** + +```json +{ + "image_b64": "", + "hint": "auto" +} +``` + +`hint` controls extraction focus: + +| Value | Behaviour | +|---|---| +| `auto` | General parsing — balanced detection of all element types (default) | +| `table` | Prioritise HTML table rendering | +| `text` | Prioritise text content and heading hierarchy | +| `form` | Prioritise form fields and key-value pairs | + +**Response:** + +```json +{ + "elements": [ + {"type": "heading", "text": "Invoice", "bbox": [0.05, 0.02, 0.9, 0.08]}, + {"type": "paragraph", "text": "Due date: 2026-07-01", "bbox": [0.05, 0.10, 0.6, 0.14]} + ], + "tables": [ + {"html": "...
", "bbox": [0.05, 0.20, 0.95, 0.60]} + ], + "raw_text": "Invoice\nDue date: 2026-07-01\n...", + "metadata": { + "source": "cf-docuvision", + "model": "ByteDance/Dolphin-v2", + "hint": "auto", + "elapsed_ms": 1240 + } +} +``` + +Element types: `heading`, `paragraph`, `list`, `table`, `figure`, `formula`, `code`. + +`bbox` values are normalised to [0, 1] relative to the image dimensions. + +--- + +## Troubleshooting + +**`CUDA out of memory` at startup** +Dolphin-v2 requires ~8GB VRAM. Set `CF_DOCUVISION_DEVICE=cpu` to use CPU mode instead. + +**`503 Model not loaded` on first request** +The model is still loading. Watch logs for `cf-docuvision: ready` before sending requests. The Docker healthcheck waits up to 120 seconds. + +**Very slow processing** +CPU mode is expected to take 60–120 seconds per page. This is normal. If you need speed, a GPU is required. + +**`trust_remote_code=True` warning** +Dolphin-v2 requires `trust_remote_code=True` for its custom architecture. The model is Apache 2.0 and auditable at [huggingface.co/ByteDance/Dolphin-v2](https://huggingface.co/ByteDance/Dolphin-v2). + +--- + +## License + +- cf-docuvision service: [MIT](LICENSE) — CircuitForge LLC +- Dolphin-v2 model: [Apache 2.0](https://huggingface.co/ByteDance/Dolphin-v2) — ByteDance