docs: add README and MIT LICENSE
Covers hardware requirements, Docker Compose quickstart, /extract API reference, CF_DOCUVISION_URL wiring, and kiwi#150 callout for the self-hosted CF_ORCH_URL code gap.
This commit is contained in:
parent
47d4dfc786
commit
cf0e2fa649
2 changed files with 207 additions and 0 deletions
21
LICENSE
Normal file
21
LICENSE
Normal file
|
|
@ -0,0 +1,21 @@
|
||||||
|
MIT License
|
||||||
|
|
||||||
|
Copyright (c) 2024 CircuitForge LLC
|
||||||
|
|
||||||
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
|
of this software and associated documentation files (the "Software"), to deal
|
||||||
|
in the Software without restriction, including without limitation the rights
|
||||||
|
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||||
|
copies of the Software, and to permit persons to whom the Software is
|
||||||
|
furnished to do so, subject to the following conditions:
|
||||||
|
|
||||||
|
The above copyright notice and this permission notice shall be included in all
|
||||||
|
copies or substantial portions of the Software.
|
||||||
|
|
||||||
|
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||||
|
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||||
|
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||||
|
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||||
|
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||||
|
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||||
|
SOFTWARE.
|
||||||
186
README.md
Normal file
186
README.md
Normal file
|
|
@ -0,0 +1,186 @@
|
||||||
|
# cf-docuvision
|
||||||
|
|
||||||
|
Document parsing service for CircuitForge products. Parses scanned documents, PDFs, forms, and receipts into structured elements (headings, paragraphs, tables, figures) using [Dolphin-v2](https://huggingface.co/ByteDance/Dolphin-v2) (ByteDance, Apache 2.0).
|
||||||
|
|
||||||
|
**Status:** v0.1.0 — production-ready for single-page documents.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
### Hardware
|
||||||
|
|
||||||
|
| GPU VRAM | Result |
|
||||||
|
|----------|--------|
|
||||||
|
| 16GB+ | Recommended — fast single-page parsing (1–3 seconds) |
|
||||||
|
| 8GB | Minimum — works for most documents |
|
||||||
|
| Under 8GB | Likely CUDA out-of-memory on model load |
|
||||||
|
| CPU only | Works — expect 60–120 seconds per page |
|
||||||
|
|
||||||
|
If you are on CPU or have limited VRAM, set `CF_DOCUVISION_DEVICE=cpu` before starting. The service logs a warning and continues — CPU fallback is slow but functional.
|
||||||
|
|
||||||
|
### Model download
|
||||||
|
|
||||||
|
First startup downloads approximately **5–8 GB** from HuggingFace. Subsequent runs use the local cache. No HuggingFace account required (model is Apache 2.0, not gated).
|
||||||
|
|
||||||
|
To speed up large downloads:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install hf-transfer
|
||||||
|
export HF_HUB_ENABLE_HF_TRANSFER=1
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Quick start (Docker Compose)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git clone https://git.opensourcesolarpunk.com/Circuit-Forge/cf-docuvision.git
|
||||||
|
cd cf-docuvision
|
||||||
|
cp .env.example .env # edit if needed
|
||||||
|
docker compose up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
Watch model load progress:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker compose logs -f cf-docuvision
|
||||||
|
```
|
||||||
|
|
||||||
|
The service is ready when logs show `cf-docuvision: ready`. Confirm:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl http://localhost:8003/health
|
||||||
|
# {"status": "ok", "model": "ByteDance/Dolphin-v2"}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Direct Python run
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install -r requirements.txt
|
||||||
|
CF_DOCUVISION_DEVICE=cuda uvicorn app.main:app --host 0.0.0.0 --port 8003
|
||||||
|
```
|
||||||
|
|
||||||
|
CPU fallback:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
CF_DOCUVISION_DEVICE=cpu uvicorn app.main:app --host 0.0.0.0 --port 8003
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
| Variable | Default | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| `CF_DOCUVISION_MODEL` | `ByteDance/Dolphin-v2` | HuggingFace model ID or local path |
|
||||||
|
| `CF_DOCUVISION_DEVICE` | `auto` | `cuda`, `cpu`, or `auto` (GPU if available) |
|
||||||
|
| `CF_DOCUVISION_PORT` | `8003` | Service port (Docker Compose only) |
|
||||||
|
|
||||||
|
To skip HuggingFace download, set `CF_DOCUVISION_MODEL` to a local directory:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Optional: uncomment the volume mount in compose.yml
|
||||||
|
# - /Library/Assets/LLM/dolphin-v2:/models/dolphin-v2:ro
|
||||||
|
CF_DOCUVISION_MODEL=/models/dolphin-v2
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Connecting from a product
|
||||||
|
|
||||||
|
Set `CF_DOCUVISION_URL` in the product's `.env`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
CF_DOCUVISION_URL=http://localhost:8003
|
||||||
|
```
|
||||||
|
|
||||||
|
Products using cf-core's `DocuvisionClient` pick this up automatically.
|
||||||
|
|
||||||
|
> **Kiwi note:** Kiwi v0.10.x gates the docuvision call on `CF_ORCH_URL` — `CF_DOCUVISION_URL` is not yet read directly. Fix is tracked at [kiwi#150](https://git.opensourcesolarpunk.com/Circuit-Forge/kiwi/issues/150). Once that ships, set `CF_DOCUVISION_URL` in Kiwi's `.env` and leave `CF_ORCH_URL` unset.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## API reference
|
||||||
|
|
||||||
|
### `GET /health`
|
||||||
|
|
||||||
|
Returns 200 when the model is loaded and ready.
|
||||||
|
|
||||||
|
```json
|
||||||
|
{"status": "ok", "model": "ByteDance/Dolphin-v2"}
|
||||||
|
```
|
||||||
|
|
||||||
|
Returns 503 while the model is still loading at startup.
|
||||||
|
|
||||||
|
### `POST /extract`
|
||||||
|
|
||||||
|
Parse a document image into structured elements.
|
||||||
|
|
||||||
|
**Request:**
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"image_b64": "<base64-encoded image bytes (JPEG, PNG, TIFF)>",
|
||||||
|
"hint": "auto"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
`hint` controls extraction focus:
|
||||||
|
|
||||||
|
| Value | Behaviour |
|
||||||
|
|---|---|
|
||||||
|
| `auto` | General parsing — balanced detection of all element types (default) |
|
||||||
|
| `table` | Prioritise HTML table rendering |
|
||||||
|
| `text` | Prioritise text content and heading hierarchy |
|
||||||
|
| `form` | Prioritise form fields and key-value pairs |
|
||||||
|
|
||||||
|
**Response:**
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"elements": [
|
||||||
|
{"type": "heading", "text": "Invoice", "bbox": [0.05, 0.02, 0.9, 0.08]},
|
||||||
|
{"type": "paragraph", "text": "Due date: 2026-07-01", "bbox": [0.05, 0.10, 0.6, 0.14]}
|
||||||
|
],
|
||||||
|
"tables": [
|
||||||
|
{"html": "<table>...</table>", "bbox": [0.05, 0.20, 0.95, 0.60]}
|
||||||
|
],
|
||||||
|
"raw_text": "Invoice\nDue date: 2026-07-01\n...",
|
||||||
|
"metadata": {
|
||||||
|
"source": "cf-docuvision",
|
||||||
|
"model": "ByteDance/Dolphin-v2",
|
||||||
|
"hint": "auto",
|
||||||
|
"elapsed_ms": 1240
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Element types: `heading`, `paragraph`, `list`, `table`, `figure`, `formula`, `code`.
|
||||||
|
|
||||||
|
`bbox` values are normalised to [0, 1] relative to the image dimensions.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
**`CUDA out of memory` at startup**
|
||||||
|
Dolphin-v2 requires ~8GB VRAM. Set `CF_DOCUVISION_DEVICE=cpu` to use CPU mode instead.
|
||||||
|
|
||||||
|
**`503 Model not loaded` on first request**
|
||||||
|
The model is still loading. Watch logs for `cf-docuvision: ready` before sending requests. The Docker healthcheck waits up to 120 seconds.
|
||||||
|
|
||||||
|
**Very slow processing**
|
||||||
|
CPU mode is expected to take 60–120 seconds per page. This is normal. If you need speed, a GPU is required.
|
||||||
|
|
||||||
|
**`trust_remote_code=True` warning**
|
||||||
|
Dolphin-v2 requires `trust_remote_code=True` for its custom architecture. The model is Apache 2.0 and auditable at [huggingface.co/ByteDance/Dolphin-v2](https://huggingface.co/ByteDance/Dolphin-v2).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
- cf-docuvision service: [MIT](LICENSE) — CircuitForge LLC
|
||||||
|
- Dolphin-v2 model: [Apache 2.0](https://huggingface.co/ByteDance/Dolphin-v2) — ByteDance
|
||||||
Loading…
Reference in a new issue