Wire receipt OCR to cf-docuvision managed service #10

New issue

Closed

opened 2026-04-01 22:59:37 -07:00 by pyr0ball · 1 comment

pyr0ball commented

2026-04-01 22:59:37 -07:00

Owner

Background

Kiwi currently loads Qwen2.5-VL-7B-Instruct in-process inside app/services/ocr/vl_model.py. This monopolizes VRAM for the duration the process is alive and cannot be shared with other CF products on the same GPU.

cf-orch already has a cf-docuvision service slot in all GPU profiles (6gb/8gb/16gb/24gb), backed by ByteDance/Dolphin-v2 (Qwen2.5-VL-3B, purpose-built document parser). Once cf-docuvision is implemented as a managed HTTP service, Kiwi should call it via HTTP rather than loading the model itself.

Tasks

Implement cf-docuvision HTTP service in circuitforge-core (tracked in circuitforge-core)
Add DOCUVISION_URL config key to app/core/config.py (default http://localhost:<cf-docuvision-port>/extract)
Create app/services/ocr/docuvision_client.py — thin HTTP client matching the same extract_receipt_data() interface as VisionLanguageOCR
Make ReceiptService select backend based on config: if DOCUVISION_URL is set and reachable, use HTTP client; else fall back to in-process VisionLanguageOCR
Remove in-process model loading once cf-docuvision is stable

Interim state

Model upgraded from Qwen2-VL-2B-Instruct → Qwen2.5-VL-7B-Instruct in vl_model.py:43 — same API, better model, no pipeline changes. This runs until cf-docuvision HTTP service is live.

circuitforge-core: implement cf-docuvision managed service

## Background Kiwi currently loads `Qwen2.5-VL-7B-Instruct` in-process inside `app/services/ocr/vl_model.py`. This monopolizes VRAM for the duration the process is alive and cannot be shared with other CF products on the same GPU. `cf-orch` already has a `cf-docuvision` service slot in all GPU profiles (6gb/8gb/16gb/24gb), backed by **ByteDance/Dolphin-v2** (Qwen2.5-VL-3B, purpose-built document parser). Once `cf-docuvision` is implemented as a managed HTTP service, Kiwi should call it via HTTP rather than loading the model itself. ## Tasks - [ ] Implement `cf-docuvision` HTTP service in `circuitforge-core` (tracked in circuitforge-core) - [ ] Add `DOCUVISION_URL` config key to `app/core/config.py` (default `http://localhost:<cf-docuvision-port>/extract`) - [ ] Create `app/services/ocr/docuvision_client.py` — thin HTTP client matching the same `extract_receipt_data()` interface as `VisionLanguageOCR` - [ ] Make `ReceiptService` select backend based on config: if `DOCUVISION_URL` is set and reachable, use HTTP client; else fall back to in-process `VisionLanguageOCR` - [ ] Remove in-process model loading once cf-docuvision is stable ## Interim state Model upgraded from `Qwen2-VL-2B-Instruct` → `Qwen2.5-VL-7B-Instruct` in `vl_model.py:43` — same API, better model, no pipeline changes. This runs until cf-docuvision HTTP service is live. ## Related - circuitforge-core: implement cf-docuvision managed service

pyr0ball closed this issue

2026-04-02 22:13:45 -07:00

pyr0ball commented

2026-04-02 22:14:01 -07:00

Author

Owner

Closing — DocuvisionClient added with fast-path OCR in app/api/endpoints/ocr.py. Receipt upload triggers cf-docuvision when available, falls back to local pipeline. Shipped in commit 7aebe96.

Closing — DocuvisionClient added with fast-path OCR in `app/api/endpoints/ocr.py`. Receipt upload triggers cf-docuvision when available, falls back to local pipeline. Shipped in commit `7aebe96`.