pipeline: multimodal chunked pipeline — cf-docuvision page chunks → cf-text streaming #42

New issue

Closed

opened 2026-04-08 22:18:29 -07:00 by pyr0ball · 0 comments

pyr0ball commented

2026-04-08 22:18:29 -07:00

Owner

Problem

For multi-page documents, the current ingest() → generate() pattern processes the entire document before any text is generated. For a 10-page resume or government form, that is a long wait before the UI shows anything.

Desired behaviour

Page 1 extracted → text streamed to UI
Page 2 extracted → text streamed to UI
...

Design sketch

The pipeline module (currently a staging stub in cf-core) needs a MultimodalPipeline that:

Accepts an image source (bytes or iterable of page bytes)
Calls cf-docuvision per page (or per chunk)
Feeds each StructuredDocument.raw_text chunk into cf-text generate_stream()
Yields (page_idx, token) tuples to the caller for progressive UI rendering

VRAM considerations

8GB GPU: Dolphin-v2 must offload before cf-text loads. cf-orch manages this via the VRAM budget — the pipeline pauses between steps and signals cf-orch to swap.
16GB+ GPU: both services can be resident; pipeline runs without serialisation.
cf-orch service registry entry must include an offload_between_steps: true flag for nodes below 16GB.

Consumers

falcon — government forms (multi-page PDFs)
peregrine — resume analysis + cover letter generation in one pipeline
godwit — identity document bundle (multiple document types)

circuitforge-core#41 (cf-text module — closed)
Circuit-Forge/cf-docuvision (Dolphin-v2 service — scaffolded)
circuitforge-core pipeline stub

## Problem For multi-page documents, the current `ingest()` → `generate()` pattern processes the entire document before any text is generated. For a 10-page resume or government form, that is a long wait before the UI shows anything. ## Desired behaviour ``` Page 1 extracted → text streamed to UI Page 2 extracted → text streamed to UI ... ``` ## Design sketch The `pipeline` module (currently a staging stub in cf-core) needs a `MultimodalPipeline` that: 1. Accepts an image source (bytes or iterable of page bytes) 2. Calls cf-docuvision per page (or per chunk) 3. Feeds each `StructuredDocument.raw_text` chunk into cf-text `generate_stream()` 4. Yields `(page_idx, token)` tuples to the caller for progressive UI rendering ## VRAM considerations - 8GB GPU: Dolphin-v2 must offload before cf-text loads. cf-orch manages this via the VRAM budget — the pipeline pauses between steps and signals cf-orch to swap. - 16GB+ GPU: both services can be resident; pipeline runs without serialisation. - cf-orch service registry entry must include an `offload_between_steps: true` flag for nodes below 16GB. ## Consumers - `falcon` — government forms (multi-page PDFs) - `peregrine` — resume analysis + cover letter generation in one pipeline - `godwit` — identity document bundle (multiple document types) ## Related - `circuitforge-core#41` (cf-text module — closed) - `Circuit-Forge/cf-docuvision` (Dolphin-v2 service — scaffolded) - `circuitforge-core` pipeline stub