pipeline: multimodal chunked pipeline — cf-docuvision page chunks → cf-text streaming #42

Closed
opened 2026-04-08 22:18:29 -07:00 by pyr0ball · 0 comments
Owner

Problem

For multi-page documents, the current ingest()generate() pattern processes the entire document before any text is generated. For a 10-page resume or government form, that is a long wait before the UI shows anything.

Desired behaviour

Page 1 extracted → text streamed to UI
Page 2 extracted → text streamed to UI
...

Design sketch

The pipeline module (currently a staging stub in cf-core) needs a MultimodalPipeline that:

  1. Accepts an image source (bytes or iterable of page bytes)
  2. Calls cf-docuvision per page (or per chunk)
  3. Feeds each StructuredDocument.raw_text chunk into cf-text generate_stream()
  4. Yields (page_idx, token) tuples to the caller for progressive UI rendering

VRAM considerations

  • 8GB GPU: Dolphin-v2 must offload before cf-text loads. cf-orch manages this via the VRAM budget — the pipeline pauses between steps and signals cf-orch to swap.
  • 16GB+ GPU: both services can be resident; pipeline runs without serialisation.
  • cf-orch service registry entry must include an offload_between_steps: true flag for nodes below 16GB.

Consumers

  • falcon — government forms (multi-page PDFs)
  • peregrine — resume analysis + cover letter generation in one pipeline
  • godwit — identity document bundle (multiple document types)
  • circuitforge-core#41 (cf-text module — closed)
  • Circuit-Forge/cf-docuvision (Dolphin-v2 service — scaffolded)
  • circuitforge-core pipeline stub
## Problem For multi-page documents, the current `ingest()` → `generate()` pattern processes the entire document before any text is generated. For a 10-page resume or government form, that is a long wait before the UI shows anything. ## Desired behaviour ``` Page 1 extracted → text streamed to UI Page 2 extracted → text streamed to UI ... ``` ## Design sketch The `pipeline` module (currently a staging stub in cf-core) needs a `MultimodalPipeline` that: 1. Accepts an image source (bytes or iterable of page bytes) 2. Calls cf-docuvision per page (or per chunk) 3. Feeds each `StructuredDocument.raw_text` chunk into cf-text `generate_stream()` 4. Yields `(page_idx, token)` tuples to the caller for progressive UI rendering ## VRAM considerations - 8GB GPU: Dolphin-v2 must offload before cf-text loads. cf-orch manages this via the VRAM budget — the pipeline pauses between steps and signals cf-orch to swap. - 16GB+ GPU: both services can be resident; pipeline runs without serialisation. - cf-orch service registry entry must include an `offload_between_steps: true` flag for nodes below 16GB. ## Consumers - `falcon` — government forms (multi-page PDFs) - `peregrine` — resume analysis + cover letter generation in one pipeline - `godwit` — identity document bundle (multiple document types) ## Related - `circuitforge-core#41` (cf-text module — closed) - `Circuit-Forge/cf-docuvision` (Dolphin-v2 service — scaffolded) - `circuitforge-core` pipeline stub
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Circuit-Forge/circuitforge-core#42
No description provided.