documents module: StructuredDocument interface + ingest() client (Dolphin-v2 runtime in cf-orch) #7
Labels
No labels
architecture
backlog
enhancement
module:documents
module:hardware
module:manage
module:pipeline
module:voice
priority:backlog
priority:high
priority:medium
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: Circuit-Forge/circuitforge-core#7
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Scope (interface layer only)
This issue covers the client-side abstraction in
circuitforge-core. The Dolphin-v2 inference service runtime lives incircuitforge-orch(see Circuit-Forge/circuitforge-orch#TBD).Products import from
circuitforge_core.documents— they never talk to cf-orch directly.Module
API
StructuredDocumentcontains:elements: list[Element]— typed, ordered (heading, paragraph, list, table, figure, formula, code…)raw_text: str— full extracted texttables: list[ParsedTable]— HTML tablesmetadata: dict— page dimensions, source type, confidenceRouting
ingest()delegates toLLMRouter.complete()withimages=[...]. The router handles backend selection:vision_servicebackend is configured and reachable (cf-orch Dolphin-v2 agent) → use itopenai_compatoranthropicbackend withsupports_images=truefallback.pyLLM-only parsing (lower fidelity, no layout awareness)No direct cf-orch dependency — routing goes through
LLMRouteras normal.Consumers
kiwi— recipe card / receipt scanningfalcon— government form field identificationperegrine— resume image parsinggodwit— identity document bundle parsingAcceptance criteria
StructuredDocument+Elementdataclasses covering all 21 Dolphin-v2 element typesingest()routes throughLLMRoutertransparentlyfallback.pyhandles graceful degradation when no vision backend availabledocuments module: shared Dolphin-v2 ingestion pipeline (images → StructuredDocument)to documents module: StructuredDocument interface + ingest() client (Dolphin-v2 runtime in cf-orch)