Vision module: screenshot OCR for account portal parsing #17

New issue

Open

opened 2026-04-05 21:36:57 -07:00 by pyr0ball · 0 comments

pyr0ball commented

2026-04-05 21:36:57 -07:00

Owner

Use Dolphin-v2 (ByteDance universal document parser, earmarked in memory) to extract structured data from account portal screenshots.

Use cases

Parse billing statement screenshot → extract charges, account number, dispute-able items
Read IVR hold confirmation screen → extract case/ticket number
OCR chat transcript image → extract conversation text for case record

Notes

Dolphin-v2: ~8GB VRAM, earmarked for cf-core doc ingestion pipeline
Shared with kiwi (receipt OCR) and peregrine (document parsing)
Implement via cf-core vision module stub (currently a stub in v0.7.0)
BSL feature — Paid tier required

Use Dolphin-v2 (ByteDance universal document parser, earmarked in memory) to extract structured data from account portal screenshots. ## Use cases - Parse billing statement screenshot → extract charges, account number, dispute-able items - Read IVR hold confirmation screen → extract case/ticket number - OCR chat transcript image → extract conversation text for case record ## Notes - Dolphin-v2: ~8GB VRAM, earmarked for cf-core doc ingestion pipeline - Shared with kiwi (receipt OCR) and peregrine (document parsing) - Implement via cf-core `vision` module stub (currently a stub in v0.7.0) - BSL feature — Paid tier required

pyr0ball added this to the Public Launch milestone 2026-04-05 21:36:57 -07:00

No labels

backlog

No milestone

No project

No assignees

1 participant

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Circuit-Forge/osprey#17

No description provided.