Vision module: screenshot OCR for account portal parsing #17

Open
opened 2026-04-05 21:36:57 -07:00 by pyr0ball · 0 comments
Owner

Use Dolphin-v2 (ByteDance universal document parser, earmarked in memory) to extract structured data from account portal screenshots.

Use cases

  • Parse billing statement screenshot → extract charges, account number, dispute-able items
  • Read IVR hold confirmation screen → extract case/ticket number
  • OCR chat transcript image → extract conversation text for case record

Notes

  • Dolphin-v2: ~8GB VRAM, earmarked for cf-core doc ingestion pipeline
  • Shared with kiwi (receipt OCR) and peregrine (document parsing)
  • Implement via cf-core vision module stub (currently a stub in v0.7.0)
  • BSL feature — Paid tier required
Use Dolphin-v2 (ByteDance universal document parser, earmarked in memory) to extract structured data from account portal screenshots. ## Use cases - Parse billing statement screenshot → extract charges, account number, dispute-able items - Read IVR hold confirmation screen → extract case/ticket number - OCR chat transcript image → extract conversation text for case record ## Notes - Dolphin-v2: ~8GB VRAM, earmarked for cf-core doc ingestion pipeline - Shared with kiwi (receipt OCR) and peregrine (document parsing) - Implement via cf-core `vision` module stub (currently a stub in v0.7.0) - BSL feature — Paid tier required
pyr0ball added this to the Public Launch milestone 2026-04-05 21:36:57 -07:00
Sign in to join this conversation.
No labels
backlog
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Circuit-Forge/osprey#17
No description provided.