Add tesseract.js for client-side receipt OCR (privacy-first, local image processing) #148
Labels
No labels
accessibility
backlog
beta-feedback
bug
duplicate
enhancement
feature-request
help wanted
invalid
needs-design
needs-triage
question
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: Circuit-Forge/kiwi#148
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
tesseract.js (https://github.com/naptha/tesseract.js/) is a WebAssembly port of the Tesseract OCR engine. It runs entirely in the browser with no server round-trip required. Apache 2.0, 38.1k stars, very mature.
Privacy angle
With WASM-based OCR, the user's receipt image never leaves their device. Only the extracted text string is sent to the backend. This is a meaningful privacy improvement over any server-side OCR pipeline and aligns directly with CF's local-inference-first principle.
Proposed integration
Free tier (local):
Paid tier (cloud):
Key limitations to plan around
cf-core documents module
tesseract.js serves as the frontend complement to whatever backend OCR cf-core provides. The pattern: client extracts text locally (free tier), backend refines if needed (paid tier).
Also relevant
References