circuitforge-core

History

pyr0ball bbb146b361 feat(documents): add PDFExtractor text-layer extraction and PageChunk Adds circuitforge_core/documents/pdf.py with: - PageChunk frozen dataclass (page_number, text, source, word_count) - PDFExtractor.chunk_pages() — pdfplumber text-layer per page, OCR fallback via pytesseract for sparse pages - Module-level graceful ImportError guard on pdfplumber (patchable, follows cf-core optional-extra pattern) - pdf and pdf-ocr optional extras declared in pyproject.toml 3 tests, all passing.		2026-05-04 08:33:10 -07:00
..
__init__.py	feat: hardware detection, cf-docuvision service, documents ingestion pipeline	2026-04-02 18:53:25 -07:00
test_client.py	feat: hardware detection, cf-docuvision service, documents ingestion pipeline	2026-04-02 18:53:25 -07:00
test_ingest.py	feat: hardware detection, cf-docuvision service, documents ingestion pipeline	2026-04-02 18:53:25 -07:00
test_models.py	feat: hardware detection, cf-docuvision service, documents ingestion pipeline	2026-04-02 18:53:25 -07:00
test_pdf.py	feat(documents): add PDFExtractor text-layer extraction and PageChunk	2026-05-04 08:33:10 -07:00