Adds circuitforge_core/documents/pdf.py with: - PageChunk frozen dataclass (page_number, text, source, word_count) - PDFExtractor.chunk_pages() — pdfplumber text-layer per page, OCR fallback via pytesseract for sparse pages - Module-level graceful ImportError guard on pdfplumber (patchable, follows cf-core optional-extra pattern) - pdf and pdf-ocr optional extras declared in pyproject.toml 3 tests, all passing. |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| test_client.py | ||
| test_ingest.py | ||
| test_models.py | ||
| test_pdf.py | ||