pagepiper/docs/user-guide/library.md

48 lines
1.8 KiB
Markdown

# Library
The library is the home screen. It shows all indexed documents and lets you add new ones.
## Adding documents
**Upload** — click **Upload PDF / EPUB** and select a file. Files up to 200 MB are accepted. The document is saved to `data/uploads/` and queued for indexing immediately.
**Scan** — set `PAGEPIPER_WATCH_DIR` to a directory in your `.env`, then click **Scan for PDFs**. Any PDF or EPUB not already in the library is queued. Re-scanning is safe; already-indexed documents are skipped.
## Document states
| Badge | Meaning |
|-------|---------|
| PROCESSING | Text extraction or embedding in progress |
| READY | Fully indexed and searchable |
| ERROR | Indexing failed — see the error message on the card |
## Ingestion progress
While a document is processing, its card shows a live progress bar:
- Animated sliding bar while text is being extracted (before page count is known)
- "Embedding N / M pages (X%)" once vectors are being written
The card refreshes automatically and emits a library reload when indexing completes.
## Re-indexing
Click **Re-index** on any document card to re-run the full ingest pipeline. This is useful after:
- Changing the `PAGEPIPER_EMBED_MODEL` (dimension mismatch auto-detected at startup, but you can also trigger manually)
- A failed ingest you want to retry
- Updating to a new version of Pagepiper with an improved extractor
## Removing a document
Click **Remove** to delete the document's metadata, page chunks, and vectors. The source file on disk is not deleted.
## Storage
All data lives in the directory set by `PAGEPIPER_DATA_DIR` (default: `data/`):
| File | Contents |
|------|---------|
| `pagepiper.db` | Document metadata, page chunks, chat feedback |
| `pagepiper_vecs.db` | sqlite-vec vector store |
| `uploads/` | Files added via browser upload |