pagepiper/docs/user-guide/library.md

# Library

The library is the home screen. It shows all indexed documents and lets you add new ones.

## Adding documents

**Upload** — click **Upload PDF / EPUB** and select a file. Files up to 200 MB are accepted. The document is saved to `data/uploads/` and queued for indexing immediately.

**Scan** — set `PAGEPIPER_WATCH_DIR` to a directory in your `.env`, then click **Scan for PDFs**. Any PDF or EPUB not already in the library is queued. Re-scanning is safe; already-indexed documents are skipped.

## Document states

| Badge | Meaning |
|-------|---------|
| PROCESSING | Text extraction or embedding in progress |
| READY | Fully indexed and searchable |
| ERROR | Indexing failed — see the error message on the card |

## Ingestion progress

While a document is processing, its card shows a live progress bar:

- Animated sliding bar while text is being extracted (before page count is known)
- "Embedding N / M pages (X%)" once vectors are being written

The card refreshes automatically and emits a library reload when indexing completes.

## Re-indexing

Click **Re-index** on any document card to re-run the full ingest pipeline. This is useful after:

- Changing the `PAGEPIPER_EMBED_MODEL` (dimension mismatch auto-detected at startup, but you can also trigger manually)
- A failed ingest you want to retry
- Updating to a new version of Pagepiper with an improved extractor

## Removing a document

Click **Remove** to delete the document's metadata, page chunks, and vectors. The source file on disk is not deleted.

## Storage

All data lives in the directory set by `PAGEPIPER_DATA_DIR` (default: `data/`):

| File | Contents |
|------|---------|
| `pagepiper.db` | Document metadata, page chunks, chat feedback |
| `pagepiper_vecs.db` | sqlite-vec vector store |
| `uploads/` | Files added via browser upload |