pagepiper/docs/user-guide/library.md

1.8 KiB

Library

The library is the home screen. It shows all indexed documents and lets you add new ones.

Adding documents

Upload — click Upload PDF / EPUB and select a file. Files up to 200 MB are accepted. The document is saved to data/uploads/ and queued for indexing immediately.

Scan — set PAGEPIPER_WATCH_DIR to a directory in your .env, then click Scan for PDFs. Any PDF or EPUB not already in the library is queued. Re-scanning is safe; already-indexed documents are skipped.

Document states

Badge Meaning
PROCESSING Text extraction or embedding in progress
READY Fully indexed and searchable
ERROR Indexing failed — see the error message on the card

Ingestion progress

While a document is processing, its card shows a live progress bar:

  • Animated sliding bar while text is being extracted (before page count is known)
  • "Embedding N / M pages (X%)" once vectors are being written

The card refreshes automatically and emits a library reload when indexing completes.

Re-indexing

Click Re-index on any document card to re-run the full ingest pipeline. This is useful after:

  • Changing the PAGEPIPER_EMBED_MODEL (dimension mismatch auto-detected at startup, but you can also trigger manually)
  • A failed ingest you want to retry
  • Updating to a new version of Pagepiper with an improved extractor

Removing a document

Click Remove to delete the document's metadata, page chunks, and vectors. The source file on disk is not deleted.

Storage

All data lives in the directory set by PAGEPIPER_DATA_DIR (default: data/):

File Contents
pagepiper.db Document metadata, page chunks, chat feedback
pagepiper_vecs.db sqlite-vec vector store
uploads/ Files added via browser upload