pagepiper/README.md

5.4 KiB

Pagepiper

Search your document library. Get answers with exact page citations.

Status License: MIT / BSL 1.1 Version

Self-hosted PDF and EPUB search with BM25 (Best Match 25) full-text indexing and LLM (large language model) synthesis. Drop your documents in, ask a question, get an answer that tells you exactly which page to turn to.

Built for TTRPG (tabletop roleplaying game) players who are tired of ctrl-F'ing through six-hundred-page rulebooks. Works equally well for legal research, technical manuals, academic papers, or any personal document library you want to query in plain language.

No cloud required. Your files stay on your machine.


Screenshots

Library

Library view — documents listed with ingest status and page counts

Chat with citations

Chat view — answer with source document and page number for every claim


Why Pagepiper?

  • Your library, not ours. Documents are indexed and stored locally. Nothing is sent to a third-party service unless you explicitly configure a cloud LLM.
  • Works without an LLM. BM25 full-text search runs entirely inside the Docker container. No Ollama, no API key, no GPU required for keyword search.
  • Answers cite their sources. Every LLM response includes the document name and page number it drew from. You can verify or dispute every answer.
  • Hybrid search when you want it. Connect a local Ollama instance to unlock semantic (vector) search that finds relevant passages even when your question doesn't use the exact words in the text.
  • Open ingest pipeline. The indexing and search layer is MIT-licensed. Add support for new formats, improve the PDF parser, contribute — the community benefits directly.

Quick Start

Prerequisites: Docker and Docker Compose. Optionally Ollama for LLM-synthesized answers.

git clone https://git.opensourcesolarpunk.com/Circuit-Forge/pagepiper
cd pagepiper
cp .env.example .env
./manage.sh start

Open http://localhost:8521.

Configure

Open .env and set your paths:

# Where Pagepiper stores its SQLite index and uploaded files
PAGEPIPER_DATA_DIR=./data

# Directory to scan for existing PDFs/EPUBs (used by the Scan button)
PAGEPIPER_BOOKS_DIR=/path/to/your/documents

To unlock LLM synthesis and semantic search, add your Ollama endpoint:

PAGEPIPER_OLLAMA_URL=http://localhost:11434
PAGEPIPER_CHAT_MODEL=mistral:7b
PAGEPIPER_EMBED_MODEL=nomic-embed-text

Add documents

Upload via browser — click Upload in the Library view. Files save to data/uploads/ and index automatically.

Scan a directory — set PAGEPIPER_BOOKS_DIR in .env, then click Scan. Pagepiper finds all files recursively and queues them.


Supported Formats

Format Ingest Page-level citations
PDF Yes Yes
EPUB Yes Yes (chapter/location)

Stack

Layer Technology
Backend API FastAPI + SQLite
Full-text search BM25 (custom index, no external service)
Vector search sqlite-vec + Ollama embeddings (optional)
LLM synthesis Ollama (local, any model)
Frontend Vue 3 SPA served by nginx
Deployment Docker Compose

Default ports: Web UI 8521, API 8540.


Management

./manage.sh start          # Build and start
./manage.sh stop           # Stop
./manage.sh restart        # Restart
./manage.sh status         # Show container status
./manage.sh logs [svc]     # Tail logs (pass 'api' or 'web' to filter)
./manage.sh open           # Open UI in browser
./manage.sh build          # Rebuild images
./manage.sh test           # Run test suite

Tiers

Feature Free Paid (BYOK)
PDF and EPUB upload Yes Yes
Directory scan Yes Yes
BM25 full-text search Yes Yes
Unlimited local ingestion Yes Yes
Hybrid BM25 + vector search Yes (local Ollama)
LLM synthesis with page citations Yes (local Ollama)

BYOK means you supply your own Ollama instance. No cloud API keys, no usage metering.


Forgejo-primary

Pagepiper is developed and hosted at git.opensourcesolarpunk.com/Circuit-Forge/pagepiper. GitHub mirrors exist for discoverability only. File issues and submit pull requests on Forgejo.


License

Pagepiper uses a split license:

  • MIT: Document ingest pipeline, BM25 full-text index, library management, EPUB support — the core discovery and retrieval layer.
  • BSL 1.1 (Business Source License): Hybrid vector search, LLM synthesis, RAG (retrieval-augmented generation) chat interface — free for personal non-commercial self-hosting; commercial use or SaaS re-hosting requires a license. Converts to MIT after four years.

A Circuit Forge LLC product. Privacy · Safety · Accessibility — co-equal, non-negotiable.