5.3 KiB
Pagepiper
Search your document library. Get answers with exact page citations.
Self-hosted PDF and EPUB search with BM25 (Best Match 25) full-text indexing and LLM (large language model) synthesis. Drop your documents in, ask a question, get an answer that tells you exactly which page to turn to.
Built for TTRPG (tabletop roleplaying game) players who are tired of ctrl-F'ing through six-hundred-page rulebooks. Works equally well for legal research, technical manuals, academic papers, or any personal document library you want to query in plain language.
No cloud required. Your files stay on your machine.
Screenshots
Library
Chat with citations
Why Pagepiper?
- Your library, not ours. Documents are indexed and stored locally. Nothing is sent to a third-party service unless you explicitly configure a cloud LLM.
- Works without an LLM. BM25 full-text search runs entirely inside the Docker container. No Ollama, no API key, no GPU required for keyword search.
- Answers cite their sources. Every LLM response includes the document name and page number it drew from. You can verify or dispute every answer.
- Hybrid search when you want it. Connect a local Ollama instance to unlock semantic (vector) search that finds relevant passages even when your question doesn't use the exact words in the text.
- Open ingest pipeline. The indexing and search layer is MIT-licensed. Add support for new formats, improve the PDF parser, contribute — the community benefits directly.
Quick Start
Prerequisites: Docker and Docker Compose. Optionally Ollama for LLM-synthesized answers.
git clone https://git.opensourcesolarpunk.com/Circuit-Forge/pagepiper
cd pagepiper
cp .env.example .env
./manage.sh start
Open http://localhost:8521.
Configure
Open .env and set your paths:
# Where Pagepiper stores its SQLite index and uploaded files
PAGEPIPER_DATA_DIR=./data
# Directory to scan for existing PDFs/EPUBs (used by the Scan button)
PAGEPIPER_BOOKS_DIR=/path/to/your/documents
To unlock LLM synthesis and semantic search, add your Ollama endpoint:
PAGEPIPER_OLLAMA_URL=http://localhost:11434
PAGEPIPER_CHAT_MODEL=mistral:7b
PAGEPIPER_EMBED_MODEL=nomic-embed-text
Add documents
Upload via browser — click Upload in the Library view. Files save to data/uploads/ and index automatically.
Scan a directory — set PAGEPIPER_BOOKS_DIR in .env, then click Scan. Pagepiper finds all files recursively and queues them.
Supported Formats
| Format | Ingest | Page-level citations |
|---|---|---|
| Yes | Yes | |
| EPUB | Yes | Yes (chapter/location) |
Stack
| Layer | Technology |
|---|---|
| Backend API | FastAPI + SQLite |
| Full-text search | BM25 (custom index, no external service) |
| Vector search | sqlite-vec + Ollama embeddings (optional) |
| LLM synthesis | Ollama (local, any model) |
| Frontend | Vue 3 SPA served by nginx |
| Deployment | Docker Compose |
Default ports: Web UI 8521, API 8540.
Management
./manage.sh start # Build and start
./manage.sh stop # Stop
./manage.sh restart # Restart
./manage.sh status # Show container status
./manage.sh logs [svc] # Tail logs (pass 'api' or 'web' to filter)
./manage.sh open # Open UI in browser
./manage.sh build # Rebuild images
./manage.sh test # Run test suite
Tiers
| Feature | Free | Paid (BYOK) |
|---|---|---|
| PDF and EPUB upload | Yes | Yes |
| Directory scan | Yes | Yes |
| BM25 full-text search | Yes | Yes |
| Unlimited local ingestion | Yes | Yes |
| Hybrid BM25 + vector search | — | Yes (local Ollama) |
| LLM synthesis with page citations | — | Yes (local Ollama) |
BYOK means you supply your own Ollama instance. No cloud API keys, no usage metering.
Forgejo-primary
Pagepiper is developed and hosted at git.opensourcesolarpunk.com/Circuit-Forge/pagepiper. GitHub mirrors exist for discoverability only. File issues and submit pull requests on Forgejo.
License
Pagepiper uses a split license:
- MIT: Document ingest pipeline, BM25 full-text index, library management, EPUB support — the core discovery and retrieval layer.
- BSL 1.1 (Business Source License): Hybrid vector search, LLM synthesis, RAG (retrieval-augmented generation) chat interface — free for personal non-commercial self-hosting; commercial use or SaaS re-hosting requires a license. Converts to MIT after four years.
A Circuit Forge LLC product. Privacy · Safety · Accessibility — co-equal, non-negotiable.

