149 lines
5.4 KiB
Markdown
149 lines
5.4 KiB
Markdown
# Pagepiper
|
|
|
|
**Search your document library. Get answers with exact page citations.**
|
|
|
|
[](https://git.opensourcesolarpunk.com/Circuit-Forge/pagepiper)
|
|
[](LICENSE)
|
|
[](https://git.opensourcesolarpunk.com/Circuit-Forge/pagepiper/releases)
|
|
|
|
Self-hosted PDF and EPUB search with BM25 (Best Match 25) full-text indexing and LLM (large language model) synthesis. Drop your documents in, ask a question, get an answer that tells you exactly which page to turn to.
|
|
|
|
Built for TTRPG (tabletop roleplaying game) players who are tired of ctrl-F'ing through six-hundred-page rulebooks. Works equally well for legal research, technical manuals, academic papers, or any personal document library you want to query in plain language.
|
|
|
|
No cloud required. Your files stay on your machine.
|
|
|
|
---
|
|
|
|
## Screenshots
|
|
|
|
### Library
|
|
|
|

|
|
|
|
### Chat with citations
|
|
|
|

|
|
|
|
---
|
|
|
|
## Why Pagepiper?
|
|
|
|
- **Your library, not ours.** Documents are indexed and stored locally. Nothing is sent to a third-party service unless you explicitly configure a cloud LLM.
|
|
- **Works without an LLM.** BM25 full-text search runs entirely inside the Docker container. No Ollama, no API key, no GPU required for keyword search.
|
|
- **Answers cite their sources.** Every LLM response includes the document name and page number it drew from. You can verify or dispute every answer.
|
|
- **Hybrid search when you want it.** Connect a local Ollama instance to unlock semantic (vector) search that finds relevant passages even when your question doesn't use the exact words in the text.
|
|
- **Open ingest pipeline.** The indexing and search layer is MIT-licensed. Add support for new formats, improve the PDF parser, contribute — the community benefits directly.
|
|
|
|
---
|
|
|
|
## Quick Start
|
|
|
|
**Prerequisites:** [Docker](https://docs.docker.com/get-docker/) and Docker Compose. Optionally [Ollama](https://ollama.com) for LLM-synthesized answers.
|
|
|
|
```bash
|
|
git clone https://git.opensourcesolarpunk.com/Circuit-Forge/pagepiper
|
|
cd pagepiper
|
|
cp .env.example .env
|
|
./manage.sh start
|
|
```
|
|
|
|
Open [http://localhost:8521](http://localhost:8521).
|
|
|
|
### Configure
|
|
|
|
Open `.env` and set your paths:
|
|
|
|
```dotenv
|
|
# Where Pagepiper stores its SQLite index and uploaded files
|
|
PAGEPIPER_DATA_DIR=./data
|
|
|
|
# Directory to scan for existing PDFs/EPUBs (used by the Scan button)
|
|
PAGEPIPER_BOOKS_DIR=/path/to/your/documents
|
|
```
|
|
|
|
To unlock LLM synthesis and semantic search, add your Ollama endpoint:
|
|
|
|
```dotenv
|
|
PAGEPIPER_OLLAMA_URL=http://localhost:11434
|
|
PAGEPIPER_CHAT_MODEL=mistral:7b
|
|
PAGEPIPER_EMBED_MODEL=nomic-embed-text
|
|
```
|
|
|
|
### Add documents
|
|
|
|
**Upload via browser** — click **Upload** in the Library view. Files save to `data/uploads/` and index automatically.
|
|
|
|
**Scan a directory** — set `PAGEPIPER_BOOKS_DIR` in `.env`, then click **Scan**. Pagepiper finds all files recursively and queues them.
|
|
|
|
---
|
|
|
|
## Supported Formats
|
|
|
|
| Format | Ingest | Page-level citations |
|
|
|--------|--------|----------------------|
|
|
| PDF | Yes | Yes |
|
|
| EPUB | Yes | Yes (chapter/location) |
|
|
|
|
---
|
|
|
|
## Stack
|
|
|
|
| Layer | Technology |
|
|
|-------|-----------|
|
|
| Backend API | FastAPI + SQLite |
|
|
| Full-text search | BM25 (custom index, no external service) |
|
|
| Vector search | sqlite-vec + Ollama embeddings (optional) |
|
|
| LLM synthesis | Ollama (local, any model) |
|
|
| Frontend | Vue 3 SPA served by nginx |
|
|
| Deployment | Docker Compose |
|
|
|
|
Default ports: Web UI `8521`, API `8540`.
|
|
|
|
---
|
|
|
|
## Management
|
|
|
|
```bash
|
|
./manage.sh start # Build and start
|
|
./manage.sh stop # Stop
|
|
./manage.sh restart # Restart
|
|
./manage.sh status # Show container status
|
|
./manage.sh logs [svc] # Tail logs (pass 'api' or 'web' to filter)
|
|
./manage.sh open # Open UI in browser
|
|
./manage.sh build # Rebuild images
|
|
./manage.sh test # Run test suite
|
|
```
|
|
|
|
---
|
|
|
|
## Tiers
|
|
|
|
| Feature | Free | Paid (BYOK) |
|
|
|---------|------|-------------|
|
|
| PDF and EPUB upload | Yes | Yes |
|
|
| Directory scan | Yes | Yes |
|
|
| BM25 full-text search | Yes | Yes |
|
|
| Unlimited local ingestion | Yes | Yes |
|
|
| Hybrid BM25 + vector search | — | Yes (local Ollama) |
|
|
| LLM synthesis with page citations | — | Yes (local Ollama) |
|
|
|
|
BYOK means you supply your own Ollama instance. No cloud API keys, no usage metering.
|
|
|
|
---
|
|
|
|
## Forgejo-primary
|
|
|
|
Pagepiper is developed and hosted at [git.opensourcesolarpunk.com/Circuit-Forge/pagepiper](https://git.opensourcesolarpunk.com/Circuit-Forge/pagepiper). GitHub mirrors exist for discoverability only. File issues and submit pull requests on Forgejo.
|
|
|
|
---
|
|
|
|
## License
|
|
|
|
Pagepiper uses a split license:
|
|
|
|
- **MIT:** Document ingest pipeline, BM25 full-text index, library management, EPUB support — the core discovery and retrieval layer.
|
|
- **BSL 1.1 (Business Source License):** Hybrid vector search, LLM synthesis, RAG (retrieval-augmented generation) chat interface — free for personal non-commercial self-hosting; commercial use or SaaS re-hosting requires a license. Converts to MIT after four years.
|
|
|
|
---
|
|
|
|
*A [Circuit Forge LLC](https://circuitforge.tech) product. Privacy · Safety · Accessibility — co-equal, non-negotiable.*
|