docs(readme): landing page rewrite — screenshots, quick start, formats table, tiers, Forgejo-primary, split license

2026-05-06 08:51:38 -07:00 · 2026-05-06 08:51:38 -07:00 · 895d0b6129
commit 895d0b6129
parent b105a0fc14
1 changed files with 71 additions and 120 deletions
--- a/README.md
+++ b/README.md
@ -1,75 +1,66 @@
 # Pagepiper
-**v0.1.0** | Self-hosted PDF and EPUB search for your personal library
+**Search your document library. Get answers with exact page citations.**
-Pagepiper lets you drop PDFs and EPUBs into a library, index them, and search across the full text. With [Ollama](https://ollama.com) configured, you also get hybrid vector search and an LLM (large language model) chat interface that cites specific page numbers when it answers.
+[![License: MIT / BSL 1.1](https://img.shields.io/badge/license-MIT%20%2F%20BSL%201.1-blue)](LICENSE)
 [![Version](https://img.shields.io/badge/version-v0.1.0-orange)](https://git.opensourcesolarpunk.com/Circuit-Forge/pagepiper/releases)
-Built for TTRPG (tabletop roleplaying game) players tired of ctrl-F'ing through Pathfinder core rulebooks. Works equally well for fan fiction EPUB collections, AO3 exports, and any personal document library.
+Self-hosted PDF and EPUB search with BM25 (Best Match 25) full-text indexing and LLM (large language model) synthesis. Drop your documents in, ask a question, get an answer that tells you exactly which page to turn to.
-Try it: [pagepiper.circuitforge.tech](https://pagepiper.circuitforge.tech)
+Built for TTRPG (tabletop roleplaying game) players who are tired of ctrl-F'ing through six-hundred-page rulebooks. Works equally well for legal research, technical manuals, academic papers, or any personal document library you want to query in plain language.
 No cloud required. Your files stay on your machine.
 ---
-## Features
+## Screenshots
-| Feature | Free tier | Paid (BYOK) |
+### Library
 |---------|-----------|-------------|
 | PDF and EPUB upload via browser drag-and-drop | Yes | Yes |
 | Directory scan for existing files | Yes | Yes |
 | BM25 full-text search (no LLM required) | Yes | Yes |
 | Unlimited local ingestion | Yes | Yes |
 | Hybrid BM25 + k-NN vector search | No | Yes (local Ollama) |
 | LLM chat with page-level citations | No | Yes (local Ollama) |
 | Thumbs up / down feedback on answers | No | Yes |
-BYOK (bring your own key) means you supply your own Ollama instance. No cloud API keys, no usage billing.
+![Library view — documents listed with ingest status and page counts](docs/screenshots/01-library.png)
-**BM25** (Best Match 25) is a keyword ranking algorithm. It works without any LLM and runs entirely inside the Docker container. **k-NN** (k-nearest neighbor) vector search uses embeddings to find passages that are semantically similar to your question, even when the exact words don't match.
+### Chat with citations
 ![Chat view — answer with source document and page number for every claim](docs/screenshots/02-chat.png)
 ---
-## Tech Stack
+## Why Pagepiper?
- **Backend:** FastAPI + SQLite (BM25 via custom BM25Index, vectors via sqlite-vec)
+- **Your library, not ours.** Documents are indexed and stored locally. Nothing is sent to a third-party service unless you explicitly configure a cloud LLM.
- **Frontend:** Vue 3 SPA served by nginx
+- **Works without an LLM.** BM25 full-text search runs entirely inside the Docker container. No Ollama, no API key, no GPU required for keyword search.
- **Embedding model:** `nomic-embed-text` via Ollama (1024-dim, optional)
+- **Answers cite their sources.** Every LLM response includes the document name and page number it drew from. You can verify or dispute every answer.
- **Chat LLM:** `mistral:7b` via Ollama (optional, any Ollama model works)
+- **Hybrid search when you want it.** Connect a local Ollama instance to unlock semantic (vector) search that finds relevant passages even when your question doesn't use the exact words in the text.
- **Deployment:** Docker Compose
+- **Open ingest pipeline.** The indexing and search layer is MIT-licensed. Add support for new formats, improve the PDF parser, contribute — the community benefits directly.
 ---
-## Quick Start (Self-Hosting)
+## Quick Start
-### Prerequisites
+**Prerequisites:** [Docker](https://docs.docker.com/get-docker/) and Docker Compose. Optionally [Ollama](https://ollama.com) for LLM-synthesized answers.
 - [Docker](https://docs.docker.com/get-docker/) and Docker Compose
 - PDFs or EPUBs you want to search
 - Optional: [Ollama](https://ollama.com) for semantic search and RAG (retrieval-augmented generation) chat
 ### 1. Clone the repo
 ```bash
 git clone https://git.opensourcesolarpunk.com/Circuit-Forge/pagepiper
 cd pagepiper
 ```
 ### 2. Configure
 ```bash
 cp .env.example .env
 ./manage.sh start
 ```
 Open [http://localhost:8521](http://localhost:8521).
 ### Configure
 Open `.env` and set your paths:
 ```dotenv
 # Directory to scan for PDFs/EPUBs (used by the "Scan" button in the UI)
 PAGEPIPER_BOOKS_DIR=/path/to/your/pdfs
 # Where Pagepiper stores its SQLite index and uploaded files
-PAGEPIPER_DATA_DIR=data
+PAGEPIPER_DATA_DIR=./data
 # Directory to scan for existing PDFs/EPUBs (used by the Scan button)
 PAGEPIPER_BOOKS_DIR=/path/to/your/documents
 ```
-To unlock hybrid search and LLM chat, uncomment and set the Ollama block:
+To unlock LLM synthesis and semantic search, add your Ollama endpoint:
 ```dotenv
 PAGEPIPER_OLLAMA_URL=http://localhost:11434
@ -77,121 +68,81 @@ PAGEPIPER_CHAT_MODEL=mistral:7b
 PAGEPIPER_EMBED_MODEL=nomic-embed-text
 ```
-### 3. Start
+### Add documents
-```bash
+**Upload via browser** — click **Upload** in the Library view. Files save to `data/uploads/` and index automatically.
 ./manage.sh start
 ```
-Open [http://localhost:8521](http://localhost:8521).
+**Scan a directory** — set `PAGEPIPER_BOOKS_DIR` in `.env`, then click **Scan**. Pagepiper finds all files recursively and queues them.
 ### 4. Add documents
 Two ways to add files:
 **Upload via browser** (easiest for small collections): Click **Upload** in the Library view and select a PDF or EPUB. The file saves to `data/uploads/` and begins indexing automatically.
 **Scan a directory** (best for large collections): Set `PAGEPIPER_BOOKS_DIR` in your `.env` to a folder of PDFs/EPUBs, then click **Scan** in the Library view. Pagepiper finds all files recursively and queues them for indexing.
 ### 5. Search and chat
 Switch to the **Chat** tab and ask questions. On the free tier, BM25 keyword search returns matching passages. With Ollama configured, you get semantic search and an LLM-generated answer with page-number citations.
 ---
-## Ollama Setup (optional)
+## Supported Formats
-Install Ollama from [ollama.com](https://ollama.com), then pull the models:
+| Format | Ingest | Page-level citations |
-
+|--------|--------|----------------------|
-```bash
+| PDF    | Yes    | Yes                  |
-ollama pull mistral:7b
+| EPUB   | Yes    | Yes (chapter/location) |
 ollama pull nomic-embed-text
 ```
 On a headless Linux server, make Ollama listen on all interfaces so the Docker container can reach it:
 ```bash
 OLLAMA_HOST=0.0.0.0 ollama serve
 ```
 On Docker Desktop (Linux or Mac), `host.docker.internal` resolves automatically. No extra network config needed.
 ---
-## Environment Variables
+## Stack
-| Variable | Default | Description |
+| Layer | Technology |
-|----------|---------|-------------|
+|-------|-----------|
-| `PAGEPIPER_BOOKS_DIR` | `./books` | Host directory to scan for PDFs and EPUBs |
+| Backend API | FastAPI + SQLite |
-| `PAGEPIPER_DATA_DIR` | `./data` | SQLite index and uploaded files live here |
+| Full-text search | BM25 (custom index, no external service) |
-| `PAGEPIPER_OLLAMA_URL` | *(unset)* | Ollama base URL; leave blank for BM25-only mode |
+| Vector search | sqlite-vec + Ollama embeddings (optional) |
-| `PAGEPIPER_EMBED_MODEL` | `nomic-embed-text` | Ollama embedding model (1024-dim default) |
+| LLM synthesis | Ollama (local, any model) |
-| `PAGEPIPER_EMBED_DIMS` | `1024` | Must match the embedding model's output dimensions |
+| Frontend | Vue 3 SPA served by nginx |
-| `PAGEPIPER_CHAT_MODEL` | `mistral:7b` | Ollama chat model; any Ollama model name works |
+| Deployment | Docker Compose |
-| `PAGEPIPER_CHAT_FEEDBACK` | *(unset)* | Set to `true` to enable thumbs up/down on chat answers |
+
 Default ports: Web UI `8521`, API `8540`.
 ---
 ## Management
 ```bash
-./manage.sh start          # Build and start (dev)
+./manage.sh start          # Build and start
 ./manage.sh stop           # Stop
 ./manage.sh restart        # Restart
 ./manage.sh status         # Show container status
-./manage.sh logs [svc]     # Tail logs (default: all services; pass 'api' or 'web' to filter)
+./manage.sh logs [svc]     # Tail logs (pass 'api' or 'web' to filter)
-./manage.sh open           # Open the UI in your browser
+./manage.sh open           # Open UI in browser
-./manage.sh build          # Rebuild images without cache
+./manage.sh build          # Rebuild images
-
+./manage.sh test           # Run test suite
 ./manage.sh cloud:start    # Start the cloud managed instance (port 8533)
 ./manage.sh cloud:stop
 ./manage.sh cloud:restart
 ./manage.sh cloud:status
 ./manage.sh cloud:logs [svc]
 ./manage.sh cloud:build
 ```
 ---
-## Cloud Managed Instance
+## Tiers
-The cloud deployment runs at [pagepiper.circuitforge.tech](https://pagepiper.circuitforge.tech) and at `menagerie.circuitforge.tech/pagepiper`. It uses `compose.cloud.yml` with LLM inference routed through the cf-orch coordinator.
+| Feature | Free | Paid (BYOK) |
 |---------|------|-------------|
 | PDF and EPUB upload | Yes | Yes |
 | Directory scan | Yes | Yes |
 | BM25 full-text search | Yes | Yes |
 | Unlimited local ingestion | Yes | Yes |
 | Hybrid BM25 + vector search | — | Yes (local Ollama) |
 | LLM synthesis with page citations | — | Yes (local Ollama) |
-To run your own cloud-style deployment:
+BYOK means you supply your own Ollama instance. No cloud API keys, no usage metering.
 ```bash
 cp .env.cloud.example .env
 # Edit .env: set PAGEPIPER_OLLAMA_URL and data paths
 ./manage.sh cloud:start
 ```
 Cloud instance listens on port 8533. The API is internal-only; nginx proxies `/api/` to the backend.
 ---
-## Data and Backups
+## Forgejo-primary
-The `data/` directory contains the SQLite index database and all uploaded files. Back it up to preserve your index. Pagepiper indexes documents at ingest time. If you modify or replace a source file, use the re-index button on the document card to rebuild its entry.
+Pagepiper is developed and hosted at [git.opensourcesolarpunk.com/Circuit-Forge/pagepiper](https://git.opensourcesolarpunk.com/Circuit-Forge/pagepiper). GitHub mirrors exist for discoverability only. File issues and submit pull requests on Forgejo.
 Large PDFs (hundreds of pages) can take a few minutes to index. The status badge on the document card updates as indexing progresses.
 ---
-## Licensing
+## License
 Pagepiper uses a split license:
- **MIT:** BM25 full-text search, document library management, ingest pipeline, EPUB support
+- **MIT:** Document ingest pipeline, BM25 full-text index, library management, EPUB support — the core discovery and retrieval layer.
- **BSL 1.1:** Hybrid vector search (embedding + k-NN), RAG chat, LLM integration
+- **BSL 1.1 (Business Source License):** Hybrid vector search, LLM synthesis, RAG (retrieval-augmented generation) chat interface — free for personal non-commercial self-hosting; commercial use or SaaS re-hosting requires a license. Converts to MIT after four years.
 BSL 1.1 is free for personal non-commercial self-hosting. SaaS re-hosting or commercial redistribution requires a license from CircuitForge. BSL 1.1 converts to MIT after four years.
 License keys: [circuitforge.tech](https://circuitforge.tech)
 ---
-## Contributing
+*A [Circuit Forge LLC](https://circuitforge.tech) product. Privacy · Safety · Accessibility — co-equal, non-negotiable.*
 Issues and PRs welcome at [git.opensourcesolarpunk.com/Circuit-Forge/pagepiper](https://git.opensourcesolarpunk.com/Circuit-Forge/pagepiper).
 The ingest pipeline and BM25 index are MIT-licensed. If you build a better PDF parser or add support for additional formats (CBZ, MOBI, etc.), the community benefits directly.