docs(readme): landing page rewrite — screenshots, quick start, formats table, tiers, Forgejo-primary, split license

2026-05-06 08:51:38 -07:00 · 2026-05-06 08:51:38 -07:00 · 895d0b6129
commit 895d0b6129
parent b105a0fc14
1 changed files with 71 additions and 120 deletions
--- a/README.md
+++ b/README.md
@ -1,75 +1,66 @@
 # Pagepiper

-**v0.1.0** | Self-hosted PDF and EPUB search for your personal library
+**Search your document library. Get answers with exact page citations.**

-Pagepiper lets you drop PDFs and EPUBs into a library, index them, and search across the full text. With [Ollama](https://ollama.com) configured, you also get hybrid vector search and an LLM (large language model) chat interface that cites specific page numbers when it answers.
+[![License: MIT / BSL 1.1](https://img.shields.io/badge/license-MIT%20%2F%20BSL%201.1-blue)](LICENSE)
+[![Version](https://img.shields.io/badge/version-v0.1.0-orange)](https://git.opensourcesolarpunk.com/Circuit-Forge/pagepiper/releases)

-Built for TTRPG (tabletop roleplaying game) players tired of ctrl-F'ing through Pathfinder core rulebooks. Works equally well for fan fiction EPUB collections, AO3 exports, and any personal document library.
+Self-hosted PDF and EPUB search with BM25 (Best Match 25) full-text indexing and LLM (large language model) synthesis. Drop your documents in, ask a question, get an answer that tells you exactly which page to turn to.

-Try it: [pagepiper.circuitforge.tech](https://pagepiper.circuitforge.tech)
+Built for TTRPG (tabletop roleplaying game) players who are tired of ctrl-F'ing through six-hundred-page rulebooks. Works equally well for legal research, technical manuals, academic papers, or any personal document library you want to query in plain language.
+
+No cloud required. Your files stay on your machine.

 ---

-## Features
+## Screenshots

-| Feature | Free tier | Paid (BYOK) |
-|---------|-----------|-------------|
-| PDF and EPUB upload via browser drag-and-drop | Yes | Yes |
-| Directory scan for existing files | Yes | Yes |
-| BM25 full-text search (no LLM required) | Yes | Yes |
-| Unlimited local ingestion | Yes | Yes |
-| Hybrid BM25 + k-NN vector search | No | Yes (local Ollama) |
-| LLM chat with page-level citations | No | Yes (local Ollama) |
-| Thumbs up / down feedback on answers | No | Yes |
+### Library

-BYOK (bring your own key) means you supply your own Ollama instance. No cloud API keys, no usage billing.
+![Library view — documents listed with ingest status and page counts](docs/screenshots/01-library.png)

-**BM25** (Best Match 25) is a keyword ranking algorithm. It works without any LLM and runs entirely inside the Docker container. **k-NN** (k-nearest neighbor) vector search uses embeddings to find passages that are semantically similar to your question, even when the exact words don't match.
+### Chat with citations
+
+![Chat view — answer with source document and page number for every claim](docs/screenshots/02-chat.png)

 ---

-## Tech Stack
+## Why Pagepiper?

- **Backend:** FastAPI + SQLite (BM25 via custom BM25Index, vectors via sqlite-vec)
- **Frontend:** Vue 3 SPA served by nginx
- **Embedding model:** `nomic-embed-text` via Ollama (1024-dim, optional)
- **Chat LLM:** `mistral:7b` via Ollama (optional, any Ollama model works)
- **Deployment:** Docker Compose
+- **Your library, not ours.** Documents are indexed and stored locally. Nothing is sent to a third-party service unless you explicitly configure a cloud LLM.
+- **Works without an LLM.** BM25 full-text search runs entirely inside the Docker container. No Ollama, no API key, no GPU required for keyword search.
+- **Answers cite their sources.** Every LLM response includes the document name and page number it drew from. You can verify or dispute every answer.
+- **Hybrid search when you want it.** Connect a local Ollama instance to unlock semantic (vector) search that finds relevant passages even when your question doesn't use the exact words in the text.
+- **Open ingest pipeline.** The indexing and search layer is MIT-licensed. Add support for new formats, improve the PDF parser, contribute — the community benefits directly.

 ---

-## Quick Start (Self-Hosting)
+## Quick Start

-### Prerequisites
-
- [Docker](https://docs.docker.com/get-docker/) and Docker Compose
- PDFs or EPUBs you want to search
- Optional: [Ollama](https://ollama.com) for semantic search and RAG (retrieval-augmented generation) chat
-
-### 1. Clone the repo
+**Prerequisites:** [Docker](https://docs.docker.com/get-docker/) and Docker Compose. Optionally [Ollama](https://ollama.com) for LLM-synthesized answers.

 ```bash
 git clone https://git.opensourcesolarpunk.com/Circuit-Forge/pagepiper
 cd pagepiper
-```
-
-### 2. Configure
-
-```bash
 cp .env.example .env
+./manage.sh start
 ```

+Open [http://localhost:8521](http://localhost:8521).
+
+### Configure
+
 Open `.env` and set your paths:

 ```dotenv
-# Directory to scan for PDFs/EPUBs (used by the "Scan" button in the UI)
-PAGEPIPER_BOOKS_DIR=/path/to/your/pdfs
-
 # Where Pagepiper stores its SQLite index and uploaded files
-PAGEPIPER_DATA_DIR=data
+PAGEPIPER_DATA_DIR=./data
+
+# Directory to scan for existing PDFs/EPUBs (used by the Scan button)
+PAGEPIPER_BOOKS_DIR=/path/to/your/documents
 ```

-To unlock hybrid search and LLM chat, uncomment and set the Ollama block:
+To unlock LLM synthesis and semantic search, add your Ollama endpoint:

 ```dotenv
 PAGEPIPER_OLLAMA_URL=http://localhost:11434
@ -77,121 +68,81 @@ PAGEPIPER_CHAT_MODEL=mistral:7b
 PAGEPIPER_EMBED_MODEL=nomic-embed-text
 ```

-### 3. Start
+### Add documents

-```bash
-./manage.sh start
-```
+**Upload via browser** — click **Upload** in the Library view. Files save to `data/uploads/` and index automatically.

-Open [http://localhost:8521](http://localhost:8521).
-
-### 4. Add documents
-
-Two ways to add files:
-
-**Upload via browser** (easiest for small collections): Click **Upload** in the Library view and select a PDF or EPUB. The file saves to `data/uploads/` and begins indexing automatically.
-
-**Scan a directory** (best for large collections): Set `PAGEPIPER_BOOKS_DIR` in your `.env` to a folder of PDFs/EPUBs, then click **Scan** in the Library view. Pagepiper finds all files recursively and queues them for indexing.
-
-### 5. Search and chat
-
-Switch to the **Chat** tab and ask questions. On the free tier, BM25 keyword search returns matching passages. With Ollama configured, you get semantic search and an LLM-generated answer with page-number citations.
+**Scan a directory** — set `PAGEPIPER_BOOKS_DIR` in `.env`, then click **Scan**. Pagepiper finds all files recursively and queues them.

 ---

-## Ollama Setup (optional)
+## Supported Formats

-Install Ollama from [ollama.com](https://ollama.com), then pull the models:
-
-```bash
-ollama pull mistral:7b
-ollama pull nomic-embed-text
-```
-
-On a headless Linux server, make Ollama listen on all interfaces so the Docker container can reach it:
-
-```bash
-OLLAMA_HOST=0.0.0.0 ollama serve
-```
-
-On Docker Desktop (Linux or Mac), `host.docker.internal` resolves automatically. No extra network config needed.
+| Format | Ingest | Page-level citations |
+|--------|--------|----------------------|
+| PDF    | Yes    | Yes                  |
+| EPUB   | Yes    | Yes (chapter/location) |

 ---

-## Environment Variables
+## Stack

-| Variable | Default | Description |
-|----------|---------|-------------|
-| `PAGEPIPER_BOOKS_DIR` | `./books` | Host directory to scan for PDFs and EPUBs |
-| `PAGEPIPER_DATA_DIR` | `./data` | SQLite index and uploaded files live here |
-| `PAGEPIPER_OLLAMA_URL` | *(unset)* | Ollama base URL; leave blank for BM25-only mode |
-| `PAGEPIPER_EMBED_MODEL` | `nomic-embed-text` | Ollama embedding model (1024-dim default) |
-| `PAGEPIPER_EMBED_DIMS` | `1024` | Must match the embedding model's output dimensions |
-| `PAGEPIPER_CHAT_MODEL` | `mistral:7b` | Ollama chat model; any Ollama model name works |
-| `PAGEPIPER_CHAT_FEEDBACK` | *(unset)* | Set to `true` to enable thumbs up/down on chat answers |
+| Layer | Technology |
+|-------|-----------|
+| Backend API | FastAPI + SQLite |
+| Full-text search | BM25 (custom index, no external service) |
+| Vector search | sqlite-vec + Ollama embeddings (optional) |
+| LLM synthesis | Ollama (local, any model) |
+| Frontend | Vue 3 SPA served by nginx |
+| Deployment | Docker Compose |
+
+Default ports: Web UI `8521`, API `8540`.

 ---

 ## Management

 ```bash
-./manage.sh start          # Build and start (dev)
+./manage.sh start          # Build and start
 ./manage.sh stop           # Stop
 ./manage.sh restart        # Restart
 ./manage.sh status         # Show container status
-./manage.sh logs [svc]     # Tail logs (default: all services; pass 'api' or 'web' to filter)
-./manage.sh open           # Open the UI in your browser
-./manage.sh build          # Rebuild images without cache
-
-./manage.sh cloud:start    # Start the cloud managed instance (port 8533)
-./manage.sh cloud:stop
-./manage.sh cloud:restart
-./manage.sh cloud:status
-./manage.sh cloud:logs [svc]
-./manage.sh cloud:build
+./manage.sh logs [svc]     # Tail logs (pass 'api' or 'web' to filter)
+./manage.sh open           # Open UI in browser
+./manage.sh build          # Rebuild images
+./manage.sh test           # Run test suite
 ```

 ---

-## Cloud Managed Instance
+## Tiers

-The cloud deployment runs at [pagepiper.circuitforge.tech](https://pagepiper.circuitforge.tech) and at `menagerie.circuitforge.tech/pagepiper`. It uses `compose.cloud.yml` with LLM inference routed through the cf-orch coordinator.
+| Feature | Free | Paid (BYOK) |
+|---------|------|-------------|
+| PDF and EPUB upload | Yes | Yes |
+| Directory scan | Yes | Yes |
+| BM25 full-text search | Yes | Yes |
+| Unlimited local ingestion | Yes | Yes |
+| Hybrid BM25 + vector search | — | Yes (local Ollama) |
+| LLM synthesis with page citations | — | Yes (local Ollama) |

-To run your own cloud-style deployment:
-
-```bash
-cp .env.cloud.example .env
-# Edit .env: set PAGEPIPER_OLLAMA_URL and data paths
-./manage.sh cloud:start
-```
-
-Cloud instance listens on port 8533. The API is internal-only; nginx proxies `/api/` to the backend.
+BYOK means you supply your own Ollama instance. No cloud API keys, no usage metering.

 ---

-## Data and Backups
+## Forgejo-primary

-The `data/` directory contains the SQLite index database and all uploaded files. Back it up to preserve your index. Pagepiper indexes documents at ingest time. If you modify or replace a source file, use the re-index button on the document card to rebuild its entry.
-
-Large PDFs (hundreds of pages) can take a few minutes to index. The status badge on the document card updates as indexing progresses.
+Pagepiper is developed and hosted at [git.opensourcesolarpunk.com/Circuit-Forge/pagepiper](https://git.opensourcesolarpunk.com/Circuit-Forge/pagepiper). GitHub mirrors exist for discoverability only. File issues and submit pull requests on Forgejo.

 ---

-## Licensing
+## License

 Pagepiper uses a split license:

- **MIT:** BM25 full-text search, document library management, ingest pipeline, EPUB support
- **BSL 1.1:** Hybrid vector search (embedding + k-NN), RAG chat, LLM integration
-
-BSL 1.1 is free for personal non-commercial self-hosting. SaaS re-hosting or commercial redistribution requires a license from CircuitForge. BSL 1.1 converts to MIT after four years.
-
-License keys: [circuitforge.tech](https://circuitforge.tech)
+- **MIT:** Document ingest pipeline, BM25 full-text index, library management, EPUB support — the core discovery and retrieval layer.
+- **BSL 1.1 (Business Source License):** Hybrid vector search, LLM synthesis, RAG (retrieval-augmented generation) chat interface — free for personal non-commercial self-hosting; commercial use or SaaS re-hosting requires a license. Converts to MIT after four years.

 ---

-## Contributing
-
-Issues and PRs welcome at [git.opensourcesolarpunk.com/Circuit-Forge/pagepiper](https://git.opensourcesolarpunk.com/Circuit-Forge/pagepiper).
-
-The ingest pipeline and BM25 index are MIT-licensed. If you build a better PDF parser or add support for additional formats (CBZ, MOBI, etc.), the community benefits directly.
+*A [Circuit Forge LLC](https://circuitforge.tech) product. Privacy · Safety · Accessibility — co-equal, non-negotiable.*