# Pagepiper

Self-hosted document search with BM25 full-text indexing and (with local Ollama) hybrid vector search and LLM-powered chat. Supports PDF and EPUB files.

## Demo

Try it: [pagepiper.circuitforge.tech](https://pagepiper.circuitforge.tech)

## Screenshots

### Library

![Library view](screenshots/01-library.png)

Scan your PDF directory to index documents, or upload individual PDFs directly. Each document shows page count and ingest status.

### Chat

![Chat view](screenshots/02-chat.png)

Ask questions across your indexed documents. Results cite the source document and page number.

## Tiers

| Feature | Free | Paid (BYOK) |
|---------|------|-------------|
| BM25 full-text search | Yes | Yes |
| PDF and EPUB upload via browser | Yes | Yes |
| Unlimited local ingestion | Yes | Yes |
| Hybrid vector search | No | Yes (local Ollama) |
| LLM chat over documents | No | Yes (local Ollama) |

BYOK (Bring Your Own Key) means you supply your own Ollama instance. No cloud API keys required.

---

## Self-Hosting Guide

### Prerequisites

- [Docker](https://docs.docker.com/get-docker/) and Docker Compose
- PDFs you want to search
- Optional: [Ollama](https://ollama.com) running locally for semantic search and LLM chat

### Step 1: Get the code

```bash
git clone https://git.opensourcesolarpunk.com/Circuit-Forge/pagepiper
cd pagepiper
```

### Step 2: Configure

```bash
cp .env.example .env
```

Open `.env` and set your directories:

```dotenv
# Where pagepiper stores its index database
PAGEPIPER_DATA_DIR=./data

# Directory to scan for PDFs (used by the "Scan for PDFs" button)
# You can also upload individual PDFs via the web UI without setting this
PAGEPIPER_BOOKS_DIR=/path/to/your/pdfs
```

To unlock hybrid vector search and LLM chat, add your Ollama endpoint:

```dotenv
PAGEPIPER_OLLAMA_URL=http://localhost:11434
PAGEPIPER_CHAT_MODEL=mistral:7b
PAGEPIPER_EMBED_MODEL=nomic-embed-text
```

### Step 3: Start

```bash
docker compose up -d --build
```

Open [http://localhost:8521](http://localhost:8521) in your browser.

### Step 4: Add your PDFs

Two ways to add documents:

**Option A — Upload via browser** (easiest for small collections):

Click the **Upload PDF** button in the Library view and select a file. It saves to `data/uploads/` and begins indexing automatically.

**Option B — Mount a directory** (best for large collections):

Set `PAGEPIPER_BOOKS_DIR` in your `.env` to point at a folder of PDFs, then click **Scan for PDFs**. Pagepiper finds all `.pdf` files recursively and queues them for indexing.

### Step 5: Search

Switch to the **Chat** tab and ask questions about your documents. The Free tier uses BM25 keyword matching. With Ollama configured, you get semantic (vector) search and LLM-generated answers with page-level citations.

---

## Ollama Setup (optional)

Install Ollama from [ollama.com](https://ollama.com), then pull the models:

```bash
ollama pull mistral:7b
ollama pull nomic-embed-text
```

Pagepiper's Docker container reaches Ollama at `host.docker.internal` — no extra network config needed on Linux/Mac with Docker Desktop. On a headless Linux server, make sure Ollama binds to `0.0.0.0`:

```bash
OLLAMA_HOST=0.0.0.0 ollama serve
```

---

## Managing the instance

```bash
# Check status
docker compose ps

# View API logs
docker compose logs -f api

# Stop
docker compose down

# Rebuild after updates
docker compose up -d --build
```

---

## Notes

- Pagepiper indexes PDFs at ingest time. Changes to the source file require a re-index (use the re-index button on the document card).
- The `data/` directory contains the SQLite index database and any uploaded files. Back it up to preserve your index.
- Large PDFs (hundreds of pages) can take a few minutes to index. Watch the status badge on the document card.