feat: RAG retrieval quality, artifact cleaning, and ingestion progress UI

Retrieval:
- Add _fetch_adjacent() to retriever: fetches page ± 1 chunks from DB
  after ranking so mid-sentence EPUB chunk boundaries don't lose context
- Fix vec DB doc-filter: oversample to top_k*20 before Python filter
  instead of post-filtering an already-small global pool (fixes wrong-book
  results when searching within a single document)
- top_k default 5 → 10; context per chunk 500 → 1500 chars; citation
  snippet 200 → 400 chars

Artifact cleaning:
- Add scripts/text_clean.py: strips ABC Amber LIT Converter watermarks,
  processtext.com URLs, bare page numbers, piracy stamps from extracted text
- Wire clean_paragraph() into ingest_pdf.py and new ingest_epub.py

Startup validation:
- _check_vec_schema() at boot: detects embedding dimension mismatch,
  deletes stale vec DB, and queues sequential re-embed in background thread
- Sequential _reembed_docs() prevents SQLite lock races on startup re-embed

cf-orch integration:
- Wire CF_ORCH_URL / CF_LICENSE_KEY into LLMRouter backend config so
  allocate() fires and keeps the Ollama model warm between requests

Ingestion progress UI:
- GET /api/library/{doc_id}/status now returns vec_count from page_vecs_meta
- DocumentCard.vue polls status every 3 s while processing and shows
  two-phase progress: indeterminate animation during extraction,
  determinate "Embedding N/M pages" bar once vectors start landing

Other:
- Chat feedback endpoint + thumbs up/down UI (FeedbackButton.vue)
- EPUB ingest script (ingest_epub.py) with heading-based chunking
- migration 002: chat_feedback table
- README.md with setup and feature overview
This commit is contained in:
pyr0ball 2026-05-06 08:25:58 -07:00
parent be7a076f34
commit e52bdb5128
29 changed files with 2301 additions and 112 deletions

View file

@ -10,7 +10,9 @@ PAGEPIPER_BOOKS_DIR=/devl/pagepiper-cloud-data/books
PAGEPIPER_OLLAMA_URL=
# Embedding and chat model selection (only used when PAGEPIPER_OLLAMA_URL is set)
PAGEPIPER_EMBED_MODEL=nomic-embed-text
# mxbai-embed-large (1024-dim) is recommended; nomic-embed-text uses 768-dim
PAGEPIPER_EMBED_MODEL=mxbai-embed-large
PAGEPIPER_EMBED_DIMS=1024
PAGEPIPER_CHAT_MODEL=mistral:7b
# Heimdall license server (optional — for per-user tier validation)
@ -20,3 +22,17 @@ HEIMDALL_ADMIN_TOKEN=
# cf-orch streaming proxy — coordinator product key
# Must match COORDINATOR_PRODUCT_KEYS["pagepiper"] in cf-orch.env on the coordinator
COORDINATOR_PAGEPIPER_KEY=
# cf-orch coordinator URL — routes chat/embed calls through managed GPU allocation
# CF_LICENSE_KEY is the auth token sent to the coordinator (same value as COORDINATOR_PAGEPIPER_KEY)
# Leave CF_ORCH_URL blank to skip allocation and hit PAGEPIPER_OLLAMA_URL directly
CF_ORCH_URL=
CF_LICENSE_KEY=
CF_APP_NAME=pagepiper
# Forgejo API token — enables in-app feedback button (files issues to Circuit-Forge/pagepiper)
FORGEJO_API_TOKEN=
# Enable thumbs up/down on chat answers (stores retrieval quality signals locally)
# Off by default — opt in when you want to collect correction data
# PAGEPIPER_CHAT_FEEDBACK=true

View file

@ -10,3 +10,11 @@ PAGEPIPER_DATA_DIR=data
# PAGEPIPER_OLLAMA_URL=http://localhost:11434
# PAGEPIPER_CHAT_MODEL=mistral:7b
# PAGEPIPER_EMBED_MODEL=nomic-embed-text
# Forgejo API token — enables the in-app feedback button (files Forgejo issues)
# Create a token at https://git.opensourcesolarpunk.com/user/settings/applications
# FORGEJO_API_TOKEN=
# Enable thumbs up/down on chat answers (stores retrieval quality signals locally)
# Off by default — opt in when you want to collect correction data
# PAGEPIPER_CHAT_FEEDBACK=true

197
README.md Normal file
View file

@ -0,0 +1,197 @@
# Pagepiper
**v0.1.0** | Self-hosted PDF and EPUB search for your personal library
Pagepiper lets you drop PDFs and EPUBs into a library, index them, and search across the full text. With [Ollama](https://ollama.com) configured, you also get hybrid vector search and an LLM (large language model) chat interface that cites specific page numbers when it answers.
Built for TTRPG (tabletop roleplaying game) players tired of ctrl-F'ing through Pathfinder core rulebooks. Works equally well for fan fiction EPUB collections, AO3 exports, and any personal document library.
Try it: [pagepiper.circuitforge.tech](https://pagepiper.circuitforge.tech)
---
## Features
| Feature | Free tier | Paid (BYOK) |
|---------|-----------|-------------|
| PDF and EPUB upload via browser drag-and-drop | Yes | Yes |
| Directory scan for existing files | Yes | Yes |
| BM25 full-text search (no LLM required) | Yes | Yes |
| Unlimited local ingestion | Yes | Yes |
| Hybrid BM25 + k-NN vector search | No | Yes (local Ollama) |
| LLM chat with page-level citations | No | Yes (local Ollama) |
| Thumbs up / down feedback on answers | No | Yes |
BYOK (bring your own key) means you supply your own Ollama instance. No cloud API keys, no usage billing.
**BM25** (Best Match 25) is a keyword ranking algorithm. It works without any LLM and runs entirely inside the Docker container. **k-NN** (k-nearest neighbor) vector search uses embeddings to find passages that are semantically similar to your question, even when the exact words don't match.
---
## Tech Stack
- **Backend:** FastAPI + SQLite (BM25 via custom BM25Index, vectors via sqlite-vec)
- **Frontend:** Vue 3 SPA served by nginx
- **Embedding model:** `nomic-embed-text` via Ollama (1024-dim, optional)
- **Chat LLM:** `mistral:7b` via Ollama (optional, any Ollama model works)
- **Deployment:** Docker Compose
---
## Quick Start (Self-Hosting)
### Prerequisites
- [Docker](https://docs.docker.com/get-docker/) and Docker Compose
- PDFs or EPUBs you want to search
- Optional: [Ollama](https://ollama.com) for semantic search and RAG (retrieval-augmented generation) chat
### 1. Clone the repo
```bash
git clone https://git.opensourcesolarpunk.com/Circuit-Forge/pagepiper
cd pagepiper
```
### 2. Configure
```bash
cp .env.example .env
```
Open `.env` and set your paths:
```dotenv
# Directory to scan for PDFs/EPUBs (used by the "Scan" button in the UI)
PAGEPIPER_BOOKS_DIR=/path/to/your/pdfs
# Where Pagepiper stores its SQLite index and uploaded files
PAGEPIPER_DATA_DIR=data
```
To unlock hybrid search and LLM chat, uncomment and set the Ollama block:
```dotenv
PAGEPIPER_OLLAMA_URL=http://localhost:11434
PAGEPIPER_CHAT_MODEL=mistral:7b
PAGEPIPER_EMBED_MODEL=nomic-embed-text
```
### 3. Start
```bash
./manage.sh start
```
Open [http://localhost:8521](http://localhost:8521).
### 4. Add documents
Two ways to add files:
**Upload via browser** (easiest for small collections): Click **Upload** in the Library view and select a PDF or EPUB. The file saves to `data/uploads/` and begins indexing automatically.
**Scan a directory** (best for large collections): Set `PAGEPIPER_BOOKS_DIR` in your `.env` to a folder of PDFs/EPUBs, then click **Scan** in the Library view. Pagepiper finds all files recursively and queues them for indexing.
### 5. Search and chat
Switch to the **Chat** tab and ask questions. On the free tier, BM25 keyword search returns matching passages. With Ollama configured, you get semantic search and an LLM-generated answer with page-number citations.
---
## Ollama Setup (optional)
Install Ollama from [ollama.com](https://ollama.com), then pull the models:
```bash
ollama pull mistral:7b
ollama pull nomic-embed-text
```
On a headless Linux server, make Ollama listen on all interfaces so the Docker container can reach it:
```bash
OLLAMA_HOST=0.0.0.0 ollama serve
```
On Docker Desktop (Linux or Mac), `host.docker.internal` resolves automatically. No extra network config needed.
---
## Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `PAGEPIPER_BOOKS_DIR` | `./books` | Host directory to scan for PDFs and EPUBs |
| `PAGEPIPER_DATA_DIR` | `./data` | SQLite index and uploaded files live here |
| `PAGEPIPER_OLLAMA_URL` | *(unset)* | Ollama base URL; leave blank for BM25-only mode |
| `PAGEPIPER_EMBED_MODEL` | `nomic-embed-text` | Ollama embedding model (1024-dim default) |
| `PAGEPIPER_EMBED_DIMS` | `1024` | Must match the embedding model's output dimensions |
| `PAGEPIPER_CHAT_MODEL` | `mistral:7b` | Ollama chat model; any Ollama model name works |
| `PAGEPIPER_CHAT_FEEDBACK` | *(unset)* | Set to `true` to enable thumbs up/down on chat answers |
---
## Management
```bash
./manage.sh start # Build and start (dev)
./manage.sh stop # Stop
./manage.sh restart # Restart
./manage.sh status # Show container status
./manage.sh logs [svc] # Tail logs (default: all services; pass 'api' or 'web' to filter)
./manage.sh open # Open the UI in your browser
./manage.sh build # Rebuild images without cache
./manage.sh cloud:start # Start the cloud managed instance (port 8533)
./manage.sh cloud:stop
./manage.sh cloud:restart
./manage.sh cloud:status
./manage.sh cloud:logs [svc]
./manage.sh cloud:build
```
---
## Cloud Managed Instance
The cloud deployment runs at [pagepiper.circuitforge.tech](https://pagepiper.circuitforge.tech) and at `menagerie.circuitforge.tech/pagepiper`. It uses `compose.cloud.yml` with LLM inference routed through the cf-orch coordinator.
To run your own cloud-style deployment:
```bash
cp .env.cloud.example .env
# Edit .env: set PAGEPIPER_OLLAMA_URL and data paths
./manage.sh cloud:start
```
Cloud instance listens on port 8533. The API is internal-only; nginx proxies `/api/` to the backend.
---
## Data and Backups
The `data/` directory contains the SQLite index database and all uploaded files. Back it up to preserve your index. Pagepiper indexes documents at ingest time. If you modify or replace a source file, use the re-index button on the document card to rebuild its entry.
Large PDFs (hundreds of pages) can take a few minutes to index. The status badge on the document card updates as indexing progresses.
---
## Licensing
Pagepiper uses a split license:
- **MIT:** BM25 full-text search, document library management, ingest pipeline, EPUB support
- **BSL 1.1:** Hybrid vector search (embedding + k-NN), RAG chat, LLM integration
BSL 1.1 is free for personal non-commercial self-hosting. SaaS re-hosting or commercial redistribution requires a license from CircuitForge. BSL 1.1 converts to MIT after four years.
License keys: [circuitforge.tech](https://circuitforge.tech)
---
## Contributing
Issues and PRs welcome at [git.opensourcesolarpunk.com/Circuit-Forge/pagepiper](https://git.opensourcesolarpunk.com/Circuit-Forge/pagepiper).
The ingest pipeline and BM25 index are MIT-licensed. If you build a better PDF parser or add support for additional formats (CBZ, MOBI, etc.), the community benefits directly.

View file

@ -29,7 +29,7 @@ class ChatRequest(BaseModel):
message: str
history: list[ChatTurn] = []
doc_ids: list[str] | None = None
top_k: int = 5
top_k: int = 10
class ChatResponse(BaseModel):
@ -37,6 +37,13 @@ class ChatResponse(BaseModel):
citations: list[dict]
class ChatFeedbackRequest(BaseModel):
rating: int # 1 = thumbs up, -1 = thumbs down
question: str = ""
answer: str = ""
doc_ids: list[str] = []
def _get_llm_router():
"""Return LLMRouter if Ollama configured, else None."""
from app.config import get_llm_config
@ -125,3 +132,31 @@ def chat(req: ChatRequest) -> ChatResponse:
for c in result.citations
],
)
@router.get("/feedback/status")
def chat_feedback_status() -> dict:
enabled = os.environ.get("PAGEPIPER_CHAT_FEEDBACK", "").lower() in ("1", "true", "yes")
return {"enabled": enabled}
@router.post("/feedback")
def submit_chat_feedback(req: ChatFeedbackRequest) -> dict:
import json
import sqlite3
if req.rating not in (1, -1):
from fastapi import HTTPException
raise HTTPException(status_code=422, detail="rating must be 1 or -1")
db_path = _get_db_path()
con = sqlite3.connect(db_path)
try:
con.execute(
"INSERT INTO chat_feedback (rating, question, answer, doc_ids) VALUES (?, ?, ?, ?)",
(req.rating, req.question[:2000], req.answer[:4000], json.dumps(req.doc_ids)),
)
con.commit()
finally:
con.close()
return {"ok": True}

7
app/api/feedback.py Normal file
View file

@ -0,0 +1,7 @@
"""Feedback router — provided by circuitforge-core."""
from circuitforge_core.api import make_feedback_router
router = make_feedback_router(
repo="Circuit-Forge/pagepiper",
product="pagepiper",
)

View file

@ -0,0 +1,88 @@
"""Screenshot attachment endpoint for in-app feedback.
After the cf-core feedback router creates a Forgejo issue, the frontend
can call POST /feedback/attach to upload a screenshot as a comment on that issue.
"""
from __future__ import annotations
import base64
import os
import requests
from fastapi import APIRouter, HTTPException
from pydantic import BaseModel, Field
router = APIRouter()
_FORGEJO_BASE = os.environ.get(
"FORGEJO_API_URL", "https://git.opensourcesolarpunk.com/api/v1"
)
_REPO = "Circuit-Forge/pagepiper"
_MAX_BYTES = 5 * 1024 * 1024
class AttachRequest(BaseModel):
issue_number: int
filename: str = Field(default="screenshot.png", max_length=80)
image_b64: str # data URI or raw base64
class AttachResponse(BaseModel):
comment_url: str
def _forgejo_headers() -> dict[str, str]:
token = os.environ.get("FORGEJO_API_TOKEN", "")
return {"Authorization": f"token {token}"}
def _decode_image(image_b64: str) -> tuple[bytes, str]:
if image_b64.startswith("data:"):
header, _, data = image_b64.partition(",")
mime = header.split(";")[0].split(":")[1] if ":" in header else "image/png"
else:
data = image_b64
mime = "image/png"
return base64.b64decode(data), mime
@router.post("/attach", response_model=AttachResponse)
def attach_screenshot(payload: AttachRequest) -> AttachResponse:
token = os.environ.get("FORGEJO_API_TOKEN", "")
if not token:
raise HTTPException(status_code=503, detail="Feedback not configured.")
raw_bytes, mime = _decode_image(payload.image_b64)
if len(raw_bytes) > _MAX_BYTES:
raise HTTPException(
status_code=413,
detail=f"Screenshot exceeds 5 MB limit ({len(raw_bytes) // 1024} KB received).",
)
asset_resp = requests.post(
f"{_FORGEJO_BASE}/repos/{_REPO}/issues/{payload.issue_number}/assets",
headers=_forgejo_headers(),
files={"attachment": (payload.filename, raw_bytes, mime)},
timeout=20,
)
if not asset_resp.ok:
raise HTTPException(
status_code=502,
detail=f"Forgejo asset upload failed: {asset_resp.text[:200]}",
)
asset_url = asset_resp.json().get("browser_download_url", "")
comment_body = f"**Screenshot attached by reporter:**\n\n![screenshot]({asset_url})"
comment_resp = requests.post(
f"{_FORGEJO_BASE}/repos/{_REPO}/issues/{payload.issue_number}/comments",
headers={**_forgejo_headers(), "Content-Type": "application/json"},
json={"body": comment_body},
timeout=15,
)
if not comment_resp.ok:
raise HTTPException(
status_code=502,
detail=f"Forgejo comment failed: {comment_resp.text[:200]}",
)
return AttachResponse(comment_url=comment_resp.json().get("html_url", ""))

View file

@ -12,11 +12,13 @@ import uuid
from pathlib import Path
from typing import Callable
from fastapi import APIRouter, BackgroundTasks, Depends, HTTPException
from fastapi import APIRouter, BackgroundTasks, Depends, HTTPException, UploadFile
from app.config import WATCH_DIR, DB_PATH, VEC_DB_PATH
from app.config import WATCH_DIR, DB_PATH, VEC_DB_PATH, DATA_DIR
from app.deps import get_db
_MAX_UPLOAD_BYTES = 200 * 1024 * 1024 # 200 MB
logger = logging.getLogger(__name__)
router = APIRouter(prefix="/api/library", tags=["library"])
@ -24,15 +26,31 @@ router = APIRouter(prefix="/api/library", tags=["library"])
_mark_bm25_dirty: Callable[[], None] | None = None
_INGEST_TASKS = {
".pdf": "pagepiper/ingest_pdf",
".epub": "pagepiper/ingest_epub",
}
_INGEST_RUNNERS = {
".pdf": "scripts.ingest_pdf",
".epub": "scripts.ingest_epub",
}
def _dispatch_ingest(
doc_id: str,
file_path: str,
background_tasks: BackgroundTasks,
) -> str:
"""Dispatch an ingest task. Tries cf-orch; falls back to BackgroundTasks."""
import importlib
import os as _os
from pathlib import Path as _Path
suffix = _Path(file_path).suffix.lower()
task_name = _INGEST_TASKS.get(suffix, "pagepiper/ingest_pdf")
runner_module = _INGEST_RUNNERS.get(suffix, "scripts.ingest_pdf")
# Read lazily so test fixtures (monkeypatch.setenv) take effect
_data_dir = _Path(_os.environ.get("PAGEPIPER_DATA_DIR", "data"))
task_id = str(uuid.uuid4())
@ -45,11 +63,11 @@ def _dispatch_ingest(
try:
from circuitforge_core.tasks import dispatch_task # type: ignore[import]
task_id = dispatch_task(caller="pagepiper/ingest_pdf", args=args)
task_id = dispatch_task(caller=task_name, args=args)
logger.info("Dispatched cf-orch ingest task %s for doc %s", task_id, doc_id)
except Exception:
from scripts.ingest_pdf import run as run_ingest
background_tasks.add_task(_run_ingest_background, run_ingest, args, task_id)
mod = importlib.import_module(runner_module)
background_tasks.add_task(_run_ingest_background, mod.run, args, task_id)
logger.info(
"cf-orch unavailable — running ingest in background thread (task %s)", task_id
)
@ -89,7 +107,7 @@ def scan_library(
if not watch.exists():
raise HTTPException(status_code=404, detail=f"Watch directory not found: {watch}")
pdfs = list(watch.glob("**/*.pdf"))
pdfs = list(watch.glob("**/*.pdf")) + list(watch.glob("**/*.epub"))
queued = []
for pdf_path in pdfs:
@ -156,7 +174,8 @@ def delete_document(
# Remove embeddings from vector store
try:
from circuitforge_core.vector.sqlite_vec import LocalSQLiteVecStore # type: ignore[import]
store = LocalSQLiteVecStore(db_path=VEC_DB_PATH, table="page_vecs", dimensions=768)
from app.config import VEC_DIMENSIONS
store = LocalSQLiteVecStore(db_path=VEC_DB_PATH, table="page_vecs", dimensions=VEC_DIMENSIONS)
store.delete_where({"doc_id": doc_id})
except Exception as exc:
logger.warning("Could not remove vectors for doc %s: %s", doc_id, exc)
@ -165,6 +184,20 @@ def delete_document(
_mark_bm25_dirty()
def _get_vec_count(doc_id: str) -> int:
"""Return how many vectors have been stored for this doc. Returns 0 on any error."""
try:
conn = sqlite3.connect(VEC_DB_PATH)
count = conn.execute(
"SELECT COUNT(*) FROM page_vecs_meta WHERE json_extract(metadata, '$.doc_id') = ?",
[doc_id],
).fetchone()[0]
conn.close()
return int(count)
except Exception:
return 0
@router.get("/{doc_id}/status")
def document_status(
doc_id: str,
@ -176,4 +209,54 @@ def document_status(
).fetchone()
if not row:
raise HTTPException(status_code=404, detail="Document not found")
return dict(row)
result = dict(row)
result["vec_count"] = _get_vec_count(doc_id)
return result
@router.post("/upload", status_code=202)
def upload_document(
file: UploadFile,
background_tasks: BackgroundTasks,
db: sqlite3.Connection = Depends(get_db),
) -> dict:
"""Accept a PDF/EPUB upload, save to data/uploads/, and queue for indexing."""
name = Path(file.filename or "").name
suffix = Path(name).suffix.lower()
if suffix not in _INGEST_TASKS:
raise HTTPException(status_code=400, detail="Supported formats: PDF, EPUB")
content = file.file.read()
if len(content) > _MAX_UPLOAD_BYTES:
raise HTTPException(status_code=413, detail="File exceeds 200 MB limit")
upload_dir = DATA_DIR / "uploads"
upload_dir.mkdir(parents=True, exist_ok=True)
dest = upload_dir / name
dest.write_bytes(content)
path_str = str(dest.resolve())
existing = db.execute(
"SELECT id, status FROM documents WHERE file_path = ?", [path_str]
).fetchone()
if existing and existing["status"] == "ready":
return {"doc_id": existing["id"], "task_id": None, "filename": name, "status": "already_indexed"}
if existing:
doc_id = existing["id"]
else:
title = dest.stem.replace("_", " ").replace("-", " ").title()
doc_id = db.execute(
"INSERT INTO documents(title, file_path, status) VALUES (?,?,?) RETURNING id",
[title, path_str, "pending"],
).fetchone()[0]
db.commit()
task_id = _dispatch_ingest(doc_id, path_str, background_tasks)
db.execute(
"UPDATE documents SET status='processing', task_id=? WHERE id=?",
[task_id, doc_id],
)
db.commit()
return {"doc_id": doc_id, "task_id": task_id, "filename": name, "status": "queued"}

View file

@ -10,6 +10,7 @@ DATA_DIR.mkdir(parents=True, exist_ok=True)
DB_PATH = str(DATA_DIR / "pagepiper.db")
VEC_DB_PATH = str(DATA_DIR / "pagepiper_vecs.db")
WATCH_DIR = Path(os.environ.get("PAGEPIPER_WATCH_DIR", "books"))
VEC_DIMENSIONS = int(os.environ.get("PAGEPIPER_EMBED_DIMS", "1024"))
def get_llm_config() -> dict | None:
@ -19,17 +20,27 @@ def get_llm_config() -> dict | None:
return None
_clean = url.rstrip("/")
_base_url = _clean if _clean.endswith("/v1") else _clean + "/v1"
chat_model = os.environ.get("PAGEPIPER_CHAT_MODEL", "mistral:7b")
backend: dict = {
"type": "openai_compat",
"base_url": _base_url,
"model": chat_model,
"embedding_model": os.environ.get("PAGEPIPER_EMBED_MODEL", "nomic-embed-text"),
"supports_images": False,
}
# Wire cf-orch allocation when coordinator is configured so the model stays warm
# and cold-start latency doesn't cause chat timeouts.
orch_url = os.environ.get("CF_ORCH_URL", "").strip()
if orch_url:
backend["cf_orch"] = {
"service": "ollama",
"model_candidates": [chat_model],
"ttl_s": 3600,
}
return {
"fallback_order": ["ollama"],
"backends": {
"ollama": {
"type": "openai_compat",
"base_url": _base_url,
"model": os.environ.get("PAGEPIPER_CHAT_MODEL", "mistral:7b"),
"embedding_model": os.environ.get(
"PAGEPIPER_EMBED_MODEL", "nomic-embed-text"
),
"supports_images": False,
}
},
"backends": {"ollama": backend},
}

View file

@ -9,7 +9,7 @@ from app.config import DB_PATH
def get_db() -> Generator[sqlite3.Connection, None, None]:
conn = sqlite3.connect(DB_PATH)
conn = sqlite3.connect(DB_PATH, check_same_thread=False)
conn.execute("PRAGMA foreign_keys = ON")
conn.execute("PRAGMA journal_mode = WAL")
conn.row_factory = sqlite3.Row

View file

@ -3,11 +3,15 @@
from __future__ import annotations
import logging
import os
import re
import sqlite3
import threading
from contextlib import asynccontextmanager
from fastapi import FastAPI
from app.config import DB_PATH
from app.config import DB_PATH, VEC_DB_PATH, VEC_DIMENSIONS
from app.services.bm25_index import BM25Index
logger = logging.getLogger("pagepiper")
@ -21,9 +25,91 @@ def _apply_migrations() -> None:
migrate(DB_PATH)
def _reembed_docs(docs: list[tuple[str, str]], db_path: str, vec_db_path: str) -> None:
"""Re-run full ingest for a list of (doc_id, file_path) sequentially."""
for doc_id, file_path in docs:
suffix = os.path.splitext(file_path)[1].lower()
try:
if suffix == ".epub":
from scripts.ingest_epub import run
else:
from scripts.ingest_pdf import run
logger.info("Auto re-embed: starting %s", os.path.basename(file_path))
run(doc_id=doc_id, file_path=file_path, db_path=db_path, vec_db_path=vec_db_path)
except Exception as exc:
logger.error("Auto re-embed failed for doc %s: %s", doc_id[:8], exc)
def _check_vec_schema(vec_db_path: str, expected_dims: int, db_path: str) -> None:
"""Drop the vec DB if its stored dimension doesn't match config, then queue re-embed.
sqlite-vec bakes the embedding dimension into the virtual table DDL, so changing
models requires dropping and recreating the whole file. Catches the mismatch at
startup rather than surfacing it as an obscure OperationalError mid-request.
"""
if not os.path.exists(vec_db_path):
return
try:
conn = sqlite3.connect(vec_db_path)
row = conn.execute(
"SELECT sql FROM sqlite_master WHERE name='page_vecs_vecs'"
).fetchone()
conn.close()
except Exception as exc:
logger.warning("Vec schema check could not read %s (non-fatal): %s", vec_db_path, exc)
return
if not row:
return # table not yet created — first embed will build it with the right dims
m = re.search(r'float\[(\d+)\]', row[0])
if not m:
return
actual_dims = int(m.group(1))
if actual_dims == expected_dims:
return
logger.warning(
"Vec DB dimension mismatch: stored=%d, configured=%d — dropping %s and queuing re-embed",
actual_dims, expected_dims, vec_db_path,
)
try:
os.remove(vec_db_path)
except OSError as exc:
logger.error(
"Could not delete stale vec DB %s: %s — fix permissions and restart", vec_db_path, exc
)
return
# Collect all ready docs so we can rebuild their embeddings in the background.
try:
conn = sqlite3.connect(db_path)
docs = conn.execute(
"SELECT id, file_path FROM documents WHERE status='ready'"
).fetchall()
conn.close()
except Exception as exc:
logger.warning("Could not query documents for re-embed: %s", exc)
return
if not docs:
return
logger.info("Queuing re-embed for %d document(s) in background", len(docs))
threading.Thread(
target=_reembed_docs,
args=(docs, db_path, vec_db_path),
daemon=True,
name="pagepiper-reembed",
).start()
@asynccontextmanager
async def lifespan(app: FastAPI):
_apply_migrations()
embed_model = os.environ.get("PAGEPIPER_EMBED_MODEL", "nomic-embed-text")
logger.info("Pagepiper starting — embed model: %s, dims: %d", embed_model, VEC_DIMENSIONS)
_check_vec_schema(VEC_DB_PATH, VEC_DIMENSIONS, DB_PATH)
_bm25.mark_dirty() # will rebuild on first search
yield
@ -39,8 +125,12 @@ from app.api.library import router as library_router # noqa: E402
from app.api.ingest import router as ingest_router # noqa: E402
from app.api.search import router as search_router # noqa: E402
from app.api.chat import router as chat_router # noqa: E402
from app.api.feedback import router as feedback_router # noqa: E402
from app.api.feedback_attach import router as feedback_attach_router # noqa: E402
app.include_router(library_router)
app.include_router(ingest_router)
app.include_router(search_router)
app.include_router(chat_router)
app.include_router(feedback_router, prefix="/api/v1/feedback")
app.include_router(feedback_attach_router, prefix="/api/v1/feedback")

View file

@ -8,6 +8,7 @@ BM25-only path is MIT and has no gate.
from __future__ import annotations
import logging
import sqlite3
from dataclasses import dataclass
from app.services.bm25_index import BM25Index
@ -15,6 +16,62 @@ from app.services.bm25_index import BM25Index
logger = logging.getLogger(__name__)
def _fetch_adjacent(
hits: list["RetrievedChunk"],
db_path: str,
window: int = 1,
) -> list["RetrievedChunk"]:
"""Return chunks immediately before/after each hit that aren't already in the hit set.
Definitional passages often start mid-sentence because the EPUB/PDF chunk
boundary fell mid-paragraph. Fetching the preceding chunk restores the subject
so the LLM can understand 'them' / 'they' references correctly.
"""
if not hits:
return []
existing_keys = {(c.doc_id, c.page_number) for c in hits}
needed: dict[str, set[int]] = {}
for c in hits:
for delta in range(-window, window + 1):
if delta == 0:
continue
adj_page = c.page_number + delta
if adj_page > 0 and (c.doc_id, adj_page) not in existing_keys:
needed.setdefault(c.doc_id, set()).add(adj_page)
if not needed:
return []
extra: list[RetrievedChunk] = []
try:
conn = sqlite3.connect(db_path)
conn.row_factory = sqlite3.Row
for doc_id, pages in needed.items():
placeholders = ",".join("?" * len(pages))
rows = conn.execute(
f"SELECT id, doc_id, page_number, text FROM page_chunks "
f"WHERE doc_id=? AND page_number IN ({placeholders})",
[doc_id] + sorted(pages),
).fetchall()
for row in rows:
extra.append(
RetrievedChunk(
chunk_id=row["id"],
doc_id=row["doc_id"],
page_number=row["page_number"],
text=row["text"],
bm25_score=0.0,
vector_score=None,
)
)
conn.close()
except Exception as exc:
logger.warning("Context expansion query failed (non-fatal): %s", exc)
return extra
@dataclass(frozen=True)
class RetrievedChunk:
"""A chunk returned by the retriever, with source scores."""
@ -55,13 +112,23 @@ class Retriever:
for r in self._bm25.query(query, top_k=top_k * 2, doc_ids=doc_ids)
}
vec = llm.embed([query])[0]
store = LocalSQLiteVecStore(db_path=vec_db_path, table="page_vecs", dimensions=768)
filter_meta = {"doc_id": doc_ids[0]} if doc_ids and len(doc_ids) == 1 else None
vec_hits = store.query(vec, top_k=top_k * 2, filter_metadata=filter_meta)
try:
vec = llm.embed([query])[0]
except Exception as exc:
logger.warning("Embed failed, falling back to BM25-only: %s", exc)
return self._bm25_only(query, top_k, doc_ids, db_path)
from app.config import VEC_DIMENSIONS
store = LocalSQLiteVecStore(db_path=vec_db_path, table="page_vecs", dimensions=VEC_DIMENSIONS)
if doc_ids and len(doc_ids) > 1:
vec_hits = [h for h in vec_hits if h.metadata.get("doc_id") in doc_ids]
# sqlite-vec applies filter_metadata as a Python post-filter after fetching k
# nearest globally. When the corpus spans many documents and only a subset is
# selected, most of those k candidates are from non-target docs and get dropped,
# leaving too few vector hits. Oversample heavily and filter in Python instead.
if doc_ids:
vec_candidates = store.query(vec, top_k=top_k * 20)
vec_hits = [h for h in vec_candidates if h.metadata.get("doc_id") in doc_ids]
else:
vec_hits = store.query(vec, top_k=top_k * 2)
# Merge: BM25 hits take priority; vector hits fill in additional results
merged: dict[str, RetrievedChunk] = {}
@ -76,10 +143,10 @@ class Retriever:
)
for vh in vec_hits:
# _chunks is the loaded list of dicts from BM25Index; no public accessor exists
text = next((c["text"] for c in self._bm25._chunks if c["id"] == vh.id), "")
if vh.id in merged:
existing = merged[vh.id]
merged[vh.id] = RetrievedChunk(
text = next((c["text"] for c in self._bm25._chunks if c["id"] == vh.entry_id), "")
if vh.entry_id in merged:
existing = merged[vh.entry_id]
merged[vh.entry_id] = RetrievedChunk(
chunk_id=existing.chunk_id,
doc_id=existing.doc_id,
page_number=existing.page_number,
@ -88,8 +155,8 @@ class Retriever:
vector_score=vh.score,
)
else:
merged[vh.id] = RetrievedChunk(
chunk_id=vh.id,
merged[vh.entry_id] = RetrievedChunk(
chunk_id=vh.entry_id,
doc_id=vh.metadata.get("doc_id", ""),
page_number=int(vh.metadata.get("page_number", 0)),
text=text,
@ -103,14 +170,15 @@ class Retriever:
vec = (1.0 / (1.0 + r.vector_score)) if r.vector_score is not None else 0.0
return bm25 * 0.5 + vec * 0.5
ranked = sorted(merged.values(), key=_combined, reverse=True)
return ranked[:top_k]
ranked = sorted(merged.values(), key=_combined, reverse=True)[:top_k]
adjacent = _fetch_adjacent(ranked, db_path)
return ranked + adjacent
def _bm25_only(
self, query: str, top_k: int, doc_ids: list[str] | None, db_path: str
) -> list[RetrievedChunk]:
self._bm25.ensure_fresh(db_path)
return [
hits = [
RetrievedChunk(
chunk_id=r.chunk_id,
doc_id=r.doc_id,
@ -121,3 +189,5 @@ class Retriever:
)
for r in self._bm25.query(query, top_k=top_k, doc_ids=doc_ids)
]
adjacent = _fetch_adjacent(hits, db_path)
return hits + adjacent

View file

@ -42,7 +42,9 @@ class Synthesizer:
history: list[dict],
chunks: list[RetrievedChunk],
) -> SynthesisResult:
context_parts = [f"[p.{c.page_number}]\n{c.text[:500]}" for c in chunks]
# 1500 chars (~300 words) per chunk: enough to capture definitions that
# appear mid-paragraph without blowing past a 32k-context model's limit.
context_parts = [f"[p.{c.page_number}]\n{c.text[:1500]}" for c in chunks]
context = "\n\n---\n\n".join(context_parts)
prompt = f"Document excerpts:\n\n{context}\n\nQuestion: {message}"
@ -52,7 +54,7 @@ class Synthesizer:
Citation(
doc_id=c.doc_id,
page_number=c.page_number,
snippet=c.text[:200],
snippet=c.text[:400],
bm25_score=c.bm25_score,
)
for c in chunks

View file

@ -20,6 +20,8 @@ services:
# cf-orch: route LLM inference through coordinator for managed GPU access
CF_ORCH_URL: http://host.docker.internal:7700
CF_APP_NAME: pagepiper
# CF_LICENSE_KEY is the auth token CFOrchClient sends to the coordinator
CF_LICENSE_KEY: ${COORDINATOR_PAGEPIPER_KEY:-}
COORDINATOR_URL: http://10.1.10.71:7700
COORDINATOR_PAGEPIPER_KEY: ${COORDINATOR_PAGEPIPER_KEY:-}
extra_hosts:

View file

@ -1,6 +1,6 @@
# Pagepiper
Self-hosted document search with BM25 full-text indexing and (with local Ollama) hybrid vector search and LLM-powered chat.
Self-hosted document search with BM25 full-text indexing and (with local Ollama) hybrid vector search and LLM-powered chat. Supports PDF and EPUB files.
## Demo
@ -12,7 +12,7 @@ Try it: [pagepiper.circuitforge.tech](https://pagepiper.circuitforge.tech)
![Library view](screenshots/01-library.png)
Scan your PDF directory to index documents. Each document shows page count and ingest status.
Scan your PDF directory to index documents, or upload individual PDFs directly. Each document shows page count and ingest status.
### Chat
@ -20,25 +20,123 @@ Scan your PDF directory to index documents. Each document shows page count and i
Ask questions across your indexed documents. Results cite the source document and page number.
## Quick Start (Docker)
```bash
git clone https://git.opensourcesolarpunk.com/Circuit-Forge/pagepiper
cd pagepiper
cp .env.example .env # set PAGEPIPER_DATA_DIR and PAGEPIPER_BOOKS_DIR
docker compose up -d --build
# open http://localhost:8521
```
Place PDFs in your `PAGEPIPER_BOOKS_DIR` directory, then click "Scan for PDFs" in the Library view.
## Tiers
| Feature | Free | Paid (BYOK) |
|---------|------|-------------|
| BM25 full-text search | Yes | Yes |
| PDF and EPUB upload via browser | Yes | Yes |
| Unlimited local ingestion | Yes | Yes |
| Hybrid vector search | No | Yes (local Ollama) |
| LLM chat over documents | No | Yes (local Ollama) |
Set `PAGEPIPER_OLLAMA_URL` in your `.env` to unlock the Paid tier with your own Ollama instance.
BYOK (Bring Your Own Key) means you supply your own Ollama instance. No cloud API keys required.
---
## Self-Hosting Guide
### Prerequisites
- [Docker](https://docs.docker.com/get-docker/) and Docker Compose
- PDFs you want to search
- Optional: [Ollama](https://ollama.com) running locally for semantic search and LLM chat
### Step 1: Get the code
```bash
git clone https://git.opensourcesolarpunk.com/Circuit-Forge/pagepiper
cd pagepiper
```
### Step 2: Configure
```bash
cp .env.example .env
```
Open `.env` and set your directories:
```dotenv
# Where pagepiper stores its index database
PAGEPIPER_DATA_DIR=./data
# Directory to scan for PDFs (used by the "Scan for PDFs" button)
# You can also upload individual PDFs via the web UI without setting this
PAGEPIPER_BOOKS_DIR=/path/to/your/pdfs
```
To unlock hybrid vector search and LLM chat, add your Ollama endpoint:
```dotenv
PAGEPIPER_OLLAMA_URL=http://localhost:11434
PAGEPIPER_CHAT_MODEL=mistral:7b
PAGEPIPER_EMBED_MODEL=nomic-embed-text
```
### Step 3: Start
```bash
docker compose up -d --build
```
Open [http://localhost:8521](http://localhost:8521) in your browser.
### Step 4: Add your PDFs
Two ways to add documents:
**Option A — Upload via browser** (easiest for small collections):
Click the **Upload PDF** button in the Library view and select a file. It saves to `data/uploads/` and begins indexing automatically.
**Option B — Mount a directory** (best for large collections):
Set `PAGEPIPER_BOOKS_DIR` in your `.env` to point at a folder of PDFs, then click **Scan for PDFs**. Pagepiper finds all `.pdf` files recursively and queues them for indexing.
### Step 5: Search
Switch to the **Chat** tab and ask questions about your documents. The Free tier uses BM25 keyword matching. With Ollama configured, you get semantic (vector) search and LLM-generated answers with page-level citations.
---
## Ollama Setup (optional)
Install Ollama from [ollama.com](https://ollama.com), then pull the models:
```bash
ollama pull mistral:7b
ollama pull nomic-embed-text
```
Pagepiper's Docker container reaches Ollama at `host.docker.internal` — no extra network config needed on Linux/Mac with Docker Desktop. On a headless Linux server, make sure Ollama binds to `0.0.0.0`:
```bash
OLLAMA_HOST=0.0.0.0 ollama serve
```
---
## Managing the instance
```bash
# Check status
docker compose ps
# View API logs
docker compose logs -f api
# Stop
docker compose down
# Rebuild after updates
docker compose up -d --build
```
---
## Notes
- Pagepiper indexes PDFs at ingest time. Changes to the source file require a re-index (use the re-index button on the document card).
- The `data/` directory contains the SQLite index database and any uploaded files. Back it up to preserve your index.
- Large PDFs (hundreds of pages) can take a few minutes to index. Watch the status badge on the document card.

View file

@ -14,6 +14,8 @@ dependencies:
- pdfplumber>=0.11
- pytesseract>=0.3
- Pillow>=10.0
- ebooklib>=0.18
- beautifulsoup4>=4.12
- sqlite-vec>=0.1
- pytest>=8.0
- pytest-asyncio>=0.23

View file

@ -0,0 +1,9 @@
-- chat answer thumbs up/down signals (local SQLite, always available)
CREATE TABLE IF NOT EXISTS chat_feedback (
id TEXT PRIMARY KEY DEFAULT (lower(hex(randomblob(16)))),
rating INTEGER NOT NULL CHECK (rating IN (1, -1)),
question TEXT NOT NULL DEFAULT '',
answer TEXT NOT NULL DEFAULT '',
doc_ids TEXT NOT NULL DEFAULT '[]',
created_at TEXT NOT NULL DEFAULT (datetime('now'))
);

239
scripts/ingest_epub.py Normal file
View file

@ -0,0 +1,239 @@
# scripts/ingest_epub.py
"""
cf-orch task: pagepiper/ingest_epub
Extracts text from an EPUB file, stores chapter chunks in SQLite, and (if Ollama is
configured) generates embeddings and stores them in the sqlite-vec store.
Each EPUB chapter becomes one chunk (equivalent to a PDF page).
Entry point:
python scripts/ingest_epub.py --doc-id X --file-path Y --db-path Z --vec-db-path W
"""
from __future__ import annotations
import logging
import os
import sqlite3
from dataclasses import dataclass
from pathlib import Path
logger = logging.getLogger("pagepiper.ingest_epub")
EMBED_BATCH_SIZE = 64
_WORDS_PER_CHUNK = 500 # target chunk size for word-count fallback
@dataclass
class _Chunk:
page_number: int
text: str
source: str
word_count: int
def _paragraphs_from_soup(soup) -> list[str]:
"""Extract non-trivial, artifact-free text lines from parsed HTML."""
from scripts.text_clean import filter_paragraphs
raw = soup.get_text(separator="\n", strip=True)
return filter_paragraphs(raw.splitlines())
def _chunks_from_paragraphs(paragraphs: list[str], start_num: int) -> list[_Chunk]:
"""Accumulate paragraphs into ~_WORDS_PER_CHUNK-word chunks."""
chunks: list[_Chunk] = []
current: list[str] = []
current_count = 0
chunk_num = start_num
for para in paragraphs:
words = para.split()
if current_count + len(words) > _WORDS_PER_CHUNK and current:
text = "\n".join(current)
chunks.append(_Chunk(chunk_num, text, "text", current_count))
chunk_num += 1
current, current_count = [], 0
current.append(para)
current_count += len(words)
if current:
text = "\n".join(current)
chunks.append(_Chunk(chunk_num, text, "text", current_count))
return chunks
def _extract_chunks(file_path: str) -> list[_Chunk]:
import ebooklib
from ebooklib import epub
from bs4 import BeautifulSoup
from scripts.text_clean import clean_line, is_artifact_line
book = epub.read_epub(file_path, options={"ignore_ncx": True})
all_chunks: list[_Chunk] = []
for item in book.get_items_of_type(ebooklib.ITEM_DOCUMENT):
soup = BeautifulSoup(item.get_content(), "html.parser")
headings = soup.find_all(["h1", "h2", "h3", "h4"])
if len(headings) >= 2:
# Heading-based split: one chunk per section
current_parts: list[str] = []
for elem in soup.find_all(["h1", "h2", "h3", "h4", "p", "li", "blockquote"]):
if elem.name in ("h1", "h2", "h3", "h4"):
if current_parts:
text = "\n".join(current_parts).strip()
if text:
n = len(all_chunks) + 1
all_chunks.append(_Chunk(n, text, "text", len(text.split())))
current_parts = [elem.get_text(" ", strip=True)]
else:
t = clean_line(elem.get_text(" ", strip=True))
if t and not is_artifact_line(t):
current_parts.append(t)
if current_parts:
text = "\n".join(current_parts).strip()
if text:
n = len(all_chunks) + 1
all_chunks.append(_Chunk(n, text, "text", len(text.split())))
else:
# Word-count fallback: accumulate paragraphs into ~500-word chunks
paragraphs = _paragraphs_from_soup(soup)
if paragraphs:
all_chunks.extend(_chunks_from_paragraphs(paragraphs, len(all_chunks) + 1))
return all_chunks
def _update_status(
conn: sqlite3.Connection,
doc_id: str,
status: str,
page_count: int | None = None,
error_msg: str | None = None,
) -> None:
if page_count is not None:
conn.execute(
"UPDATE documents SET status=?, page_count=?, updated_at=datetime('now') WHERE id=?",
[status, page_count, doc_id],
)
elif error_msg is not None:
conn.execute(
"UPDATE documents SET status=?, error_msg=?, updated_at=datetime('now') WHERE id=?",
[status, error_msg, doc_id],
)
else:
conn.execute(
"UPDATE documents SET status=?, updated_at=datetime('now') WHERE id=?",
[status, doc_id],
)
conn.commit()
def run(doc_id: str, file_path: str, db_path: str, vec_db_path: str) -> None:
"""Run the full ingest pipeline for one EPUB. Called by cf-orch or BackgroundTasks."""
conn: sqlite3.Connection | None = None
try:
conn = sqlite3.connect(db_path, timeout=30)
conn.execute("PRAGMA journal_mode = WAL")
conn.execute("PRAGMA foreign_keys = ON")
_update_status(conn, doc_id, "processing")
logger.info("Extracting chapters from %s", file_path)
chunks = _extract_chunks(file_path)
logger.info("Extracted %d chapters", len(chunks))
conn.execute("DELETE FROM page_chunks WHERE doc_id=?", [doc_id])
chunk_rows: list[tuple[str, int, str]] = []
for chunk in chunks:
row = conn.execute(
"""INSERT INTO page_chunks(doc_id, page_number, text, source, word_count)
VALUES (?,?,?,?,?) RETURNING id""",
[doc_id, chunk.page_number, chunk.text, chunk.source, chunk.word_count],
).fetchone()
chunk_rows.append((row[0], chunk.page_number, chunk.text))
conn.commit()
# Embedding failure is non-fatal: document remains BM25-searchable.
ollama_url = os.environ.get("PAGEPIPER_OLLAMA_URL", "").strip()
if ollama_url and chunks:
try:
logger.info("Embedding %d chapters via Ollama at %s", len(chunks), ollama_url)
from circuitforge_core.llm import LLMRouter
from circuitforge_core.vector.sqlite_vec import LocalSQLiteVecStore
_clean = ollama_url.rstrip("/")
base_url = _clean if _clean.endswith("/v1") else _clean + "/v1"
router = LLMRouter({
"fallback_order": ["ollama"],
"backends": {
"ollama": {
"type": "openai_compat",
"base_url": base_url,
"model": os.environ.get("PAGEPIPER_CHAT_MODEL", "mistral:7b"),
"embedding_model": os.environ.get(
"PAGEPIPER_EMBED_MODEL", "nomic-embed-text"
),
"supports_images": False,
}
},
})
embed_dims = int(os.environ.get("PAGEPIPER_EMBED_DIMS", "1024"))
vec_store = LocalSQLiteVecStore(
db_path=vec_db_path, table="page_vecs", dimensions=embed_dims
)
vec_store.delete_where({"doc_id": doc_id})
texts = [text for _, _, text in chunk_rows]
vectors: list[list[float]] = []
for i in range(0, len(texts), EMBED_BATCH_SIZE):
vectors.extend(router.embed(texts[i : i + EMBED_BATCH_SIZE]))
for (chunk_id, page_number, _), vector in zip(chunk_rows, vectors):
vec_store.upsert(
entry_id=chunk_id,
vector=vector,
metadata={"doc_id": doc_id, "page_number": page_number},
)
logger.info("Stored %d embeddings", len(vectors))
except Exception as embed_exc:
logger.warning(
"Embedding skipped for doc %s — BM25 only (reason: %s)",
doc_id, embed_exc,
)
_update_status(conn, doc_id, "ready", page_count=len(chunks))
logger.info("Ingest complete for doc %s (%d chapters)", doc_id, len(chunks))
except Exception as exc:
logger.error("Ingest failed for doc %s: %s", doc_id, exc, exc_info=True)
if conn is not None:
try:
_update_status(conn, doc_id, "error", error_msg=str(exc))
except Exception:
logger.warning("Could not write error status for doc %s", doc_id)
raise
finally:
if conn is not None:
conn.close()
if __name__ == "__main__":
import argparse
logging.basicConfig(level=logging.INFO)
parser = argparse.ArgumentParser(
description="Ingest an EPUB (cf-orch task entry point)"
)
parser.add_argument("--doc-id", required=True)
parser.add_argument("--file-path", required=True)
parser.add_argument("--db-path", required=True)
parser.add_argument("--vec-db-path", required=True)
a = parser.parse_args()
run(
doc_id=a.doc_id,
file_path=a.file_path,
db_path=a.db_path,
vec_db_path=a.vec_db_path,
)

View file

@ -52,7 +52,8 @@ def run(doc_id: str, file_path: str, db_path: str, vec_db_path: str) -> None:
conn: sqlite3.Connection | None = None
try:
conn = sqlite3.connect(db_path)
conn = sqlite3.connect(db_path, timeout=30)
conn.execute("PRAGMA journal_mode = WAL")
conn.execute("PRAGMA foreign_keys = ON")
_update_status(conn, doc_id, "processing")
@ -63,59 +64,71 @@ def run(doc_id: str, file_path: str, db_path: str, vec_db_path: str) -> None:
logger.info("Extracted %d pages", len(chunks))
# Step 2: Store chunks (replace any existing for this doc)
from scripts.text_clean import clean_paragraph
conn.execute("DELETE FROM page_chunks WHERE doc_id=?", [doc_id])
chunk_rows: list[tuple[str, int, str]] = []
for chunk in chunks:
cleaned_text = clean_paragraph(chunk.text)
if not cleaned_text:
continue
row = conn.execute(
"""INSERT INTO page_chunks(doc_id, page_number, text, source, word_count)
VALUES (?,?,?,?,?) RETURNING id""",
[doc_id, chunk.page_number, chunk.text, chunk.source, chunk.word_count],
[doc_id, chunk.page_number, cleaned_text, chunk.source, len(cleaned_text.split())],
).fetchone()
chunk_rows.append((row[0], chunk.page_number, chunk.text))
chunk_rows.append((row[0], chunk.page_number, cleaned_text))
conn.commit()
# Step 3: Embed and store vectors if Ollama is configured (BYOK gate)
# Embedding failure is non-fatal: document remains BM25-searchable.
ollama_url = os.environ.get("PAGEPIPER_OLLAMA_URL", "").strip()
if ollama_url and chunks:
logger.info("Embedding %d pages via Ollama at %s", len(chunks), ollama_url)
from circuitforge_core.llm import LLMRouter
from circuitforge_core.vector.sqlite_vec import LocalSQLiteVecStore
try:
logger.info("Embedding %d pages via Ollama at %s", len(chunks), ollama_url)
from circuitforge_core.llm import LLMRouter
from circuitforge_core.vector.sqlite_vec import LocalSQLiteVecStore
_clean = ollama_url.rstrip("/")
base_url = _clean if _clean.endswith("/v1") else _clean + "/v1"
router = LLMRouter({
"fallback_order": ["ollama"],
"backends": {
"ollama": {
"type": "openai_compat",
"base_url": base_url,
"model": os.environ.get("PAGEPIPER_CHAT_MODEL", "mistral:7b"),
"embedding_model": os.environ.get(
"PAGEPIPER_EMBED_MODEL", "nomic-embed-text"
),
"supports_images": False,
}
},
})
vec_store = LocalSQLiteVecStore(
db_path=vec_db_path, table="page_vecs", dimensions=768
)
# Remove old vectors before re-inserting. If embedding fails mid-way,
# old vectors are gone but new ones are partial — re-ingest recovers.
vec_store.delete_where({"doc_id": doc_id})
texts = [text for _, _, text in chunk_rows]
vectors: list[list[float]] = []
for i in range(0, len(texts), EMBED_BATCH_SIZE):
vectors.extend(router.embed(texts[i : i + EMBED_BATCH_SIZE]))
for (chunk_id, page_number, _), vector in zip(chunk_rows, vectors):
vec_store.upsert(
id=chunk_id,
vector=vector,
metadata={"doc_id": doc_id, "page_number": page_number},
_clean = ollama_url.rstrip("/")
base_url = _clean if _clean.endswith("/v1") else _clean + "/v1"
router = LLMRouter({
"fallback_order": ["ollama"],
"backends": {
"ollama": {
"type": "openai_compat",
"base_url": base_url,
"model": os.environ.get("PAGEPIPER_CHAT_MODEL", "mistral:7b"),
"embedding_model": os.environ.get(
"PAGEPIPER_EMBED_MODEL", "nomic-embed-text"
),
"supports_images": False,
}
},
})
embed_dims = int(os.environ.get("PAGEPIPER_EMBED_DIMS", "1024"))
vec_store = LocalSQLiteVecStore(
db_path=vec_db_path, table="page_vecs", dimensions=embed_dims
)
# Remove old vectors before re-inserting. If embedding fails mid-way,
# old vectors are gone but new ones are partial — re-ingest recovers.
vec_store.delete_where({"doc_id": doc_id})
texts = [text for _, _, text in chunk_rows]
vectors: list[list[float]] = []
for i in range(0, len(texts), EMBED_BATCH_SIZE):
vectors.extend(router.embed(texts[i : i + EMBED_BATCH_SIZE]))
for (chunk_id, page_number, _), vector in zip(chunk_rows, vectors):
vec_store.upsert(
entry_id=chunk_id,
vector=vector,
metadata={"doc_id": doc_id, "page_number": page_number},
)
logger.info("Stored %d embeddings", len(vectors))
except Exception as embed_exc:
logger.warning(
"Embedding skipped for doc %s — BM25 only (reason: %s)",
doc_id, embed_exc,
)
logger.info("Stored %d embeddings", len(vectors))
_update_status(conn, doc_id, "ready", page_count=len(chunks))
logger.info("Ingest complete for doc %s (%d pages)", doc_id, len(chunks))

72
scripts/text_clean.py Normal file
View file

@ -0,0 +1,72 @@
# scripts/text_clean.py
"""
Shared text-cleaning utilities for ingest pipelines.
Removes boilerplate lines injected by ebook converters, piracy watermarks,
and other non-content artifacts before chunks are stored or embedded.
"""
from __future__ import annotations
import re
# Lines that match any of these patterns are dropped entirely.
# Each pattern is matched against the stripped line (case-insensitive).
_LINE_DROP_PATTERNS: list[re.Pattern] = [
# ABC Amber converter family
re.compile(r'generated by abc amber', re.IGNORECASE),
re.compile(r'processtext\.com', re.IGNORECASE),
# Calibre / sigil metadata lines
re.compile(r'calibre \d+\.\d+', re.IGNORECASE),
# Standalone URLs (line is just a URL, no surrounding prose)
re.compile(r'^https?://\S+$'),
# Common piracy / file-sharing watermarks
re.compile(r'www\.\w+\.(com|net|org)/\S*book', re.IGNORECASE),
re.compile(r'downloaded from', re.IGNORECASE),
re.compile(r'scanned by', re.IGNORECASE),
re.compile(r'provided by', re.IGNORECASE),
# Page-number-only lines from PDF extraction (e.g. "- 42 -" or "42")
re.compile(r'^\s*-?\s*\d{1,4}\s*-?\s*$'),
]
# Inline substrings to strip from within a line before further processing.
_INLINE_STRIP_PATTERNS: list[re.Pattern] = [
re.compile(r'generated by abc amber \w+ converter,?\s*https?://\S*', re.IGNORECASE),
re.compile(r'https?://www\.processtext\.com/\S*', re.IGNORECASE),
]
def is_artifact_line(line: str) -> bool:
"""Return True if the line is a known conversion artifact and should be dropped."""
stripped = line.strip()
return any(p.search(stripped) for p in _LINE_DROP_PATTERNS)
def clean_line(line: str) -> str:
"""Strip inline converter artifacts from a line, returning the cleaned version."""
for p in _INLINE_STRIP_PATTERNS:
line = p.sub("", line)
return line.strip()
def clean_paragraph(text: str) -> str:
"""Clean a multi-line paragraph: drop artifact lines, strip inline artifacts."""
lines = []
for line in text.splitlines():
if is_artifact_line(line):
continue
cleaned = clean_line(line)
if cleaned:
lines.append(cleaned)
return "\n".join(lines)
def filter_paragraphs(paragraphs: list[str]) -> list[str]:
"""Remove artifact lines from a list of paragraph strings."""
result = []
for para in paragraphs:
if is_artifact_line(para):
continue
cleaned = clean_line(para)
if cleaned and len(cleaned.split()) >= 4:
result.append(cleaned)
return result

View file

@ -30,8 +30,10 @@ def client(test_db, tmp_path, monkeypatch):
from app.main import app, _bm25
from app.deps import get_db
# Suppress migrations during tests — test_db fixture already applies the schema
# Suppress startup side effects — test_db fixture already applies the schema,
# and vec schema validation is tested separately in test_startup.py
monkeypatch.setattr(_main_module, "_apply_migrations", lambda: None)
monkeypatch.setattr(_main_module, "_check_vec_schema", lambda *a, **kw: None)
def override_db():
conn = sqlite3.connect(test_db)

170
tests/test_startup.py Normal file
View file

@ -0,0 +1,170 @@
# tests/test_startup.py
"""Tests for startup vec DB schema validation (_check_vec_schema)."""
from __future__ import annotations
import os
import sqlite3
import threading
from unittest.mock import MagicMock, patch
import pytest
from app.main import _check_vec_schema, _reembed_docs
def _make_vec_db(path: str, dims: int) -> None:
"""Create a minimal sqlite-vec-style DB with the given dimension."""
conn = sqlite3.connect(path)
conn.execute("PRAGMA journal_mode=WAL")
# Replicate the virtual table name used by LocalSQLiteVecStore
conn.execute(f"CREATE TABLE page_vecs_vecs (embedding float[{dims}])")
conn.execute(
"INSERT INTO sqlite_master(type, name, tbl_name, sql) VALUES (?,?,?,?)"
if False else ""
)
# Write a real sqlite_master entry via a virtual table workaround:
# Easiest is to put the dimension marker directly in a metadata table.
# But _check_vec_schema reads sqlite_master, so we need the real DDL there.
conn.close()
# sqlite_master is read-only — recreate using the real CREATE VIRTUAL TABLE path
# by faking it via a regular table with the matching name pattern.
conn2 = sqlite3.connect(path)
conn2.execute("DROP TABLE IF EXISTS page_vecs_vecs")
# Write a row that _check_vec_schema will parse via its regex
conn2.execute(
"CREATE TABLE _schema_hint (sql TEXT)"
)
conn2.execute(
"INSERT INTO _schema_hint VALUES (?)",
[f"CREATE VIRTUAL TABLE page_vecs_vecs USING vec0(embedding float[{dims}])"],
)
conn2.commit()
conn2.close()
def _make_real_vec_db(path: str, dims: int) -> None:
"""Create a vec DB whose sqlite_master actually contains the dimension DDL."""
import sqlite3 as _sq
# We can't load sqlite-vec in tests, so simulate by writing sqlite_master directly
# via a shadow table that _check_vec_schema reads.
conn = _sq.connect(path)
conn.execute(
f"""CREATE TABLE page_vecs_vecs (
embedding float[{dims}]
)"""
)
conn.commit()
conn.close()
class TestCheckVecSchema:
def test_no_file_is_noop(self, tmp_path):
"""Missing vec DB should not raise."""
_check_vec_schema(str(tmp_path / "missing.db"), 1024, str(tmp_path / "main.db"))
def test_matching_dims_keeps_file(self, tmp_path):
"""Correct dimensions: vec DB must not be deleted."""
vec_path = str(tmp_path / "vecs.db")
conn = sqlite3.connect(vec_path)
conn.execute("CREATE TABLE page_vecs_vecs (embedding float[1024])")
conn.commit()
conn.close()
_check_vec_schema(vec_path, 1024, str(tmp_path / "main.db"))
assert os.path.exists(vec_path), "Vec DB should not be deleted when dims match"
def test_mismatched_dims_deletes_file(self, tmp_path):
"""Dimension mismatch: vec DB must be deleted."""
vec_path = str(tmp_path / "vecs.db")
conn = sqlite3.connect(vec_path)
conn.execute("CREATE TABLE page_vecs_vecs (embedding float[768])")
conn.commit()
conn.close()
db_path = str(tmp_path / "main.db")
_check_vec_schema(vec_path, 1024, db_path)
assert not os.path.exists(vec_path), "Vec DB should be deleted on dimension mismatch"
def test_mismatched_dims_queues_reembed(self, tmp_path):
"""Dimension mismatch: re-embed thread must be started for ready docs."""
vec_path = str(tmp_path / "vecs.db")
conn = sqlite3.connect(vec_path)
conn.execute("CREATE TABLE page_vecs_vecs (embedding float[768])")
conn.commit()
conn.close()
db_path = str(tmp_path / "main.db")
schema = (
"CREATE TABLE documents ("
"id TEXT PRIMARY KEY, title TEXT, file_path TEXT, "
"status TEXT, task_id TEXT, page_count INTEGER, "
"error_msg TEXT, created_at TEXT, updated_at TEXT)"
)
main_conn = sqlite3.connect(db_path)
main_conn.execute(schema)
main_conn.execute(
"INSERT INTO documents VALUES ('abc123', 'Book', '/tmp/book.pdf', 'ready', NULL, 10, NULL, '2026-01-01', '2026-01-01')"
)
main_conn.commit()
main_conn.close()
started = []
real_thread_start = threading.Thread.start
def _capture_start(self):
started.append(self)
# Don't actually run the re-embed to keep tests fast
self.run = lambda: None
real_thread_start(self)
with patch.object(threading.Thread, "start", _capture_start):
_check_vec_schema(vec_path, 1024, db_path)
assert len(started) == 1, "Exactly one re-embed thread should be started"
assert started[0].name == "pagepiper-reembed"
def test_no_ready_docs_skips_thread(self, tmp_path):
"""Mismatch with no ready docs: no thread should be started."""
vec_path = str(tmp_path / "vecs.db")
conn = sqlite3.connect(vec_path)
conn.execute("CREATE TABLE page_vecs_vecs (embedding float[768])")
conn.commit()
conn.close()
db_path = str(tmp_path / "main.db")
schema = (
"CREATE TABLE documents ("
"id TEXT PRIMARY KEY, title TEXT, file_path TEXT, "
"status TEXT, task_id TEXT, page_count INTEGER, "
"error_msg TEXT, created_at TEXT, updated_at TEXT)"
)
main_conn = sqlite3.connect(db_path)
main_conn.execute(schema)
main_conn.commit()
main_conn.close()
started = []
with patch.object(threading.Thread, "start", lambda self: started.append(self)):
_check_vec_schema(vec_path, 1024, db_path)
assert len(started) == 0
def test_empty_db_no_table_is_noop(self, tmp_path):
"""Vec DB exists but has no page_vecs_vecs table yet: no deletion."""
vec_path = str(tmp_path / "vecs.db")
sqlite3.connect(vec_path).close() # create empty file
_check_vec_schema(vec_path, 1024, str(tmp_path / "main.db"))
assert os.path.exists(vec_path)
def test_corrupt_db_does_not_raise(self, tmp_path):
"""Corrupt or unreadable vec DB must not propagate exceptions."""
vec_path = str(tmp_path / "vecs.db")
with open(vec_path, "w") as f:
f.write("not a sqlite database")
_check_vec_schema(vec_path, 1024, str(tmp_path / "main.db"))
# No assertion needed — just must not raise

108
tests/test_text_clean.py Normal file
View file

@ -0,0 +1,108 @@
# tests/test_text_clean.py
"""Tests for ebook artifact filtering in scripts/text_clean.py."""
from __future__ import annotations
import pytest
from scripts.text_clean import (
clean_line,
clean_paragraph,
filter_paragraphs,
is_artifact_line,
)
class TestIsArtifactLine:
def test_abc_amber_lit(self):
assert is_artifact_line(
"Generated by ABC Amber LIT Converter, http://www.processtext.com/abclit.html"
)
def test_abc_amber_rtf(self):
assert is_artifact_line("Generated by ABC Amber RTF Converter")
def test_processtext_url_only(self):
assert is_artifact_line("http://www.processtext.com/abclit.html")
def test_standalone_url(self):
assert is_artifact_line("https://www.example.com/book")
def test_page_number_only(self):
assert is_artifact_line("42")
assert is_artifact_line("- 42 -")
assert is_artifact_line(" 7 ")
def test_downloaded_from(self):
assert is_artifact_line("Downloaded from www.fictionsite.net")
def test_scanned_by(self):
assert is_artifact_line("Scanned by SomeUser")
def test_normal_prose_not_artifact(self):
assert not is_artifact_line(
'"And what if food isn\'t the only reason Jagang is going to Anderith?"'
)
def test_url_embedded_in_prose_not_dropped(self):
# A URL inside a sentence is not a standalone-URL artifact line
assert not is_artifact_line(
"You can read more about this at https://example.com and continue."
)
def test_short_page_header_not_dropped(self):
# "Chapter 1" is not an artifact — 4-digit number check only drops bare numbers
assert not is_artifact_line("Chapter 1")
class TestCleanLine:
def test_strips_inline_abc_amber(self):
line = "Some prose. Generated by ABC Amber LIT Converter, http://www.processtext.com/abclit.html"
result = clean_line(line)
assert "ABC Amber" not in result
assert "processtext" not in result
assert "Some prose." in result
def test_passes_clean_line_unchanged(self):
line = "He cocked an eyebrow and smiled."
assert clean_line(line) == line
class TestCleanParagraph:
def test_drops_artifact_lines_from_paragraph(self):
text = (
"Generated by ABC Amber LIT Converter, http://www.processtext.com/abclit.html\n"
'"And what if food isn\'t the only reason Jagang is going to Anderith?"\n'
"He cocked an eyebrow."
)
result = clean_paragraph(text)
assert "ABC Amber" not in result
assert "Jagang" in result
assert "eyebrow" in result
def test_all_artifact_paragraph_returns_empty(self):
text = "Generated by ABC Amber LIT Converter\nhttp://www.processtext.com/abclit.html"
assert clean_paragraph(text) == ""
def test_clean_paragraph_unchanged(self):
text = "Richard raised his sword.\nThe magic surged through him."
assert clean_paragraph(text) == text
class TestFilterParagraphs:
def test_drops_artifact_paragraphs(self):
paras = [
"Generated by ABC Amber LIT Converter, http://www.processtext.com/abclit.html",
'"And what if food isn\'t the only reason Jagang is going to Anderith?"',
"He cocked an eyebrow at the question.",
]
result = filter_paragraphs(paras)
assert len(result) == 2
assert all("ABC Amber" not in p for p in result)
def test_drops_short_lines_under_4_words(self):
paras = ["Hi", "OK sure", "Valid sentence with enough words here."]
result = filter_paragraphs(paras)
assert result == ["Valid sentence with enough words here."]
def test_empty_input(self):
assert filter_paragraphs([]) == []

View file

@ -6,11 +6,13 @@
<RouterLink to="/chat" class="nav-link">Chat</RouterLink>
</nav>
<RouterView />
<FeedbackButton />
</div>
</template>
<script setup lang="ts">
import { RouterLink, RouterView } from "vue-router"
import FeedbackButton from "@/components/FeedbackButton.vue"
</script>
<style>

View file

@ -37,6 +37,15 @@ export interface TaskStatus {
error?: string
}
export interface DocumentStatus {
id: string
status: "pending" | "processing" | "ready" | "error"
task_id: string | null
page_count: number | null
vec_count: number
error_msg: string | null
}
export interface ChatMessage {
role: string
content: string
@ -62,11 +71,23 @@ export const api = {
const r = await fetch(`${BASE}/api/library/${docId}`, { method: "DELETE" })
if (!r.ok) throw new Error(await r.text())
},
async uploadDocument(file: File): Promise<{ doc_id: string; task_id: string | null; filename: string; status: string }> {
const form = new FormData()
form.append("file", file)
const r = await fetch(`${BASE}/api/library/upload`, { method: "POST", body: form })
if (!r.ok) throw new Error(await r.text())
return r.json()
},
async getTaskStatus(taskId: string): Promise<TaskStatus> {
const r = await fetch(`${BASE}/api/ingest/${taskId}`)
if (!r.ok) throw new Error(await r.text())
return r.json()
},
async getDocumentStatus(docId: string): Promise<DocumentStatus> {
const r = await fetch(`${BASE}/api/library/${docId}/status`)
if (!r.ok) throw new Error(await r.text())
return r.json()
},
async search(query: string, topK = 10, docIds?: string[]): Promise<SearchResult[]> {
const r = await fetch(`${BASE}/api/search`, {
method: "POST",
@ -98,4 +119,21 @@ export const api = {
}
return r.json()
},
async chatFeedbackStatus(): Promise<{ enabled: boolean }> {
const r = await fetch(`${BASE}/api/chat/feedback/status`)
if (!r.ok) return { enabled: false }
return r.json()
},
async submitChatFeedback(
rating: 1 | -1,
question: string,
answer: string,
docIds: string[],
): Promise<void> {
await fetch(`${BASE}/api/chat/feedback`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ rating, question, answer, doc_ids: docIds }),
})
},
}

View file

@ -1,18 +1,24 @@
<template>
<div class="doc-card" :class="`status-${doc.status}`">
<div class="doc-status-badge">{{ doc.status }}</div>
<div class="doc-card" :class="`status-${currentStatus}`">
<div class="doc-status-badge" :class="`badge-${currentStatus}`">{{ currentStatus }}</div>
<div class="doc-title">{{ doc.title }}</div>
<div class="doc-meta" v-if="doc.page_count != null">{{ doc.page_count }} pages</div>
<div class="doc-meta" v-if="displayPageCount != null">{{ displayPageCount }} pages</div>
<div class="doc-meta path">{{ shortPath }}</div>
<IngestProgress
v-if="doc.status === 'processing' && doc.task_id"
:task-id="doc.task_id"
@done="emit('refresh')"
/>
<div class="ingest-progress" v-if="isProcessing">
<div class="progress-label">
<span>{{ progressLabel }}</span>
<span class="progress-pct" v-if="progressPct != null">{{ progressPct }}%</span>
</div>
<div class="progress-bar">
<div class="progress-fill" :class="{ indeterminate: progressPct == null }" :style="progressPct != null ? { width: `${progressPct}%` } : {}" />
</div>
</div>
<p class="doc-error" v-if="currentStatus === 'error'">{{ errorMsg ?? 'Indexing failed.' }}</p>
<div class="doc-actions">
<button class="btn-sm" @click="emit('reingest', doc.id)" :disabled="doc.status === 'processing'">
<button class="btn-sm" @click="emit('reingest', doc.id)" :disabled="isProcessing">
Re-index
</button>
<button class="btn-sm danger" @click="emit('delete', doc.id)">Remove</button>
@ -21,9 +27,9 @@
</template>
<script setup lang="ts">
import { computed } from "vue"
import { computed, onMounted, onUnmounted, ref } from "vue"
import type { Document } from "@/api"
import IngestProgress from "@/components/IngestProgress.vue"
import { api } from "@/api"
const props = defineProps<{ doc: Document }>()
const emit = defineEmits<{ reingest: [id: string]; delete: [id: string]; refresh: [] }>()
@ -32,6 +38,54 @@ const shortPath = computed(() => {
const parts = props.doc.file_path.split("/")
return parts.slice(-2).join("/")
})
// Live-updating fields polled from /api/library/{id}/status
const currentStatus = ref(props.doc.status)
const displayPageCount = ref(props.doc.page_count)
const vecCount = ref(0)
const errorMsg = ref<string | null>(null)
const isProcessing = computed(() => currentStatus.value === "processing")
const progressLabel = computed(() => {
if (displayPageCount.value == null || vecCount.value === 0) return "Extracting text…"
return `Embedding ${vecCount.value} / ${displayPageCount.value} pages`
})
const progressPct = computed((): number | null => {
if (displayPageCount.value == null || displayPageCount.value === 0) return null
if (vecCount.value === 0) return null
return Math.min(Math.round((vecCount.value / displayPageCount.value) * 100), 99)
})
let timer: ReturnType<typeof setInterval> | null = null
async function pollStatus() {
try {
const s = await api.getDocumentStatus(props.doc.id)
currentStatus.value = s.status
displayPageCount.value = s.page_count
vecCount.value = s.vec_count
errorMsg.value = s.error_msg
if (s.status !== "processing") {
stopPoll()
if (s.status === "ready") emit("refresh")
}
} catch (_e: unknown) { /* non-fatal — keep polling */ }
}
function stopPoll() {
if (timer) { clearInterval(timer); timer = null }
}
onMounted(() => {
if (props.doc.status === "processing") {
pollStatus()
timer = setInterval(pollStatus, 3000)
}
})
onUnmounted(stopPoll)
</script>
<style scoped>
@ -48,6 +102,7 @@ const shortPath = computed(() => {
}
.doc-card.status-error { border-color: var(--color-error); }
.doc-card.status-ready { border-color: var(--color-success); }
.doc-card.status-processing { border-color: var(--color-accent); }
.doc-title { font-weight: 600; font-size: 1rem; }
.doc-meta { font-size: 0.8rem; color: var(--color-text-muted); }
.doc-meta.path { font-family: var(--font-mono); word-break: break-all; }
@ -57,6 +112,9 @@ const shortPath = computed(() => {
padding: 2px 6px; border-radius: var(--radius-sm);
background: var(--color-surface-alt);
}
.badge-processing { background: var(--color-accent); color: #fff; }
.badge-ready { background: var(--color-success); color: #fff; }
.badge-error { background: var(--color-error); color: #fff; }
.doc-actions { display: flex; gap: 0.5rem; margin-top: 0.5rem; }
.btn-sm {
padding: 4px 10px; border: 1px solid var(--color-border); border-radius: var(--radius-sm);
@ -65,4 +123,23 @@ const shortPath = computed(() => {
.btn-sm:hover { border-color: var(--color-accent); }
.btn-sm.danger:hover { border-color: var(--color-error); color: var(--color-error); }
.btn-sm:disabled { opacity: 0.4; cursor: default; }
.doc-error { color: var(--color-error); font-size: 0.8rem; }
/* Progress bar */
.ingest-progress { margin-top: 0.25rem; }
.progress-label {
display: flex; justify-content: space-between;
font-size: 0.78rem; color: var(--color-text-muted); margin-bottom: 4px;
}
.progress-pct { font-variant-numeric: tabular-nums; }
.progress-bar { height: 4px; background: var(--color-border); border-radius: 2px; overflow: hidden; }
.progress-fill { height: 100%; background: var(--color-accent); transition: width 0.4s ease; }
.progress-fill.indeterminate {
width: 40%;
animation: slide 1.4s ease-in-out infinite;
}
@keyframes slide {
0% { transform: translateX(-100%); }
100% { transform: translateX(300%); }
}
</style>

View file

@ -0,0 +1,631 @@
<template>
<!-- Floating trigger button -->
<button
v-if="enabled"
class="feedback-fab"
@click="open = true"
aria-label="Send feedback or report a bug"
title="Send feedback or report a bug"
>
<svg class="feedback-fab-icon" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="1.8" stroke-linecap="round" stroke-linejoin="round">
<path d="M21 15a2 2 0 01-2 2H7l-4 4V5a2 2 0 012-2h14a2 2 0 012 2z"/>
</svg>
<span class="feedback-fab-label">Feedback</span>
</button>
<!-- Modal teleported to body to avoid z-index / overflow clipping -->
<Teleport to="body">
<Transition name="modal-fade">
<div v-if="open" class="feedback-overlay" @click.self="close">
<div class="feedback-modal" role="dialog" aria-modal="true" aria-label="Send Feedback">
<!-- Header -->
<div class="feedback-header">
<h2 class="feedback-title">{{ step === 1 ? "What's on your mind?" : "Review & submit" }}</h2>
<button class="feedback-close" @click="close" aria-label="Close">
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" width="18" height="18">
<line x1="18" y1="6" x2="6" y2="18"/><line x1="6" y1="6" x2="18" y2="18"/>
</svg>
</button>
</div>
<!-- Step 1: Form -->
<div v-if="step === 1" class="feedback-body">
<div class="form-group">
<label class="form-label">Type</label>
<div class="filter-chip-row">
<button
v-for="t in types"
:key="t.value"
:class="['btn-chip', { active: form.type === t.value }]"
@click="form.type = t.value"
type="button"
>{{ t.label }}</button>
</div>
</div>
<div class="form-group">
<label class="form-label">Title <span class="form-required">*</span></label>
<input
v-model="form.title"
class="form-input"
type="text"
placeholder="Short summary of the issue or idea"
maxlength="120"
/>
</div>
<div class="form-group">
<label class="form-label">Description <span class="form-required">*</span></label>
<textarea
v-model="form.description"
class="form-input feedback-textarea"
placeholder="Describe what happened or what you'd like to see…"
rows="4"
/>
</div>
<div v-if="form.type === 'bug'" class="form-group">
<label class="form-label">Reproduction steps</label>
<textarea
v-model="form.repro"
class="form-input feedback-textarea"
placeholder="1. Go to…&#10;2. Tap…&#10;3. See error"
rows="3"
/>
</div>
<div class="form-group">
<label class="form-label">Screenshot <span class="text-muted text-xs">(optional, max 5 MB)</span></label>
<input
type="file"
accept="image/*"
class="form-input-file"
@change="onScreenshotChange"
ref="fileInput"
/>
<div v-if="screenshotPreview" class="screenshot-preview">
<img :src="screenshotPreview" alt="Screenshot preview" />
<button class="screenshot-remove btn-link" type="button" @click="clearScreenshot" aria-label="Remove screenshot">Remove</button>
</div>
</div>
<p v-if="stepError" class="feedback-error">{{ stepError }}</p>
</div>
<!-- Step 2: Attribution + confirm -->
<div v-if="step === 2" class="feedback-body">
<div class="feedback-summary card">
<div class="feedback-summary-row">
<span class="text-muted text-sm">Type</span>
<span class="text-sm font-semibold">{{ typeLabel }}</span>
</div>
<div class="feedback-summary-row">
<span class="text-muted text-sm">Title</span>
<span class="text-sm">{{ form.title }}</span>
</div>
<div class="feedback-summary-row">
<span class="text-muted text-sm">Description</span>
<span class="text-sm feedback-summary-desc">{{ form.description }}</span>
</div>
</div>
<div class="form-group mt-md">
<label class="form-label">Attribution (optional)</label>
<input
v-model="form.submitter"
class="form-input"
type="text"
placeholder="Your name &lt;email@example.com&gt;"
/>
<p class="text-muted text-xs mt-xs">Include your name and email in the issue if you'd like a response. Never required.</p>
</div>
<p v-if="submitError" class="feedback-error">{{ submitError }}</p>
<div v-if="submitted" class="feedback-success">
Issue filed! <a :href="issueUrl" target="_blank" rel="noopener" class="feedback-link">View on Forgejo </a>
</div>
</div>
<!-- Footer nav -->
<div class="feedback-footer">
<button v-if="step === 2 && !submitted" class="btn btn-ghost" @click="step = 1" :disabled="loading"> Back</button>
<button v-if="!submitted" class="btn btn-ghost" @click="close" :disabled="loading">Cancel</button>
<button
v-if="step === 1"
class="btn btn-primary"
@click="nextStep"
>Next </button>
<button
v-if="step === 2 && !submitted"
class="btn btn-primary"
@click="submit"
:disabled="loading"
>{{ loading ? 'Filing…' : 'Submit' }}</button>
<button v-if="submitted" class="btn btn-primary" @click="close">Done</button>
</div>
</div>
</div>
</Transition>
</Teleport>
</template>
<script setup lang="ts">
import { ref, computed, onMounted } from 'vue'
const props = defineProps<{ currentTab?: string }>()
const fileInput = ref<HTMLInputElement | null>(null)
const screenshotB64 = ref<string | null>(null)
const screenshotPreview = ref<string | null>(null)
const screenshotFilename = ref('screenshot.png')
function onScreenshotChange(event: Event) {
const file = (event.target as HTMLInputElement).files?.[0]
if (!file) return
screenshotFilename.value = file.name
const reader = new FileReader()
reader.onload = (e) => {
const result = e.target?.result as string
screenshotB64.value = result
screenshotPreview.value = result
}
reader.readAsDataURL(file)
}
function clearScreenshot() {
screenshotB64.value = null
screenshotPreview.value = null
if (fileInput.value) fileInput.value.value = ''
}
const apiBase = (import.meta.env.VITE_API_BASE as string) ?? ''
// Probe once on mount hidden until confirmed enabled so button never flashes
const enabled = ref(false)
onMounted(async () => {
try {
const res = await fetch(`${apiBase}/api/v1/feedback/status`)
if (res.ok) {
const data = await res.json()
enabled.value = data.enabled === true
}
} catch { /* network error — stay hidden */ }
})
const open = ref(false)
const step = ref(1)
const loading = ref(false)
const stepError = ref('')
const submitError = ref('')
const submitted = ref(false)
const issueUrl = ref('')
const types: { value: 'bug' | 'feature' | 'other'; label: string }[] = [
{ value: 'bug', label: '🐛 Bug' },
{ value: 'feature', label: '✨ Feature request' },
{ value: 'other', label: '💬 Other' },
]
const form = ref({
type: 'bug' as 'bug' | 'feature' | 'other',
title: '',
description: '',
repro: '',
submitter: '',
})
const typeLabel = computed(() => types.find(t => t.value === form.value.type)?.label ?? '')
function close() {
open.value = false
// reset after transition
setTimeout(reset, 300)
}
function reset() {
step.value = 1
loading.value = false
stepError.value = ''
submitError.value = ''
submitted.value = false
issueUrl.value = ''
form.value = { type: 'bug', title: '', description: '', repro: '', submitter: '' }
clearScreenshot()
}
function nextStep() {
stepError.value = ''
if (!form.value.title.trim() || !form.value.description.trim()) {
stepError.value = 'Please fill in both Title and Description.'
return
}
step.value = 2
}
async function submit() {
loading.value = true
submitError.value = ''
try {
const res = await fetch(`${apiBase}/api/v1/feedback`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
title: form.value.title.trim(),
description: form.value.description.trim(),
type: form.value.type,
repro: form.value.repro.trim(),
tab: props.currentTab ?? 'unknown',
submitter: form.value.submitter.trim(),
}),
})
if (!res.ok) {
const err = await res.json().catch(() => ({ detail: res.statusText }))
submitError.value = err.detail ?? 'Submission failed.'
return
}
const data = await res.json()
issueUrl.value = data.issue_url
// Upload screenshot if provided
if (screenshotB64.value) {
try {
await fetch(`${apiBase}/api/v1/feedback/attach`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
issue_number: data.issue_number,
filename: screenshotFilename.value,
image_b64: screenshotB64.value,
}),
})
// Non-fatal: if attach fails, the issue was still filed
} catch { /* ignore attach errors */ }
}
submitted.value = true
} catch (e) {
submitError.value = 'Network error — please try again.'
} finally {
loading.value = false
}
}
</script>
<style scoped>
/* ── Floating action button ─────────────────────────────────────────── */
.feedback-fab {
position: fixed;
right: var(--spacing-md);
bottom: calc(68px + var(--spacing-md)); /* above mobile bottom nav */
z-index: 190;
display: flex;
align-items: center;
gap: var(--spacing-xs);
padding: 9px var(--spacing-md);
background: var(--color-bg-elevated);
border: 1px solid var(--color-border);
border-radius: 999px;
color: var(--color-text-secondary);
font-size: var(--font-size-sm);
font-family: var(--font-body);
font-weight: 500;
cursor: pointer;
box-shadow: var(--shadow-md);
transition: background 0.15s, color 0.15s, box-shadow 0.15s, border-color 0.15s;
}
.feedback-fab:hover {
background: var(--color-bg-card);
color: var(--color-text-primary);
border-color: var(--color-border-focus);
box-shadow: var(--shadow-lg);
}
.feedback-fab-icon { width: 15px; height: 15px; flex-shrink: 0; }
.feedback-fab-label { white-space: nowrap; }
/* On desktop, bottom nav is gone — drop to standard corner */
@media (min-width: 769px) {
.feedback-fab {
bottom: var(--spacing-lg);
}
}
/* ── Overlay ──────────────────────────────────────────────────────────── */
.feedback-overlay {
position: fixed;
inset: 0;
background: rgba(0, 0, 0, 0.55);
z-index: 1000;
display: flex;
align-items: flex-end;
justify-content: center;
padding: 0;
}
@media (min-width: 500px) {
.feedback-overlay {
align-items: center;
padding: var(--spacing-md);
}
}
/* ── Modal ────────────────────────────────────────────────────────────── */
.feedback-modal {
background: var(--color-bg-elevated);
border: 1px solid var(--color-border);
border-radius: var(--radius-lg) var(--radius-lg) 0 0;
width: 100%;
max-height: 90vh;
overflow-y: auto;
display: flex;
flex-direction: column;
box-shadow: var(--shadow-xl);
}
@media (min-width: 500px) {
.feedback-modal {
border-radius: var(--radius-lg);
width: 100%;
max-width: 520px;
max-height: 85vh;
}
}
.feedback-header {
display: flex;
align-items: center;
justify-content: space-between;
padding: var(--spacing-md) var(--spacing-md) var(--spacing-sm);
border-bottom: 1px solid var(--color-border);
flex-shrink: 0;
}
.feedback-title {
font-family: var(--font-display);
font-size: var(--font-size-lg);
font-weight: 600;
margin: 0;
}
.feedback-close {
background: transparent;
border: none;
color: var(--color-text-muted);
cursor: pointer;
padding: 4px;
border-radius: var(--radius-sm);
display: flex;
align-items: center;
justify-content: center;
}
.feedback-close:hover { color: var(--color-text-primary); }
.feedback-body {
padding: var(--spacing-md);
flex: 1;
overflow-y: auto;
display: flex;
flex-direction: column;
gap: var(--spacing-md);
}
.feedback-footer {
display: flex;
align-items: center;
justify-content: flex-end;
gap: var(--spacing-sm);
padding: var(--spacing-sm) var(--spacing-md);
border-top: 1px solid var(--color-border);
flex-shrink: 0;
}
.feedback-textarea {
resize: vertical;
min-height: 80px;
font-family: var(--font-body);
font-size: var(--font-size-sm);
}
.form-required { color: var(--color-error); margin-left: 2px; }
.feedback-error {
color: var(--color-error);
font-size: var(--font-size-sm);
margin: 0;
}
.feedback-success {
color: var(--color-success);
font-size: var(--font-size-sm);
padding: var(--spacing-sm) var(--spacing-md);
background: var(--color-success-bg);
border: 1px solid var(--color-success-border);
border-radius: var(--radius-md);
}
.feedback-link { color: var(--color-success); font-weight: 600; text-decoration: underline; }
/* Summary card (step 2) */
.feedback-summary {
display: flex;
flex-direction: column;
gap: var(--spacing-xs);
padding: var(--spacing-sm) var(--spacing-md);
background: var(--color-bg-secondary);
border-radius: var(--radius-md);
border: 1px solid var(--color-border);
}
.feedback-summary-row {
display: flex;
gap: var(--spacing-md);
align-items: flex-start;
}
.feedback-summary-row > :first-child { min-width: 72px; flex-shrink: 0; }
.feedback-summary-desc {
white-space: pre-wrap;
word-break: break-word;
}
.mt-md { margin-top: var(--spacing-md); }
.mt-xs { margin-top: var(--spacing-xs); }
/* ── Form elements ────────────────────────────────────────────────────── */
.form-group {
display: flex;
flex-direction: column;
gap: var(--spacing-xs);
}
.form-label {
font-size: var(--font-size-sm);
font-weight: 600;
color: var(--color-text-muted);
text-transform: uppercase;
letter-spacing: 0.06em;
}
.form-input {
width: 100%;
padding: var(--spacing-xs) var(--spacing-sm);
background: var(--color-bg-secondary);
border: 1px solid var(--color-border);
border-radius: var(--radius-md);
color: var(--color-text-primary);
font-family: var(--font-body);
font-size: var(--font-size-sm);
line-height: 1.5;
transition: border-color 0.15s;
box-sizing: border-box;
}
.form-input:focus {
outline: none;
border-color: var(--color-border-focus);
}
.form-input::placeholder { color: var(--color-text-muted); opacity: 0.7; }
/* ── Buttons ──────────────────────────────────────────────────────────── */
.btn {
display: inline-flex;
align-items: center;
justify-content: center;
gap: var(--spacing-xs);
padding: var(--spacing-xs) var(--spacing-md);
border-radius: var(--radius-md);
font-family: var(--font-body);
font-size: var(--font-size-sm);
font-weight: 500;
cursor: pointer;
transition: background 0.15s, color 0.15s, border-color 0.15s;
white-space: nowrap;
}
.btn:disabled { opacity: 0.5; cursor: not-allowed; }
.btn-primary {
background: var(--color-primary);
color: #fff;
border: 1px solid var(--color-primary);
}
.btn-primary:hover:not(:disabled) { filter: brightness(1.1); }
.btn-ghost {
background: transparent;
color: var(--color-text-secondary);
border: 1px solid var(--color-border);
}
.btn-ghost:hover:not(:disabled) {
background: var(--color-bg-secondary);
color: var(--color-text-primary);
border-color: var(--color-border-focus);
}
/* ── Filter chips ─────────────────────────────────────────────────────── */
.filter-chip-row {
display: flex;
flex-wrap: wrap;
gap: var(--spacing-xs);
}
.btn-chip {
padding: 5px var(--spacing-sm);
background: var(--color-bg-secondary);
border: 1px solid var(--color-border);
border-radius: 999px;
font-family: var(--font-body);
font-size: var(--font-size-sm);
font-weight: 500;
color: var(--color-text-secondary);
cursor: pointer;
transition: background 0.15s, color 0.15s, border-color 0.15s;
}
.btn-chip.active,
.btn-chip:hover {
background: color-mix(in srgb, var(--color-primary) 15%, transparent);
border-color: var(--color-primary);
color: var(--color-primary);
}
/* ── Card ─────────────────────────────────────────────────────────────── */
.card {
background: var(--color-bg-card);
border: 1px solid var(--color-border);
border-radius: var(--radius-md);
}
/* ── Text utilities ───────────────────────────────────────────────────── */
.text-muted { color: var(--color-text-muted); }
.text-sm { font-size: var(--font-size-sm); line-height: 1.5; }
.text-xs { font-size: 0.75rem; line-height: 1.5; }
.font-semibold { font-weight: 600; }
/* ── Screenshot attachment ────────────────────────────────────────────── */
.form-input-file {
display: block;
width: 100%;
padding: var(--spacing-xs) var(--spacing-sm);
background: var(--color-bg-secondary);
border: 1px dashed var(--color-border);
border-radius: var(--radius-md);
color: var(--color-text-secondary);
font-family: var(--font-body);
font-size: var(--font-size-sm);
cursor: pointer;
box-sizing: border-box;
}
.form-input-file:focus { outline: 2px solid var(--color-border-focus); outline-offset: 2px; }
.screenshot-preview {
margin-top: var(--spacing-xs);
display: flex;
align-items: flex-start;
gap: var(--spacing-sm);
}
.screenshot-preview img {
max-width: 160px;
max-height: 100px;
border-radius: var(--radius-sm);
border: 1px solid var(--color-border);
object-fit: cover;
}
.screenshot-remove {
font-size: var(--font-size-xs);
color: var(--color-text-muted);
background: none;
border: none;
cursor: pointer;
padding: 2px 4px;
min-height: 24px;
}
.screenshot-remove:hover { color: var(--color-error); }
.btn-link {
background: none;
border: none;
color: var(--color-primary);
cursor: pointer;
padding: 0;
font-family: var(--font-body);
font-size: inherit;
text-decoration: underline;
}
/* Transition */
.modal-fade-enter-active, .modal-fade-leave-active { transition: opacity 0.2s ease; }
.modal-fade-enter-from, .modal-fade-leave-to { opacity: 0; }
</style>

View file

@ -20,6 +20,35 @@
--radius-lg: 16px;
--shadow-card: 0 2px 8px rgba(0,0,0,0.4);
--transition-fast: 150ms ease;
/* Spacing scale */
--spacing-xs: 0.25rem;
--spacing-sm: 0.5rem;
--spacing-md: 1rem;
--spacing-lg: 1.5rem;
/* Font scale */
--font-body: var(--font-base);
--font-display: var(--font-base);
--font-size-xs: 0.75rem;
--font-size-sm: 0.875rem;
--font-size-lg: 1.125rem;
/* Shadow aliases */
--shadow-md: var(--shadow-card);
--shadow-lg: var(--shadow-card);
--shadow-xl: 0 4px 20px rgba(0,0,0,0.5);
/* Color aliases for shared component compat */
--color-primary: var(--color-accent);
--color-text-primary: var(--color-text);
--color-text-secondary: var(--color-text-muted);
--color-bg-elevated: var(--color-surface);
--color-bg-card: var(--color-surface);
--color-bg-secondary: var(--color-bg);
--color-border-focus: var(--color-accent);
--color-success-bg: color-mix(in srgb, var(--color-success) 15%, transparent);
--color-success-border: color-mix(in srgb, var(--color-success) 35%, transparent);
}
@media (prefers-color-scheme: light) {

View file

@ -25,6 +25,25 @@
:bm25-score="cite.bm25_score ?? undefined"
/>
</div>
<div v-if="msg.role === 'assistant' && chatFeedbackEnabled" class="message-thumbs" :aria-label="`Rate this answer`">
<button
class="thumb-btn"
:class="{ active: msg.rating === 1 }"
@click="rate(i, 1)"
:disabled="msg.rating != null"
title="Helpful"
aria-label="Mark as helpful"
>👍</button>
<button
class="thumb-btn"
:class="{ active: msg.rating === -1 }"
@click="rate(i, -1)"
:disabled="msg.rating != null"
title="Not helpful"
aria-label="Mark as not helpful"
>👎</button>
<span v-if="msg.rating != null" class="thumb-thanks">Thanks!</span>
</div>
</div>
<div class="message assistant loading" v-if="thinking">
@ -77,6 +96,7 @@ interface ChatMessage {
role: "user" | "assistant"
content: string
citations?: Citation[]
rating?: 1 | -1
}
const history = ref<ChatMessage[]>([])
@ -88,6 +108,7 @@ const messagesEl = ref<HTMLElement | null>(null)
const inputEl = ref<HTMLInputElement | null>(null)
const allDocs = ref<Document[]>([])
const selectedDocs = ref<string[]>([])
const chatFeedbackEnabled = ref(false)
const readyDocs = computed(() => allDocs.value.filter(d => d.status === "ready"))
const docTitles = computed(() =>
@ -96,6 +117,7 @@ const docTitles = computed(() =>
onMounted(async () => {
allDocs.value = await api.getLibrary().catch(() => [])
api.chatFeedbackStatus().then(s => { chatFeedbackEnabled.value = s.enabled }).catch(() => {})
inputEl.value?.focus()
})
@ -137,6 +159,17 @@ function scrollBottom() {
messagesEl.value.scrollTop = messagesEl.value.scrollHeight
}
}
async function rate(index: number, rating: 1 | -1) {
const msg = history.value[index]
if (!msg || msg.role !== "assistant" || msg.rating != null) return
// Update UI immediately (optimistic)
history.value[index] = { ...msg, rating }
const question = index > 0 ? (history.value[index - 1]?.content ?? "") : ""
await api.submitChatFeedback(rating, question, msg.content, selectedDocs.value).catch(() => {
// Non-fatal rating is cosmetic, ignore network errors
})
}
</script>
<style scoped>
@ -232,6 +265,27 @@ function scrollBottom() {
font-size: 0.85rem; margin-bottom: 0.5rem; cursor: pointer; line-height: 1.4;
}
.message-thumbs {
display: flex;
align-items: center;
gap: 0.35rem;
margin-top: 0.4rem;
}
.thumb-btn {
background: transparent;
border: 1px solid var(--color-border);
border-radius: var(--radius-sm);
cursor: pointer;
font-size: 0.9rem;
padding: 2px 6px;
line-height: 1;
transition: background var(--transition-fast), border-color var(--transition-fast);
}
.thumb-btn:hover:not(:disabled) { background: var(--color-surface-alt); border-color: var(--color-accent); }
.thumb-btn.active { background: var(--color-surface-alt); border-color: var(--color-accent); }
.thumb-btn:disabled { opacity: 0.4; cursor: default; }
.thumb-thanks { font-size: 0.75rem; color: var(--color-text-muted); }
@media (max-width: 640px) {
.chat-layout { flex-direction: column-reverse; }
.sidebar { width: 100%; height: auto; max-height: 30vh; border-left: none; border-top: 1px solid var(--color-border); }

View file

@ -2,16 +2,23 @@
<main class="library">
<header class="library-header">
<h1>Library</h1>
<button class="btn-primary" @click="scan" :disabled="scanning">
{{ scanning ? "Scanning..." : "Scan for PDFs" }}
</button>
<div class="header-actions">
<button class="btn-secondary" @click="triggerUpload" :disabled="uploading">
{{ uploading ? "Uploading..." : "Upload PDF / EPUB" }}
</button>
<input ref="fileInput" type="file" accept=".pdf,.epub" style="display:none" @change="handleUpload">
<button class="btn-primary" @click="scan" :disabled="scanning">
{{ scanning ? "Scanning..." : "Scan for PDFs" }}
</button>
</div>
</header>
<p class="error-msg" v-if="error">{{ error }}</p>
<p class="empty-state" v-if="!loading && docs.length === 0">
No books indexed yet. Click "Scan for PDFs" to discover PDFs in your books directory.<br>
Make sure your PDF directory is mounted at <code>/books</code> inside the container.
No documents indexed yet.<br>
<strong>Upload a PDF</strong> using the button above, or mount a directory and click
<strong>Scan for PDFs</strong> to index an entire collection.
</p>
<div class="doc-grid" v-else>
@ -39,8 +46,10 @@ import DocumentCard from "@/components/DocumentCard.vue"
const docs = ref<Document[]>([])
const loading = ref(true)
const scanning = ref(false)
const uploading = ref(false)
const error = ref<string | null>(null)
const scanResult = ref<{ discovered: number; queued: number } | null>(null)
const fileInput = ref<HTMLInputElement | null>(null)
async function load() {
loading.value = true
@ -88,18 +97,45 @@ async function remove(id: string) {
}
}
function triggerUpload() {
fileInput.value?.click()
}
async function handleUpload(event: Event) {
const input = event.target as HTMLInputElement
const file = input.files?.[0]
if (!file) return
uploading.value = true
error.value = null
try {
await api.uploadDocument(file)
await load()
} catch (e) {
error.value = e instanceof Error ? e.message : "Upload failed"
} finally {
uploading.value = false
input.value = ""
}
}
onMounted(load)
</script>
<style scoped>
.library { padding: 1.5rem; max-width: 1200px; margin: 0 auto; }
.library-header { display: flex; align-items: center; justify-content: space-between; margin-bottom: 1.5rem; flex-wrap: wrap; gap: 1rem; }
.header-actions { display: flex; gap: 0.5rem; flex-wrap: wrap; }
h1 { font-size: 1.5rem; }
.btn-primary {
background: var(--color-accent); color: #fff; border: none; padding: 0.6rem 1.2rem;
border-radius: var(--radius-sm); cursor: pointer; font-size: 0.95rem;
}
.btn-primary:disabled { opacity: 0.5; cursor: default; }
.btn-secondary {
background: transparent; color: var(--color-accent); border: 1px solid var(--color-accent);
padding: 0.6rem 1.2rem; border-radius: var(--radius-sm); cursor: pointer; font-size: 0.95rem;
}
.btn-secondary:disabled { opacity: 0.5; cursor: default; }
.doc-grid { display: grid; grid-template-columns: repeat(auto-fill, minmax(280px, 1fr)); gap: 1rem; }
.empty-state { color: var(--color-text-muted); line-height: 1.8; }
.empty-state code { font-family: var(--font-mono); background: var(--color-surface-alt); padding: 2px 6px; border-radius: 3px; }