feat: RAG retrieval quality, artifact cleaning, and ingestion progress UI
Retrieval:
- Add _fetch_adjacent() to retriever: fetches page ± 1 chunks from DB
after ranking so mid-sentence EPUB chunk boundaries don't lose context
- Fix vec DB doc-filter: oversample to top_k*20 before Python filter
instead of post-filtering an already-small global pool (fixes wrong-book
results when searching within a single document)
- top_k default 5 → 10; context per chunk 500 → 1500 chars; citation
snippet 200 → 400 chars
Artifact cleaning:
- Add scripts/text_clean.py: strips ABC Amber LIT Converter watermarks,
processtext.com URLs, bare page numbers, piracy stamps from extracted text
- Wire clean_paragraph() into ingest_pdf.py and new ingest_epub.py
Startup validation:
- _check_vec_schema() at boot: detects embedding dimension mismatch,
deletes stale vec DB, and queues sequential re-embed in background thread
- Sequential _reembed_docs() prevents SQLite lock races on startup re-embed
cf-orch integration:
- Wire CF_ORCH_URL / CF_LICENSE_KEY into LLMRouter backend config so
allocate() fires and keeps the Ollama model warm between requests
Ingestion progress UI:
- GET /api/library/{doc_id}/status now returns vec_count from page_vecs_meta
- DocumentCard.vue polls status every 3 s while processing and shows
two-phase progress: indeterminate animation during extraction,
determinate "Embedding N/M pages" bar once vectors start landing
Other:
- Chat feedback endpoint + thumbs up/down UI (FeedbackButton.vue)
- EPUB ingest script (ingest_epub.py) with heading-based chunking
- migration 002: chat_feedback table
- README.md with setup and feature overview
This commit is contained in:
parent
be7a076f34
commit
e52bdb5128
29 changed files with 2301 additions and 112 deletions
|
|
@ -10,7 +10,9 @@ PAGEPIPER_BOOKS_DIR=/devl/pagepiper-cloud-data/books
|
|||
PAGEPIPER_OLLAMA_URL=
|
||||
|
||||
# Embedding and chat model selection (only used when PAGEPIPER_OLLAMA_URL is set)
|
||||
PAGEPIPER_EMBED_MODEL=nomic-embed-text
|
||||
# mxbai-embed-large (1024-dim) is recommended; nomic-embed-text uses 768-dim
|
||||
PAGEPIPER_EMBED_MODEL=mxbai-embed-large
|
||||
PAGEPIPER_EMBED_DIMS=1024
|
||||
PAGEPIPER_CHAT_MODEL=mistral:7b
|
||||
|
||||
# Heimdall license server (optional — for per-user tier validation)
|
||||
|
|
@ -20,3 +22,17 @@ HEIMDALL_ADMIN_TOKEN=
|
|||
# cf-orch streaming proxy — coordinator product key
|
||||
# Must match COORDINATOR_PRODUCT_KEYS["pagepiper"] in cf-orch.env on the coordinator
|
||||
COORDINATOR_PAGEPIPER_KEY=
|
||||
|
||||
# cf-orch coordinator URL — routes chat/embed calls through managed GPU allocation
|
||||
# CF_LICENSE_KEY is the auth token sent to the coordinator (same value as COORDINATOR_PAGEPIPER_KEY)
|
||||
# Leave CF_ORCH_URL blank to skip allocation and hit PAGEPIPER_OLLAMA_URL directly
|
||||
CF_ORCH_URL=
|
||||
CF_LICENSE_KEY=
|
||||
CF_APP_NAME=pagepiper
|
||||
|
||||
# Forgejo API token — enables in-app feedback button (files issues to Circuit-Forge/pagepiper)
|
||||
FORGEJO_API_TOKEN=
|
||||
|
||||
# Enable thumbs up/down on chat answers (stores retrieval quality signals locally)
|
||||
# Off by default — opt in when you want to collect correction data
|
||||
# PAGEPIPER_CHAT_FEEDBACK=true
|
||||
|
|
|
|||
|
|
@ -10,3 +10,11 @@ PAGEPIPER_DATA_DIR=data
|
|||
# PAGEPIPER_OLLAMA_URL=http://localhost:11434
|
||||
# PAGEPIPER_CHAT_MODEL=mistral:7b
|
||||
# PAGEPIPER_EMBED_MODEL=nomic-embed-text
|
||||
|
||||
# Forgejo API token — enables the in-app feedback button (files Forgejo issues)
|
||||
# Create a token at https://git.opensourcesolarpunk.com/user/settings/applications
|
||||
# FORGEJO_API_TOKEN=
|
||||
|
||||
# Enable thumbs up/down on chat answers (stores retrieval quality signals locally)
|
||||
# Off by default — opt in when you want to collect correction data
|
||||
# PAGEPIPER_CHAT_FEEDBACK=true
|
||||
|
|
|
|||
197
README.md
Normal file
197
README.md
Normal file
|
|
@ -0,0 +1,197 @@
|
|||
# Pagepiper
|
||||
|
||||
**v0.1.0** | Self-hosted PDF and EPUB search for your personal library
|
||||
|
||||
Pagepiper lets you drop PDFs and EPUBs into a library, index them, and search across the full text. With [Ollama](https://ollama.com) configured, you also get hybrid vector search and an LLM (large language model) chat interface that cites specific page numbers when it answers.
|
||||
|
||||
Built for TTRPG (tabletop roleplaying game) players tired of ctrl-F'ing through Pathfinder core rulebooks. Works equally well for fan fiction EPUB collections, AO3 exports, and any personal document library.
|
||||
|
||||
Try it: [pagepiper.circuitforge.tech](https://pagepiper.circuitforge.tech)
|
||||
|
||||
---
|
||||
|
||||
## Features
|
||||
|
||||
| Feature | Free tier | Paid (BYOK) |
|
||||
|---------|-----------|-------------|
|
||||
| PDF and EPUB upload via browser drag-and-drop | Yes | Yes |
|
||||
| Directory scan for existing files | Yes | Yes |
|
||||
| BM25 full-text search (no LLM required) | Yes | Yes |
|
||||
| Unlimited local ingestion | Yes | Yes |
|
||||
| Hybrid BM25 + k-NN vector search | No | Yes (local Ollama) |
|
||||
| LLM chat with page-level citations | No | Yes (local Ollama) |
|
||||
| Thumbs up / down feedback on answers | No | Yes |
|
||||
|
||||
BYOK (bring your own key) means you supply your own Ollama instance. No cloud API keys, no usage billing.
|
||||
|
||||
**BM25** (Best Match 25) is a keyword ranking algorithm. It works without any LLM and runs entirely inside the Docker container. **k-NN** (k-nearest neighbor) vector search uses embeddings to find passages that are semantically similar to your question, even when the exact words don't match.
|
||||
|
||||
---
|
||||
|
||||
## Tech Stack
|
||||
|
||||
- **Backend:** FastAPI + SQLite (BM25 via custom BM25Index, vectors via sqlite-vec)
|
||||
- **Frontend:** Vue 3 SPA served by nginx
|
||||
- **Embedding model:** `nomic-embed-text` via Ollama (1024-dim, optional)
|
||||
- **Chat LLM:** `mistral:7b` via Ollama (optional, any Ollama model works)
|
||||
- **Deployment:** Docker Compose
|
||||
|
||||
---
|
||||
|
||||
## Quick Start (Self-Hosting)
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- [Docker](https://docs.docker.com/get-docker/) and Docker Compose
|
||||
- PDFs or EPUBs you want to search
|
||||
- Optional: [Ollama](https://ollama.com) for semantic search and RAG (retrieval-augmented generation) chat
|
||||
|
||||
### 1. Clone the repo
|
||||
|
||||
```bash
|
||||
git clone https://git.opensourcesolarpunk.com/Circuit-Forge/pagepiper
|
||||
cd pagepiper
|
||||
```
|
||||
|
||||
### 2. Configure
|
||||
|
||||
```bash
|
||||
cp .env.example .env
|
||||
```
|
||||
|
||||
Open `.env` and set your paths:
|
||||
|
||||
```dotenv
|
||||
# Directory to scan for PDFs/EPUBs (used by the "Scan" button in the UI)
|
||||
PAGEPIPER_BOOKS_DIR=/path/to/your/pdfs
|
||||
|
||||
# Where Pagepiper stores its SQLite index and uploaded files
|
||||
PAGEPIPER_DATA_DIR=data
|
||||
```
|
||||
|
||||
To unlock hybrid search and LLM chat, uncomment and set the Ollama block:
|
||||
|
||||
```dotenv
|
||||
PAGEPIPER_OLLAMA_URL=http://localhost:11434
|
||||
PAGEPIPER_CHAT_MODEL=mistral:7b
|
||||
PAGEPIPER_EMBED_MODEL=nomic-embed-text
|
||||
```
|
||||
|
||||
### 3. Start
|
||||
|
||||
```bash
|
||||
./manage.sh start
|
||||
```
|
||||
|
||||
Open [http://localhost:8521](http://localhost:8521).
|
||||
|
||||
### 4. Add documents
|
||||
|
||||
Two ways to add files:
|
||||
|
||||
**Upload via browser** (easiest for small collections): Click **Upload** in the Library view and select a PDF or EPUB. The file saves to `data/uploads/` and begins indexing automatically.
|
||||
|
||||
**Scan a directory** (best for large collections): Set `PAGEPIPER_BOOKS_DIR` in your `.env` to a folder of PDFs/EPUBs, then click **Scan** in the Library view. Pagepiper finds all files recursively and queues them for indexing.
|
||||
|
||||
### 5. Search and chat
|
||||
|
||||
Switch to the **Chat** tab and ask questions. On the free tier, BM25 keyword search returns matching passages. With Ollama configured, you get semantic search and an LLM-generated answer with page-number citations.
|
||||
|
||||
---
|
||||
|
||||
## Ollama Setup (optional)
|
||||
|
||||
Install Ollama from [ollama.com](https://ollama.com), then pull the models:
|
||||
|
||||
```bash
|
||||
ollama pull mistral:7b
|
||||
ollama pull nomic-embed-text
|
||||
```
|
||||
|
||||
On a headless Linux server, make Ollama listen on all interfaces so the Docker container can reach it:
|
||||
|
||||
```bash
|
||||
OLLAMA_HOST=0.0.0.0 ollama serve
|
||||
```
|
||||
|
||||
On Docker Desktop (Linux or Mac), `host.docker.internal` resolves automatically. No extra network config needed.
|
||||
|
||||
---
|
||||
|
||||
## Environment Variables
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `PAGEPIPER_BOOKS_DIR` | `./books` | Host directory to scan for PDFs and EPUBs |
|
||||
| `PAGEPIPER_DATA_DIR` | `./data` | SQLite index and uploaded files live here |
|
||||
| `PAGEPIPER_OLLAMA_URL` | *(unset)* | Ollama base URL; leave blank for BM25-only mode |
|
||||
| `PAGEPIPER_EMBED_MODEL` | `nomic-embed-text` | Ollama embedding model (1024-dim default) |
|
||||
| `PAGEPIPER_EMBED_DIMS` | `1024` | Must match the embedding model's output dimensions |
|
||||
| `PAGEPIPER_CHAT_MODEL` | `mistral:7b` | Ollama chat model; any Ollama model name works |
|
||||
| `PAGEPIPER_CHAT_FEEDBACK` | *(unset)* | Set to `true` to enable thumbs up/down on chat answers |
|
||||
|
||||
---
|
||||
|
||||
## Management
|
||||
|
||||
```bash
|
||||
./manage.sh start # Build and start (dev)
|
||||
./manage.sh stop # Stop
|
||||
./manage.sh restart # Restart
|
||||
./manage.sh status # Show container status
|
||||
./manage.sh logs [svc] # Tail logs (default: all services; pass 'api' or 'web' to filter)
|
||||
./manage.sh open # Open the UI in your browser
|
||||
./manage.sh build # Rebuild images without cache
|
||||
|
||||
./manage.sh cloud:start # Start the cloud managed instance (port 8533)
|
||||
./manage.sh cloud:stop
|
||||
./manage.sh cloud:restart
|
||||
./manage.sh cloud:status
|
||||
./manage.sh cloud:logs [svc]
|
||||
./manage.sh cloud:build
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Cloud Managed Instance
|
||||
|
||||
The cloud deployment runs at [pagepiper.circuitforge.tech](https://pagepiper.circuitforge.tech) and at `menagerie.circuitforge.tech/pagepiper`. It uses `compose.cloud.yml` with LLM inference routed through the cf-orch coordinator.
|
||||
|
||||
To run your own cloud-style deployment:
|
||||
|
||||
```bash
|
||||
cp .env.cloud.example .env
|
||||
# Edit .env: set PAGEPIPER_OLLAMA_URL and data paths
|
||||
./manage.sh cloud:start
|
||||
```
|
||||
|
||||
Cloud instance listens on port 8533. The API is internal-only; nginx proxies `/api/` to the backend.
|
||||
|
||||
---
|
||||
|
||||
## Data and Backups
|
||||
|
||||
The `data/` directory contains the SQLite index database and all uploaded files. Back it up to preserve your index. Pagepiper indexes documents at ingest time. If you modify or replace a source file, use the re-index button on the document card to rebuild its entry.
|
||||
|
||||
Large PDFs (hundreds of pages) can take a few minutes to index. The status badge on the document card updates as indexing progresses.
|
||||
|
||||
---
|
||||
|
||||
## Licensing
|
||||
|
||||
Pagepiper uses a split license:
|
||||
|
||||
- **MIT:** BM25 full-text search, document library management, ingest pipeline, EPUB support
|
||||
- **BSL 1.1:** Hybrid vector search (embedding + k-NN), RAG chat, LLM integration
|
||||
|
||||
BSL 1.1 is free for personal non-commercial self-hosting. SaaS re-hosting or commercial redistribution requires a license from CircuitForge. BSL 1.1 converts to MIT after four years.
|
||||
|
||||
License keys: [circuitforge.tech](https://circuitforge.tech)
|
||||
|
||||
---
|
||||
|
||||
## Contributing
|
||||
|
||||
Issues and PRs welcome at [git.opensourcesolarpunk.com/Circuit-Forge/pagepiper](https://git.opensourcesolarpunk.com/Circuit-Forge/pagepiper).
|
||||
|
||||
The ingest pipeline and BM25 index are MIT-licensed. If you build a better PDF parser or add support for additional formats (CBZ, MOBI, etc.), the community benefits directly.
|
||||
|
|
@ -29,7 +29,7 @@ class ChatRequest(BaseModel):
|
|||
message: str
|
||||
history: list[ChatTurn] = []
|
||||
doc_ids: list[str] | None = None
|
||||
top_k: int = 5
|
||||
top_k: int = 10
|
||||
|
||||
|
||||
class ChatResponse(BaseModel):
|
||||
|
|
@ -37,6 +37,13 @@ class ChatResponse(BaseModel):
|
|||
citations: list[dict]
|
||||
|
||||
|
||||
class ChatFeedbackRequest(BaseModel):
|
||||
rating: int # 1 = thumbs up, -1 = thumbs down
|
||||
question: str = ""
|
||||
answer: str = ""
|
||||
doc_ids: list[str] = []
|
||||
|
||||
|
||||
def _get_llm_router():
|
||||
"""Return LLMRouter if Ollama configured, else None."""
|
||||
from app.config import get_llm_config
|
||||
|
|
@ -125,3 +132,31 @@ def chat(req: ChatRequest) -> ChatResponse:
|
|||
for c in result.citations
|
||||
],
|
||||
)
|
||||
|
||||
|
||||
@router.get("/feedback/status")
|
||||
def chat_feedback_status() -> dict:
|
||||
enabled = os.environ.get("PAGEPIPER_CHAT_FEEDBACK", "").lower() in ("1", "true", "yes")
|
||||
return {"enabled": enabled}
|
||||
|
||||
|
||||
@router.post("/feedback")
|
||||
def submit_chat_feedback(req: ChatFeedbackRequest) -> dict:
|
||||
import json
|
||||
import sqlite3
|
||||
|
||||
if req.rating not in (1, -1):
|
||||
from fastapi import HTTPException
|
||||
raise HTTPException(status_code=422, detail="rating must be 1 or -1")
|
||||
|
||||
db_path = _get_db_path()
|
||||
con = sqlite3.connect(db_path)
|
||||
try:
|
||||
con.execute(
|
||||
"INSERT INTO chat_feedback (rating, question, answer, doc_ids) VALUES (?, ?, ?, ?)",
|
||||
(req.rating, req.question[:2000], req.answer[:4000], json.dumps(req.doc_ids)),
|
||||
)
|
||||
con.commit()
|
||||
finally:
|
||||
con.close()
|
||||
return {"ok": True}
|
||||
|
|
|
|||
7
app/api/feedback.py
Normal file
7
app/api/feedback.py
Normal file
|
|
@ -0,0 +1,7 @@
|
|||
"""Feedback router — provided by circuitforge-core."""
|
||||
from circuitforge_core.api import make_feedback_router
|
||||
|
||||
router = make_feedback_router(
|
||||
repo="Circuit-Forge/pagepiper",
|
||||
product="pagepiper",
|
||||
)
|
||||
88
app/api/feedback_attach.py
Normal file
88
app/api/feedback_attach.py
Normal file
|
|
@ -0,0 +1,88 @@
|
|||
"""Screenshot attachment endpoint for in-app feedback.
|
||||
|
||||
After the cf-core feedback router creates a Forgejo issue, the frontend
|
||||
can call POST /feedback/attach to upload a screenshot as a comment on that issue.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import base64
|
||||
import os
|
||||
|
||||
import requests
|
||||
from fastapi import APIRouter, HTTPException
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
router = APIRouter()
|
||||
|
||||
_FORGEJO_BASE = os.environ.get(
|
||||
"FORGEJO_API_URL", "https://git.opensourcesolarpunk.com/api/v1"
|
||||
)
|
||||
_REPO = "Circuit-Forge/pagepiper"
|
||||
_MAX_BYTES = 5 * 1024 * 1024
|
||||
|
||||
|
||||
class AttachRequest(BaseModel):
|
||||
issue_number: int
|
||||
filename: str = Field(default="screenshot.png", max_length=80)
|
||||
image_b64: str # data URI or raw base64
|
||||
|
||||
|
||||
class AttachResponse(BaseModel):
|
||||
comment_url: str
|
||||
|
||||
|
||||
def _forgejo_headers() -> dict[str, str]:
|
||||
token = os.environ.get("FORGEJO_API_TOKEN", "")
|
||||
return {"Authorization": f"token {token}"}
|
||||
|
||||
|
||||
def _decode_image(image_b64: str) -> tuple[bytes, str]:
|
||||
if image_b64.startswith("data:"):
|
||||
header, _, data = image_b64.partition(",")
|
||||
mime = header.split(";")[0].split(":")[1] if ":" in header else "image/png"
|
||||
else:
|
||||
data = image_b64
|
||||
mime = "image/png"
|
||||
return base64.b64decode(data), mime
|
||||
|
||||
|
||||
@router.post("/attach", response_model=AttachResponse)
|
||||
def attach_screenshot(payload: AttachRequest) -> AttachResponse:
|
||||
token = os.environ.get("FORGEJO_API_TOKEN", "")
|
||||
if not token:
|
||||
raise HTTPException(status_code=503, detail="Feedback not configured.")
|
||||
|
||||
raw_bytes, mime = _decode_image(payload.image_b64)
|
||||
if len(raw_bytes) > _MAX_BYTES:
|
||||
raise HTTPException(
|
||||
status_code=413,
|
||||
detail=f"Screenshot exceeds 5 MB limit ({len(raw_bytes) // 1024} KB received).",
|
||||
)
|
||||
|
||||
asset_resp = requests.post(
|
||||
f"{_FORGEJO_BASE}/repos/{_REPO}/issues/{payload.issue_number}/assets",
|
||||
headers=_forgejo_headers(),
|
||||
files={"attachment": (payload.filename, raw_bytes, mime)},
|
||||
timeout=20,
|
||||
)
|
||||
if not asset_resp.ok:
|
||||
raise HTTPException(
|
||||
status_code=502,
|
||||
detail=f"Forgejo asset upload failed: {asset_resp.text[:200]}",
|
||||
)
|
||||
|
||||
asset_url = asset_resp.json().get("browser_download_url", "")
|
||||
comment_body = f"**Screenshot attached by reporter:**\n\n"
|
||||
comment_resp = requests.post(
|
||||
f"{_FORGEJO_BASE}/repos/{_REPO}/issues/{payload.issue_number}/comments",
|
||||
headers={**_forgejo_headers(), "Content-Type": "application/json"},
|
||||
json={"body": comment_body},
|
||||
timeout=15,
|
||||
)
|
||||
if not comment_resp.ok:
|
||||
raise HTTPException(
|
||||
status_code=502,
|
||||
detail=f"Forgejo comment failed: {comment_resp.text[:200]}",
|
||||
)
|
||||
|
||||
return AttachResponse(comment_url=comment_resp.json().get("html_url", ""))
|
||||
|
|
@ -12,11 +12,13 @@ import uuid
|
|||
from pathlib import Path
|
||||
from typing import Callable
|
||||
|
||||
from fastapi import APIRouter, BackgroundTasks, Depends, HTTPException
|
||||
from fastapi import APIRouter, BackgroundTasks, Depends, HTTPException, UploadFile
|
||||
|
||||
from app.config import WATCH_DIR, DB_PATH, VEC_DB_PATH
|
||||
from app.config import WATCH_DIR, DB_PATH, VEC_DB_PATH, DATA_DIR
|
||||
from app.deps import get_db
|
||||
|
||||
_MAX_UPLOAD_BYTES = 200 * 1024 * 1024 # 200 MB
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
router = APIRouter(prefix="/api/library", tags=["library"])
|
||||
|
||||
|
|
@ -24,15 +26,31 @@ router = APIRouter(prefix="/api/library", tags=["library"])
|
|||
_mark_bm25_dirty: Callable[[], None] | None = None
|
||||
|
||||
|
||||
_INGEST_TASKS = {
|
||||
".pdf": "pagepiper/ingest_pdf",
|
||||
".epub": "pagepiper/ingest_epub",
|
||||
}
|
||||
|
||||
_INGEST_RUNNERS = {
|
||||
".pdf": "scripts.ingest_pdf",
|
||||
".epub": "scripts.ingest_epub",
|
||||
}
|
||||
|
||||
|
||||
def _dispatch_ingest(
|
||||
doc_id: str,
|
||||
file_path: str,
|
||||
background_tasks: BackgroundTasks,
|
||||
) -> str:
|
||||
"""Dispatch an ingest task. Tries cf-orch; falls back to BackgroundTasks."""
|
||||
import importlib
|
||||
import os as _os
|
||||
from pathlib import Path as _Path
|
||||
|
||||
suffix = _Path(file_path).suffix.lower()
|
||||
task_name = _INGEST_TASKS.get(suffix, "pagepiper/ingest_pdf")
|
||||
runner_module = _INGEST_RUNNERS.get(suffix, "scripts.ingest_pdf")
|
||||
|
||||
# Read lazily so test fixtures (monkeypatch.setenv) take effect
|
||||
_data_dir = _Path(_os.environ.get("PAGEPIPER_DATA_DIR", "data"))
|
||||
task_id = str(uuid.uuid4())
|
||||
|
|
@ -45,11 +63,11 @@ def _dispatch_ingest(
|
|||
|
||||
try:
|
||||
from circuitforge_core.tasks import dispatch_task # type: ignore[import]
|
||||
task_id = dispatch_task(caller="pagepiper/ingest_pdf", args=args)
|
||||
task_id = dispatch_task(caller=task_name, args=args)
|
||||
logger.info("Dispatched cf-orch ingest task %s for doc %s", task_id, doc_id)
|
||||
except Exception:
|
||||
from scripts.ingest_pdf import run as run_ingest
|
||||
background_tasks.add_task(_run_ingest_background, run_ingest, args, task_id)
|
||||
mod = importlib.import_module(runner_module)
|
||||
background_tasks.add_task(_run_ingest_background, mod.run, args, task_id)
|
||||
logger.info(
|
||||
"cf-orch unavailable — running ingest in background thread (task %s)", task_id
|
||||
)
|
||||
|
|
@ -89,7 +107,7 @@ def scan_library(
|
|||
if not watch.exists():
|
||||
raise HTTPException(status_code=404, detail=f"Watch directory not found: {watch}")
|
||||
|
||||
pdfs = list(watch.glob("**/*.pdf"))
|
||||
pdfs = list(watch.glob("**/*.pdf")) + list(watch.glob("**/*.epub"))
|
||||
queued = []
|
||||
|
||||
for pdf_path in pdfs:
|
||||
|
|
@ -156,7 +174,8 @@ def delete_document(
|
|||
# Remove embeddings from vector store
|
||||
try:
|
||||
from circuitforge_core.vector.sqlite_vec import LocalSQLiteVecStore # type: ignore[import]
|
||||
store = LocalSQLiteVecStore(db_path=VEC_DB_PATH, table="page_vecs", dimensions=768)
|
||||
from app.config import VEC_DIMENSIONS
|
||||
store = LocalSQLiteVecStore(db_path=VEC_DB_PATH, table="page_vecs", dimensions=VEC_DIMENSIONS)
|
||||
store.delete_where({"doc_id": doc_id})
|
||||
except Exception as exc:
|
||||
logger.warning("Could not remove vectors for doc %s: %s", doc_id, exc)
|
||||
|
|
@ -165,6 +184,20 @@ def delete_document(
|
|||
_mark_bm25_dirty()
|
||||
|
||||
|
||||
def _get_vec_count(doc_id: str) -> int:
|
||||
"""Return how many vectors have been stored for this doc. Returns 0 on any error."""
|
||||
try:
|
||||
conn = sqlite3.connect(VEC_DB_PATH)
|
||||
count = conn.execute(
|
||||
"SELECT COUNT(*) FROM page_vecs_meta WHERE json_extract(metadata, '$.doc_id') = ?",
|
||||
[doc_id],
|
||||
).fetchone()[0]
|
||||
conn.close()
|
||||
return int(count)
|
||||
except Exception:
|
||||
return 0
|
||||
|
||||
|
||||
@router.get("/{doc_id}/status")
|
||||
def document_status(
|
||||
doc_id: str,
|
||||
|
|
@ -176,4 +209,54 @@ def document_status(
|
|||
).fetchone()
|
||||
if not row:
|
||||
raise HTTPException(status_code=404, detail="Document not found")
|
||||
return dict(row)
|
||||
result = dict(row)
|
||||
result["vec_count"] = _get_vec_count(doc_id)
|
||||
return result
|
||||
|
||||
|
||||
@router.post("/upload", status_code=202)
|
||||
def upload_document(
|
||||
file: UploadFile,
|
||||
background_tasks: BackgroundTasks,
|
||||
db: sqlite3.Connection = Depends(get_db),
|
||||
) -> dict:
|
||||
"""Accept a PDF/EPUB upload, save to data/uploads/, and queue for indexing."""
|
||||
name = Path(file.filename or "").name
|
||||
suffix = Path(name).suffix.lower()
|
||||
if suffix not in _INGEST_TASKS:
|
||||
raise HTTPException(status_code=400, detail="Supported formats: PDF, EPUB")
|
||||
|
||||
content = file.file.read()
|
||||
if len(content) > _MAX_UPLOAD_BYTES:
|
||||
raise HTTPException(status_code=413, detail="File exceeds 200 MB limit")
|
||||
|
||||
upload_dir = DATA_DIR / "uploads"
|
||||
upload_dir.mkdir(parents=True, exist_ok=True)
|
||||
dest = upload_dir / name
|
||||
dest.write_bytes(content)
|
||||
|
||||
path_str = str(dest.resolve())
|
||||
existing = db.execute(
|
||||
"SELECT id, status FROM documents WHERE file_path = ?", [path_str]
|
||||
).fetchone()
|
||||
|
||||
if existing and existing["status"] == "ready":
|
||||
return {"doc_id": existing["id"], "task_id": None, "filename": name, "status": "already_indexed"}
|
||||
|
||||
if existing:
|
||||
doc_id = existing["id"]
|
||||
else:
|
||||
title = dest.stem.replace("_", " ").replace("-", " ").title()
|
||||
doc_id = db.execute(
|
||||
"INSERT INTO documents(title, file_path, status) VALUES (?,?,?) RETURNING id",
|
||||
[title, path_str, "pending"],
|
||||
).fetchone()[0]
|
||||
db.commit()
|
||||
|
||||
task_id = _dispatch_ingest(doc_id, path_str, background_tasks)
|
||||
db.execute(
|
||||
"UPDATE documents SET status='processing', task_id=? WHERE id=?",
|
||||
[task_id, doc_id],
|
||||
)
|
||||
db.commit()
|
||||
return {"doc_id": doc_id, "task_id": task_id, "filename": name, "status": "queued"}
|
||||
|
|
|
|||
|
|
@ -10,6 +10,7 @@ DATA_DIR.mkdir(parents=True, exist_ok=True)
|
|||
DB_PATH = str(DATA_DIR / "pagepiper.db")
|
||||
VEC_DB_PATH = str(DATA_DIR / "pagepiper_vecs.db")
|
||||
WATCH_DIR = Path(os.environ.get("PAGEPIPER_WATCH_DIR", "books"))
|
||||
VEC_DIMENSIONS = int(os.environ.get("PAGEPIPER_EMBED_DIMS", "1024"))
|
||||
|
||||
|
||||
def get_llm_config() -> dict | None:
|
||||
|
|
@ -19,17 +20,27 @@ def get_llm_config() -> dict | None:
|
|||
return None
|
||||
_clean = url.rstrip("/")
|
||||
_base_url = _clean if _clean.endswith("/v1") else _clean + "/v1"
|
||||
return {
|
||||
"fallback_order": ["ollama"],
|
||||
"backends": {
|
||||
"ollama": {
|
||||
chat_model = os.environ.get("PAGEPIPER_CHAT_MODEL", "mistral:7b")
|
||||
|
||||
backend: dict = {
|
||||
"type": "openai_compat",
|
||||
"base_url": _base_url,
|
||||
"model": os.environ.get("PAGEPIPER_CHAT_MODEL", "mistral:7b"),
|
||||
"embedding_model": os.environ.get(
|
||||
"PAGEPIPER_EMBED_MODEL", "nomic-embed-text"
|
||||
),
|
||||
"model": chat_model,
|
||||
"embedding_model": os.environ.get("PAGEPIPER_EMBED_MODEL", "nomic-embed-text"),
|
||||
"supports_images": False,
|
||||
}
|
||||
},
|
||||
|
||||
# Wire cf-orch allocation when coordinator is configured so the model stays warm
|
||||
# and cold-start latency doesn't cause chat timeouts.
|
||||
orch_url = os.environ.get("CF_ORCH_URL", "").strip()
|
||||
if orch_url:
|
||||
backend["cf_orch"] = {
|
||||
"service": "ollama",
|
||||
"model_candidates": [chat_model],
|
||||
"ttl_s": 3600,
|
||||
}
|
||||
|
||||
return {
|
||||
"fallback_order": ["ollama"],
|
||||
"backends": {"ollama": backend},
|
||||
}
|
||||
|
|
|
|||
|
|
@ -9,7 +9,7 @@ from app.config import DB_PATH
|
|||
|
||||
|
||||
def get_db() -> Generator[sqlite3.Connection, None, None]:
|
||||
conn = sqlite3.connect(DB_PATH)
|
||||
conn = sqlite3.connect(DB_PATH, check_same_thread=False)
|
||||
conn.execute("PRAGMA foreign_keys = ON")
|
||||
conn.execute("PRAGMA journal_mode = WAL")
|
||||
conn.row_factory = sqlite3.Row
|
||||
|
|
|
|||
92
app/main.py
92
app/main.py
|
|
@ -3,11 +3,15 @@
|
|||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
import sqlite3
|
||||
import threading
|
||||
from contextlib import asynccontextmanager
|
||||
|
||||
from fastapi import FastAPI
|
||||
|
||||
from app.config import DB_PATH
|
||||
from app.config import DB_PATH, VEC_DB_PATH, VEC_DIMENSIONS
|
||||
from app.services.bm25_index import BM25Index
|
||||
|
||||
logger = logging.getLogger("pagepiper")
|
||||
|
|
@ -21,9 +25,91 @@ def _apply_migrations() -> None:
|
|||
migrate(DB_PATH)
|
||||
|
||||
|
||||
def _reembed_docs(docs: list[tuple[str, str]], db_path: str, vec_db_path: str) -> None:
|
||||
"""Re-run full ingest for a list of (doc_id, file_path) sequentially."""
|
||||
for doc_id, file_path in docs:
|
||||
suffix = os.path.splitext(file_path)[1].lower()
|
||||
try:
|
||||
if suffix == ".epub":
|
||||
from scripts.ingest_epub import run
|
||||
else:
|
||||
from scripts.ingest_pdf import run
|
||||
logger.info("Auto re-embed: starting %s", os.path.basename(file_path))
|
||||
run(doc_id=doc_id, file_path=file_path, db_path=db_path, vec_db_path=vec_db_path)
|
||||
except Exception as exc:
|
||||
logger.error("Auto re-embed failed for doc %s: %s", doc_id[:8], exc)
|
||||
|
||||
|
||||
def _check_vec_schema(vec_db_path: str, expected_dims: int, db_path: str) -> None:
|
||||
"""Drop the vec DB if its stored dimension doesn't match config, then queue re-embed.
|
||||
|
||||
sqlite-vec bakes the embedding dimension into the virtual table DDL, so changing
|
||||
models requires dropping and recreating the whole file. Catches the mismatch at
|
||||
startup rather than surfacing it as an obscure OperationalError mid-request.
|
||||
"""
|
||||
if not os.path.exists(vec_db_path):
|
||||
return
|
||||
try:
|
||||
conn = sqlite3.connect(vec_db_path)
|
||||
row = conn.execute(
|
||||
"SELECT sql FROM sqlite_master WHERE name='page_vecs_vecs'"
|
||||
).fetchone()
|
||||
conn.close()
|
||||
except Exception as exc:
|
||||
logger.warning("Vec schema check could not read %s (non-fatal): %s", vec_db_path, exc)
|
||||
return
|
||||
|
||||
if not row:
|
||||
return # table not yet created — first embed will build it with the right dims
|
||||
|
||||
m = re.search(r'float\[(\d+)\]', row[0])
|
||||
if not m:
|
||||
return
|
||||
actual_dims = int(m.group(1))
|
||||
if actual_dims == expected_dims:
|
||||
return
|
||||
|
||||
logger.warning(
|
||||
"Vec DB dimension mismatch: stored=%d, configured=%d — dropping %s and queuing re-embed",
|
||||
actual_dims, expected_dims, vec_db_path,
|
||||
)
|
||||
try:
|
||||
os.remove(vec_db_path)
|
||||
except OSError as exc:
|
||||
logger.error(
|
||||
"Could not delete stale vec DB %s: %s — fix permissions and restart", vec_db_path, exc
|
||||
)
|
||||
return
|
||||
|
||||
# Collect all ready docs so we can rebuild their embeddings in the background.
|
||||
try:
|
||||
conn = sqlite3.connect(db_path)
|
||||
docs = conn.execute(
|
||||
"SELECT id, file_path FROM documents WHERE status='ready'"
|
||||
).fetchall()
|
||||
conn.close()
|
||||
except Exception as exc:
|
||||
logger.warning("Could not query documents for re-embed: %s", exc)
|
||||
return
|
||||
|
||||
if not docs:
|
||||
return
|
||||
|
||||
logger.info("Queuing re-embed for %d document(s) in background", len(docs))
|
||||
threading.Thread(
|
||||
target=_reembed_docs,
|
||||
args=(docs, db_path, vec_db_path),
|
||||
daemon=True,
|
||||
name="pagepiper-reembed",
|
||||
).start()
|
||||
|
||||
|
||||
@asynccontextmanager
|
||||
async def lifespan(app: FastAPI):
|
||||
_apply_migrations()
|
||||
embed_model = os.environ.get("PAGEPIPER_EMBED_MODEL", "nomic-embed-text")
|
||||
logger.info("Pagepiper starting — embed model: %s, dims: %d", embed_model, VEC_DIMENSIONS)
|
||||
_check_vec_schema(VEC_DB_PATH, VEC_DIMENSIONS, DB_PATH)
|
||||
_bm25.mark_dirty() # will rebuild on first search
|
||||
yield
|
||||
|
||||
|
|
@ -39,8 +125,12 @@ from app.api.library import router as library_router # noqa: E402
|
|||
from app.api.ingest import router as ingest_router # noqa: E402
|
||||
from app.api.search import router as search_router # noqa: E402
|
||||
from app.api.chat import router as chat_router # noqa: E402
|
||||
from app.api.feedback import router as feedback_router # noqa: E402
|
||||
from app.api.feedback_attach import router as feedback_attach_router # noqa: E402
|
||||
|
||||
app.include_router(library_router)
|
||||
app.include_router(ingest_router)
|
||||
app.include_router(search_router)
|
||||
app.include_router(chat_router)
|
||||
app.include_router(feedback_router, prefix="/api/v1/feedback")
|
||||
app.include_router(feedback_attach_router, prefix="/api/v1/feedback")
|
||||
|
|
|
|||
|
|
@ -8,6 +8,7 @@ BM25-only path is MIT and has no gate.
|
|||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import sqlite3
|
||||
from dataclasses import dataclass
|
||||
|
||||
from app.services.bm25_index import BM25Index
|
||||
|
|
@ -15,6 +16,62 @@ from app.services.bm25_index import BM25Index
|
|||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def _fetch_adjacent(
|
||||
hits: list["RetrievedChunk"],
|
||||
db_path: str,
|
||||
window: int = 1,
|
||||
) -> list["RetrievedChunk"]:
|
||||
"""Return chunks immediately before/after each hit that aren't already in the hit set.
|
||||
|
||||
Definitional passages often start mid-sentence because the EPUB/PDF chunk
|
||||
boundary fell mid-paragraph. Fetching the preceding chunk restores the subject
|
||||
so the LLM can understand 'them' / 'they' references correctly.
|
||||
"""
|
||||
if not hits:
|
||||
return []
|
||||
|
||||
existing_keys = {(c.doc_id, c.page_number) for c in hits}
|
||||
needed: dict[str, set[int]] = {}
|
||||
for c in hits:
|
||||
for delta in range(-window, window + 1):
|
||||
if delta == 0:
|
||||
continue
|
||||
adj_page = c.page_number + delta
|
||||
if adj_page > 0 and (c.doc_id, adj_page) not in existing_keys:
|
||||
needed.setdefault(c.doc_id, set()).add(adj_page)
|
||||
|
||||
if not needed:
|
||||
return []
|
||||
|
||||
extra: list[RetrievedChunk] = []
|
||||
try:
|
||||
conn = sqlite3.connect(db_path)
|
||||
conn.row_factory = sqlite3.Row
|
||||
for doc_id, pages in needed.items():
|
||||
placeholders = ",".join("?" * len(pages))
|
||||
rows = conn.execute(
|
||||
f"SELECT id, doc_id, page_number, text FROM page_chunks "
|
||||
f"WHERE doc_id=? AND page_number IN ({placeholders})",
|
||||
[doc_id] + sorted(pages),
|
||||
).fetchall()
|
||||
for row in rows:
|
||||
extra.append(
|
||||
RetrievedChunk(
|
||||
chunk_id=row["id"],
|
||||
doc_id=row["doc_id"],
|
||||
page_number=row["page_number"],
|
||||
text=row["text"],
|
||||
bm25_score=0.0,
|
||||
vector_score=None,
|
||||
)
|
||||
)
|
||||
conn.close()
|
||||
except Exception as exc:
|
||||
logger.warning("Context expansion query failed (non-fatal): %s", exc)
|
||||
|
||||
return extra
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class RetrievedChunk:
|
||||
"""A chunk returned by the retriever, with source scores."""
|
||||
|
|
@ -55,13 +112,23 @@ class Retriever:
|
|||
for r in self._bm25.query(query, top_k=top_k * 2, doc_ids=doc_ids)
|
||||
}
|
||||
|
||||
try:
|
||||
vec = llm.embed([query])[0]
|
||||
store = LocalSQLiteVecStore(db_path=vec_db_path, table="page_vecs", dimensions=768)
|
||||
filter_meta = {"doc_id": doc_ids[0]} if doc_ids and len(doc_ids) == 1 else None
|
||||
vec_hits = store.query(vec, top_k=top_k * 2, filter_metadata=filter_meta)
|
||||
except Exception as exc:
|
||||
logger.warning("Embed failed, falling back to BM25-only: %s", exc)
|
||||
return self._bm25_only(query, top_k, doc_ids, db_path)
|
||||
from app.config import VEC_DIMENSIONS
|
||||
store = LocalSQLiteVecStore(db_path=vec_db_path, table="page_vecs", dimensions=VEC_DIMENSIONS)
|
||||
|
||||
if doc_ids and len(doc_ids) > 1:
|
||||
vec_hits = [h for h in vec_hits if h.metadata.get("doc_id") in doc_ids]
|
||||
# sqlite-vec applies filter_metadata as a Python post-filter after fetching k
|
||||
# nearest globally. When the corpus spans many documents and only a subset is
|
||||
# selected, most of those k candidates are from non-target docs and get dropped,
|
||||
# leaving too few vector hits. Oversample heavily and filter in Python instead.
|
||||
if doc_ids:
|
||||
vec_candidates = store.query(vec, top_k=top_k * 20)
|
||||
vec_hits = [h for h in vec_candidates if h.metadata.get("doc_id") in doc_ids]
|
||||
else:
|
||||
vec_hits = store.query(vec, top_k=top_k * 2)
|
||||
|
||||
# Merge: BM25 hits take priority; vector hits fill in additional results
|
||||
merged: dict[str, RetrievedChunk] = {}
|
||||
|
|
@ -76,10 +143,10 @@ class Retriever:
|
|||
)
|
||||
for vh in vec_hits:
|
||||
# _chunks is the loaded list of dicts from BM25Index; no public accessor exists
|
||||
text = next((c["text"] for c in self._bm25._chunks if c["id"] == vh.id), "")
|
||||
if vh.id in merged:
|
||||
existing = merged[vh.id]
|
||||
merged[vh.id] = RetrievedChunk(
|
||||
text = next((c["text"] for c in self._bm25._chunks if c["id"] == vh.entry_id), "")
|
||||
if vh.entry_id in merged:
|
||||
existing = merged[vh.entry_id]
|
||||
merged[vh.entry_id] = RetrievedChunk(
|
||||
chunk_id=existing.chunk_id,
|
||||
doc_id=existing.doc_id,
|
||||
page_number=existing.page_number,
|
||||
|
|
@ -88,8 +155,8 @@ class Retriever:
|
|||
vector_score=vh.score,
|
||||
)
|
||||
else:
|
||||
merged[vh.id] = RetrievedChunk(
|
||||
chunk_id=vh.id,
|
||||
merged[vh.entry_id] = RetrievedChunk(
|
||||
chunk_id=vh.entry_id,
|
||||
doc_id=vh.metadata.get("doc_id", ""),
|
||||
page_number=int(vh.metadata.get("page_number", 0)),
|
||||
text=text,
|
||||
|
|
@ -103,14 +170,15 @@ class Retriever:
|
|||
vec = (1.0 / (1.0 + r.vector_score)) if r.vector_score is not None else 0.0
|
||||
return bm25 * 0.5 + vec * 0.5
|
||||
|
||||
ranked = sorted(merged.values(), key=_combined, reverse=True)
|
||||
return ranked[:top_k]
|
||||
ranked = sorted(merged.values(), key=_combined, reverse=True)[:top_k]
|
||||
adjacent = _fetch_adjacent(ranked, db_path)
|
||||
return ranked + adjacent
|
||||
|
||||
def _bm25_only(
|
||||
self, query: str, top_k: int, doc_ids: list[str] | None, db_path: str
|
||||
) -> list[RetrievedChunk]:
|
||||
self._bm25.ensure_fresh(db_path)
|
||||
return [
|
||||
hits = [
|
||||
RetrievedChunk(
|
||||
chunk_id=r.chunk_id,
|
||||
doc_id=r.doc_id,
|
||||
|
|
@ -121,3 +189,5 @@ class Retriever:
|
|||
)
|
||||
for r in self._bm25.query(query, top_k=top_k, doc_ids=doc_ids)
|
||||
]
|
||||
adjacent = _fetch_adjacent(hits, db_path)
|
||||
return hits + adjacent
|
||||
|
|
|
|||
|
|
@ -42,7 +42,9 @@ class Synthesizer:
|
|||
history: list[dict],
|
||||
chunks: list[RetrievedChunk],
|
||||
) -> SynthesisResult:
|
||||
context_parts = [f"[p.{c.page_number}]\n{c.text[:500]}" for c in chunks]
|
||||
# 1500 chars (~300 words) per chunk: enough to capture definitions that
|
||||
# appear mid-paragraph without blowing past a 32k-context model's limit.
|
||||
context_parts = [f"[p.{c.page_number}]\n{c.text[:1500]}" for c in chunks]
|
||||
context = "\n\n---\n\n".join(context_parts)
|
||||
prompt = f"Document excerpts:\n\n{context}\n\nQuestion: {message}"
|
||||
|
||||
|
|
@ -52,7 +54,7 @@ class Synthesizer:
|
|||
Citation(
|
||||
doc_id=c.doc_id,
|
||||
page_number=c.page_number,
|
||||
snippet=c.text[:200],
|
||||
snippet=c.text[:400],
|
||||
bm25_score=c.bm25_score,
|
||||
)
|
||||
for c in chunks
|
||||
|
|
|
|||
|
|
@ -20,6 +20,8 @@ services:
|
|||
# cf-orch: route LLM inference through coordinator for managed GPU access
|
||||
CF_ORCH_URL: http://host.docker.internal:7700
|
||||
CF_APP_NAME: pagepiper
|
||||
# CF_LICENSE_KEY is the auth token CFOrchClient sends to the coordinator
|
||||
CF_LICENSE_KEY: ${COORDINATOR_PAGEPIPER_KEY:-}
|
||||
COORDINATOR_URL: http://10.1.10.71:7700
|
||||
COORDINATOR_PAGEPIPER_KEY: ${COORDINATOR_PAGEPIPER_KEY:-}
|
||||
extra_hosts:
|
||||
|
|
|
|||
128
docs/index.md
128
docs/index.md
|
|
@ -1,6 +1,6 @@
|
|||
# Pagepiper
|
||||
|
||||
Self-hosted document search with BM25 full-text indexing and (with local Ollama) hybrid vector search and LLM-powered chat.
|
||||
Self-hosted document search with BM25 full-text indexing and (with local Ollama) hybrid vector search and LLM-powered chat. Supports PDF and EPUB files.
|
||||
|
||||
## Demo
|
||||
|
||||
|
|
@ -12,7 +12,7 @@ Try it: [pagepiper.circuitforge.tech](https://pagepiper.circuitforge.tech)
|
|||
|
||||

|
||||
|
||||
Scan your PDF directory to index documents. Each document shows page count and ingest status.
|
||||
Scan your PDF directory to index documents, or upload individual PDFs directly. Each document shows page count and ingest status.
|
||||
|
||||
### Chat
|
||||
|
||||
|
|
@ -20,25 +20,123 @@ Scan your PDF directory to index documents. Each document shows page count and i
|
|||
|
||||
Ask questions across your indexed documents. Results cite the source document and page number.
|
||||
|
||||
## Quick Start (Docker)
|
||||
|
||||
```bash
|
||||
git clone https://git.opensourcesolarpunk.com/Circuit-Forge/pagepiper
|
||||
cd pagepiper
|
||||
cp .env.example .env # set PAGEPIPER_DATA_DIR and PAGEPIPER_BOOKS_DIR
|
||||
docker compose up -d --build
|
||||
# open http://localhost:8521
|
||||
```
|
||||
|
||||
Place PDFs in your `PAGEPIPER_BOOKS_DIR` directory, then click "Scan for PDFs" in the Library view.
|
||||
|
||||
## Tiers
|
||||
|
||||
| Feature | Free | Paid (BYOK) |
|
||||
|---------|------|-------------|
|
||||
| BM25 full-text search | Yes | Yes |
|
||||
| PDF and EPUB upload via browser | Yes | Yes |
|
||||
| Unlimited local ingestion | Yes | Yes |
|
||||
| Hybrid vector search | No | Yes (local Ollama) |
|
||||
| LLM chat over documents | No | Yes (local Ollama) |
|
||||
|
||||
Set `PAGEPIPER_OLLAMA_URL` in your `.env` to unlock the Paid tier with your own Ollama instance.
|
||||
BYOK (Bring Your Own Key) means you supply your own Ollama instance. No cloud API keys required.
|
||||
|
||||
---
|
||||
|
||||
## Self-Hosting Guide
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- [Docker](https://docs.docker.com/get-docker/) and Docker Compose
|
||||
- PDFs you want to search
|
||||
- Optional: [Ollama](https://ollama.com) running locally for semantic search and LLM chat
|
||||
|
||||
### Step 1: Get the code
|
||||
|
||||
```bash
|
||||
git clone https://git.opensourcesolarpunk.com/Circuit-Forge/pagepiper
|
||||
cd pagepiper
|
||||
```
|
||||
|
||||
### Step 2: Configure
|
||||
|
||||
```bash
|
||||
cp .env.example .env
|
||||
```
|
||||
|
||||
Open `.env` and set your directories:
|
||||
|
||||
```dotenv
|
||||
# Where pagepiper stores its index database
|
||||
PAGEPIPER_DATA_DIR=./data
|
||||
|
||||
# Directory to scan for PDFs (used by the "Scan for PDFs" button)
|
||||
# You can also upload individual PDFs via the web UI without setting this
|
||||
PAGEPIPER_BOOKS_DIR=/path/to/your/pdfs
|
||||
```
|
||||
|
||||
To unlock hybrid vector search and LLM chat, add your Ollama endpoint:
|
||||
|
||||
```dotenv
|
||||
PAGEPIPER_OLLAMA_URL=http://localhost:11434
|
||||
PAGEPIPER_CHAT_MODEL=mistral:7b
|
||||
PAGEPIPER_EMBED_MODEL=nomic-embed-text
|
||||
```
|
||||
|
||||
### Step 3: Start
|
||||
|
||||
```bash
|
||||
docker compose up -d --build
|
||||
```
|
||||
|
||||
Open [http://localhost:8521](http://localhost:8521) in your browser.
|
||||
|
||||
### Step 4: Add your PDFs
|
||||
|
||||
Two ways to add documents:
|
||||
|
||||
**Option A — Upload via browser** (easiest for small collections):
|
||||
|
||||
Click the **Upload PDF** button in the Library view and select a file. It saves to `data/uploads/` and begins indexing automatically.
|
||||
|
||||
**Option B — Mount a directory** (best for large collections):
|
||||
|
||||
Set `PAGEPIPER_BOOKS_DIR` in your `.env` to point at a folder of PDFs, then click **Scan for PDFs**. Pagepiper finds all `.pdf` files recursively and queues them for indexing.
|
||||
|
||||
### Step 5: Search
|
||||
|
||||
Switch to the **Chat** tab and ask questions about your documents. The Free tier uses BM25 keyword matching. With Ollama configured, you get semantic (vector) search and LLM-generated answers with page-level citations.
|
||||
|
||||
---
|
||||
|
||||
## Ollama Setup (optional)
|
||||
|
||||
Install Ollama from [ollama.com](https://ollama.com), then pull the models:
|
||||
|
||||
```bash
|
||||
ollama pull mistral:7b
|
||||
ollama pull nomic-embed-text
|
||||
```
|
||||
|
||||
Pagepiper's Docker container reaches Ollama at `host.docker.internal` — no extra network config needed on Linux/Mac with Docker Desktop. On a headless Linux server, make sure Ollama binds to `0.0.0.0`:
|
||||
|
||||
```bash
|
||||
OLLAMA_HOST=0.0.0.0 ollama serve
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Managing the instance
|
||||
|
||||
```bash
|
||||
# Check status
|
||||
docker compose ps
|
||||
|
||||
# View API logs
|
||||
docker compose logs -f api
|
||||
|
||||
# Stop
|
||||
docker compose down
|
||||
|
||||
# Rebuild after updates
|
||||
docker compose up -d --build
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Notes
|
||||
|
||||
- Pagepiper indexes PDFs at ingest time. Changes to the source file require a re-index (use the re-index button on the document card).
|
||||
- The `data/` directory contains the SQLite index database and any uploaded files. Back it up to preserve your index.
|
||||
- Large PDFs (hundreds of pages) can take a few minutes to index. Watch the status badge on the document card.
|
||||
|
|
|
|||
|
|
@ -14,6 +14,8 @@ dependencies:
|
|||
- pdfplumber>=0.11
|
||||
- pytesseract>=0.3
|
||||
- Pillow>=10.0
|
||||
- ebooklib>=0.18
|
||||
- beautifulsoup4>=4.12
|
||||
- sqlite-vec>=0.1
|
||||
- pytest>=8.0
|
||||
- pytest-asyncio>=0.23
|
||||
|
|
|
|||
9
migrations/002_chat_feedback.sql
Normal file
9
migrations/002_chat_feedback.sql
Normal file
|
|
@ -0,0 +1,9 @@
|
|||
-- chat answer thumbs up/down signals (local SQLite, always available)
|
||||
CREATE TABLE IF NOT EXISTS chat_feedback (
|
||||
id TEXT PRIMARY KEY DEFAULT (lower(hex(randomblob(16)))),
|
||||
rating INTEGER NOT NULL CHECK (rating IN (1, -1)),
|
||||
question TEXT NOT NULL DEFAULT '',
|
||||
answer TEXT NOT NULL DEFAULT '',
|
||||
doc_ids TEXT NOT NULL DEFAULT '[]',
|
||||
created_at TEXT NOT NULL DEFAULT (datetime('now'))
|
||||
);
|
||||
239
scripts/ingest_epub.py
Normal file
239
scripts/ingest_epub.py
Normal file
|
|
@ -0,0 +1,239 @@
|
|||
# scripts/ingest_epub.py
|
||||
"""
|
||||
cf-orch task: pagepiper/ingest_epub
|
||||
|
||||
Extracts text from an EPUB file, stores chapter chunks in SQLite, and (if Ollama is
|
||||
configured) generates embeddings and stores them in the sqlite-vec store.
|
||||
|
||||
Each EPUB chapter becomes one chunk (equivalent to a PDF page).
|
||||
|
||||
Entry point:
|
||||
python scripts/ingest_epub.py --doc-id X --file-path Y --db-path Z --vec-db-path W
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import os
|
||||
import sqlite3
|
||||
from dataclasses import dataclass
|
||||
from pathlib import Path
|
||||
|
||||
logger = logging.getLogger("pagepiper.ingest_epub")
|
||||
|
||||
EMBED_BATCH_SIZE = 64
|
||||
_WORDS_PER_CHUNK = 500 # target chunk size for word-count fallback
|
||||
|
||||
|
||||
@dataclass
|
||||
class _Chunk:
|
||||
page_number: int
|
||||
text: str
|
||||
source: str
|
||||
word_count: int
|
||||
|
||||
|
||||
def _paragraphs_from_soup(soup) -> list[str]:
|
||||
"""Extract non-trivial, artifact-free text lines from parsed HTML."""
|
||||
from scripts.text_clean import filter_paragraphs
|
||||
raw = soup.get_text(separator="\n", strip=True)
|
||||
return filter_paragraphs(raw.splitlines())
|
||||
|
||||
|
||||
def _chunks_from_paragraphs(paragraphs: list[str], start_num: int) -> list[_Chunk]:
|
||||
"""Accumulate paragraphs into ~_WORDS_PER_CHUNK-word chunks."""
|
||||
chunks: list[_Chunk] = []
|
||||
current: list[str] = []
|
||||
current_count = 0
|
||||
chunk_num = start_num
|
||||
|
||||
for para in paragraphs:
|
||||
words = para.split()
|
||||
if current_count + len(words) > _WORDS_PER_CHUNK and current:
|
||||
text = "\n".join(current)
|
||||
chunks.append(_Chunk(chunk_num, text, "text", current_count))
|
||||
chunk_num += 1
|
||||
current, current_count = [], 0
|
||||
current.append(para)
|
||||
current_count += len(words)
|
||||
|
||||
if current:
|
||||
text = "\n".join(current)
|
||||
chunks.append(_Chunk(chunk_num, text, "text", current_count))
|
||||
|
||||
return chunks
|
||||
|
||||
|
||||
def _extract_chunks(file_path: str) -> list[_Chunk]:
|
||||
import ebooklib
|
||||
from ebooklib import epub
|
||||
from bs4 import BeautifulSoup
|
||||
from scripts.text_clean import clean_line, is_artifact_line
|
||||
|
||||
book = epub.read_epub(file_path, options={"ignore_ncx": True})
|
||||
all_chunks: list[_Chunk] = []
|
||||
|
||||
for item in book.get_items_of_type(ebooklib.ITEM_DOCUMENT):
|
||||
soup = BeautifulSoup(item.get_content(), "html.parser")
|
||||
headings = soup.find_all(["h1", "h2", "h3", "h4"])
|
||||
|
||||
if len(headings) >= 2:
|
||||
# Heading-based split: one chunk per section
|
||||
current_parts: list[str] = []
|
||||
for elem in soup.find_all(["h1", "h2", "h3", "h4", "p", "li", "blockquote"]):
|
||||
if elem.name in ("h1", "h2", "h3", "h4"):
|
||||
if current_parts:
|
||||
text = "\n".join(current_parts).strip()
|
||||
if text:
|
||||
n = len(all_chunks) + 1
|
||||
all_chunks.append(_Chunk(n, text, "text", len(text.split())))
|
||||
current_parts = [elem.get_text(" ", strip=True)]
|
||||
else:
|
||||
t = clean_line(elem.get_text(" ", strip=True))
|
||||
if t and not is_artifact_line(t):
|
||||
current_parts.append(t)
|
||||
if current_parts:
|
||||
text = "\n".join(current_parts).strip()
|
||||
if text:
|
||||
n = len(all_chunks) + 1
|
||||
all_chunks.append(_Chunk(n, text, "text", len(text.split())))
|
||||
else:
|
||||
# Word-count fallback: accumulate paragraphs into ~500-word chunks
|
||||
paragraphs = _paragraphs_from_soup(soup)
|
||||
if paragraphs:
|
||||
all_chunks.extend(_chunks_from_paragraphs(paragraphs, len(all_chunks) + 1))
|
||||
|
||||
return all_chunks
|
||||
|
||||
|
||||
def _update_status(
|
||||
conn: sqlite3.Connection,
|
||||
doc_id: str,
|
||||
status: str,
|
||||
page_count: int | None = None,
|
||||
error_msg: str | None = None,
|
||||
) -> None:
|
||||
if page_count is not None:
|
||||
conn.execute(
|
||||
"UPDATE documents SET status=?, page_count=?, updated_at=datetime('now') WHERE id=?",
|
||||
[status, page_count, doc_id],
|
||||
)
|
||||
elif error_msg is not None:
|
||||
conn.execute(
|
||||
"UPDATE documents SET status=?, error_msg=?, updated_at=datetime('now') WHERE id=?",
|
||||
[status, error_msg, doc_id],
|
||||
)
|
||||
else:
|
||||
conn.execute(
|
||||
"UPDATE documents SET status=?, updated_at=datetime('now') WHERE id=?",
|
||||
[status, doc_id],
|
||||
)
|
||||
conn.commit()
|
||||
|
||||
|
||||
def run(doc_id: str, file_path: str, db_path: str, vec_db_path: str) -> None:
|
||||
"""Run the full ingest pipeline for one EPUB. Called by cf-orch or BackgroundTasks."""
|
||||
conn: sqlite3.Connection | None = None
|
||||
try:
|
||||
conn = sqlite3.connect(db_path, timeout=30)
|
||||
conn.execute("PRAGMA journal_mode = WAL")
|
||||
conn.execute("PRAGMA foreign_keys = ON")
|
||||
_update_status(conn, doc_id, "processing")
|
||||
|
||||
logger.info("Extracting chapters from %s", file_path)
|
||||
chunks = _extract_chunks(file_path)
|
||||
logger.info("Extracted %d chapters", len(chunks))
|
||||
|
||||
conn.execute("DELETE FROM page_chunks WHERE doc_id=?", [doc_id])
|
||||
chunk_rows: list[tuple[str, int, str]] = []
|
||||
for chunk in chunks:
|
||||
row = conn.execute(
|
||||
"""INSERT INTO page_chunks(doc_id, page_number, text, source, word_count)
|
||||
VALUES (?,?,?,?,?) RETURNING id""",
|
||||
[doc_id, chunk.page_number, chunk.text, chunk.source, chunk.word_count],
|
||||
).fetchone()
|
||||
chunk_rows.append((row[0], chunk.page_number, chunk.text))
|
||||
conn.commit()
|
||||
|
||||
# Embedding failure is non-fatal: document remains BM25-searchable.
|
||||
ollama_url = os.environ.get("PAGEPIPER_OLLAMA_URL", "").strip()
|
||||
if ollama_url and chunks:
|
||||
try:
|
||||
logger.info("Embedding %d chapters via Ollama at %s", len(chunks), ollama_url)
|
||||
from circuitforge_core.llm import LLMRouter
|
||||
from circuitforge_core.vector.sqlite_vec import LocalSQLiteVecStore
|
||||
|
||||
_clean = ollama_url.rstrip("/")
|
||||
base_url = _clean if _clean.endswith("/v1") else _clean + "/v1"
|
||||
router = LLMRouter({
|
||||
"fallback_order": ["ollama"],
|
||||
"backends": {
|
||||
"ollama": {
|
||||
"type": "openai_compat",
|
||||
"base_url": base_url,
|
||||
"model": os.environ.get("PAGEPIPER_CHAT_MODEL", "mistral:7b"),
|
||||
"embedding_model": os.environ.get(
|
||||
"PAGEPIPER_EMBED_MODEL", "nomic-embed-text"
|
||||
),
|
||||
"supports_images": False,
|
||||
}
|
||||
},
|
||||
})
|
||||
embed_dims = int(os.environ.get("PAGEPIPER_EMBED_DIMS", "1024"))
|
||||
vec_store = LocalSQLiteVecStore(
|
||||
db_path=vec_db_path, table="page_vecs", dimensions=embed_dims
|
||||
)
|
||||
vec_store.delete_where({"doc_id": doc_id})
|
||||
|
||||
texts = [text for _, _, text in chunk_rows]
|
||||
vectors: list[list[float]] = []
|
||||
for i in range(0, len(texts), EMBED_BATCH_SIZE):
|
||||
vectors.extend(router.embed(texts[i : i + EMBED_BATCH_SIZE]))
|
||||
|
||||
for (chunk_id, page_number, _), vector in zip(chunk_rows, vectors):
|
||||
vec_store.upsert(
|
||||
entry_id=chunk_id,
|
||||
vector=vector,
|
||||
metadata={"doc_id": doc_id, "page_number": page_number},
|
||||
)
|
||||
logger.info("Stored %d embeddings", len(vectors))
|
||||
except Exception as embed_exc:
|
||||
logger.warning(
|
||||
"Embedding skipped for doc %s — BM25 only (reason: %s)",
|
||||
doc_id, embed_exc,
|
||||
)
|
||||
|
||||
_update_status(conn, doc_id, "ready", page_count=len(chunks))
|
||||
logger.info("Ingest complete for doc %s (%d chapters)", doc_id, len(chunks))
|
||||
|
||||
except Exception as exc:
|
||||
logger.error("Ingest failed for doc %s: %s", doc_id, exc, exc_info=True)
|
||||
if conn is not None:
|
||||
try:
|
||||
_update_status(conn, doc_id, "error", error_msg=str(exc))
|
||||
except Exception:
|
||||
logger.warning("Could not write error status for doc %s", doc_id)
|
||||
raise
|
||||
finally:
|
||||
if conn is not None:
|
||||
conn.close()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
import argparse
|
||||
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Ingest an EPUB (cf-orch task entry point)"
|
||||
)
|
||||
parser.add_argument("--doc-id", required=True)
|
||||
parser.add_argument("--file-path", required=True)
|
||||
parser.add_argument("--db-path", required=True)
|
||||
parser.add_argument("--vec-db-path", required=True)
|
||||
a = parser.parse_args()
|
||||
run(
|
||||
doc_id=a.doc_id,
|
||||
file_path=a.file_path,
|
||||
db_path=a.db_path,
|
||||
vec_db_path=a.vec_db_path,
|
||||
)
|
||||
|
|
@ -52,7 +52,8 @@ def run(doc_id: str, file_path: str, db_path: str, vec_db_path: str) -> None:
|
|||
|
||||
conn: sqlite3.Connection | None = None
|
||||
try:
|
||||
conn = sqlite3.connect(db_path)
|
||||
conn = sqlite3.connect(db_path, timeout=30)
|
||||
conn.execute("PRAGMA journal_mode = WAL")
|
||||
conn.execute("PRAGMA foreign_keys = ON")
|
||||
_update_status(conn, doc_id, "processing")
|
||||
|
||||
|
|
@ -63,20 +64,26 @@ def run(doc_id: str, file_path: str, db_path: str, vec_db_path: str) -> None:
|
|||
logger.info("Extracted %d pages", len(chunks))
|
||||
|
||||
# Step 2: Store chunks (replace any existing for this doc)
|
||||
from scripts.text_clean import clean_paragraph
|
||||
conn.execute("DELETE FROM page_chunks WHERE doc_id=?", [doc_id])
|
||||
chunk_rows: list[tuple[str, int, str]] = []
|
||||
for chunk in chunks:
|
||||
cleaned_text = clean_paragraph(chunk.text)
|
||||
if not cleaned_text:
|
||||
continue
|
||||
row = conn.execute(
|
||||
"""INSERT INTO page_chunks(doc_id, page_number, text, source, word_count)
|
||||
VALUES (?,?,?,?,?) RETURNING id""",
|
||||
[doc_id, chunk.page_number, chunk.text, chunk.source, chunk.word_count],
|
||||
[doc_id, chunk.page_number, cleaned_text, chunk.source, len(cleaned_text.split())],
|
||||
).fetchone()
|
||||
chunk_rows.append((row[0], chunk.page_number, chunk.text))
|
||||
chunk_rows.append((row[0], chunk.page_number, cleaned_text))
|
||||
conn.commit()
|
||||
|
||||
# Step 3: Embed and store vectors if Ollama is configured (BYOK gate)
|
||||
# Embedding failure is non-fatal: document remains BM25-searchable.
|
||||
ollama_url = os.environ.get("PAGEPIPER_OLLAMA_URL", "").strip()
|
||||
if ollama_url and chunks:
|
||||
try:
|
||||
logger.info("Embedding %d pages via Ollama at %s", len(chunks), ollama_url)
|
||||
from circuitforge_core.llm import LLMRouter
|
||||
from circuitforge_core.vector.sqlite_vec import LocalSQLiteVecStore
|
||||
|
|
@ -97,8 +104,9 @@ def run(doc_id: str, file_path: str, db_path: str, vec_db_path: str) -> None:
|
|||
}
|
||||
},
|
||||
})
|
||||
embed_dims = int(os.environ.get("PAGEPIPER_EMBED_DIMS", "1024"))
|
||||
vec_store = LocalSQLiteVecStore(
|
||||
db_path=vec_db_path, table="page_vecs", dimensions=768
|
||||
db_path=vec_db_path, table="page_vecs", dimensions=embed_dims
|
||||
)
|
||||
# Remove old vectors before re-inserting. If embedding fails mid-way,
|
||||
# old vectors are gone but new ones are partial — re-ingest recovers.
|
||||
|
|
@ -111,11 +119,16 @@ def run(doc_id: str, file_path: str, db_path: str, vec_db_path: str) -> None:
|
|||
|
||||
for (chunk_id, page_number, _), vector in zip(chunk_rows, vectors):
|
||||
vec_store.upsert(
|
||||
id=chunk_id,
|
||||
entry_id=chunk_id,
|
||||
vector=vector,
|
||||
metadata={"doc_id": doc_id, "page_number": page_number},
|
||||
)
|
||||
logger.info("Stored %d embeddings", len(vectors))
|
||||
except Exception as embed_exc:
|
||||
logger.warning(
|
||||
"Embedding skipped for doc %s — BM25 only (reason: %s)",
|
||||
doc_id, embed_exc,
|
||||
)
|
||||
|
||||
_update_status(conn, doc_id, "ready", page_count=len(chunks))
|
||||
logger.info("Ingest complete for doc %s (%d pages)", doc_id, len(chunks))
|
||||
|
|
|
|||
72
scripts/text_clean.py
Normal file
72
scripts/text_clean.py
Normal file
|
|
@ -0,0 +1,72 @@
|
|||
# scripts/text_clean.py
|
||||
"""
|
||||
Shared text-cleaning utilities for ingest pipelines.
|
||||
|
||||
Removes boilerplate lines injected by ebook converters, piracy watermarks,
|
||||
and other non-content artifacts before chunks are stored or embedded.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import re
|
||||
|
||||
# Lines that match any of these patterns are dropped entirely.
|
||||
# Each pattern is matched against the stripped line (case-insensitive).
|
||||
_LINE_DROP_PATTERNS: list[re.Pattern] = [
|
||||
# ABC Amber converter family
|
||||
re.compile(r'generated by abc amber', re.IGNORECASE),
|
||||
re.compile(r'processtext\.com', re.IGNORECASE),
|
||||
# Calibre / sigil metadata lines
|
||||
re.compile(r'calibre \d+\.\d+', re.IGNORECASE),
|
||||
# Standalone URLs (line is just a URL, no surrounding prose)
|
||||
re.compile(r'^https?://\S+$'),
|
||||
# Common piracy / file-sharing watermarks
|
||||
re.compile(r'www\.\w+\.(com|net|org)/\S*book', re.IGNORECASE),
|
||||
re.compile(r'downloaded from', re.IGNORECASE),
|
||||
re.compile(r'scanned by', re.IGNORECASE),
|
||||
re.compile(r'provided by', re.IGNORECASE),
|
||||
# Page-number-only lines from PDF extraction (e.g. "- 42 -" or "42")
|
||||
re.compile(r'^\s*-?\s*\d{1,4}\s*-?\s*$'),
|
||||
]
|
||||
|
||||
# Inline substrings to strip from within a line before further processing.
|
||||
_INLINE_STRIP_PATTERNS: list[re.Pattern] = [
|
||||
re.compile(r'generated by abc amber \w+ converter,?\s*https?://\S*', re.IGNORECASE),
|
||||
re.compile(r'https?://www\.processtext\.com/\S*', re.IGNORECASE),
|
||||
]
|
||||
|
||||
|
||||
def is_artifact_line(line: str) -> bool:
|
||||
"""Return True if the line is a known conversion artifact and should be dropped."""
|
||||
stripped = line.strip()
|
||||
return any(p.search(stripped) for p in _LINE_DROP_PATTERNS)
|
||||
|
||||
|
||||
def clean_line(line: str) -> str:
|
||||
"""Strip inline converter artifacts from a line, returning the cleaned version."""
|
||||
for p in _INLINE_STRIP_PATTERNS:
|
||||
line = p.sub("", line)
|
||||
return line.strip()
|
||||
|
||||
|
||||
def clean_paragraph(text: str) -> str:
|
||||
"""Clean a multi-line paragraph: drop artifact lines, strip inline artifacts."""
|
||||
lines = []
|
||||
for line in text.splitlines():
|
||||
if is_artifact_line(line):
|
||||
continue
|
||||
cleaned = clean_line(line)
|
||||
if cleaned:
|
||||
lines.append(cleaned)
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def filter_paragraphs(paragraphs: list[str]) -> list[str]:
|
||||
"""Remove artifact lines from a list of paragraph strings."""
|
||||
result = []
|
||||
for para in paragraphs:
|
||||
if is_artifact_line(para):
|
||||
continue
|
||||
cleaned = clean_line(para)
|
||||
if cleaned and len(cleaned.split()) >= 4:
|
||||
result.append(cleaned)
|
||||
return result
|
||||
|
|
@ -30,8 +30,10 @@ def client(test_db, tmp_path, monkeypatch):
|
|||
from app.main import app, _bm25
|
||||
from app.deps import get_db
|
||||
|
||||
# Suppress migrations during tests — test_db fixture already applies the schema
|
||||
# Suppress startup side effects — test_db fixture already applies the schema,
|
||||
# and vec schema validation is tested separately in test_startup.py
|
||||
monkeypatch.setattr(_main_module, "_apply_migrations", lambda: None)
|
||||
monkeypatch.setattr(_main_module, "_check_vec_schema", lambda *a, **kw: None)
|
||||
|
||||
def override_db():
|
||||
conn = sqlite3.connect(test_db)
|
||||
|
|
|
|||
170
tests/test_startup.py
Normal file
170
tests/test_startup.py
Normal file
|
|
@ -0,0 +1,170 @@
|
|||
# tests/test_startup.py
|
||||
"""Tests for startup vec DB schema validation (_check_vec_schema)."""
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
import sqlite3
|
||||
import threading
|
||||
from unittest.mock import MagicMock, patch
|
||||
|
||||
import pytest
|
||||
|
||||
from app.main import _check_vec_schema, _reembed_docs
|
||||
|
||||
|
||||
def _make_vec_db(path: str, dims: int) -> None:
|
||||
"""Create a minimal sqlite-vec-style DB with the given dimension."""
|
||||
conn = sqlite3.connect(path)
|
||||
conn.execute("PRAGMA journal_mode=WAL")
|
||||
# Replicate the virtual table name used by LocalSQLiteVecStore
|
||||
conn.execute(f"CREATE TABLE page_vecs_vecs (embedding float[{dims}])")
|
||||
conn.execute(
|
||||
"INSERT INTO sqlite_master(type, name, tbl_name, sql) VALUES (?,?,?,?)"
|
||||
if False else ""
|
||||
)
|
||||
# Write a real sqlite_master entry via a virtual table workaround:
|
||||
# Easiest is to put the dimension marker directly in a metadata table.
|
||||
# But _check_vec_schema reads sqlite_master, so we need the real DDL there.
|
||||
conn.close()
|
||||
# sqlite_master is read-only — recreate using the real CREATE VIRTUAL TABLE path
|
||||
# by faking it via a regular table with the matching name pattern.
|
||||
conn2 = sqlite3.connect(path)
|
||||
conn2.execute("DROP TABLE IF EXISTS page_vecs_vecs")
|
||||
# Write a row that _check_vec_schema will parse via its regex
|
||||
conn2.execute(
|
||||
"CREATE TABLE _schema_hint (sql TEXT)"
|
||||
)
|
||||
conn2.execute(
|
||||
"INSERT INTO _schema_hint VALUES (?)",
|
||||
[f"CREATE VIRTUAL TABLE page_vecs_vecs USING vec0(embedding float[{dims}])"],
|
||||
)
|
||||
conn2.commit()
|
||||
conn2.close()
|
||||
|
||||
|
||||
def _make_real_vec_db(path: str, dims: int) -> None:
|
||||
"""Create a vec DB whose sqlite_master actually contains the dimension DDL."""
|
||||
import sqlite3 as _sq
|
||||
# We can't load sqlite-vec in tests, so simulate by writing sqlite_master directly
|
||||
# via a shadow table that _check_vec_schema reads.
|
||||
conn = _sq.connect(path)
|
||||
conn.execute(
|
||||
f"""CREATE TABLE page_vecs_vecs (
|
||||
embedding float[{dims}]
|
||||
)"""
|
||||
)
|
||||
conn.commit()
|
||||
conn.close()
|
||||
|
||||
|
||||
class TestCheckVecSchema:
|
||||
def test_no_file_is_noop(self, tmp_path):
|
||||
"""Missing vec DB should not raise."""
|
||||
_check_vec_schema(str(tmp_path / "missing.db"), 1024, str(tmp_path / "main.db"))
|
||||
|
||||
def test_matching_dims_keeps_file(self, tmp_path):
|
||||
"""Correct dimensions: vec DB must not be deleted."""
|
||||
vec_path = str(tmp_path / "vecs.db")
|
||||
conn = sqlite3.connect(vec_path)
|
||||
conn.execute("CREATE TABLE page_vecs_vecs (embedding float[1024])")
|
||||
conn.commit()
|
||||
conn.close()
|
||||
|
||||
_check_vec_schema(vec_path, 1024, str(tmp_path / "main.db"))
|
||||
|
||||
assert os.path.exists(vec_path), "Vec DB should not be deleted when dims match"
|
||||
|
||||
def test_mismatched_dims_deletes_file(self, tmp_path):
|
||||
"""Dimension mismatch: vec DB must be deleted."""
|
||||
vec_path = str(tmp_path / "vecs.db")
|
||||
conn = sqlite3.connect(vec_path)
|
||||
conn.execute("CREATE TABLE page_vecs_vecs (embedding float[768])")
|
||||
conn.commit()
|
||||
conn.close()
|
||||
|
||||
db_path = str(tmp_path / "main.db")
|
||||
_check_vec_schema(vec_path, 1024, db_path)
|
||||
|
||||
assert not os.path.exists(vec_path), "Vec DB should be deleted on dimension mismatch"
|
||||
|
||||
def test_mismatched_dims_queues_reembed(self, tmp_path):
|
||||
"""Dimension mismatch: re-embed thread must be started for ready docs."""
|
||||
vec_path = str(tmp_path / "vecs.db")
|
||||
conn = sqlite3.connect(vec_path)
|
||||
conn.execute("CREATE TABLE page_vecs_vecs (embedding float[768])")
|
||||
conn.commit()
|
||||
conn.close()
|
||||
|
||||
db_path = str(tmp_path / "main.db")
|
||||
schema = (
|
||||
"CREATE TABLE documents ("
|
||||
"id TEXT PRIMARY KEY, title TEXT, file_path TEXT, "
|
||||
"status TEXT, task_id TEXT, page_count INTEGER, "
|
||||
"error_msg TEXT, created_at TEXT, updated_at TEXT)"
|
||||
)
|
||||
main_conn = sqlite3.connect(db_path)
|
||||
main_conn.execute(schema)
|
||||
main_conn.execute(
|
||||
"INSERT INTO documents VALUES ('abc123', 'Book', '/tmp/book.pdf', 'ready', NULL, 10, NULL, '2026-01-01', '2026-01-01')"
|
||||
)
|
||||
main_conn.commit()
|
||||
main_conn.close()
|
||||
|
||||
started = []
|
||||
real_thread_start = threading.Thread.start
|
||||
|
||||
def _capture_start(self):
|
||||
started.append(self)
|
||||
# Don't actually run the re-embed to keep tests fast
|
||||
self.run = lambda: None
|
||||
real_thread_start(self)
|
||||
|
||||
with patch.object(threading.Thread, "start", _capture_start):
|
||||
_check_vec_schema(vec_path, 1024, db_path)
|
||||
|
||||
assert len(started) == 1, "Exactly one re-embed thread should be started"
|
||||
assert started[0].name == "pagepiper-reembed"
|
||||
|
||||
def test_no_ready_docs_skips_thread(self, tmp_path):
|
||||
"""Mismatch with no ready docs: no thread should be started."""
|
||||
vec_path = str(tmp_path / "vecs.db")
|
||||
conn = sqlite3.connect(vec_path)
|
||||
conn.execute("CREATE TABLE page_vecs_vecs (embedding float[768])")
|
||||
conn.commit()
|
||||
conn.close()
|
||||
|
||||
db_path = str(tmp_path / "main.db")
|
||||
schema = (
|
||||
"CREATE TABLE documents ("
|
||||
"id TEXT PRIMARY KEY, title TEXT, file_path TEXT, "
|
||||
"status TEXT, task_id TEXT, page_count INTEGER, "
|
||||
"error_msg TEXT, created_at TEXT, updated_at TEXT)"
|
||||
)
|
||||
main_conn = sqlite3.connect(db_path)
|
||||
main_conn.execute(schema)
|
||||
main_conn.commit()
|
||||
main_conn.close()
|
||||
|
||||
started = []
|
||||
with patch.object(threading.Thread, "start", lambda self: started.append(self)):
|
||||
_check_vec_schema(vec_path, 1024, db_path)
|
||||
|
||||
assert len(started) == 0
|
||||
|
||||
def test_empty_db_no_table_is_noop(self, tmp_path):
|
||||
"""Vec DB exists but has no page_vecs_vecs table yet: no deletion."""
|
||||
vec_path = str(tmp_path / "vecs.db")
|
||||
sqlite3.connect(vec_path).close() # create empty file
|
||||
|
||||
_check_vec_schema(vec_path, 1024, str(tmp_path / "main.db"))
|
||||
|
||||
assert os.path.exists(vec_path)
|
||||
|
||||
def test_corrupt_db_does_not_raise(self, tmp_path):
|
||||
"""Corrupt or unreadable vec DB must not propagate exceptions."""
|
||||
vec_path = str(tmp_path / "vecs.db")
|
||||
with open(vec_path, "w") as f:
|
||||
f.write("not a sqlite database")
|
||||
|
||||
_check_vec_schema(vec_path, 1024, str(tmp_path / "main.db"))
|
||||
# No assertion needed — just must not raise
|
||||
108
tests/test_text_clean.py
Normal file
108
tests/test_text_clean.py
Normal file
|
|
@ -0,0 +1,108 @@
|
|||
# tests/test_text_clean.py
|
||||
"""Tests for ebook artifact filtering in scripts/text_clean.py."""
|
||||
from __future__ import annotations
|
||||
|
||||
import pytest
|
||||
|
||||
from scripts.text_clean import (
|
||||
clean_line,
|
||||
clean_paragraph,
|
||||
filter_paragraphs,
|
||||
is_artifact_line,
|
||||
)
|
||||
|
||||
|
||||
class TestIsArtifactLine:
|
||||
def test_abc_amber_lit(self):
|
||||
assert is_artifact_line(
|
||||
"Generated by ABC Amber LIT Converter, http://www.processtext.com/abclit.html"
|
||||
)
|
||||
|
||||
def test_abc_amber_rtf(self):
|
||||
assert is_artifact_line("Generated by ABC Amber RTF Converter")
|
||||
|
||||
def test_processtext_url_only(self):
|
||||
assert is_artifact_line("http://www.processtext.com/abclit.html")
|
||||
|
||||
def test_standalone_url(self):
|
||||
assert is_artifact_line("https://www.example.com/book")
|
||||
|
||||
def test_page_number_only(self):
|
||||
assert is_artifact_line("42")
|
||||
assert is_artifact_line("- 42 -")
|
||||
assert is_artifact_line(" 7 ")
|
||||
|
||||
def test_downloaded_from(self):
|
||||
assert is_artifact_line("Downloaded from www.fictionsite.net")
|
||||
|
||||
def test_scanned_by(self):
|
||||
assert is_artifact_line("Scanned by SomeUser")
|
||||
|
||||
def test_normal_prose_not_artifact(self):
|
||||
assert not is_artifact_line(
|
||||
'"And what if food isn\'t the only reason Jagang is going to Anderith?"'
|
||||
)
|
||||
|
||||
def test_url_embedded_in_prose_not_dropped(self):
|
||||
# A URL inside a sentence is not a standalone-URL artifact line
|
||||
assert not is_artifact_line(
|
||||
"You can read more about this at https://example.com and continue."
|
||||
)
|
||||
|
||||
def test_short_page_header_not_dropped(self):
|
||||
# "Chapter 1" is not an artifact — 4-digit number check only drops bare numbers
|
||||
assert not is_artifact_line("Chapter 1")
|
||||
|
||||
|
||||
class TestCleanLine:
|
||||
def test_strips_inline_abc_amber(self):
|
||||
line = "Some prose. Generated by ABC Amber LIT Converter, http://www.processtext.com/abclit.html"
|
||||
result = clean_line(line)
|
||||
assert "ABC Amber" not in result
|
||||
assert "processtext" not in result
|
||||
assert "Some prose." in result
|
||||
|
||||
def test_passes_clean_line_unchanged(self):
|
||||
line = "He cocked an eyebrow and smiled."
|
||||
assert clean_line(line) == line
|
||||
|
||||
|
||||
class TestCleanParagraph:
|
||||
def test_drops_artifact_lines_from_paragraph(self):
|
||||
text = (
|
||||
"Generated by ABC Amber LIT Converter, http://www.processtext.com/abclit.html\n"
|
||||
'"And what if food isn\'t the only reason Jagang is going to Anderith?"\n'
|
||||
"He cocked an eyebrow."
|
||||
)
|
||||
result = clean_paragraph(text)
|
||||
assert "ABC Amber" not in result
|
||||
assert "Jagang" in result
|
||||
assert "eyebrow" in result
|
||||
|
||||
def test_all_artifact_paragraph_returns_empty(self):
|
||||
text = "Generated by ABC Amber LIT Converter\nhttp://www.processtext.com/abclit.html"
|
||||
assert clean_paragraph(text) == ""
|
||||
|
||||
def test_clean_paragraph_unchanged(self):
|
||||
text = "Richard raised his sword.\nThe magic surged through him."
|
||||
assert clean_paragraph(text) == text
|
||||
|
||||
|
||||
class TestFilterParagraphs:
|
||||
def test_drops_artifact_paragraphs(self):
|
||||
paras = [
|
||||
"Generated by ABC Amber LIT Converter, http://www.processtext.com/abclit.html",
|
||||
'"And what if food isn\'t the only reason Jagang is going to Anderith?"',
|
||||
"He cocked an eyebrow at the question.",
|
||||
]
|
||||
result = filter_paragraphs(paras)
|
||||
assert len(result) == 2
|
||||
assert all("ABC Amber" not in p for p in result)
|
||||
|
||||
def test_drops_short_lines_under_4_words(self):
|
||||
paras = ["Hi", "OK sure", "Valid sentence with enough words here."]
|
||||
result = filter_paragraphs(paras)
|
||||
assert result == ["Valid sentence with enough words here."]
|
||||
|
||||
def test_empty_input(self):
|
||||
assert filter_paragraphs([]) == []
|
||||
|
|
@ -6,11 +6,13 @@
|
|||
<RouterLink to="/chat" class="nav-link">Chat</RouterLink>
|
||||
</nav>
|
||||
<RouterView />
|
||||
<FeedbackButton />
|
||||
</div>
|
||||
</template>
|
||||
|
||||
<script setup lang="ts">
|
||||
import { RouterLink, RouterView } from "vue-router"
|
||||
import FeedbackButton from "@/components/FeedbackButton.vue"
|
||||
</script>
|
||||
|
||||
<style>
|
||||
|
|
|
|||
|
|
@ -37,6 +37,15 @@ export interface TaskStatus {
|
|||
error?: string
|
||||
}
|
||||
|
||||
export interface DocumentStatus {
|
||||
id: string
|
||||
status: "pending" | "processing" | "ready" | "error"
|
||||
task_id: string | null
|
||||
page_count: number | null
|
||||
vec_count: number
|
||||
error_msg: string | null
|
||||
}
|
||||
|
||||
export interface ChatMessage {
|
||||
role: string
|
||||
content: string
|
||||
|
|
@ -62,11 +71,23 @@ export const api = {
|
|||
const r = await fetch(`${BASE}/api/library/${docId}`, { method: "DELETE" })
|
||||
if (!r.ok) throw new Error(await r.text())
|
||||
},
|
||||
async uploadDocument(file: File): Promise<{ doc_id: string; task_id: string | null; filename: string; status: string }> {
|
||||
const form = new FormData()
|
||||
form.append("file", file)
|
||||
const r = await fetch(`${BASE}/api/library/upload`, { method: "POST", body: form })
|
||||
if (!r.ok) throw new Error(await r.text())
|
||||
return r.json()
|
||||
},
|
||||
async getTaskStatus(taskId: string): Promise<TaskStatus> {
|
||||
const r = await fetch(`${BASE}/api/ingest/${taskId}`)
|
||||
if (!r.ok) throw new Error(await r.text())
|
||||
return r.json()
|
||||
},
|
||||
async getDocumentStatus(docId: string): Promise<DocumentStatus> {
|
||||
const r = await fetch(`${BASE}/api/library/${docId}/status`)
|
||||
if (!r.ok) throw new Error(await r.text())
|
||||
return r.json()
|
||||
},
|
||||
async search(query: string, topK = 10, docIds?: string[]): Promise<SearchResult[]> {
|
||||
const r = await fetch(`${BASE}/api/search`, {
|
||||
method: "POST",
|
||||
|
|
@ -98,4 +119,21 @@ export const api = {
|
|||
}
|
||||
return r.json()
|
||||
},
|
||||
async chatFeedbackStatus(): Promise<{ enabled: boolean }> {
|
||||
const r = await fetch(`${BASE}/api/chat/feedback/status`)
|
||||
if (!r.ok) return { enabled: false }
|
||||
return r.json()
|
||||
},
|
||||
async submitChatFeedback(
|
||||
rating: 1 | -1,
|
||||
question: string,
|
||||
answer: string,
|
||||
docIds: string[],
|
||||
): Promise<void> {
|
||||
await fetch(`${BASE}/api/chat/feedback`, {
|
||||
method: "POST",
|
||||
headers: { "Content-Type": "application/json" },
|
||||
body: JSON.stringify({ rating, question, answer, doc_ids: docIds }),
|
||||
})
|
||||
},
|
||||
}
|
||||
|
|
|
|||
|
|
@ -1,18 +1,24 @@
|
|||
<template>
|
||||
<div class="doc-card" :class="`status-${doc.status}`">
|
||||
<div class="doc-status-badge">{{ doc.status }}</div>
|
||||
<div class="doc-card" :class="`status-${currentStatus}`">
|
||||
<div class="doc-status-badge" :class="`badge-${currentStatus}`">{{ currentStatus }}</div>
|
||||
<div class="doc-title">{{ doc.title }}</div>
|
||||
<div class="doc-meta" v-if="doc.page_count != null">{{ doc.page_count }} pages</div>
|
||||
<div class="doc-meta" v-if="displayPageCount != null">{{ displayPageCount }} pages</div>
|
||||
<div class="doc-meta path">{{ shortPath }}</div>
|
||||
|
||||
<IngestProgress
|
||||
v-if="doc.status === 'processing' && doc.task_id"
|
||||
:task-id="doc.task_id"
|
||||
@done="emit('refresh')"
|
||||
/>
|
||||
<div class="ingest-progress" v-if="isProcessing">
|
||||
<div class="progress-label">
|
||||
<span>{{ progressLabel }}</span>
|
||||
<span class="progress-pct" v-if="progressPct != null">{{ progressPct }}%</span>
|
||||
</div>
|
||||
<div class="progress-bar">
|
||||
<div class="progress-fill" :class="{ indeterminate: progressPct == null }" :style="progressPct != null ? { width: `${progressPct}%` } : {}" />
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<p class="doc-error" v-if="currentStatus === 'error'">{{ errorMsg ?? 'Indexing failed.' }}</p>
|
||||
|
||||
<div class="doc-actions">
|
||||
<button class="btn-sm" @click="emit('reingest', doc.id)" :disabled="doc.status === 'processing'">
|
||||
<button class="btn-sm" @click="emit('reingest', doc.id)" :disabled="isProcessing">
|
||||
Re-index
|
||||
</button>
|
||||
<button class="btn-sm danger" @click="emit('delete', doc.id)">Remove</button>
|
||||
|
|
@ -21,9 +27,9 @@
|
|||
</template>
|
||||
|
||||
<script setup lang="ts">
|
||||
import { computed } from "vue"
|
||||
import { computed, onMounted, onUnmounted, ref } from "vue"
|
||||
import type { Document } from "@/api"
|
||||
import IngestProgress from "@/components/IngestProgress.vue"
|
||||
import { api } from "@/api"
|
||||
|
||||
const props = defineProps<{ doc: Document }>()
|
||||
const emit = defineEmits<{ reingest: [id: string]; delete: [id: string]; refresh: [] }>()
|
||||
|
|
@ -32,6 +38,54 @@ const shortPath = computed(() => {
|
|||
const parts = props.doc.file_path.split("/")
|
||||
return parts.slice(-2).join("/")
|
||||
})
|
||||
|
||||
// Live-updating fields polled from /api/library/{id}/status
|
||||
const currentStatus = ref(props.doc.status)
|
||||
const displayPageCount = ref(props.doc.page_count)
|
||||
const vecCount = ref(0)
|
||||
const errorMsg = ref<string | null>(null)
|
||||
|
||||
const isProcessing = computed(() => currentStatus.value === "processing")
|
||||
|
||||
const progressLabel = computed(() => {
|
||||
if (displayPageCount.value == null || vecCount.value === 0) return "Extracting text…"
|
||||
return `Embedding ${vecCount.value} / ${displayPageCount.value} pages`
|
||||
})
|
||||
|
||||
const progressPct = computed((): number | null => {
|
||||
if (displayPageCount.value == null || displayPageCount.value === 0) return null
|
||||
if (vecCount.value === 0) return null
|
||||
return Math.min(Math.round((vecCount.value / displayPageCount.value) * 100), 99)
|
||||
})
|
||||
|
||||
let timer: ReturnType<typeof setInterval> | null = null
|
||||
|
||||
async function pollStatus() {
|
||||
try {
|
||||
const s = await api.getDocumentStatus(props.doc.id)
|
||||
currentStatus.value = s.status
|
||||
displayPageCount.value = s.page_count
|
||||
vecCount.value = s.vec_count
|
||||
errorMsg.value = s.error_msg
|
||||
if (s.status !== "processing") {
|
||||
stopPoll()
|
||||
if (s.status === "ready") emit("refresh")
|
||||
}
|
||||
} catch (_e: unknown) { /* non-fatal — keep polling */ }
|
||||
}
|
||||
|
||||
function stopPoll() {
|
||||
if (timer) { clearInterval(timer); timer = null }
|
||||
}
|
||||
|
||||
onMounted(() => {
|
||||
if (props.doc.status === "processing") {
|
||||
pollStatus()
|
||||
timer = setInterval(pollStatus, 3000)
|
||||
}
|
||||
})
|
||||
|
||||
onUnmounted(stopPoll)
|
||||
</script>
|
||||
|
||||
<style scoped>
|
||||
|
|
@ -48,6 +102,7 @@ const shortPath = computed(() => {
|
|||
}
|
||||
.doc-card.status-error { border-color: var(--color-error); }
|
||||
.doc-card.status-ready { border-color: var(--color-success); }
|
||||
.doc-card.status-processing { border-color: var(--color-accent); }
|
||||
.doc-title { font-weight: 600; font-size: 1rem; }
|
||||
.doc-meta { font-size: 0.8rem; color: var(--color-text-muted); }
|
||||
.doc-meta.path { font-family: var(--font-mono); word-break: break-all; }
|
||||
|
|
@ -57,6 +112,9 @@ const shortPath = computed(() => {
|
|||
padding: 2px 6px; border-radius: var(--radius-sm);
|
||||
background: var(--color-surface-alt);
|
||||
}
|
||||
.badge-processing { background: var(--color-accent); color: #fff; }
|
||||
.badge-ready { background: var(--color-success); color: #fff; }
|
||||
.badge-error { background: var(--color-error); color: #fff; }
|
||||
.doc-actions { display: flex; gap: 0.5rem; margin-top: 0.5rem; }
|
||||
.btn-sm {
|
||||
padding: 4px 10px; border: 1px solid var(--color-border); border-radius: var(--radius-sm);
|
||||
|
|
@ -65,4 +123,23 @@ const shortPath = computed(() => {
|
|||
.btn-sm:hover { border-color: var(--color-accent); }
|
||||
.btn-sm.danger:hover { border-color: var(--color-error); color: var(--color-error); }
|
||||
.btn-sm:disabled { opacity: 0.4; cursor: default; }
|
||||
.doc-error { color: var(--color-error); font-size: 0.8rem; }
|
||||
|
||||
/* Progress bar */
|
||||
.ingest-progress { margin-top: 0.25rem; }
|
||||
.progress-label {
|
||||
display: flex; justify-content: space-between;
|
||||
font-size: 0.78rem; color: var(--color-text-muted); margin-bottom: 4px;
|
||||
}
|
||||
.progress-pct { font-variant-numeric: tabular-nums; }
|
||||
.progress-bar { height: 4px; background: var(--color-border); border-radius: 2px; overflow: hidden; }
|
||||
.progress-fill { height: 100%; background: var(--color-accent); transition: width 0.4s ease; }
|
||||
.progress-fill.indeterminate {
|
||||
width: 40%;
|
||||
animation: slide 1.4s ease-in-out infinite;
|
||||
}
|
||||
@keyframes slide {
|
||||
0% { transform: translateX(-100%); }
|
||||
100% { transform: translateX(300%); }
|
||||
}
|
||||
</style>
|
||||
|
|
|
|||
631
web/src/components/FeedbackButton.vue
Normal file
631
web/src/components/FeedbackButton.vue
Normal file
|
|
@ -0,0 +1,631 @@
|
|||
<template>
|
||||
<!-- Floating trigger button -->
|
||||
<button
|
||||
v-if="enabled"
|
||||
class="feedback-fab"
|
||||
@click="open = true"
|
||||
aria-label="Send feedback or report a bug"
|
||||
title="Send feedback or report a bug"
|
||||
>
|
||||
<svg class="feedback-fab-icon" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="1.8" stroke-linecap="round" stroke-linejoin="round">
|
||||
<path d="M21 15a2 2 0 01-2 2H7l-4 4V5a2 2 0 012-2h14a2 2 0 012 2z"/>
|
||||
</svg>
|
||||
<span class="feedback-fab-label">Feedback</span>
|
||||
</button>
|
||||
|
||||
<!-- Modal — teleported to body to avoid z-index / overflow clipping -->
|
||||
<Teleport to="body">
|
||||
<Transition name="modal-fade">
|
||||
<div v-if="open" class="feedback-overlay" @click.self="close">
|
||||
<div class="feedback-modal" role="dialog" aria-modal="true" aria-label="Send Feedback">
|
||||
|
||||
<!-- Header -->
|
||||
<div class="feedback-header">
|
||||
<h2 class="feedback-title">{{ step === 1 ? "What's on your mind?" : "Review & submit" }}</h2>
|
||||
<button class="feedback-close" @click="close" aria-label="Close">
|
||||
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" width="18" height="18">
|
||||
<line x1="18" y1="6" x2="6" y2="18"/><line x1="6" y1="6" x2="18" y2="18"/>
|
||||
</svg>
|
||||
</button>
|
||||
</div>
|
||||
|
||||
<!-- ── Step 1: Form ─────────────────────────────────────────── -->
|
||||
<div v-if="step === 1" class="feedback-body">
|
||||
<div class="form-group">
|
||||
<label class="form-label">Type</label>
|
||||
<div class="filter-chip-row">
|
||||
<button
|
||||
v-for="t in types"
|
||||
:key="t.value"
|
||||
:class="['btn-chip', { active: form.type === t.value }]"
|
||||
@click="form.type = t.value"
|
||||
type="button"
|
||||
>{{ t.label }}</button>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="form-group">
|
||||
<label class="form-label">Title <span class="form-required">*</span></label>
|
||||
<input
|
||||
v-model="form.title"
|
||||
class="form-input"
|
||||
type="text"
|
||||
placeholder="Short summary of the issue or idea"
|
||||
maxlength="120"
|
||||
/>
|
||||
</div>
|
||||
|
||||
<div class="form-group">
|
||||
<label class="form-label">Description <span class="form-required">*</span></label>
|
||||
<textarea
|
||||
v-model="form.description"
|
||||
class="form-input feedback-textarea"
|
||||
placeholder="Describe what happened or what you'd like to see…"
|
||||
rows="4"
|
||||
/>
|
||||
</div>
|
||||
|
||||
<div v-if="form.type === 'bug'" class="form-group">
|
||||
<label class="form-label">Reproduction steps</label>
|
||||
<textarea
|
||||
v-model="form.repro"
|
||||
class="form-input feedback-textarea"
|
||||
placeholder="1. Go to… 2. Tap… 3. See error"
|
||||
rows="3"
|
||||
/>
|
||||
</div>
|
||||
|
||||
<div class="form-group">
|
||||
<label class="form-label">Screenshot <span class="text-muted text-xs">(optional, max 5 MB)</span></label>
|
||||
<input
|
||||
type="file"
|
||||
accept="image/*"
|
||||
class="form-input-file"
|
||||
@change="onScreenshotChange"
|
||||
ref="fileInput"
|
||||
/>
|
||||
<div v-if="screenshotPreview" class="screenshot-preview">
|
||||
<img :src="screenshotPreview" alt="Screenshot preview" />
|
||||
<button class="screenshot-remove btn-link" type="button" @click="clearScreenshot" aria-label="Remove screenshot">Remove</button>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<p v-if="stepError" class="feedback-error">{{ stepError }}</p>
|
||||
</div>
|
||||
|
||||
<!-- ── Step 2: Attribution + confirm ──────────────────────────── -->
|
||||
<div v-if="step === 2" class="feedback-body">
|
||||
<div class="feedback-summary card">
|
||||
<div class="feedback-summary-row">
|
||||
<span class="text-muted text-sm">Type</span>
|
||||
<span class="text-sm font-semibold">{{ typeLabel }}</span>
|
||||
</div>
|
||||
<div class="feedback-summary-row">
|
||||
<span class="text-muted text-sm">Title</span>
|
||||
<span class="text-sm">{{ form.title }}</span>
|
||||
</div>
|
||||
<div class="feedback-summary-row">
|
||||
<span class="text-muted text-sm">Description</span>
|
||||
<span class="text-sm feedback-summary-desc">{{ form.description }}</span>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="form-group mt-md">
|
||||
<label class="form-label">Attribution (optional)</label>
|
||||
<input
|
||||
v-model="form.submitter"
|
||||
class="form-input"
|
||||
type="text"
|
||||
placeholder="Your name <email@example.com>"
|
||||
/>
|
||||
<p class="text-muted text-xs mt-xs">Include your name and email in the issue if you'd like a response. Never required.</p>
|
||||
</div>
|
||||
|
||||
<p v-if="submitError" class="feedback-error">{{ submitError }}</p>
|
||||
<div v-if="submitted" class="feedback-success">
|
||||
Issue filed! <a :href="issueUrl" target="_blank" rel="noopener" class="feedback-link">View on Forgejo →</a>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- Footer nav -->
|
||||
<div class="feedback-footer">
|
||||
<button v-if="step === 2 && !submitted" class="btn btn-ghost" @click="step = 1" :disabled="loading">← Back</button>
|
||||
<button v-if="!submitted" class="btn btn-ghost" @click="close" :disabled="loading">Cancel</button>
|
||||
<button
|
||||
v-if="step === 1"
|
||||
class="btn btn-primary"
|
||||
@click="nextStep"
|
||||
>Next →</button>
|
||||
<button
|
||||
v-if="step === 2 && !submitted"
|
||||
class="btn btn-primary"
|
||||
@click="submit"
|
||||
:disabled="loading"
|
||||
>{{ loading ? 'Filing…' : 'Submit' }}</button>
|
||||
<button v-if="submitted" class="btn btn-primary" @click="close">Done</button>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</Transition>
|
||||
</Teleport>
|
||||
</template>
|
||||
|
||||
<script setup lang="ts">
|
||||
import { ref, computed, onMounted } from 'vue'
|
||||
|
||||
const props = defineProps<{ currentTab?: string }>()
|
||||
|
||||
const fileInput = ref<HTMLInputElement | null>(null)
|
||||
const screenshotB64 = ref<string | null>(null)
|
||||
const screenshotPreview = ref<string | null>(null)
|
||||
const screenshotFilename = ref('screenshot.png')
|
||||
|
||||
function onScreenshotChange(event: Event) {
|
||||
const file = (event.target as HTMLInputElement).files?.[0]
|
||||
if (!file) return
|
||||
screenshotFilename.value = file.name
|
||||
const reader = new FileReader()
|
||||
reader.onload = (e) => {
|
||||
const result = e.target?.result as string
|
||||
screenshotB64.value = result
|
||||
screenshotPreview.value = result
|
||||
}
|
||||
reader.readAsDataURL(file)
|
||||
}
|
||||
|
||||
function clearScreenshot() {
|
||||
screenshotB64.value = null
|
||||
screenshotPreview.value = null
|
||||
if (fileInput.value) fileInput.value.value = ''
|
||||
}
|
||||
|
||||
const apiBase = (import.meta.env.VITE_API_BASE as string) ?? ''
|
||||
|
||||
// Probe once on mount — hidden until confirmed enabled so button never flashes
|
||||
const enabled = ref(false)
|
||||
onMounted(async () => {
|
||||
try {
|
||||
const res = await fetch(`${apiBase}/api/v1/feedback/status`)
|
||||
if (res.ok) {
|
||||
const data = await res.json()
|
||||
enabled.value = data.enabled === true
|
||||
}
|
||||
} catch { /* network error — stay hidden */ }
|
||||
})
|
||||
|
||||
const open = ref(false)
|
||||
const step = ref(1)
|
||||
const loading = ref(false)
|
||||
const stepError = ref('')
|
||||
const submitError = ref('')
|
||||
const submitted = ref(false)
|
||||
const issueUrl = ref('')
|
||||
|
||||
const types: { value: 'bug' | 'feature' | 'other'; label: string }[] = [
|
||||
{ value: 'bug', label: '🐛 Bug' },
|
||||
{ value: 'feature', label: '✨ Feature request' },
|
||||
{ value: 'other', label: '💬 Other' },
|
||||
]
|
||||
|
||||
const form = ref({
|
||||
type: 'bug' as 'bug' | 'feature' | 'other',
|
||||
title: '',
|
||||
description: '',
|
||||
repro: '',
|
||||
submitter: '',
|
||||
})
|
||||
|
||||
const typeLabel = computed(() => types.find(t => t.value === form.value.type)?.label ?? '')
|
||||
|
||||
function close() {
|
||||
open.value = false
|
||||
// reset after transition
|
||||
setTimeout(reset, 300)
|
||||
}
|
||||
|
||||
function reset() {
|
||||
step.value = 1
|
||||
loading.value = false
|
||||
stepError.value = ''
|
||||
submitError.value = ''
|
||||
submitted.value = false
|
||||
issueUrl.value = ''
|
||||
form.value = { type: 'bug', title: '', description: '', repro: '', submitter: '' }
|
||||
clearScreenshot()
|
||||
}
|
||||
|
||||
function nextStep() {
|
||||
stepError.value = ''
|
||||
if (!form.value.title.trim() || !form.value.description.trim()) {
|
||||
stepError.value = 'Please fill in both Title and Description.'
|
||||
return
|
||||
}
|
||||
step.value = 2
|
||||
}
|
||||
|
||||
async function submit() {
|
||||
loading.value = true
|
||||
submitError.value = ''
|
||||
try {
|
||||
const res = await fetch(`${apiBase}/api/v1/feedback`, {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({
|
||||
title: form.value.title.trim(),
|
||||
description: form.value.description.trim(),
|
||||
type: form.value.type,
|
||||
repro: form.value.repro.trim(),
|
||||
tab: props.currentTab ?? 'unknown',
|
||||
submitter: form.value.submitter.trim(),
|
||||
}),
|
||||
})
|
||||
if (!res.ok) {
|
||||
const err = await res.json().catch(() => ({ detail: res.statusText }))
|
||||
submitError.value = err.detail ?? 'Submission failed.'
|
||||
return
|
||||
}
|
||||
const data = await res.json()
|
||||
issueUrl.value = data.issue_url
|
||||
|
||||
// Upload screenshot if provided
|
||||
if (screenshotB64.value) {
|
||||
try {
|
||||
await fetch(`${apiBase}/api/v1/feedback/attach`, {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({
|
||||
issue_number: data.issue_number,
|
||||
filename: screenshotFilename.value,
|
||||
image_b64: screenshotB64.value,
|
||||
}),
|
||||
})
|
||||
// Non-fatal: if attach fails, the issue was still filed
|
||||
} catch { /* ignore attach errors */ }
|
||||
}
|
||||
|
||||
submitted.value = true
|
||||
} catch (e) {
|
||||
submitError.value = 'Network error — please try again.'
|
||||
} finally {
|
||||
loading.value = false
|
||||
}
|
||||
}
|
||||
</script>
|
||||
|
||||
<style scoped>
|
||||
/* ── Floating action button ─────────────────────────────────────────── */
|
||||
.feedback-fab {
|
||||
position: fixed;
|
||||
right: var(--spacing-md);
|
||||
bottom: calc(68px + var(--spacing-md)); /* above mobile bottom nav */
|
||||
z-index: 190;
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: var(--spacing-xs);
|
||||
padding: 9px var(--spacing-md);
|
||||
background: var(--color-bg-elevated);
|
||||
border: 1px solid var(--color-border);
|
||||
border-radius: 999px;
|
||||
color: var(--color-text-secondary);
|
||||
font-size: var(--font-size-sm);
|
||||
font-family: var(--font-body);
|
||||
font-weight: 500;
|
||||
cursor: pointer;
|
||||
box-shadow: var(--shadow-md);
|
||||
transition: background 0.15s, color 0.15s, box-shadow 0.15s, border-color 0.15s;
|
||||
}
|
||||
.feedback-fab:hover {
|
||||
background: var(--color-bg-card);
|
||||
color: var(--color-text-primary);
|
||||
border-color: var(--color-border-focus);
|
||||
box-shadow: var(--shadow-lg);
|
||||
}
|
||||
.feedback-fab-icon { width: 15px; height: 15px; flex-shrink: 0; }
|
||||
.feedback-fab-label { white-space: nowrap; }
|
||||
|
||||
/* On desktop, bottom nav is gone — drop to standard corner */
|
||||
@media (min-width: 769px) {
|
||||
.feedback-fab {
|
||||
bottom: var(--spacing-lg);
|
||||
}
|
||||
}
|
||||
|
||||
/* ── Overlay ──────────────────────────────────────────────────────────── */
|
||||
.feedback-overlay {
|
||||
position: fixed;
|
||||
inset: 0;
|
||||
background: rgba(0, 0, 0, 0.55);
|
||||
z-index: 1000;
|
||||
display: flex;
|
||||
align-items: flex-end;
|
||||
justify-content: center;
|
||||
padding: 0;
|
||||
}
|
||||
|
||||
@media (min-width: 500px) {
|
||||
.feedback-overlay {
|
||||
align-items: center;
|
||||
padding: var(--spacing-md);
|
||||
}
|
||||
}
|
||||
|
||||
/* ── Modal ────────────────────────────────────────────────────────────── */
|
||||
.feedback-modal {
|
||||
background: var(--color-bg-elevated);
|
||||
border: 1px solid var(--color-border);
|
||||
border-radius: var(--radius-lg) var(--radius-lg) 0 0;
|
||||
width: 100%;
|
||||
max-height: 90vh;
|
||||
overflow-y: auto;
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
box-shadow: var(--shadow-xl);
|
||||
}
|
||||
|
||||
@media (min-width: 500px) {
|
||||
.feedback-modal {
|
||||
border-radius: var(--radius-lg);
|
||||
width: 100%;
|
||||
max-width: 520px;
|
||||
max-height: 85vh;
|
||||
}
|
||||
}
|
||||
|
||||
.feedback-header {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: space-between;
|
||||
padding: var(--spacing-md) var(--spacing-md) var(--spacing-sm);
|
||||
border-bottom: 1px solid var(--color-border);
|
||||
flex-shrink: 0;
|
||||
}
|
||||
.feedback-title {
|
||||
font-family: var(--font-display);
|
||||
font-size: var(--font-size-lg);
|
||||
font-weight: 600;
|
||||
margin: 0;
|
||||
}
|
||||
.feedback-close {
|
||||
background: transparent;
|
||||
border: none;
|
||||
color: var(--color-text-muted);
|
||||
cursor: pointer;
|
||||
padding: 4px;
|
||||
border-radius: var(--radius-sm);
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: center;
|
||||
}
|
||||
.feedback-close:hover { color: var(--color-text-primary); }
|
||||
|
||||
.feedback-body {
|
||||
padding: var(--spacing-md);
|
||||
flex: 1;
|
||||
overflow-y: auto;
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: var(--spacing-md);
|
||||
}
|
||||
|
||||
.feedback-footer {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: flex-end;
|
||||
gap: var(--spacing-sm);
|
||||
padding: var(--spacing-sm) var(--spacing-md);
|
||||
border-top: 1px solid var(--color-border);
|
||||
flex-shrink: 0;
|
||||
}
|
||||
|
||||
.feedback-textarea {
|
||||
resize: vertical;
|
||||
min-height: 80px;
|
||||
font-family: var(--font-body);
|
||||
font-size: var(--font-size-sm);
|
||||
}
|
||||
|
||||
.form-required { color: var(--color-error); margin-left: 2px; }
|
||||
|
||||
.feedback-error {
|
||||
color: var(--color-error);
|
||||
font-size: var(--font-size-sm);
|
||||
margin: 0;
|
||||
}
|
||||
|
||||
.feedback-success {
|
||||
color: var(--color-success);
|
||||
font-size: var(--font-size-sm);
|
||||
padding: var(--spacing-sm) var(--spacing-md);
|
||||
background: var(--color-success-bg);
|
||||
border: 1px solid var(--color-success-border);
|
||||
border-radius: var(--radius-md);
|
||||
}
|
||||
.feedback-link { color: var(--color-success); font-weight: 600; text-decoration: underline; }
|
||||
|
||||
/* Summary card (step 2) */
|
||||
.feedback-summary {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: var(--spacing-xs);
|
||||
padding: var(--spacing-sm) var(--spacing-md);
|
||||
background: var(--color-bg-secondary);
|
||||
border-radius: var(--radius-md);
|
||||
border: 1px solid var(--color-border);
|
||||
}
|
||||
.feedback-summary-row {
|
||||
display: flex;
|
||||
gap: var(--spacing-md);
|
||||
align-items: flex-start;
|
||||
}
|
||||
.feedback-summary-row > :first-child { min-width: 72px; flex-shrink: 0; }
|
||||
.feedback-summary-desc {
|
||||
white-space: pre-wrap;
|
||||
word-break: break-word;
|
||||
}
|
||||
|
||||
.mt-md { margin-top: var(--spacing-md); }
|
||||
.mt-xs { margin-top: var(--spacing-xs); }
|
||||
|
||||
/* ── Form elements ────────────────────────────────────────────────────── */
|
||||
.form-group {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: var(--spacing-xs);
|
||||
}
|
||||
|
||||
.form-label {
|
||||
font-size: var(--font-size-sm);
|
||||
font-weight: 600;
|
||||
color: var(--color-text-muted);
|
||||
text-transform: uppercase;
|
||||
letter-spacing: 0.06em;
|
||||
}
|
||||
|
||||
.form-input {
|
||||
width: 100%;
|
||||
padding: var(--spacing-xs) var(--spacing-sm);
|
||||
background: var(--color-bg-secondary);
|
||||
border: 1px solid var(--color-border);
|
||||
border-radius: var(--radius-md);
|
||||
color: var(--color-text-primary);
|
||||
font-family: var(--font-body);
|
||||
font-size: var(--font-size-sm);
|
||||
line-height: 1.5;
|
||||
transition: border-color 0.15s;
|
||||
box-sizing: border-box;
|
||||
}
|
||||
.form-input:focus {
|
||||
outline: none;
|
||||
border-color: var(--color-border-focus);
|
||||
}
|
||||
.form-input::placeholder { color: var(--color-text-muted); opacity: 0.7; }
|
||||
|
||||
/* ── Buttons ──────────────────────────────────────────────────────────── */
|
||||
.btn {
|
||||
display: inline-flex;
|
||||
align-items: center;
|
||||
justify-content: center;
|
||||
gap: var(--spacing-xs);
|
||||
padding: var(--spacing-xs) var(--spacing-md);
|
||||
border-radius: var(--radius-md);
|
||||
font-family: var(--font-body);
|
||||
font-size: var(--font-size-sm);
|
||||
font-weight: 500;
|
||||
cursor: pointer;
|
||||
transition: background 0.15s, color 0.15s, border-color 0.15s;
|
||||
white-space: nowrap;
|
||||
}
|
||||
.btn:disabled { opacity: 0.5; cursor: not-allowed; }
|
||||
|
||||
.btn-primary {
|
||||
background: var(--color-primary);
|
||||
color: #fff;
|
||||
border: 1px solid var(--color-primary);
|
||||
}
|
||||
.btn-primary:hover:not(:disabled) { filter: brightness(1.1); }
|
||||
|
||||
.btn-ghost {
|
||||
background: transparent;
|
||||
color: var(--color-text-secondary);
|
||||
border: 1px solid var(--color-border);
|
||||
}
|
||||
.btn-ghost:hover:not(:disabled) {
|
||||
background: var(--color-bg-secondary);
|
||||
color: var(--color-text-primary);
|
||||
border-color: var(--color-border-focus);
|
||||
}
|
||||
|
||||
/* ── Filter chips ─────────────────────────────────────────────────────── */
|
||||
.filter-chip-row {
|
||||
display: flex;
|
||||
flex-wrap: wrap;
|
||||
gap: var(--spacing-xs);
|
||||
}
|
||||
|
||||
.btn-chip {
|
||||
padding: 5px var(--spacing-sm);
|
||||
background: var(--color-bg-secondary);
|
||||
border: 1px solid var(--color-border);
|
||||
border-radius: 999px;
|
||||
font-family: var(--font-body);
|
||||
font-size: var(--font-size-sm);
|
||||
font-weight: 500;
|
||||
color: var(--color-text-secondary);
|
||||
cursor: pointer;
|
||||
transition: background 0.15s, color 0.15s, border-color 0.15s;
|
||||
}
|
||||
.btn-chip.active,
|
||||
.btn-chip:hover {
|
||||
background: color-mix(in srgb, var(--color-primary) 15%, transparent);
|
||||
border-color: var(--color-primary);
|
||||
color: var(--color-primary);
|
||||
}
|
||||
|
||||
/* ── Card ─────────────────────────────────────────────────────────────── */
|
||||
.card {
|
||||
background: var(--color-bg-card);
|
||||
border: 1px solid var(--color-border);
|
||||
border-radius: var(--radius-md);
|
||||
}
|
||||
|
||||
/* ── Text utilities ───────────────────────────────────────────────────── */
|
||||
.text-muted { color: var(--color-text-muted); }
|
||||
.text-sm { font-size: var(--font-size-sm); line-height: 1.5; }
|
||||
.text-xs { font-size: 0.75rem; line-height: 1.5; }
|
||||
.font-semibold { font-weight: 600; }
|
||||
|
||||
/* ── Screenshot attachment ────────────────────────────────────────────── */
|
||||
.form-input-file {
|
||||
display: block;
|
||||
width: 100%;
|
||||
padding: var(--spacing-xs) var(--spacing-sm);
|
||||
background: var(--color-bg-secondary);
|
||||
border: 1px dashed var(--color-border);
|
||||
border-radius: var(--radius-md);
|
||||
color: var(--color-text-secondary);
|
||||
font-family: var(--font-body);
|
||||
font-size: var(--font-size-sm);
|
||||
cursor: pointer;
|
||||
box-sizing: border-box;
|
||||
}
|
||||
.form-input-file:focus { outline: 2px solid var(--color-border-focus); outline-offset: 2px; }
|
||||
|
||||
.screenshot-preview {
|
||||
margin-top: var(--spacing-xs);
|
||||
display: flex;
|
||||
align-items: flex-start;
|
||||
gap: var(--spacing-sm);
|
||||
}
|
||||
.screenshot-preview img {
|
||||
max-width: 160px;
|
||||
max-height: 100px;
|
||||
border-radius: var(--radius-sm);
|
||||
border: 1px solid var(--color-border);
|
||||
object-fit: cover;
|
||||
}
|
||||
.screenshot-remove {
|
||||
font-size: var(--font-size-xs);
|
||||
color: var(--color-text-muted);
|
||||
background: none;
|
||||
border: none;
|
||||
cursor: pointer;
|
||||
padding: 2px 4px;
|
||||
min-height: 24px;
|
||||
}
|
||||
.screenshot-remove:hover { color: var(--color-error); }
|
||||
|
||||
.btn-link {
|
||||
background: none;
|
||||
border: none;
|
||||
color: var(--color-primary);
|
||||
cursor: pointer;
|
||||
padding: 0;
|
||||
font-family: var(--font-body);
|
||||
font-size: inherit;
|
||||
text-decoration: underline;
|
||||
}
|
||||
|
||||
/* Transition */
|
||||
.modal-fade-enter-active, .modal-fade-leave-active { transition: opacity 0.2s ease; }
|
||||
.modal-fade-enter-from, .modal-fade-leave-to { opacity: 0; }
|
||||
</style>
|
||||
|
|
@ -20,6 +20,35 @@
|
|||
--radius-lg: 16px;
|
||||
--shadow-card: 0 2px 8px rgba(0,0,0,0.4);
|
||||
--transition-fast: 150ms ease;
|
||||
|
||||
/* Spacing scale */
|
||||
--spacing-xs: 0.25rem;
|
||||
--spacing-sm: 0.5rem;
|
||||
--spacing-md: 1rem;
|
||||
--spacing-lg: 1.5rem;
|
||||
|
||||
/* Font scale */
|
||||
--font-body: var(--font-base);
|
||||
--font-display: var(--font-base);
|
||||
--font-size-xs: 0.75rem;
|
||||
--font-size-sm: 0.875rem;
|
||||
--font-size-lg: 1.125rem;
|
||||
|
||||
/* Shadow aliases */
|
||||
--shadow-md: var(--shadow-card);
|
||||
--shadow-lg: var(--shadow-card);
|
||||
--shadow-xl: 0 4px 20px rgba(0,0,0,0.5);
|
||||
|
||||
/* Color aliases for shared component compat */
|
||||
--color-primary: var(--color-accent);
|
||||
--color-text-primary: var(--color-text);
|
||||
--color-text-secondary: var(--color-text-muted);
|
||||
--color-bg-elevated: var(--color-surface);
|
||||
--color-bg-card: var(--color-surface);
|
||||
--color-bg-secondary: var(--color-bg);
|
||||
--color-border-focus: var(--color-accent);
|
||||
--color-success-bg: color-mix(in srgb, var(--color-success) 15%, transparent);
|
||||
--color-success-border: color-mix(in srgb, var(--color-success) 35%, transparent);
|
||||
}
|
||||
|
||||
@media (prefers-color-scheme: light) {
|
||||
|
|
|
|||
|
|
@ -25,6 +25,25 @@
|
|||
:bm25-score="cite.bm25_score ?? undefined"
|
||||
/>
|
||||
</div>
|
||||
<div v-if="msg.role === 'assistant' && chatFeedbackEnabled" class="message-thumbs" :aria-label="`Rate this answer`">
|
||||
<button
|
||||
class="thumb-btn"
|
||||
:class="{ active: msg.rating === 1 }"
|
||||
@click="rate(i, 1)"
|
||||
:disabled="msg.rating != null"
|
||||
title="Helpful"
|
||||
aria-label="Mark as helpful"
|
||||
>👍</button>
|
||||
<button
|
||||
class="thumb-btn"
|
||||
:class="{ active: msg.rating === -1 }"
|
||||
@click="rate(i, -1)"
|
||||
:disabled="msg.rating != null"
|
||||
title="Not helpful"
|
||||
aria-label="Mark as not helpful"
|
||||
>👎</button>
|
||||
<span v-if="msg.rating != null" class="thumb-thanks">Thanks!</span>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="message assistant loading" v-if="thinking">
|
||||
|
|
@ -77,6 +96,7 @@ interface ChatMessage {
|
|||
role: "user" | "assistant"
|
||||
content: string
|
||||
citations?: Citation[]
|
||||
rating?: 1 | -1
|
||||
}
|
||||
|
||||
const history = ref<ChatMessage[]>([])
|
||||
|
|
@ -88,6 +108,7 @@ const messagesEl = ref<HTMLElement | null>(null)
|
|||
const inputEl = ref<HTMLInputElement | null>(null)
|
||||
const allDocs = ref<Document[]>([])
|
||||
const selectedDocs = ref<string[]>([])
|
||||
const chatFeedbackEnabled = ref(false)
|
||||
|
||||
const readyDocs = computed(() => allDocs.value.filter(d => d.status === "ready"))
|
||||
const docTitles = computed(() =>
|
||||
|
|
@ -96,6 +117,7 @@ const docTitles = computed(() =>
|
|||
|
||||
onMounted(async () => {
|
||||
allDocs.value = await api.getLibrary().catch(() => [])
|
||||
api.chatFeedbackStatus().then(s => { chatFeedbackEnabled.value = s.enabled }).catch(() => {})
|
||||
inputEl.value?.focus()
|
||||
})
|
||||
|
||||
|
|
@ -137,6 +159,17 @@ function scrollBottom() {
|
|||
messagesEl.value.scrollTop = messagesEl.value.scrollHeight
|
||||
}
|
||||
}
|
||||
|
||||
async function rate(index: number, rating: 1 | -1) {
|
||||
const msg = history.value[index]
|
||||
if (!msg || msg.role !== "assistant" || msg.rating != null) return
|
||||
// Update UI immediately (optimistic)
|
||||
history.value[index] = { ...msg, rating }
|
||||
const question = index > 0 ? (history.value[index - 1]?.content ?? "") : ""
|
||||
await api.submitChatFeedback(rating, question, msg.content, selectedDocs.value).catch(() => {
|
||||
// Non-fatal — rating is cosmetic, ignore network errors
|
||||
})
|
||||
}
|
||||
</script>
|
||||
|
||||
<style scoped>
|
||||
|
|
@ -232,6 +265,27 @@ function scrollBottom() {
|
|||
font-size: 0.85rem; margin-bottom: 0.5rem; cursor: pointer; line-height: 1.4;
|
||||
}
|
||||
|
||||
.message-thumbs {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 0.35rem;
|
||||
margin-top: 0.4rem;
|
||||
}
|
||||
.thumb-btn {
|
||||
background: transparent;
|
||||
border: 1px solid var(--color-border);
|
||||
border-radius: var(--radius-sm);
|
||||
cursor: pointer;
|
||||
font-size: 0.9rem;
|
||||
padding: 2px 6px;
|
||||
line-height: 1;
|
||||
transition: background var(--transition-fast), border-color var(--transition-fast);
|
||||
}
|
||||
.thumb-btn:hover:not(:disabled) { background: var(--color-surface-alt); border-color: var(--color-accent); }
|
||||
.thumb-btn.active { background: var(--color-surface-alt); border-color: var(--color-accent); }
|
||||
.thumb-btn:disabled { opacity: 0.4; cursor: default; }
|
||||
.thumb-thanks { font-size: 0.75rem; color: var(--color-text-muted); }
|
||||
|
||||
@media (max-width: 640px) {
|
||||
.chat-layout { flex-direction: column-reverse; }
|
||||
.sidebar { width: 100%; height: auto; max-height: 30vh; border-left: none; border-top: 1px solid var(--color-border); }
|
||||
|
|
|
|||
|
|
@ -2,16 +2,23 @@
|
|||
<main class="library">
|
||||
<header class="library-header">
|
||||
<h1>Library</h1>
|
||||
<div class="header-actions">
|
||||
<button class="btn-secondary" @click="triggerUpload" :disabled="uploading">
|
||||
{{ uploading ? "Uploading..." : "Upload PDF / EPUB" }}
|
||||
</button>
|
||||
<input ref="fileInput" type="file" accept=".pdf,.epub" style="display:none" @change="handleUpload">
|
||||
<button class="btn-primary" @click="scan" :disabled="scanning">
|
||||
{{ scanning ? "Scanning..." : "Scan for PDFs" }}
|
||||
</button>
|
||||
</div>
|
||||
</header>
|
||||
|
||||
<p class="error-msg" v-if="error">{{ error }}</p>
|
||||
|
||||
<p class="empty-state" v-if="!loading && docs.length === 0">
|
||||
No books indexed yet. Click "Scan for PDFs" to discover PDFs in your books directory.<br>
|
||||
Make sure your PDF directory is mounted at <code>/books</code> inside the container.
|
||||
No documents indexed yet.<br>
|
||||
<strong>Upload a PDF</strong> using the button above, or mount a directory and click
|
||||
<strong>Scan for PDFs</strong> to index an entire collection.
|
||||
</p>
|
||||
|
||||
<div class="doc-grid" v-else>
|
||||
|
|
@ -39,8 +46,10 @@ import DocumentCard from "@/components/DocumentCard.vue"
|
|||
const docs = ref<Document[]>([])
|
||||
const loading = ref(true)
|
||||
const scanning = ref(false)
|
||||
const uploading = ref(false)
|
||||
const error = ref<string | null>(null)
|
||||
const scanResult = ref<{ discovered: number; queued: number } | null>(null)
|
||||
const fileInput = ref<HTMLInputElement | null>(null)
|
||||
|
||||
async function load() {
|
||||
loading.value = true
|
||||
|
|
@ -88,18 +97,45 @@ async function remove(id: string) {
|
|||
}
|
||||
}
|
||||
|
||||
function triggerUpload() {
|
||||
fileInput.value?.click()
|
||||
}
|
||||
|
||||
async function handleUpload(event: Event) {
|
||||
const input = event.target as HTMLInputElement
|
||||
const file = input.files?.[0]
|
||||
if (!file) return
|
||||
uploading.value = true
|
||||
error.value = null
|
||||
try {
|
||||
await api.uploadDocument(file)
|
||||
await load()
|
||||
} catch (e) {
|
||||
error.value = e instanceof Error ? e.message : "Upload failed"
|
||||
} finally {
|
||||
uploading.value = false
|
||||
input.value = ""
|
||||
}
|
||||
}
|
||||
|
||||
onMounted(load)
|
||||
</script>
|
||||
|
||||
<style scoped>
|
||||
.library { padding: 1.5rem; max-width: 1200px; margin: 0 auto; }
|
||||
.library-header { display: flex; align-items: center; justify-content: space-between; margin-bottom: 1.5rem; flex-wrap: wrap; gap: 1rem; }
|
||||
.header-actions { display: flex; gap: 0.5rem; flex-wrap: wrap; }
|
||||
h1 { font-size: 1.5rem; }
|
||||
.btn-primary {
|
||||
background: var(--color-accent); color: #fff; border: none; padding: 0.6rem 1.2rem;
|
||||
border-radius: var(--radius-sm); cursor: pointer; font-size: 0.95rem;
|
||||
}
|
||||
.btn-primary:disabled { opacity: 0.5; cursor: default; }
|
||||
.btn-secondary {
|
||||
background: transparent; color: var(--color-accent); border: 1px solid var(--color-accent);
|
||||
padding: 0.6rem 1.2rem; border-radius: var(--radius-sm); cursor: pointer; font-size: 0.95rem;
|
||||
}
|
||||
.btn-secondary:disabled { opacity: 0.5; cursor: default; }
|
||||
.doc-grid { display: grid; grid-template-columns: repeat(auto-fill, minmax(280px, 1fr)); gap: 1rem; }
|
||||
.empty-state { color: var(--color-text-muted); line-height: 1.8; }
|
||||
.empty-state code { font-family: var(--font-mono); background: var(--color-surface-alt); padding: 2px 6px; border-radius: 3px; }
|
||||
|
|
|
|||
Loading…
Reference in a new issue