pagepiper/app/services/synthesizer.py
pyr0ball e52bdb5128 feat: RAG retrieval quality, artifact cleaning, and ingestion progress UI
Retrieval:
- Add _fetch_adjacent() to retriever: fetches page ± 1 chunks from DB
  after ranking so mid-sentence EPUB chunk boundaries don't lose context
- Fix vec DB doc-filter: oversample to top_k*20 before Python filter
  instead of post-filtering an already-small global pool (fixes wrong-book
  results when searching within a single document)
- top_k default 5 → 10; context per chunk 500 → 1500 chars; citation
  snippet 200 → 400 chars

Artifact cleaning:
- Add scripts/text_clean.py: strips ABC Amber LIT Converter watermarks,
  processtext.com URLs, bare page numbers, piracy stamps from extracted text
- Wire clean_paragraph() into ingest_pdf.py and new ingest_epub.py

Startup validation:
- _check_vec_schema() at boot: detects embedding dimension mismatch,
  deletes stale vec DB, and queues sequential re-embed in background thread
- Sequential _reembed_docs() prevents SQLite lock races on startup re-embed

cf-orch integration:
- Wire CF_ORCH_URL / CF_LICENSE_KEY into LLMRouter backend config so
  allocate() fires and keeps the Ollama model warm between requests

Ingestion progress UI:
- GET /api/library/{doc_id}/status now returns vec_count from page_vecs_meta
- DocumentCard.vue polls status every 3 s while processing and shows
  two-phase progress: indeterminate animation during extraction,
  determinate "Embedding N/M pages" bar once vectors start landing

Other:
- Chat feedback endpoint + thumbs up/down UI (FeedbackButton.vue)
- EPUB ingest script (ingest_epub.py) with heading-based chunking
- migration 002: chat_feedback table
- README.md with setup and feature overview
2026-05-06 08:25:58 -07:00

62 lines
1.7 KiB
Python

# app/services/synthesizer.py
"""
LLM answer synthesis over retrieved chunks.
BSL 1.1 — requires LLMRouter (Ollama BYOK or cloud tier).
"""
from __future__ import annotations
from dataclasses import dataclass
from app.services.retriever import RetrievedChunk
_SYSTEM_PROMPT = (
"You are a helpful document assistant. "
"Answer the user's question using ONLY the provided document excerpts. "
"For each claim, cite the source page as [p.N]. "
"If the excerpts are insufficient, say so. Do not invent information."
)
@dataclass(frozen=True)
class Citation:
doc_id: str
page_number: int
snippet: str
bm25_score: float
@dataclass(frozen=True)
class SynthesisResult:
answer: str
citations: tuple[Citation, ...]
class Synthesizer:
def __init__(self, llm) -> None: # LLMRouter
self._llm = llm
def synthesize(
self,
message: str,
history: list[dict],
chunks: list[RetrievedChunk],
) -> SynthesisResult:
# 1500 chars (~300 words) per chunk: enough to capture definitions that
# appear mid-paragraph without blowing past a 32k-context model's limit.
context_parts = [f"[p.{c.page_number}]\n{c.text[:1500]}" for c in chunks]
context = "\n\n---\n\n".join(context_parts)
prompt = f"Document excerpts:\n\n{context}\n\nQuestion: {message}"
answer = self._llm.complete(prompt, system=_SYSTEM_PROMPT)
citations = tuple(
Citation(
doc_id=c.doc_id,
page_number=c.page_number,
snippet=c.text[:400],
bm25_score=c.bm25_score,
)
for c in chunks
)
return SynthesisResult(answer=answer, citations=citations)