feat: replace nomic-embed-text retriever with Agent-ModernColBERT for semantic chunk search #8

Open
opened 2026-05-13 15:46:04 -07:00 by pyr0ball · 0 comments
Owner

Background

Pagepiper currently uses nomic-embed-text (bi-encoder) + cosine similarity for chunk retrieval. This works for simple keyword-adjacent queries but loses nuance on complex rulebook questions like:

"what is the action economy for a fighter casting a spell while prone"

A bi-encoder collapses the whole query into a single vector, losing the multi-part reasoning structure.

Proposed upgrade

lightonai/Agent-ModernColBERT — a late-interaction retriever built on ModernBERT. Instead of one vector per chunk, it stores token-level embeddings and computes MaxSim interaction at query time. Designed specifically for agentic/multi-hop queries.

Model is registered in cf-orch model registry as agent-moderncolbert (cf-retriever service type, ~800MB VRAM).

What to change

Option A — in-process (simpler)

Load Agent-ModernColBERT directly in app/services/retriever.py using the pylate library (the recommended ColBERT inference library from LightOn). Replace the nomic-embed-text embedding + cosine search step.

Option B — via cf-orch (consistent with fleet model management)

Route retrieval through CFOrchClient.task_allocate("pagepiper", "retrieve") once a cf-retriever service is defined. Keeps model loading/VRAM outside the FastAPI process.

Option A is the right first step — Option B is worth revisiting when cf-retriever is a fully managed cf-orch service.

Index storage note

ColBERT stores per-token embeddings (~128 floats/token) rather than one vector per chunk. Index will be larger than the current embedding store. For typical TTRPG rulebook collections this is acceptable — flag if storage becomes a concern at scale.

Acceptance criteria

  • pylate added to dependencies
  • app/services/retriever.py uses Agent-ModernColBERT for chunk retrieval
  • Existing /chat endpoint behaviour unchanged (retriever is internal)
  • ColBERT index built/persisted alongside existing SQLite store
  • Qualitative test: complex multi-part rulebook query returns more relevant chunks than nomic baseline
  • Add pagepiper.retrieve to assignments.yaml in cf-orch pointing at agent-moderncolbert
  • cf-orch model registry: agent-moderncolbert (already registered)
  • app/services/retriever.py, app/api/chat.py
  • lightonai/Agent-ModernColBERT on HuggingFace
## Background Pagepiper currently uses `nomic-embed-text` (bi-encoder) + cosine similarity for chunk retrieval. This works for simple keyword-adjacent queries but loses nuance on complex rulebook questions like: > "what is the action economy for a fighter casting a spell while prone" A bi-encoder collapses the whole query into a single vector, losing the multi-part reasoning structure. ## Proposed upgrade **`lightonai/Agent-ModernColBERT`** — a late-interaction retriever built on ModernBERT. Instead of one vector per chunk, it stores token-level embeddings and computes MaxSim interaction at query time. Designed specifically for agentic/multi-hop queries. Model is registered in cf-orch model registry as `agent-moderncolbert` (`cf-retriever` service type, ~800MB VRAM). ## What to change ### Option A — in-process (simpler) Load Agent-ModernColBERT directly in `app/services/retriever.py` using the `pylate` library (the recommended ColBERT inference library from LightOn). Replace the `nomic-embed-text` embedding + cosine search step. ### Option B — via cf-orch (consistent with fleet model management) Route retrieval through `CFOrchClient.task_allocate("pagepiper", "retrieve")` once a `cf-retriever` service is defined. Keeps model loading/VRAM outside the FastAPI process. Option A is the right first step — Option B is worth revisiting when `cf-retriever` is a fully managed cf-orch service. ## Index storage note ColBERT stores per-token embeddings (~128 floats/token) rather than one vector per chunk. Index will be larger than the current embedding store. For typical TTRPG rulebook collections this is acceptable — flag if storage becomes a concern at scale. ## Acceptance criteria - [ ] `pylate` added to dependencies - [ ] `app/services/retriever.py` uses Agent-ModernColBERT for chunk retrieval - [ ] Existing `/chat` endpoint behaviour unchanged (retriever is internal) - [ ] ColBERT index built/persisted alongside existing SQLite store - [ ] Qualitative test: complex multi-part rulebook query returns more relevant chunks than nomic baseline - [ ] Add `pagepiper.retrieve` to `assignments.yaml` in cf-orch pointing at `agent-moderncolbert` ## Related - cf-orch model registry: `agent-moderncolbert` (already registered) - `app/services/retriever.py`, `app/api/chat.py` - `lightonai/Agent-ModernColBERT` on HuggingFace
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Circuit-Forge/pagepiper#8
No description provided.