feat: replace nomic-embed-text retriever with Agent-ModernColBERT for semantic chunk search #8

New issue

Open

opened 2026-05-13 15:46:04 -07:00 by pyr0ball · 0 comments

pyr0ball commented

2026-05-13 15:46:04 -07:00

Owner

Background

Pagepiper currently uses nomic-embed-text (bi-encoder) + cosine similarity for chunk retrieval. This works for simple keyword-adjacent queries but loses nuance on complex rulebook questions like:

"what is the action economy for a fighter casting a spell while prone"

A bi-encoder collapses the whole query into a single vector, losing the multi-part reasoning structure.

Proposed upgrade

lightonai/Agent-ModernColBERT — a late-interaction retriever built on ModernBERT. Instead of one vector per chunk, it stores token-level embeddings and computes MaxSim interaction at query time. Designed specifically for agentic/multi-hop queries.

Model is registered in cf-orch model registry as agent-moderncolbert (cf-retriever service type, ~800MB VRAM).

What to change

Option A — in-process (simpler)

Load Agent-ModernColBERT directly in app/services/retriever.py using the pylate library (the recommended ColBERT inference library from LightOn). Replace the nomic-embed-text embedding + cosine search step.

Option B — via cf-orch (consistent with fleet model management)

Route retrieval through CFOrchClient.task_allocate("pagepiper", "retrieve") once a cf-retriever service is defined. Keeps model loading/VRAM outside the FastAPI process.

Option A is the right first step — Option B is worth revisiting when cf-retriever is a fully managed cf-orch service.

Index storage note

ColBERT stores per-token embeddings (~128 floats/token) rather than one vector per chunk. Index will be larger than the current embedding store. For typical TTRPG rulebook collections this is acceptable — flag if storage becomes a concern at scale.

Acceptance criteria

pylate added to dependencies
app/services/retriever.py uses Agent-ModernColBERT for chunk retrieval
Existing /chat endpoint behaviour unchanged (retriever is internal)
ColBERT index built/persisted alongside existing SQLite store
Qualitative test: complex multi-part rulebook query returns more relevant chunks than nomic baseline
Add pagepiper.retrieve to assignments.yaml in cf-orch pointing at agent-moderncolbert

cf-orch model registry: agent-moderncolbert (already registered)
app/services/retriever.py, app/api/chat.py
lightonai/Agent-ModernColBERT on HuggingFace

## Background Pagepiper currently uses `nomic-embed-text` (bi-encoder) + cosine similarity for chunk retrieval. This works for simple keyword-adjacent queries but loses nuance on complex rulebook questions like: > "what is the action economy for a fighter casting a spell while prone" A bi-encoder collapses the whole query into a single vector, losing the multi-part reasoning structure. ## Proposed upgrade **`lightonai/Agent-ModernColBERT`** — a late-interaction retriever built on ModernBERT. Instead of one vector per chunk, it stores token-level embeddings and computes MaxSim interaction at query time. Designed specifically for agentic/multi-hop queries. Model is registered in cf-orch model registry as `agent-moderncolbert` (`cf-retriever` service type, ~800MB VRAM). ## What to change ### Option A — in-process (simpler) Load Agent-ModernColBERT directly in `app/services/retriever.py` using the `pylate` library (the recommended ColBERT inference library from LightOn). Replace the `nomic-embed-text` embedding + cosine search step. ### Option B — via cf-orch (consistent with fleet model management) Route retrieval through `CFOrchClient.task_allocate("pagepiper", "retrieve")` once a `cf-retriever` service is defined. Keeps model loading/VRAM outside the FastAPI process. Option A is the right first step — Option B is worth revisiting when `cf-retriever` is a fully managed cf-orch service. ## Index storage note ColBERT stores per-token embeddings (~128 floats/token) rather than one vector per chunk. Index will be larger than the current embedding store. For typical TTRPG rulebook collections this is acceptable — flag if storage becomes a concern at scale. ## Acceptance criteria - [ ] `pylate` added to dependencies - [ ] `app/services/retriever.py` uses Agent-ModernColBERT for chunk retrieval - [ ] Existing `/chat` endpoint behaviour unchanged (retriever is internal) - [ ] ColBERT index built/persisted alongside existing SQLite store - [ ] Qualitative test: complex multi-part rulebook query returns more relevant chunks than nomic baseline - [ ] Add `pagepiper.retrieve` to `assignments.yaml` in cf-orch pointing at `agent-moderncolbert` ## Related - cf-orch model registry: `agent-moderncolbert` (already registered) - `app/services/retriever.py`, `app/api/chat.py` - `lightonai/Agent-ModernColBERT` on HuggingFace

No labels

No milestone

No project

No assignees

1 participant

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Circuit-Forge/pagepiper#8

No description provided.