Design ref: llama-conductor patterns for provenance tracking, refusal-first, and context compaction #11

New issue

Open

opened 2026-05-31 09:46:48 -07:00 by pyr0ball · 0 comments

pyr0ball commented

2026-05-31 09:46:48 -07:00

Owner

Reference project

GitHub: https://github.com/BobbyLLM/llama-conductor
License: AGPL-3.0 — do NOT incorporate code; patterns only

Relevant patterns for Robin

Provenance on every answer

llama-conductor appends a footer to every response: Confidence: <tier> | Source: <path>. Sources are tiered: user-defined cheatsheets first, then wiki/docs, then web, then model weights as last resort.

For Robin, answering a Linux migration question should always surface:

Which source answered it (local man pages, CF knowledge base, distro docs, web, or model inference)
Confidence tier (authoritative, documented, inferred)

This is critical for a migration companion — a user following wrong advice about their system can cause real harm. Explicit sourcing lets them verify before acting.

Explicit refusal when evidence is absent

llama-conductor refuses to speculate when no grounding evidence exists. Robin must do the same: if a hardware compatibility question has no known answer in the local index, say so explicitly rather than generate a plausible-sounding but wrong answer.

Context compaction ("Vodka CTC")

Bounded prompt sizing with configurable compaction pressure to prevent memory bloat during long assistant sessions. Robin is designed for ongoing conversations across a user's migration journey — unmanaged context growth will degrade response quality and VRAM usage over time. Compaction with explicit preserved anchors (facts the user has confirmed) is the right pattern.

Deterministic fact store (TTL-keyed, separate from model weights)

Users can store facts with !! and retrieve with ??. Facts have TTL and touch mechanisms. For Robin, this maps onto: user's confirmed hardware, installed packages, distro version, stated preferences — stored deterministically, not inferred from conversation history each time.

Next steps

Define Robin's fact schema: hardware, distro, confirmed-working packages, user preferences
Design confidence tier system for Robin answers
Define compaction policy: what gets preserved vs. summarized as sessions grow
Add source + confidence_tier to Robin's response envelope in API design

## Reference project **GitHub:** https://github.com/BobbyLLM/llama-conductor **License:** AGPL-3.0 — do NOT incorporate code; patterns only ## Relevant patterns for Robin ### Provenance on every answer llama-conductor appends a footer to every response: `Confidence: <tier> | Source: <path>`. Sources are tiered: user-defined cheatsheets first, then wiki/docs, then web, then model weights as last resort. For Robin, answering a Linux migration question should always surface: - Which source answered it (local man pages, CF knowledge base, distro docs, web, or model inference) - Confidence tier (`authoritative`, `documented`, `inferred`) This is critical for a migration companion — a user following wrong advice about their system can cause real harm. Explicit sourcing lets them verify before acting. ### Explicit refusal when evidence is absent llama-conductor refuses to speculate when no grounding evidence exists. Robin must do the same: if a hardware compatibility question has no known answer in the local index, say so explicitly rather than generate a plausible-sounding but wrong answer. ### Context compaction ("Vodka CTC") Bounded prompt sizing with configurable compaction pressure to prevent memory bloat during long assistant sessions. Robin is designed for ongoing conversations across a user's migration journey — unmanaged context growth will degrade response quality and VRAM usage over time. Compaction with explicit preserved anchors (facts the user has confirmed) is the right pattern. ### Deterministic fact store (TTL-keyed, separate from model weights) Users can store facts with `!!` and retrieve with `??`. Facts have TTL and touch mechanisms. For Robin, this maps onto: user's confirmed hardware, installed packages, distro version, stated preferences — stored deterministically, not inferred from conversation history each time. ## Next steps - [ ] Define Robin's fact schema: hardware, distro, confirmed-working packages, user preferences - [ ] Design confidence tier system for Robin answers - [ ] Define compaction policy: what gets preserved vs. summarized as sessions grow - [ ] Add `source` + `confidence_tier` to Robin's response envelope in API design