Add circuitforge_core.memory module: MemoryClient wraps the mnemo HTTP sidecar for entity / relation storage. All operations no-op gracefully when sidecar is unavailable so products can import unconditionally. Adds optional [memory] extras entry in pyproject.toml (mnemo-sdk>=0.1.0).
4.2 KiB
circuitforge_core.memory
Persistent knowledge graph for CF products, backed by the mnemo sidecar.
What it does
mnemo runs as a sidecar process alongside a product's FastAPI backend. It:
- Extracts named entities and relationships from text you feed it
- Persists them in a local SQLite database with WAL mode
- Returns a formatted context block for prompt injection in under 5ms
cf_core.memory wraps mnemo's Python SDK with CF-standard config,
graceful degradation (no-ops when the sidecar is absent), and
exponential backoff with automatic reconnect after transient failures.
Install
pip install circuitforge-core[memory]
Docker Compose setup
Add the mnemo service to your product's compose.yml alongside ollama.
Peregrine is the reference implementation — copy the block from
peregrine/compose.yml:
services:
mnemo:
image: ghcr.io/zaydmulani09/mnemo:latest
ports:
- "${MNEMO_PORT:-8080}:8080"
volumes:
- mnemo-data:/data
environment:
- MNEMO_DB_PATH=/data/mnemo.db
- MNEMO_LLM_PROVIDER=${MNEMO_LLM_PROVIDER:-ollama}
- MNEMO_LLM_BASE_URL=${MNEMO_LLM_BASE_URL:-http://ollama:11434/v1}
- MNEMO_LLM_API_KEY=${MNEMO_LLM_API_KEY:-ollama}
- MNEMO_LLM_MODEL=${MNEMO_LLM_MODEL:-llama3.2:3b}
depends_on:
- ollama
healthcheck:
test: ["CMD", "wget", "-q", "--spider", "http://localhost:8080/health"]
interval: 15s
timeout: 5s
retries: 3
profiles: [memory]
restart: unless-stopped
volumes:
mnemo-data:
Add these to the product's api service environment:
environment:
- MNEMO_HOST=${MNEMO_HOST:-mnemo}
- MNEMO_PORT=${MNEMO_PORT:-8080}
Launch with:
docker compose --profile memory --profile cpu up -d
# or alongside a GPU profile:
docker compose --profile memory --profile single-gpu up -d
Environment variables
| Variable | Default | Description |
|---|---|---|
MNEMO_HOST |
localhost |
Sidecar hostname (use mnemo in Docker) |
MNEMO_PORT |
8080 |
Sidecar port |
MNEMO_TIMEOUT |
10.0 |
HTTP timeout in seconds |
The sidecar itself is configured via MNEMO_LLM_* env vars (see compose block above).
FastAPI integration
from contextlib import asynccontextmanager
from fastapi import FastAPI
from circuitforge_core.memory import MemoryClient, MemoryConfig
memory = MemoryClient(MemoryConfig.from_env())
@asynccontextmanager
async def lifespan(app: FastAPI):
await memory.connect() # no-op + warning if sidecar absent
yield
await memory.close()
app = FastAPI(lifespan=lifespan)
API
# Store a text fragment (conversation turn, fact, user preference, etc.)
await memory.remember("User avoids shellfish and prefers dark mode", source="settings")
# Retrieve a prompt-ready context block
context = await memory.recall("What are this user's dietary restrictions?")
system_prompt = f"You are a helpful assistant.\n\n{context}"
# List extracted entities
entities = await memory.entities(limit=20)
# Stats snapshot
stats = await memory.stats() # MemoryStats | None
# Wipe everything (irreversible)
await memory.wipe()
All methods return empty values (False, "", [], None) when the
sidecar is not available — no try/except needed in product code.
Resilience model
| Event | Behaviour |
|---|---|
| Sidecar absent at startup | connect() logs once, enters no-op mode |
| First call failure | Warning logged, 5s backoff scheduled |
| Nth consecutive failure | Backoff doubles each time (5→10→20→40→60s cap) |
After _MAX_FAILURES (3) |
Client marked unavailable; all calls no-op |
| Cooldown elapses | Next call silently attempts reconnect |
| Successful call | Failure counter and retry timer reset |
strict=True |
MemoryUnavailableError raised instead of no-op |
Chunking note
mnemo stores each remember() call as a single chunk — it does not
automatically split large texts. For best retrieval quality, chunk on the
caller side before ingesting:
# Good: one turn per ingest call
for turn in conversation_turns:
await memory.remember(turn, source="chat", session_id=session_id)
# Avoid: one giant blob
await memory.remember(entire_conversation_as_one_string)