text.md: add LLM inference service section with three-path decision table (GGUF/transformers/VLM mmproj/classifier), multimodal content- block API, mock mode, CF_TEXT_URL wiring. video.md: new file covering Marlin-2B service, server-local video_path callout, CUDA 13 nightly path, trust_remote_code note, MIT/BSL boundary (current wrapper is MIT; special sauce pipelines go in separate BSL module, not cf-core). mqtt.md: new file covering broker vs serial decision tree, MQTTClient usage, TopicRouter.matches() NotImplementedError with workaround, install extras.
6.2 KiB
text
Text processing utilities. Normalization, truncation, chunking, and token estimation — shared across all products that manipulate text before or after LLM inference.
from circuitforge_core.text import normalize, chunk, truncate, estimate_tokens
normalize(text: str) -> str
Strips excess whitespace, normalizes unicode (NFC), and removes null bytes and control characters that can cause downstream issues with SQLite FTS5 or LLM tokenizers.
from circuitforge_core.text import normalize
clean = normalize(" Hello\u00a0world\x00 ")
# → "Hello world"
truncate(text: str, max_tokens: int, model: str = "default") -> str
Truncates text to approximately max_tokens tokens, breaking at sentence or paragraph boundaries where possible. Uses a simple byte-based heuristic (1 token ≈ 4 bytes) unless a specific model tokenizer is requested.
excerpt = truncate(long_doc, max_tokens=2048)
chunk(text: str, chunk_size: int, overlap: int = 0) -> list[str]
Splits text into overlapping chunks for RAG (retrieval-augmented generation) pipelines. Respects paragraph boundaries.
chunks = chunk(article_text, chunk_size=512, overlap=64)
estimate_tokens(text: str, model: str = "default") -> int
Estimates token count without loading a full tokenizer. Accurate enough for context window budget planning (within ~10%).
FTS5 helpers
SQLite FTS5 has quirks with special characters in MATCH expressions. The text module provides helpers used by the recipe engine and other FTS5 consumers:
from circuitforge_core.text import fts_quote, strip_apostrophes
# Always double-quote FTS5 terms — bare tokens break on brand names
query = " ".join(fts_quote(term) for term in tokens)
# → '"chicken" "breast" "lemon"'
# Strip apostrophes before FTS5 queries
clean = strip_apostrophes("O'Doul's")
# → "ODoulS"
!!! warning "FTS5 gotcha"
Always quote ALL terms in MATCH expressions. Bare tokens break on brand names (e.g., O'Doul's), plant-based ingredient names, and anything with punctuation.
LLM inference service
circuitforge_core.text.app is a self-contained FastAPI inference server. It exposes a local LLM (or PII classifier) over HTTP so that products can call it via CF_TEXT_URL without bundling heavy ML dependencies themselves.
What are you running?
Three independent paths — pick one before installing:
| Path | Use case | Extra |
|---|---|---|
| LLM inference | Chat, completion, summarisation using a GGUF or HuggingFace model | text-llamacpp or text-transformers |
| VLM inference | Vision-language model that accepts images alongside text | text-llamacpp (GGUF with --mmproj) or text-transformers |
| Classifier / PII filter | NER-based PII detection and redaction | text-transformers |
LLM inference (GGUF via llama.cpp)
pip install "circuitforge-core[text-llamacpp]"
python -m circuitforge_core.text.app \
--model /path/to/model.gguf \
--port 8006 \
--gpu-id 0
4-bit quantisation (GGUF files ending in q4_k_m, q4_0, etc.) runs well on 6–8GB VRAM. Full-precision (f16) requires more.
Multi-GPU (splits across two GPUs via device_map=auto):
python -m circuitforge_core.text.app \
--model /path/to/large-model \
--port 8006 \
--gpu-ids 0,1
LLM inference (HuggingFace transformers)
pip install "circuitforge-core[text-transformers]"
# 4-bit quantisation (bitsandbytes):
pip install "circuitforge-core[text-transformers-4bit]"
python -m circuitforge_core.text.app \
--model /path/to/model-or-hf-repo \
--backend transformers \
--port 8006
VLM inference (GGUF with mmproj)
LLaVA-style models (LLaVA, BakLLaVA, llava-phi) require a separate projector file (--mmproj):
python -m circuitforge_core.text.app \
--model /path/to/llava-model.gguf \
--mmproj /path/to/mmproj.gguf \
--port 8006 \
--gpu-id 0
Embedded VLMs (Qwen2-VL, MiniCPM-V, Moondream) have the projector baked in — no --mmproj needed.
Sending images via the multimodal API:
POST /chat
{
"messages": [
{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,<b64>"}},
{"type": "text", "text": "What is in this document?"}
]
}
]
}
Sending an image to a text-only model returns HTTP 422.
Classifier / PII filter
pip install "circuitforge-core[text-transformers]"
python -m circuitforge_core.text.app \
--backend classifier \
--model dslim/bert-base-NER \
--port 8006
Recommended model for English PII detection: dslim/bert-base-NER. Substituting other HuggingFace NER models is supported.
Calling the filter endpoint:
POST /filter
{
"text": "Please contact John Smith at john@example.com.",
"mode": "redact"
}
Modes: redact (replace spans with [REDACTED]), detect (return boolean), spans (return span list with labels and confidence).
Mock mode (no model required)
CF_TEXT_MOCK=1 python -m circuitforge_core.text.app --port 8006
Returns deterministic canned responses for all endpoints. No GPU, no model download. Suitable for CI and integration testing.
Configuration
| Variable | Default | Description |
|---|---|---|
CF_TEXT_URL |
— | URL products use to reach cf-text (e.g. http://localhost:8006) |
CF_TEXT_MOCK |
— | Set to 1 to enable mock mode |
CLI flags: --model, --backend (llamacpp/transformers/classifier/mock), --port, --gpu-id, --gpu-ids, --mmproj.
API endpoints
| Endpoint | Backend | Description |
|---|---|---|
GET /health |
all | {"status":"ok","model":str,"backend":str,"vram_mb":int} |
POST /generate |
text-gen | Single prompt completion |
POST /chat |
text-gen | OpenAI-compatible chat (supports multimodal content blocks) |
POST /v1/chat/completions |
text-gen | OpenAI-compatible alias for /chat |
POST /filter |
classifier | PII detection and redaction |
Connecting from a product
CF_TEXT_URL=http://localhost:8006
Products using cf-core's LLM router pick this up automatically when the text backend is enabled in config/llm.yaml.