circuitforge-core/docs/modules/text.md
pyr0ball 5eab4c43a4
Some checks are pending
CI / test (push) Waiting to run
Mirror / mirror (push) Waiting to run
docs: self-hoster service docs for text, video, and mqtt modules
text.md: add LLM inference service section with three-path decision
table (GGUF/transformers/VLM mmproj/classifier), multimodal content-
block API, mock mode, CF_TEXT_URL wiring.

video.md: new file covering Marlin-2B service, server-local video_path
callout, CUDA 13 nightly path, trust_remote_code note, MIT/BSL boundary
(current wrapper is MIT; special sauce pipelines go in separate BSL
module, not cf-core).

mqtt.md: new file covering broker vs serial decision tree, MQTTClient
usage, TopicRouter.matches() NotImplementedError with workaround, install
extras.
2026-06-05 11:59:48 -07:00

6.2 KiB
Raw Blame History

text

Text processing utilities. Normalization, truncation, chunking, and token estimation — shared across all products that manipulate text before or after LLM inference.

from circuitforge_core.text import normalize, chunk, truncate, estimate_tokens

normalize(text: str) -> str

Strips excess whitespace, normalizes unicode (NFC), and removes null bytes and control characters that can cause downstream issues with SQLite FTS5 or LLM tokenizers.

from circuitforge_core.text import normalize

clean = normalize("  Hello\u00a0world\x00  ")
# → "Hello world"

truncate(text: str, max_tokens: int, model: str = "default") -> str

Truncates text to approximately max_tokens tokens, breaking at sentence or paragraph boundaries where possible. Uses a simple byte-based heuristic (1 token ≈ 4 bytes) unless a specific model tokenizer is requested.

excerpt = truncate(long_doc, max_tokens=2048)

chunk(text: str, chunk_size: int, overlap: int = 0) -> list[str]

Splits text into overlapping chunks for RAG (retrieval-augmented generation) pipelines. Respects paragraph boundaries.

chunks = chunk(article_text, chunk_size=512, overlap=64)

estimate_tokens(text: str, model: str = "default") -> int

Estimates token count without loading a full tokenizer. Accurate enough for context window budget planning (within ~10%).

FTS5 helpers

SQLite FTS5 has quirks with special characters in MATCH expressions. The text module provides helpers used by the recipe engine and other FTS5 consumers:

from circuitforge_core.text import fts_quote, strip_apostrophes

# Always double-quote FTS5 terms — bare tokens break on brand names
query = " ".join(fts_quote(term) for term in tokens)
# → '"chicken" "breast" "lemon"'

# Strip apostrophes before FTS5 queries
clean = strip_apostrophes("O'Doul's")
# → "ODoulS"

!!! warning "FTS5 gotcha" Always quote ALL terms in MATCH expressions. Bare tokens break on brand names (e.g., O'Doul's), plant-based ingredient names, and anything with punctuation.


LLM inference service

circuitforge_core.text.app is a self-contained FastAPI inference server. It exposes a local LLM (or PII classifier) over HTTP so that products can call it via CF_TEXT_URL without bundling heavy ML dependencies themselves.

What are you running?

Three independent paths — pick one before installing:

Path Use case Extra
LLM inference Chat, completion, summarisation using a GGUF or HuggingFace model text-llamacpp or text-transformers
VLM inference Vision-language model that accepts images alongside text text-llamacpp (GGUF with --mmproj) or text-transformers
Classifier / PII filter NER-based PII detection and redaction text-transformers

LLM inference (GGUF via llama.cpp)

pip install "circuitforge-core[text-llamacpp]"
python -m circuitforge_core.text.app \
    --model /path/to/model.gguf \
    --port 8006 \
    --gpu-id 0

4-bit quantisation (GGUF files ending in q4_k_m, q4_0, etc.) runs well on 68GB VRAM. Full-precision (f16) requires more.

Multi-GPU (splits across two GPUs via device_map=auto):

python -m circuitforge_core.text.app \
    --model /path/to/large-model \
    --port 8006 \
    --gpu-ids 0,1

LLM inference (HuggingFace transformers)

pip install "circuitforge-core[text-transformers]"
# 4-bit quantisation (bitsandbytes):
pip install "circuitforge-core[text-transformers-4bit]"
python -m circuitforge_core.text.app \
    --model /path/to/model-or-hf-repo \
    --backend transformers \
    --port 8006

VLM inference (GGUF with mmproj)

LLaVA-style models (LLaVA, BakLLaVA, llava-phi) require a separate projector file (--mmproj):

python -m circuitforge_core.text.app \
    --model /path/to/llava-model.gguf \
    --mmproj /path/to/mmproj.gguf \
    --port 8006 \
    --gpu-id 0

Embedded VLMs (Qwen2-VL, MiniCPM-V, Moondream) have the projector baked in — no --mmproj needed.

Sending images via the multimodal API:

POST /chat
{
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,<b64>"}},
        {"type": "text", "text": "What is in this document?"}
      ]
    }
  ]
}

Sending an image to a text-only model returns HTTP 422.


Classifier / PII filter

pip install "circuitforge-core[text-transformers]"
python -m circuitforge_core.text.app \
    --backend classifier \
    --model dslim/bert-base-NER \
    --port 8006

Recommended model for English PII detection: dslim/bert-base-NER. Substituting other HuggingFace NER models is supported.

Calling the filter endpoint:

POST /filter
{
  "text": "Please contact John Smith at john@example.com.",
  "mode": "redact"
}

Modes: redact (replace spans with [REDACTED]), detect (return boolean), spans (return span list with labels and confidence).


Mock mode (no model required)

CF_TEXT_MOCK=1 python -m circuitforge_core.text.app --port 8006

Returns deterministic canned responses for all endpoints. No GPU, no model download. Suitable for CI and integration testing.


Configuration

Variable Default Description
CF_TEXT_URL URL products use to reach cf-text (e.g. http://localhost:8006)
CF_TEXT_MOCK Set to 1 to enable mock mode

CLI flags: --model, --backend (llamacpp/transformers/classifier/mock), --port, --gpu-id, --gpu-ids, --mmproj.


API endpoints

Endpoint Backend Description
GET /health all {"status":"ok","model":str,"backend":str,"vram_mb":int}
POST /generate text-gen Single prompt completion
POST /chat text-gen OpenAI-compatible chat (supports multimodal content blocks)
POST /v1/chat/completions text-gen OpenAI-compatible alias for /chat
POST /filter classifier PII detection and redaction

Connecting from a product

CF_TEXT_URL=http://localhost:8006

Products using cf-core's LLM router pick this up automatically when the text backend is enabled in config/llm.yaml.