pyr0ball 5eab4c43a4

CI / test (push) Waiting to run

Details

Mirror / mirror (push) Waiting to run

Details

docs: self-hoster service docs for text, video, and mqtt modules

text.md: add LLM inference service section with three-path decision
table (GGUF/transformers/VLM mmproj/classifier), multimodal content-
block API, mock mode, CF_TEXT_URL wiring.

video.md: new file covering Marlin-2B service, server-local video_path
callout, CUDA 13 nightly path, trust_remote_code note, MIT/BSL boundary
(current wrapper is MIT; special sauce pipelines go in separate BSL
module, not cf-core).

mqtt.md: new file covering broker vs serial decision tree, MQTTClient
usage, TopicRouter.matches() NotImplementedError with workaround, install
extras.

2026-06-05 11:59:48 -07:00

6.2 KiB

Raw Blame History

text

Text processing utilities. Normalization, truncation, chunking, and token estimation — shared across all products that manipulate text before or after LLM inference.

from circuitforge_core.text import normalize, chunk, truncate, estimate_tokens

`normalize(text: str) -> str`

Strips excess whitespace, normalizes unicode (NFC), and removes null bytes and control characters that can cause downstream issues with SQLite FTS5 or LLM tokenizers.

from circuitforge_core.text import normalize

clean = normalize("  Hello\u00a0world\x00  ")
# → "Hello world"

`truncate(text: str, max_tokens: int, model: str = "default") -> str`

Truncates text to approximately max_tokens tokens, breaking at sentence or paragraph boundaries where possible. Uses a simple byte-based heuristic (1 token ≈ 4 bytes) unless a specific model tokenizer is requested.

excerpt = truncate(long_doc, max_tokens=2048)

`chunk(text: str, chunk_size: int, overlap: int = 0) -> list[str]`

Splits text into overlapping chunks for RAG (retrieval-augmented generation) pipelines. Respects paragraph boundaries.

chunks = chunk(article_text, chunk_size=512, overlap=64)

`estimate_tokens(text: str, model: str = "default") -> int`

Estimates token count without loading a full tokenizer. Accurate enough for context window budget planning (within ~10%).

FTS5 helpers

SQLite FTS5 has quirks with special characters in MATCH expressions. The text module provides helpers used by the recipe engine and other FTS5 consumers:

from circuitforge_core.text import fts_quote, strip_apostrophes

# Always double-quote FTS5 terms — bare tokens break on brand names
query = " ".join(fts_quote(term) for term in tokens)
# → '"chicken" "breast" "lemon"'

# Strip apostrophes before FTS5 queries
clean = strip_apostrophes("O'Doul's")
# → "ODoulS"

!!! warning "FTS5 gotcha" Always quote ALL terms in MATCH expressions. Bare tokens break on brand names (e.g., O'Doul's), plant-based ingredient names, and anything with punctuation.

LLM inference service

circuitforge_core.text.app is a self-contained FastAPI inference server. It exposes a local LLM (or PII classifier) over HTTP so that products can call it via CF_TEXT_URL without bundling heavy ML dependencies themselves.

What are you running?

Three independent paths — pick one before installing:

Path	Use case	Extra
LLM inference	Chat, completion, summarisation using a GGUF or HuggingFace model	`text-llamacpp` or `text-transformers`
VLM inference	Vision-language model that accepts images alongside text	`text-llamacpp` (GGUF with `--mmproj`) or `text-transformers`
Classifier / PII filter	NER-based PII detection and redaction	`text-transformers`

LLM inference (GGUF via llama.cpp)

pip install "circuitforge-core[text-llamacpp]"

python -m circuitforge_core.text.app \
    --model /path/to/model.gguf \
    --port 8006 \
    --gpu-id 0

4-bit quantisation (GGUF files ending in q4_k_m, q4_0, etc.) runs well on 6–8GB VRAM. Full-precision (f16) requires more.

Multi-GPU (splits across two GPUs via device_map=auto):

python -m circuitforge_core.text.app \
    --model /path/to/large-model \
    --port 8006 \
    --gpu-ids 0,1

LLM inference (HuggingFace transformers)

pip install "circuitforge-core[text-transformers]"
# 4-bit quantisation (bitsandbytes):
pip install "circuitforge-core[text-transformers-4bit]"

python -m circuitforge_core.text.app \
    --model /path/to/model-or-hf-repo \
    --backend transformers \
    --port 8006

VLM inference (GGUF with mmproj)

LLaVA-style models (LLaVA, BakLLaVA, llava-phi) require a separate projector file (--mmproj):

python -m circuitforge_core.text.app \
    --model /path/to/llava-model.gguf \
    --mmproj /path/to/mmproj.gguf \
    --port 8006 \
    --gpu-id 0

Embedded VLMs (Qwen2-VL, MiniCPM-V, Moondream) have the projector baked in — no --mmproj needed.

Sending images via the multimodal API:

POST /chat
{
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,<b64>"}},
        {"type": "text", "text": "What is in this document?"}
      ]
    }
  ]
}

Sending an image to a text-only model returns HTTP 422.

Classifier / PII filter

pip install "circuitforge-core[text-transformers]"

python -m circuitforge_core.text.app \
    --backend classifier \
    --model dslim/bert-base-NER \
    --port 8006

Recommended model for English PII detection: dslim/bert-base-NER. Substituting other HuggingFace NER models is supported.

Calling the filter endpoint:

POST /filter
{
  "text": "Please contact John Smith at john@example.com.",
  "mode": "redact"
}

Modes: redact (replace spans with [REDACTED]), detect (return boolean), spans (return span list with labels and confidence).

Mock mode (no model required)

CF_TEXT_MOCK=1 python -m circuitforge_core.text.app --port 8006

Returns deterministic canned responses for all endpoints. No GPU, no model download. Suitable for CI and integration testing.

Configuration

Variable	Default	Description
`CF_TEXT_URL`	—	URL products use to reach cf-text (e.g. `http://localhost:8006`)
`CF_TEXT_MOCK`	—	Set to `1` to enable mock mode

CLI flags: --model, --backend (llamacpp/transformers/classifier/mock), --port, --gpu-id, --gpu-ids, --mmproj.

API endpoints

Endpoint	Backend	Description
`GET /health`	all	`{"status":"ok","model":str,"backend":str,"vram_mb":int}`
`POST /generate`	text-gen	Single prompt completion
`POST /chat`	text-gen	OpenAI-compatible chat (supports multimodal content blocks)
`POST /v1/chat/completions`	text-gen	OpenAI-compatible alias for `/chat`
`POST /filter`	classifier	PII detection and redaction

Connecting from a product

CF_TEXT_URL=http://localhost:8006

Products using cf-core's LLM router pick this up automatically when the text backend is enabled in config/llm.yaml.

6.2 KiB Raw Blame History Unescape Escape