- video and mqtt rows added to module table
- text row clarified: utilities + LLM inference service with backend list
- Install section: video-service, mqtt, meshtastic-service, memory extras
- Version badge bumped to 0.21.0
- Tiers description: drop Ultra mention
text.md: add LLM inference service section with three-path decision
table (GGUF/transformers/VLM mmproj/classifier), multimodal content-
block API, mock mode, CF_TEXT_URL wiring.
video.md: new file covering Marlin-2B service, server-local video_path
callout, CUDA 13 nightly path, trust_remote_code note, MIT/BSL boundary
(current wrapper is MIT; special sauce pipelines go in separate BSL
module, not cf-core).
mqtt.md: new file covering broker vs serial decision tree, MQTTClient
usage, TopicRouter.matches() NotImplementedError with workaround, install
extras.
PIIFilter and ClassifierBackend are privacy infrastructure, not
commercial AI features. Gating privacy controls behind a commercial
licence contradicts CF's privacy pillar. MIT applies per the repo root
LICENSE.
Add twine upload step to release workflow so circuitforge-core lands on
both public PyPI and the Circuit-Forge Forgejo Packages index
(--extra-index-url for cf-orch installs). Reuses FORGEJO_PYPI_TOKEN for
the release creation step. Update installation.md to document editable
install pattern and optional extras.
- cloud_session: bypass IPs now honour valid JWT tokens so logged-in devs
land on their own account DB; invalid/expired JWTs soft-fail to guest
instead of hard 401 (public endpoints stay accessible with stale cookies)
- tasks/scheduler: log unhandled exceptions that escape run_task_fn to
prevent silent task stalls in the batch worker
- reranker: add module-level logger for structured log output
- text/transformers: use BitsAndBytesConfig for quantization (deprecated
load_in_4bit/load_in_8bit kwargs removed in transformers 4.40+)
- __init__: derive __version__ from installed package metadata so editable
installs always report the correct version string
Add circuitforge_core.memory module: MemoryClient wraps the mnemo HTTP sidecar
for entity / relation storage. All operations no-op gracefully when sidecar is
unavailable so products can import unconditionally. Adds optional [memory]
extras entry in pyproject.toml (mnemo-sdk>=0.1.0).
Add ClassifierBackend (NER/PII via transformers token-classification pipeline)
and TextFilter (redact / detect / spans modes). MockClassifierBackend provides
deterministic PII spans for tests and CI without GPU. Enables privacy-safe
pre-screening before LLM inference.
Missing from initial extras list — required by QwenVLVideoProcessor
at inference time. On CUDA 13 nodes must be installed from the PyTorch
nightly cu130 index to avoid a torch version downgrade:
pip install --index-url https://download.pytorch.org/whl/nightly/cu130 torch torchvision
Discovered during Muninn deployment (2026-05-26).
Humans own design, architecture, code review, testing, and
verification. LLMs are part of our development workflow.
Links to circuitforge.tech/positions for our full position.
CUDA defaults to FASTEST_FIRST device ordering, which does not match
nvidia-smi's PCI bus order on multi-GPU nodes. On Muninn, the RTX 3090
is cuda:0 and the Quadro RTX 4000 is cuda:1 — the opposite of nvidia-smi.
Two fixes:
1. Set CUDA_DEVICE_ORDER=PCI_BUS_ID so --gpu-id always matches nvidia-smi
and the muninn.yaml profile GPU index assignments.
2. Use direct assignment (os.environ[...] = ...) instead of setdefault —
setdefault silently no-ops if CUDA_VISIBLE_DEVICES is already present
in the environment (conda activation, prior run, system default).
Add the circuitforge_core.video package implementing the cf-video inference
service managed by cf-orch.
Service endpoints:
GET /health — liveness check; model name + VRAM
POST /caption — dense scene description + timestamped event list
POST /find — temporal grounding of a natural-language event query
Backend hierarchy:
VideoBackend (Protocol)
MarlinBackend — NemoStation/Marlin-2B via transformers>=5.7.0
MockVideoBackend — deterministic stub; no GPU required
Pydantic request/response models enforce parameter bounds at the API
boundary (max_new_tokens ge/le, event min_length=1). Span is serialized
as list[float] | None for JSON compatibility.
MarlinBackend loads eagerly in __init__ so cf-orch's 2-second liveness
poll catches load failures immediately. FORCE_QWENVL_VIDEO_READER env var
defaults to torchcodec (faster than av path) before transformers import.
pyproject.toml extras:
video-marlin — torch, transformers, torchcodec, qwen-vl-utils, av, Pillow
video-service — video-marlin + fastapi + uvicorn
Test coverage: 46 tests across test_mock_backend.py and test_app.py.
All passing without GPU or real video file.
Closes: #71
Adds three-layer dedup infrastructure for community recipe posts:
- Migration 006: similar_to_ref self-FK, title lower() index, recipe_id index
- CommunityPost.similar_to_ref optional field (frozen dataclass, defaults None)
- SharedStore.search_similar_posts(): title ILIKE + recipe_id match, ordered by relevance
- insert_post() wires similar_to_ref into the INSERT
- LLMRouter.__init__ now accepts a Path | dict; pagepiper ingest scripts
pass a runtime-constructed config dict instead of a temp file
- _check_ollama_model_pulled() preflight on embed(): checks /api/tags once
per backend URL and raises RuntimeError("...Fix: ollama pull <model>")
when the configured embedding model is not pulled; silently skips for
non-Ollama backends (vLLM, etc.) that don't expose /api/tags
- 6 new tests: dict init paths (x2) + preflight scenarios (x4)
- Existing embed tests updated to mock requests.get to avoid live Ollama calls
Adds embed(texts, model_override, fallback_order) to LLMRouter. Only
openai_compat backends are tried (Ollama/vLLM expose /v1/embeddings;
anthropic and vision_service do not). Uses embedding_model from backend
config when present, falls back to the chat model otherwise. Supports
cf-orch allocation and raises RuntimeError when all backends are exhausted.
4 tests added (TDD: RED → GREEN), 763 total passing, no regressions.
Implements the VectorStore ABC using sqlite-vec virtual tables.
Two-table design (vec0 virtual + companion meta) supports upsert,
top-k ANN query with optional metadata post-filter, delete by ID,
and bulk delete_where. Also renames VectorMatch.id → entry_id to
avoid shadowing the Python builtin, updating base.py and all tests.
Installed: sqlite-vec 0.1.9
Tests: 16 passed (7 base + 9 integration)
VectorMatch.entry_id renamed to VectorMatch.id to match the API contract
expected by downstream consumers (pagepiper T7). The dataclass remains frozen
to prevent field reassignment; metadata is kept as plain dict for JSON
deserialization compatibility.
- Renamed VectorMatch.entry_id field to id
- Updated all test references to use .id accessor
- Simplified metadata to plain dict (removed MappingProxyType wrapping)
- All 7 tests passing
- Add module-level guards for pytesseract and PIL.Image (enables patching in tests)
- Move `import io` from inside _ocr_page to module-level stdlib imports
- Extract _ensure_pil_image() helper with TypeError guard so isinstance check
does not blow up when Image is patched to a MagicMock in tests
- Add 3 new tests: pdfplumber=None ImportError, sparse-page OCR fallback,
OCR render failure returns empty chunk
- Coverage: 96% (up from 64%)
- Set points.flags.writeable = False in HandsDetector.detect() so in-place
mutation of HandLandmarks.points raises ValueError (frozen=True alone does not
protect numpy array contents)
- Extend test_handlandmarks_is_immutable to assert ValueError on array mutation
- Add test_camera.py with 3 tests covering is_open, frames() yield/break
behaviour, and context manager release (was at 0% coverage)
- Remove unused `import numpy as np` from camera.py; fix frames() return
annotation to Iterator (np.ndarray ref removed with the import)
Five backends: BGE (FlagEmbedding), Qwen3 (generative yes/no logit scorer,
batched forward pass), CrossEncoder (sentence-transformers, covers mxbai-rerank
/ ms-marco / jina), Cohere (BYOK cloud), Remote (HTTP delegate to cf-reranker
service). Mock adapter for tests. 54 tests.
cf-reranker FastAPI service app (port 8011) — cf-orch manages as a process,
defaults to Qwen3-Reranker-0.6B.
make_reranker() auto-detects CF_ORCH_URL and routes to cf-orch cf-reranker
when set — cloud apps (Kiwi, Peregrine, Snipe) get remote Qwen3 reranking
with zero code changes. Local dev falls back to local BGE.
pyproject extras: reranker-bge, reranker-qwen3, reranker-cross-encoder,
reranker-cohere, reranker-service.
Extracted from kiwi/avocet where it was duplicated. Reads llm.yaml via
the same path LLMRouter uses — products can now import detect_byok from
cf-core instead of maintaining their own copy.
Extracts the JWT validation + Heimdall tier resolution + guest session pattern
that was duplicated across kiwi and peregrine into a single reusable module.
CloudSessionFactory is parameterized by product name. Products instantiate it
once at module level and call .dependency() to get a FastAPI-compatible Depends()
function. .require_tier(min_tier) returns a dependency factory for gated routes.
CloudUser carries:
user_id — Directus UUID, "local" (self-hosted), "local-dev" (bypass), "anon-<uuid>"
tier — free | paid | premium | ultra | local
product — which CF product this session is for
has_byok — whether user has a configured LLM backend
meta — dict for product-specific extras (household_id, license_key, etc.)
Products can pass extra_meta= to attach product-specific fields without
subclassing. The module is FastAPI-only (fastapi is a lazy import so local-mode
products that never hit cloud paths don't pay the import cost).
- backends/ollama.py: routes requests to a running Ollama instance via HTTP API
- backends/vllm.py: routes requests to vllm's OpenAI-compatible API
(/v1/chat/completions); cf-text holds no GPU memory in proxy mode
- hardware/tiers.py: register cf-musicgen in 8GB, 16GB, and 32GB VRAM tiers
- tts/app.py: use inline type comment for _backend to avoid runtime global warning
- tts/backends/base.py: minor style cleanup
- create_app: add gpu_ids param; when set, exports CUDA_VISIBLE_DEVICES=<ids>
so HuggingFace Accelerate auto-shards across all listed devices
- CLI: add --gpu-ids arg (e.g. "0,1"); overrides --gpu-id when provided
- backends/base.py: propagate gpu_ids through TextBackend.generate
so backends can be aware of the visible device set
Single-GPU deployments are unaffected — --gpu-id=0 remains the default.
Adds community subcategory tagging for corpus recipes (kiwi#118).
Any product with a recipe corpus can use this to let users tag recipes
into browse taxonomy locations that FTS missed.
- 005_recipe_tags.sql: recipe_tags (per-recipe taxonomy tag with upvote
counter) + recipe_tag_votes (dedup table; submitter self-vote at insert)
- store.py: submit_recipe_tag(), upvote_recipe_tag(), get_recipe_tag_by_id(),
list_tags_for_recipe(), get_accepted_recipe_ids_for_subcategory()
Acceptance threshold: upvotes >= 2 (submitter counts as 1, one more needed).
Tags keyed as recipe_source='corpus' for future community-recipe extension.