text.md: add LLM inference service section with three-path decision table (GGUF/transformers/VLM mmproj/classifier), multimodal content- block API, mock mode, CF_TEXT_URL wiring. video.md: new file covering Marlin-2B service, server-local video_path callout, CUDA 13 nightly path, trust_remote_code note, MIT/BSL boundary (current wrapper is MIT; special sauce pipelines go in separate BSL module, not cf-core). mqtt.md: new file covering broker vs serial decision tree, MQTTClient usage, TopicRouter.matches() NotImplementedError with workaround, install extras.
5.2 KiB
circuitforge_core.video
Video captioning and temporal grounding service using Marlin-2B (Apache 2.0).
What it does
- Caption: Produces a scene summary and a timestamped list of detected events for a video file.
- Find: Grounds a natural-language event description to a time span within the video.
Prerequisites
Hardware
| GPU VRAM | Result |
|---|---|
| 16GB+ | Recommended for full-precision inference |
| 12GB | Minimum for most videos |
| Under 12GB | OOM likely on longer clips |
CPU mode is not supported — Marlin-2B requires a CUDA-capable GPU.
CUDA version
nvidia-smi | grep "CUDA Version"
| CUDA version | Install path |
|---|---|
| 12.x or earlier | Standard install — see below |
| 13.x (RTX 50-series / Blackwell) | PyTorch nightly required — see below |
Security note
Marlin-2B requires trust_remote_code=True. Review the model's modeling_marlin.py on HuggingFace before deploying on a production node. The model is Apache 2.0 and the source is auditable at huggingface.co/NemoStation/Marlin-2B.
Install
Standard (CUDA 12.x):
pip install "circuitforge-core[video-service]"
CUDA 13.x (RTX 50-series / Blackwell) — PyTorch nightly required:
pip install --index-url https://download.pytorch.org/whl/nightly/cu130 torch torchvision
pip install "circuitforge-core[video-service]" --no-deps
pip install transformers>=5.7.0 torchcodec "qwen-vl-utils>=0.0.14" av Pillow accelerate fastapi "uvicorn[standard]"
Running the service
Download the model to a local path first (one-time, approximately 4–6 GB):
huggingface-cli download NemoStation/Marlin-2B --local-dir /path/to/models/Marlin-2B
Start the service:
CUDA_DEVICE_ORDER=PCI_BUS_ID python -m circuitforge_core.video.app \
--model /path/to/models/Marlin-2B \
--port 8016 \
--gpu-id 0
The service blocks at startup until the model is loaded, then prints ready status. Confirm:
curl http://localhost:8016/health
# {"status": "ok", "model": "/path/to/models/Marlin-2B", "vram_mb": ...}
Point products at the service with:
CF_VIDEO_URL=http://localhost:8016
API reference
GET /health
Returns 200 when model is loaded. vram_mb is the GPU memory in use.
{"status": "ok", "model": "/models/Marlin-2B", "vram_mb": 4200}
POST /caption
Generate a scene summary and timestamped events for a video.
Important:
video_pathmust be an absolute path on the machine running cf-video — not the calling machine. If cf-video runs in Docker, mount your video directory into the container and use the container-side path.
Request:
{
"video_path": "/absolute/path/to/video.mp4",
"max_new_tokens": 2048
}
Response:
{
"scene": "A kitchen scene where someone prepares pasta.",
"events": [
{"start": 0.0, "end": 4.5, "description": "Filling pot with water"},
{"start": 4.5, "end": 12.0, "description": "Boiling water on stovetop"}
],
"caption": "Kitchen cooking scene with pasta preparation steps.",
"model": "/models/Marlin-2B"
}
POST /find
Ground a natural-language event description to a time span.
Request:
{
"video_path": "/absolute/path/to/video.mp4",
"event": "person adds salt to the water",
"max_new_tokens": 256
}
Response:
{
"span": [8.2, 10.6],
"format_ok": true,
"raw": "[8.2, 10.6]",
"model": "/models/Marlin-2B"
}
span is null when the model cannot ground the event in the video. format_ok indicates whether the model produced a parseable time range.
Docker Compose setup
# compose.yml excerpt
services:
cf-video:
image: ghcr.io/circuit-forge/cf-video:latest # or build locally
network_mode: host
environment:
CF_VIDEO_MODEL: /models/Marlin-2B
CF_VIDEO_PORT: "8016"
volumes:
- /path/to/models/Marlin-2B:/models/Marlin-2B:ro
- /path/to/your/videos:/videos:ro # mount video storage
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
restart: unless-stopped
Pass video paths relative to the container mount:
{"video_path": "/videos/my-video.mp4", "event": "person enters room"}
Troubleshooting
CUDA out of memory
Marlin-2B requires 12GB+ VRAM. No CPU fallback is available.
No such file or directory: /home/user/video.mp4
video_path is resolved on the server, not the client. If cf-video runs in Docker, you must mount the directory containing the video into the container and use the container-side path.
CUDA version mismatch RTX 50-series (Blackwell) cards use CUDA 13. Standard PyTorch stable does not support CUDA 13 — install PyTorch nightly as described in Prerequisites.
trust_remote_code errors
Make sure transformers >= 5.7.0 is installed. Older versions do not support the Marlin architecture registration.
License
- cf-video service code: MIT — CircuitForge LLC
- Marlin-2B model: Apache 2.0 — NemoStation