pyr0ball 5eab4c43a4

CI / test (push) Waiting to run

Details

Mirror / mirror (push) Waiting to run

Details

docs: self-hoster service docs for text, video, and mqtt modules

text.md: add LLM inference service section with three-path decision
table (GGUF/transformers/VLM mmproj/classifier), multimodal content-
block API, mock mode, CF_TEXT_URL wiring.

video.md: new file covering Marlin-2B service, server-local video_path
callout, CUDA 13 nightly path, trust_remote_code note, MIT/BSL boundary
(current wrapper is MIT; special sauce pipelines go in separate BSL
module, not cf-core).

mqtt.md: new file covering broker vs serial decision tree, MQTTClient
usage, TopicRouter.matches() NotImplementedError with workaround, install
extras.

2026-06-05 11:59:48 -07:00

5.2 KiB

Raw Blame History

circuitforge_core.video

Video captioning and temporal grounding service using Marlin-2B (Apache 2.0).

What it does

Caption: Produces a scene summary and a timestamped list of detected events for a video file.
Find: Grounds a natural-language event description to a time span within the video.

Prerequisites

Hardware

GPU VRAM	Result
16GB+	Recommended for full-precision inference
12GB	Minimum for most videos
Under 12GB	OOM likely on longer clips

CPU mode is not supported — Marlin-2B requires a CUDA-capable GPU.

CUDA version

nvidia-smi | grep "CUDA Version"

CUDA version	Install path
12.x or earlier	Standard install — see below
13.x (RTX 50-series / Blackwell)	PyTorch nightly required — see below

Security note

Marlin-2B requires trust_remote_code=True. Review the model's modeling_marlin.py on HuggingFace before deploying on a production node. The model is Apache 2.0 and the source is auditable at huggingface.co/NemoStation/Marlin-2B.

Install

Standard (CUDA 12.x):

pip install "circuitforge-core[video-service]"

CUDA 13.x (RTX 50-series / Blackwell) — PyTorch nightly required:

pip install --index-url https://download.pytorch.org/whl/nightly/cu130 torch torchvision
pip install "circuitforge-core[video-service]" --no-deps
pip install transformers>=5.7.0 torchcodec "qwen-vl-utils>=0.0.14" av Pillow accelerate fastapi "uvicorn[standard]"

Running the service

Download the model to a local path first (one-time, approximately 4–6 GB):

huggingface-cli download NemoStation/Marlin-2B --local-dir /path/to/models/Marlin-2B

Start the service:

CUDA_DEVICE_ORDER=PCI_BUS_ID python -m circuitforge_core.video.app \
    --model /path/to/models/Marlin-2B \
    --port 8016 \
    --gpu-id 0

The service blocks at startup until the model is loaded, then prints ready status. Confirm:

curl http://localhost:8016/health
# {"status": "ok", "model": "/path/to/models/Marlin-2B", "vram_mb": ...}

Point products at the service with:

CF_VIDEO_URL=http://localhost:8016

API reference

`GET /health`

Returns 200 when model is loaded. vram_mb is the GPU memory in use.

{"status": "ok", "model": "/models/Marlin-2B", "vram_mb": 4200}

`POST /caption`

Generate a scene summary and timestamped events for a video.

Important: video_path must be an absolute path on the machine running cf-video — not the calling machine. If cf-video runs in Docker, mount your video directory into the container and use the container-side path.

Request:

{
  "video_path": "/absolute/path/to/video.mp4",
  "max_new_tokens": 2048
}

Response:

{
  "scene": "A kitchen scene where someone prepares pasta.",
  "events": [
    {"start": 0.0, "end": 4.5, "description": "Filling pot with water"},
    {"start": 4.5, "end": 12.0, "description": "Boiling water on stovetop"}
  ],
  "caption": "Kitchen cooking scene with pasta preparation steps.",
  "model": "/models/Marlin-2B"
}

`POST /find`

Ground a natural-language event description to a time span.

Request:

{
  "video_path": "/absolute/path/to/video.mp4",
  "event": "person adds salt to the water",
  "max_new_tokens": 256
}

Response:

{
  "span": [8.2, 10.6],
  "format_ok": true,
  "raw": "[8.2, 10.6]",
  "model": "/models/Marlin-2B"
}

span is null when the model cannot ground the event in the video. format_ok indicates whether the model produced a parseable time range.

Docker Compose setup

# compose.yml excerpt
services:
  cf-video:
    image: ghcr.io/circuit-forge/cf-video:latest   # or build locally
    network_mode: host
    environment:
      CF_VIDEO_MODEL: /models/Marlin-2B
      CF_VIDEO_PORT: "8016"
    volumes:
      - /path/to/models/Marlin-2B:/models/Marlin-2B:ro
      - /path/to/your/videos:/videos:ro              # mount video storage
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    restart: unless-stopped

Pass video paths relative to the container mount:

{"video_path": "/videos/my-video.mp4", "event": "person enters room"}

Troubleshooting

CUDA out of memory Marlin-2B requires 12GB+ VRAM. No CPU fallback is available.

No such file or directory: /home/user/video.mp4 video_path is resolved on the server, not the client. If cf-video runs in Docker, you must mount the directory containing the video into the container and use the container-side path.

CUDA version mismatch RTX 50-series (Blackwell) cards use CUDA 13. Standard PyTorch stable does not support CUDA 13 — install PyTorch nightly as described in Prerequisites.

trust_remote_code errors Make sure transformers >= 5.7.0 is installed. Older versions do not support the Marlin architecture registration.

License

cf-video service code: MIT — CircuitForge LLC
Marlin-2B model: Apache 2.0 — NemoStation

5.2 KiB Raw Blame History Unescape Escape