# circuitforge_core.video

Video captioning and temporal grounding service using [Marlin-2B](https://huggingface.co/NemoStation/Marlin-2B) (Apache 2.0).

## What it does

- **Caption:** Produces a scene summary and a timestamped list of detected events for a video file.
- **Find:** Grounds a natural-language event description to a time span within the video.

## Prerequisites

### Hardware

| GPU VRAM | Result |
|----------|--------|
| 16GB+ | Recommended for full-precision inference |
| 12GB | Minimum for most videos |
| Under 12GB | OOM likely on longer clips |

CPU mode is not supported — Marlin-2B requires a CUDA-capable GPU.

### CUDA version

```bash
nvidia-smi | grep "CUDA Version"
```

| CUDA version | Install path |
|---|---|
| 12.x or earlier | Standard install — see below |
| 13.x (RTX 50-series / Blackwell) | PyTorch nightly required — see below |

### Security note

Marlin-2B requires `trust_remote_code=True`. Review the model's `modeling_marlin.py` on HuggingFace before deploying on a production node. The model is Apache 2.0 and the source is auditable at [huggingface.co/NemoStation/Marlin-2B](https://huggingface.co/NemoStation/Marlin-2B).

---

## Install

Standard (CUDA 12.x):

```bash
pip install "circuitforge-core[video-service]"
```

CUDA 13.x (RTX 50-series / Blackwell) — PyTorch nightly required:

```bash
pip install --index-url https://download.pytorch.org/whl/nightly/cu130 torch torchvision
pip install "circuitforge-core[video-service]" --no-deps
pip install transformers>=5.7.0 torchcodec "qwen-vl-utils>=0.0.14" av Pillow accelerate fastapi "uvicorn[standard]"
```

---

## Running the service

Download the model to a local path first (one-time, approximately 4–6 GB):

```bash
huggingface-cli download NemoStation/Marlin-2B --local-dir /path/to/models/Marlin-2B
```

Start the service:

```bash
CUDA_DEVICE_ORDER=PCI_BUS_ID python -m circuitforge_core.video.app \
    --model /path/to/models/Marlin-2B \
    --port 8016 \
    --gpu-id 0
```

The service blocks at startup until the model is loaded, then prints ready status. Confirm:

```bash
curl http://localhost:8016/health
# {"status": "ok", "model": "/path/to/models/Marlin-2B", "vram_mb": ...}
```

Point products at the service with:

```bash
CF_VIDEO_URL=http://localhost:8016
```

---

## API reference

### `GET /health`

Returns 200 when model is loaded. `vram_mb` is the GPU memory in use.

```json
{"status": "ok", "model": "/models/Marlin-2B", "vram_mb": 4200}
```

### `POST /caption`

Generate a scene summary and timestamped events for a video.

> **Important:** `video_path` must be an absolute path on the machine running cf-video — not the calling machine. If cf-video runs in Docker, mount your video directory into the container and use the container-side path.

**Request:**

```json
{
  "video_path": "/absolute/path/to/video.mp4",
  "max_new_tokens": 2048
}
```

**Response:**

```json
{
  "scene": "A kitchen scene where someone prepares pasta.",
  "events": [
    {"start": 0.0, "end": 4.5, "description": "Filling pot with water"},
    {"start": 4.5, "end": 12.0, "description": "Boiling water on stovetop"}
  ],
  "caption": "Kitchen cooking scene with pasta preparation steps.",
  "model": "/models/Marlin-2B"
}
```

### `POST /find`

Ground a natural-language event description to a time span.

**Request:**

```json
{
  "video_path": "/absolute/path/to/video.mp4",
  "event": "person adds salt to the water",
  "max_new_tokens": 256
}
```

**Response:**

```json
{
  "span": [8.2, 10.6],
  "format_ok": true,
  "raw": "[8.2, 10.6]",
  "model": "/models/Marlin-2B"
}
```

`span` is `null` when the model cannot ground the event in the video. `format_ok` indicates whether the model produced a parseable time range.

---

## Docker Compose setup

```yaml
# compose.yml excerpt
services:
  cf-video:
    image: ghcr.io/circuit-forge/cf-video:latest   # or build locally
    network_mode: host
    environment:
      CF_VIDEO_MODEL: /models/Marlin-2B
      CF_VIDEO_PORT: "8016"
    volumes:
      - /path/to/models/Marlin-2B:/models/Marlin-2B:ro
      - /path/to/your/videos:/videos:ro              # mount video storage
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    restart: unless-stopped
```

Pass video paths relative to the container mount:

```json
{"video_path": "/videos/my-video.mp4", "event": "person enters room"}
```

---

## Troubleshooting

**`CUDA out of memory`**
Marlin-2B requires 12GB+ VRAM. No CPU fallback is available.

**`No such file or directory: /home/user/video.mp4`**
`video_path` is resolved on the server, not the client. If cf-video runs in Docker, you must mount the directory containing the video into the container and use the container-side path.

**CUDA version mismatch**
RTX 50-series (Blackwell) cards use CUDA 13. Standard PyTorch stable does not support CUDA 13 — install PyTorch nightly as described in Prerequisites.

**`trust_remote_code` errors**
Make sure `transformers >= 5.7.0` is installed. Older versions do not support the Marlin architecture registration.

---

## License

- cf-video service code: MIT — CircuitForge LLC
- Marlin-2B model: [Apache 2.0](https://huggingface.co/NemoStation/Marlin-2B) — NemoStation