peregrine/docs/plans/2026-02-26-dual-gpu-plan.md

# Dual-GPU / Dual-Inference Implementation Plan

> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

**Goal:** Add `DUAL_GPU_MODE=ollama|vllm|mixed` env var that gates which inference service occupies GPU 1 on dual-GPU systems, plus a first-run download size warning in preflight.

**Architecture:** Sub-profiles (`dual-gpu-ollama`, `dual-gpu-vllm`, `dual-gpu-mixed`) are injected alongside `--profile dual-gpu` by the Makefile based on `DUAL_GPU_MODE`. The LLM router requires zero changes — `_is_reachable()` naturally skips backends that aren't running. Preflight gains `ollama_research` as a tracked service and emits a size warning block.

**Tech Stack:** Docker Compose profiles, Python (preflight.py), YAML (llm.yaml, compose files), bash (Makefile, manage.sh)

**Design doc:** `docs/plans/2026-02-26-dual-gpu-design.md`

**Test runner:** `conda run -n job-seeker python -m pytest tests/ -v`

---

### Task 1: Update `config/llm.yaml`

**Files:**
- Modify: `config/llm.yaml`

**Step 1: Add `vllm_research` backend and update `research_fallback_order`**

Open `config/llm.yaml`. After the `vllm:` block, add:

```yaml
  vllm_research:
    api_key: ''
    base_url: http://host.docker.internal:8000/v1
    enabled: true
    model: __auto__
    supports_images: false
    type: openai_compat
```

Replace `research_fallback_order:` section with:

```yaml
research_fallback_order:
- claude_code
- vllm_research
- ollama_research
- github_copilot
- anthropic
```

**Step 2: Verify YAML parses cleanly**

```bash
conda run -n job-seeker python -c "import yaml; yaml.safe_load(open('config/llm.yaml'))"
```

Expected: no output (no error).

**Step 3: Run existing llm config test**

```bash
conda run -n job-seeker python -m pytest tests/test_llm_router.py::test_config_loads -v
```

Expected: PASS

**Step 4: Commit**

```bash
git add config/llm.yaml
git commit -m "feat: add vllm_research backend and update research_fallback_order"
```

---

### Task 2: Write failing tests for preflight changes

**Files:**
- Create: `tests/test_preflight.py`

No existing test file for preflight. Write all tests upfront — they fail until Task 3–5 implement the code.

**Step 1: Create `tests/test_preflight.py`**

```python
"""Tests for scripts/preflight.py additions: dual-GPU service table, size warning, VRAM check."""
import pytest
from pathlib import Path
from unittest.mock import patch
import yaml
import tempfile
import os


# ── Service table ──────────────────────────────────────────────────────────────

def test_ollama_research_in_services():
    """ollama_research must be in _SERVICES at port 11435."""
    from scripts.preflight import _SERVICES
    assert "ollama_research" in _SERVICES
    _, default_port, env_var, docker_owned, adoptable = _SERVICES["ollama_research"]
    assert default_port == 11435
    assert env_var == "OLLAMA_RESEARCH_PORT"
    assert docker_owned is True
    assert adoptable is True


def test_ollama_research_in_llm_backends():
    """ollama_research must be a standalone key in _LLM_BACKENDS (not nested under ollama)."""
    from scripts.preflight import _LLM_BACKENDS
    assert "ollama_research" in _LLM_BACKENDS
    # Should map to the ollama_research llm backend
    backend_names = [name for name, _ in _LLM_BACKENDS["ollama_research"]]
    assert "ollama_research" in backend_names


def test_vllm_research_in_llm_backends():
    """vllm_research must be registered under vllm in _LLM_BACKENDS."""
    from scripts.preflight import _LLM_BACKENDS
    assert "vllm" in _LLM_BACKENDS
    backend_names = [name for name, _ in _LLM_BACKENDS["vllm"]]
    assert "vllm_research" in backend_names


def test_ollama_research_in_docker_internal():
    """ollama_research must map to internal port 11434 (Ollama's container port)."""
    from scripts.preflight import _DOCKER_INTERNAL
    assert "ollama_research" in _DOCKER_INTERNAL
    hostname, port = _DOCKER_INTERNAL["ollama_research"]
    assert hostname == "ollama_research"
    assert port == 11434  # container-internal port is always 11434


def test_ollama_not_mapped_to_ollama_research_backend():
    """ollama service key must only update the ollama llm backend, not ollama_research."""
    from scripts.preflight import _LLM_BACKENDS
    ollama_backend_names = [name for name, _ in _LLM_BACKENDS.get("ollama", [])]
    assert "ollama_research" not in ollama_backend_names


# ── Download size warning ──────────────────────────────────────────────────────

def test_download_size_remote_profile():
    """Remote profile: only searxng + app, no ollama, no vision, no vllm."""
    from scripts.preflight import _download_size_mb
    sizes = _download_size_mb("remote", "ollama")
    assert "searxng" in sizes
    assert "app" in sizes
    assert "ollama" not in sizes
    assert "vision_image" not in sizes
    assert "vllm_image" not in sizes


def test_download_size_cpu_profile():
    """CPU profile: adds ollama image + llama3.2:3b weights."""
    from scripts.preflight import _download_size_mb
    sizes = _download_size_mb("cpu", "ollama")
    assert "ollama" in sizes
    assert "llama3_2_3b" in sizes
    assert "vision_image" not in sizes


def test_download_size_single_gpu_profile():
    """Single-GPU: adds vision image + moondream2 weights."""
    from scripts.preflight import _download_size_mb
    sizes = _download_size_mb("single-gpu", "ollama")
    assert "vision_image" in sizes
    assert "moondream2" in sizes
    assert "vllm_image" not in sizes


def test_download_size_dual_gpu_ollama_mode():
    """dual-gpu + ollama mode: no vllm image."""
    from scripts.preflight import _download_size_mb
    sizes = _download_size_mb("dual-gpu", "ollama")
    assert "vllm_image" not in sizes


def test_download_size_dual_gpu_vllm_mode():
    """dual-gpu + vllm mode: adds ~10 GB vllm image."""
    from scripts.preflight import _download_size_mb
    sizes = _download_size_mb("dual-gpu", "vllm")
    assert "vllm_image" in sizes
    assert sizes["vllm_image"] >= 9000  # at least 9 GB


def test_download_size_dual_gpu_mixed_mode():
    """dual-gpu + mixed mode: also includes vllm image."""
    from scripts.preflight import _download_size_mb
    sizes = _download_size_mb("dual-gpu", "mixed")
    assert "vllm_image" in sizes


# ── Mixed-mode VRAM warning ────────────────────────────────────────────────────

def test_mixed_mode_vram_warning_triggered():
    """Should return a warning string when GPU 1 has < 12 GB free in mixed mode."""
    from scripts.preflight import _mixed_mode_vram_warning
    gpus = [
        {"name": "RTX 3090", "vram_total_gb": 24.0, "vram_free_gb": 20.0},
        {"name": "RTX 3090", "vram_total_gb": 24.0, "vram_free_gb": 8.0},  # tight
    ]
    warning = _mixed_mode_vram_warning(gpus, "mixed")
    assert warning is not None
    assert "8.0" in warning or "GPU 1" in warning


def test_mixed_mode_vram_warning_not_triggered_with_headroom():
    """Should return None when GPU 1 has >= 12 GB free."""
    from scripts.preflight import _mixed_mode_vram_warning
    gpus = [
        {"name": "RTX 4090", "vram_total_gb": 24.0, "vram_free_gb": 20.0},
        {"name": "RTX 4090", "vram_total_gb": 24.0, "vram_free_gb": 18.0},  # plenty
    ]
    warning = _mixed_mode_vram_warning(gpus, "mixed")
    assert warning is None


def test_mixed_mode_vram_warning_not_triggered_for_other_modes():
    """Warning only applies in mixed mode."""
    from scripts.preflight import _mixed_mode_vram_warning
    gpus = [
        {"name": "RTX 3090", "vram_total_gb": 24.0, "vram_free_gb": 20.0},
        {"name": "RTX 3090", "vram_total_gb": 24.0, "vram_free_gb": 6.0},
    ]
    assert _mixed_mode_vram_warning(gpus, "ollama") is None
    assert _mixed_mode_vram_warning(gpus, "vllm") is None


# ── update_llm_yaml with ollama_research ──────────────────────────────────────

def test_update_llm_yaml_sets_ollama_research_url_docker_internal():
    """ollama_research backend URL must be set to ollama_research:11434 when Docker-owned."""
    from scripts.preflight import update_llm_yaml

    llm_cfg = {
        "backends": {
            "ollama": {"base_url": "http://old", "type": "openai_compat"},
            "ollama_research": {"base_url": "http://old", "type": "openai_compat"},
            "vllm": {"base_url": "http://old", "type": "openai_compat"},
            "vllm_research": {"base_url": "http://old", "type": "openai_compat"},
            "vision_service": {"base_url": "http://old", "type": "vision_service"},
        }
    }

    with tempfile.NamedTemporaryFile(mode="w", suffix=".yaml", delete=False) as f:
        yaml.dump(llm_cfg, f)
        tmp_path = Path(f.name)

    ports = {
        "ollama": {
            "resolved": 11434, "external": False, "env_var": "OLLAMA_PORT"
        },
        "ollama_research": {
            "resolved": 11435, "external": False, "env_var": "OLLAMA_RESEARCH_PORT"
        },
        "vllm": {
            "resolved": 8000, "external": False, "env_var": "VLLM_PORT"
        },
        "vision": {
            "resolved": 8002, "external": False, "env_var": "VISION_PORT"
        },
    }

    try:
        # Patch LLM_YAML to point at our temp file
        with patch("scripts.preflight.LLM_YAML", tmp_path):
            update_llm_yaml(ports)

        result = yaml.safe_load(tmp_path.read_text())
        # Docker-internal: use service name + container port
        assert result["backends"]["ollama_research"]["base_url"] == "http://ollama_research:11434/v1"
        # vllm_research must match vllm's URL
        assert result["backends"]["vllm_research"]["base_url"] == result["backends"]["vllm"]["base_url"]
    finally:
        tmp_path.unlink()


def test_update_llm_yaml_sets_ollama_research_url_external():
    """When ollama_research is external (adopted), URL uses host.docker.internal:11435."""
    from scripts.preflight import update_llm_yaml

    llm_cfg = {
        "backends": {
            "ollama": {"base_url": "http://old", "type": "openai_compat"},
            "ollama_research": {"base_url": "http://old", "type": "openai_compat"},
        }
    }

    with tempfile.NamedTemporaryFile(mode="w", suffix=".yaml", delete=False) as f:
        yaml.dump(llm_cfg, f)
        tmp_path = Path(f.name)

    ports = {
        "ollama": {"resolved": 11434, "external": False, "env_var": "OLLAMA_PORT"},
        "ollama_research": {"resolved": 11435, "external": True, "env_var": "OLLAMA_RESEARCH_PORT"},
    }

    try:
        with patch("scripts.preflight.LLM_YAML", tmp_path):
            update_llm_yaml(ports)
        result = yaml.safe_load(tmp_path.read_text())
        assert result["backends"]["ollama_research"]["base_url"] == "http://host.docker.internal:11435/v1"
    finally:
        tmp_path.unlink()
```

**Step 2: Run tests to confirm they all fail**

```bash
conda run -n job-seeker python -m pytest tests/test_preflight.py -v 2>&1 | head -50
```

Expected: all FAIL with `ImportError` or `AssertionError` — that's correct.

**Step 3: Commit failing tests**

```bash
git add tests/test_preflight.py
git commit -m "test: add failing tests for dual-gpu preflight additions"
```

---

### Task 3: `preflight.py` — service table additions

**Files:**
- Modify: `scripts/preflight.py:46-67` (`_SERVICES`, `_LLM_BACKENDS`, `_DOCKER_INTERNAL`)

**Step 1: Update `_SERVICES`**

Find the `_SERVICES` dict (currently ends at the `"ollama"` entry). Add `ollama_research` as a new entry:

```python
_SERVICES: dict[str, tuple[str, int, str, bool, bool]] = {
    "streamlit":        ("streamlit_port",        8501,  "STREAMLIT_PORT",        True,  False),
    "searxng":          ("searxng_port",           8888,  "SEARXNG_PORT",          True,  True),
    "vllm":             ("vllm_port",              8000,  "VLLM_PORT",             True,  True),
    "vision":           ("vision_port",            8002,  "VISION_PORT",           True,  True),
    "ollama":           ("ollama_port",            11434, "OLLAMA_PORT",           True,  True),
    "ollama_research":  ("ollama_research_port",   11435, "OLLAMA_RESEARCH_PORT",  True,  True),
}
```

**Step 2: Update `_LLM_BACKENDS`**

Replace the existing dict:

```python
_LLM_BACKENDS: dict[str, list[tuple[str, str]]] = {
    "ollama":          [("ollama", "/v1")],
    "ollama_research": [("ollama_research", "/v1")],
    "vllm":            [("vllm", "/v1"), ("vllm_research", "/v1")],
    "vision":          [("vision_service", "")],
}
```

**Step 3: Update `_DOCKER_INTERNAL`**

Add `ollama_research` entry:

```python
_DOCKER_INTERNAL: dict[str, tuple[str, int]] = {
    "ollama":          ("ollama",          11434),
    "ollama_research": ("ollama_research", 11434),  # container-internal port is always 11434
    "vllm":            ("vllm",            8000),
    "vision":          ("vision",          8002),
    "searxng":         ("searxng",         8080),
}
```

**Step 4: Run service table tests**

```bash
conda run -n job-seeker python -m pytest tests/test_preflight.py::test_ollama_research_in_services tests/test_preflight.py::test_ollama_research_in_llm_backends tests/test_preflight.py::test_vllm_research_in_llm_backends tests/test_preflight.py::test_ollama_research_in_docker_internal tests/test_preflight.py::test_ollama_not_mapped_to_ollama_research_backend tests/test_preflight.py::test_update_llm_yaml_sets_ollama_research_url_docker_internal tests/test_preflight.py::test_update_llm_yaml_sets_ollama_research_url_external -v
```

Expected: all PASS

**Step 5: Commit**

```bash
git add scripts/preflight.py
git commit -m "feat: add ollama_research to preflight service table and LLM backend map"
```

---

### Task 4: `preflight.py` — `_download_size_mb()` pure function

**Files:**
- Modify: `scripts/preflight.py` (add new function after `calc_cpu_offload_gb`)

**Step 1: Add the function**

After `calc_cpu_offload_gb()`, add:

```python
def _download_size_mb(profile: str, dual_gpu_mode: str = "ollama") -> dict[str, int]:
    """
    Return estimated first-run download sizes in MB, keyed by component name.
    Profile-aware: only includes components that will actually be pulled.
    """
    sizes: dict[str, int] = {
        "searxng": 300,
        "app":     1500,
    }
    if profile in ("cpu", "single-gpu", "dual-gpu"):
        sizes["ollama"]      = 800
        sizes["llama3_2_3b"] = 2000
    if profile in ("single-gpu", "dual-gpu"):
        sizes["vision_image"] = 3000
        sizes["moondream2"]   = 1800
    if profile == "dual-gpu" and dual_gpu_mode in ("vllm", "mixed"):
        sizes["vllm_image"] = 10000
    return sizes
```

**Step 2: Run download size tests**

```bash
conda run -n job-seeker python -m pytest tests/test_preflight.py -k "download_size" -v
```

Expected: all PASS

**Step 3: Commit**

```bash
git add scripts/preflight.py
git commit -m "feat: add _download_size_mb() pure function for preflight size warning"
```

---

### Task 5: `preflight.py` — VRAM warning, size report block, DUAL_GPU_MODE default

**Files:**
- Modify: `scripts/preflight.py` (three additions to `main()` and a new helper)

**Step 1: Add `_mixed_mode_vram_warning()` after `_download_size_mb()`**

```python
def _mixed_mode_vram_warning(gpus: list[dict], dual_gpu_mode: str) -> str | None:
    """
    Return a warning string if GPU 1 likely lacks VRAM for mixed mode, else None.
    Only relevant when dual_gpu_mode == 'mixed' and at least 2 GPUs are present.
    """
    if dual_gpu_mode != "mixed" or len(gpus) < 2:
        return None
    free = gpus[1]["vram_free_gb"]
    if free < 12:
        return (
            f"⚠  DUAL_GPU_MODE=mixed: GPU 1 has only {free:.1f} GB free — "
            f"running ollama_research + vllm together may cause OOM. "
            f"Consider DUAL_GPU_MODE=ollama or DUAL_GPU_MODE=vllm."
        )
    return None
```

**Step 2: Run VRAM warning tests**

```bash
conda run -n job-seeker python -m pytest tests/test_preflight.py -k "vram" -v
```

Expected: all PASS

**Step 3: Wire size warning into `main()` report block**

In `main()`, find the closing `print("╚═...═╝")` line. Add the size warning block just before it:

```python
        # ── Download size warning ──────────────────────────────────────────────
        dual_gpu_mode = os.environ.get("DUAL_GPU_MODE", "ollama")
        sizes = _download_size_mb(profile, dual_gpu_mode)
        total_mb = sum(sizes.values())
        print("║")
        print("║  Download sizes (first-run estimates)")
        print("║    Docker images")
        print(f"║      app (Python build)   ~{sizes.get('app', 0):,} MB")
        if "searxng" in sizes:
            print(f"║      searxng/searxng       ~{sizes['searxng']:,} MB")
        if "ollama" in sizes:
            shared_note = "  (shared by ollama + ollama_research)" if profile == "dual-gpu" and dual_gpu_mode in ("ollama", "mixed") else ""
            print(f"║      ollama/ollama         ~{sizes['ollama']:,} MB{shared_note}")
        if "vision_image" in sizes:
            print(f"║      vision service        ~{sizes['vision_image']:,} MB  (torch + moondream)")
        if "vllm_image" in sizes:
            print(f"║      vllm/vllm-openai      ~{sizes['vllm_image']:,} MB")
        print("║    Model weights  (lazy-loaded on first use)")
        if "llama3_2_3b" in sizes:
            print(f"║      llama3.2:3b            ~{sizes['llama3_2_3b']:,} MB  → OLLAMA_MODELS_DIR")
        if "moondream2" in sizes:
            print(f"║      moondream2             ~{sizes['moondream2']:,} MB  → vision container cache")
        if profile == "dual-gpu" and dual_gpu_mode in ("ollama", "mixed"):
            print("║    Note: ollama + ollama_research share model dir — no double download")
        print(f"║  ⚠  Total first-run: ~{total_mb / 1024:.1f} GB  (models persist between restarts)")

        # ── Mixed-mode VRAM warning ────────────────────────────────────────────
        vram_warn = _mixed_mode_vram_warning(gpus, dual_gpu_mode)
        if vram_warn:
            print("║")
            print(f"║  {vram_warn}")
```

**Step 4: Wire `DUAL_GPU_MODE` default into `write_env()` block in `main()`**

In `main()`, find the `if not args.check_only:` block. After `env_updates["PEREGRINE_GPU_NAMES"]`, add:

```python
        # Write DUAL_GPU_MODE default for new 2-GPU setups (don't override user's choice)
        if len(gpus) >= 2:
            existing_env: dict[str, str] = {}
            if ENV_FILE.exists():
                for line in ENV_FILE.read_text().splitlines():
                    if "=" in line and not line.startswith("#"):
                        k, _, v = line.partition("=")
                        existing_env[k.strip()] = v.strip()
            if "DUAL_GPU_MODE" not in existing_env:
                env_updates["DUAL_GPU_MODE"] = "ollama"
```

**Step 5: Add `import os` if not already present at top of file**

Check line 1–30 of `scripts/preflight.py`. `import os` is already present inside `get_cpu_cores()` as a local import — move it to the top-level imports block:

```python
import os  # add alongside existing stdlib imports
```

And remove the local `import os` inside `get_cpu_cores()`.

**Step 6: Run all preflight tests**

```bash
conda run -n job-seeker python -m pytest tests/test_preflight.py -v
```

Expected: all PASS

**Step 7: Smoke-check the preflight report output**

```bash
conda run -n job-seeker python scripts/preflight.py --check-only
```

Expected: report includes the `Download sizes` block near the bottom.

**Step 8: Commit**

```bash
git add scripts/preflight.py
git commit -m "feat: add DUAL_GPU_MODE default, VRAM warning, and download size report to preflight"
```

---

### Task 6: `compose.yml` — `ollama_research` service + profile updates

**Files:**
- Modify: `compose.yml`

**Step 1: Update `ollama` profiles line**

Find:
```yaml
    profiles: [cpu, single-gpu, dual-gpu]
```
Replace with:
```yaml
    profiles: [cpu, single-gpu, dual-gpu-ollama, dual-gpu-vllm, dual-gpu-mixed]
```

**Step 2: Update `vision` profiles line**

Find:
```yaml
    profiles: [single-gpu, dual-gpu]
```
Replace with:
```yaml
    profiles: [single-gpu, dual-gpu-ollama, dual-gpu-vllm, dual-gpu-mixed]
```

**Step 3: Update `vllm` profiles line**

Find:
```yaml
    profiles: [dual-gpu]
```
Replace with:
```yaml
    profiles: [dual-gpu-vllm, dual-gpu-mixed]
```

**Step 4: Add `ollama_research` service**

After the closing lines of the `ollama` service block, add:

```yaml
  ollama_research:
    image: ollama/ollama:latest
    ports:
      - "${OLLAMA_RESEARCH_PORT:-11435}:11434"
    volumes:
      - ${OLLAMA_MODELS_DIR:-~/models/ollama}:/root/.ollama
      - ./docker/ollama/entrypoint.sh:/entrypoint.sh
    environment:
      - OLLAMA_MODELS=/root/.ollama
      - DEFAULT_OLLAMA_MODEL=${OLLAMA_RESEARCH_MODEL:-llama3.2:3b}
    entrypoint: ["/bin/bash", "/entrypoint.sh"]
    profiles: [dual-gpu-ollama, dual-gpu-mixed]
    restart: unless-stopped
```

**Step 5: Validate compose YAML**

```bash
docker compose -f compose.yml config --quiet
```

Expected: no errors.

**Step 6: Commit**

```bash
git add compose.yml
git commit -m "feat: add ollama_research service and update profiles for dual-gpu sub-profiles"
```

---

### Task 7: GPU overlay files — `compose.gpu.yml` and `compose.podman-gpu.yml`

**Files:**
- Modify: `compose.gpu.yml`
- Modify: `compose.podman-gpu.yml`

**Step 1: Add `ollama_research` to `compose.gpu.yml`**

After the `ollama:` block, add:

```yaml
  ollama_research:
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ["1"]
              capabilities: [gpu]
```

**Step 2: Add `ollama_research` to `compose.podman-gpu.yml`**

After the `ollama:` block, add:

```yaml
  ollama_research:
    devices:
      - nvidia.com/gpu=1
    deploy:
      resources:
        reservations:
          devices: []
```

**Step 3: Validate both files**

```bash
docker compose -f compose.yml -f compose.gpu.yml config --quiet
```

Expected: no errors.

**Step 4: Commit**

```bash
git add compose.gpu.yml compose.podman-gpu.yml
git commit -m "feat: assign ollama_research to GPU 1 in Docker and Podman GPU overlays"
```

---

### Task 8: `Makefile` + `manage.sh` — `DUAL_GPU_MODE` injection and help text

**Files:**
- Modify: `Makefile`
- Modify: `manage.sh`

**Step 1: Update `Makefile`**

After the `COMPOSE_OVERRIDE` variable, add `DUAL_GPU_MODE` reading:

```makefile
DUAL_GPU_MODE ?= $(shell grep -m1 '^DUAL_GPU_MODE=' .env 2>/dev/null | cut -d= -f2 || echo ollama)
```

In the GPU overlay block, find:
```makefile
else
  ifneq (,$(findstring gpu,$(PROFILE)))
    COMPOSE_FILES := -f compose.yml $(COMPOSE_OVERRIDE) -f compose.gpu.yml
  endif
endif
```

Replace the `else` branch with:
```makefile
else
  ifneq (,$(findstring gpu,$(PROFILE)))
    COMPOSE_FILES := -f compose.yml $(COMPOSE_OVERRIDE) -f compose.gpu.yml
  endif
endif
ifeq ($(PROFILE),dual-gpu)
  COMPOSE_FILES += --profile dual-gpu-$(DUAL_GPU_MODE)
endif
```

**Step 2: Update `manage.sh` — profiles help block**

Find the profiles section in `usage()`:
```bash
    echo "    dual-gpu     Ollama + Vision + vLLM on GPU 0+1"
```

Replace with:
```bash
    echo "    dual-gpu     Ollama + Vision on GPU 0; GPU 1 set by DUAL_GPU_MODE"
    echo "                 DUAL_GPU_MODE=ollama  (default) ollama_research on GPU 1"
    echo "                 DUAL_GPU_MODE=vllm             vllm on GPU 1"
    echo "                 DUAL_GPU_MODE=mixed            both on GPU 1 (VRAM-split)"
```

**Step 3: Verify Makefile parses**

```bash
make help
```

Expected: help table prints cleanly, no make errors.

**Step 4: Verify manage.sh help**

```bash
./manage.sh help
```

Expected: new dual-gpu description appears in profiles section.

**Step 5: Commit**

```bash
git add Makefile manage.sh
git commit -m "feat: inject DUAL_GPU_MODE sub-profile in Makefile; update manage.sh help"
```

---

### Task 9: Integration smoke test

**Goal:** Verify the full chain works for `DUAL_GPU_MODE=ollama` without actually starting Docker (dry-run compose config check).

**Step 1: Write `DUAL_GPU_MODE=ollama` to `.env` temporarily**

```bash
echo "DUAL_GPU_MODE=ollama" >> .env
```

**Step 2: Dry-run compose config for dual-gpu + dual-gpu-ollama**

```bash
docker compose -f compose.yml -f compose.gpu.yml --profile dual-gpu --profile dual-gpu-ollama config 2>&1 | grep -E "^  [a-z]|image:|ports:"
```

Expected output includes:
- `ollama:` service with port 11434
- `ollama_research:` service with port 11435
- `vision:` service
- `searxng:` service
- **No** `vllm:` service

**Step 3: Dry-run for `DUAL_GPU_MODE=vllm`**

```bash
docker compose -f compose.yml -f compose.gpu.yml --profile dual-gpu --profile dual-gpu-vllm config 2>&1 | grep -E "^  [a-z]|image:|ports:"
```

Expected:
- `ollama:` service (port 11434)
- `vllm:` service (port 8000)
- **No** `ollama_research:` service

**Step 4: Run full test suite**

```bash
conda run -n job-seeker python -m pytest tests/ -v
```

Expected: all existing tests PASS, all new preflight tests PASS.

**Step 5: Clean up `.env` test entry**

```bash
# Remove the test DUAL_GPU_MODE line (preflight will re-write it correctly on next run)
sed -i '/^DUAL_GPU_MODE=/d' .env
```

**Step 6: Final commit**

```bash
git add .env  # in case preflight rewrote it during testing
git commit -m "feat: dual-gpu DUAL_GPU_MODE complete — ollama/vllm/mixed GPU 1 selection"
```