feat: dual-GPU DUAL_GPU_MODE complete — ollama/vllm/mixed GPU 1 selection
This commit is contained in:
parent
7ef95dd9ba
commit
11f6334f28
2 changed files with 1068 additions and 0 deletions
257
docs/plans/2026-02-26-dual-gpu-design.md
Normal file
257
docs/plans/2026-02-26-dual-gpu-design.md
Normal file
|
|
@ -0,0 +1,257 @@
|
|||
# Peregrine — Dual-GPU / Dual-Inference Design
|
||||
|
||||
**Date:** 2026-02-26
|
||||
**Status:** Approved — ready for implementation
|
||||
**Scope:** Peregrine (reference impl; patterns propagate to future products)
|
||||
|
||||
---
|
||||
|
||||
## Goal
|
||||
|
||||
Replace the fixed `dual-gpu` profile (Ollama + vLLM hardwired to GPU 0 + GPU 1) with a
|
||||
`DUAL_GPU_MODE` env var that selects which inference stack occupies GPU 1. Simultaneously
|
||||
add a first-run download size warning to preflight so users know what they're in for before
|
||||
Docker starts pulling images and models.
|
||||
|
||||
---
|
||||
|
||||
## Modes
|
||||
|
||||
| `DUAL_GPU_MODE` | GPU 0 | GPU 1 | Research backend |
|
||||
|-----------------|-------|-------|-----------------|
|
||||
| `ollama` (default) | ollama + vision | ollama_research | `ollama_research` |
|
||||
| `vllm` | ollama + vision | vllm | `vllm_research` |
|
||||
| `mixed` | ollama + vision | ollama_research + vllm (VRAM-split) | `vllm_research` → `ollama_research` fallback |
|
||||
|
||||
`mixed` requires sufficient VRAM on GPU 1. Preflight warns (not blocks) when GPU 1 has
|
||||
< 12 GB free before starting in mixed mode.
|
||||
|
||||
Cover letters always use `ollama` on GPU 0. Research uses whichever GPU 1 backend is
|
||||
reachable. The LLM router's `_is_reachable()` check handles this transparently — the
|
||||
fallback chain simply skips services that aren't running.
|
||||
|
||||
---
|
||||
|
||||
## Compose Profile Architecture
|
||||
|
||||
Docker Compose profiles used to gate which services start per mode.
|
||||
`DUAL_GPU_MODE` is read by the Makefile and passed as a second `--profile` flag.
|
||||
|
||||
### Service → profile mapping
|
||||
|
||||
| Service | Profiles |
|
||||
|---------|---------|
|
||||
| `ollama` | `cpu`, `single-gpu`, `dual-gpu-ollama`, `dual-gpu-vllm`, `dual-gpu-mixed` |
|
||||
| `vision` | `single-gpu`, `dual-gpu-ollama`, `dual-gpu-vllm`, `dual-gpu-mixed` |
|
||||
| `ollama_research` | `dual-gpu-ollama`, `dual-gpu-mixed` |
|
||||
| `vllm` | `dual-gpu-vllm`, `dual-gpu-mixed` |
|
||||
| `finetune` | `finetune` |
|
||||
|
||||
User-facing profiles remain: `remote`, `cpu`, `single-gpu`, `dual-gpu`.
|
||||
Sub-profiles (`dual-gpu-ollama`, `dual-gpu-vllm`, `dual-gpu-mixed`) are injected by the
|
||||
Makefile and never typed by the user.
|
||||
|
||||
---
|
||||
|
||||
## File Changes
|
||||
|
||||
### `compose.yml`
|
||||
|
||||
**`ollama`** — add all dual-gpu sub-profiles to `profiles`:
|
||||
```yaml
|
||||
profiles: [cpu, single-gpu, dual-gpu-ollama, dual-gpu-vllm, dual-gpu-mixed]
|
||||
```
|
||||
|
||||
**`vision`** — same pattern:
|
||||
```yaml
|
||||
profiles: [single-gpu, dual-gpu-ollama, dual-gpu-vllm, dual-gpu-mixed]
|
||||
```
|
||||
|
||||
**`vllm`** — change from `[dual-gpu]` to:
|
||||
```yaml
|
||||
profiles: [dual-gpu-vllm, dual-gpu-mixed]
|
||||
```
|
||||
|
||||
**`ollama_research`** — new service:
|
||||
```yaml
|
||||
ollama_research:
|
||||
image: ollama/ollama:latest
|
||||
ports:
|
||||
- "${OLLAMA_RESEARCH_PORT:-11435}:11434"
|
||||
volumes:
|
||||
- ${OLLAMA_MODELS_DIR:-~/models/ollama}:/root/.ollama # shared — no double download
|
||||
- ./docker/ollama/entrypoint.sh:/entrypoint.sh
|
||||
environment:
|
||||
- OLLAMA_MODELS=/root/.ollama
|
||||
- DEFAULT_OLLAMA_MODEL=${OLLAMA_RESEARCH_MODEL:-llama3.2:3b}
|
||||
entrypoint: ["/bin/bash", "/entrypoint.sh"]
|
||||
profiles: [dual-gpu-ollama, dual-gpu-mixed]
|
||||
restart: unless-stopped
|
||||
```
|
||||
|
||||
### `compose.gpu.yml`
|
||||
|
||||
Add `ollama_research` block (GPU 1). `vllm` stays on GPU 1 as-is:
|
||||
```yaml
|
||||
ollama_research:
|
||||
deploy:
|
||||
resources:
|
||||
reservations:
|
||||
devices:
|
||||
- driver: nvidia
|
||||
device_ids: ["1"]
|
||||
capabilities: [gpu]
|
||||
```
|
||||
|
||||
### `compose.podman-gpu.yml`
|
||||
|
||||
Same addition for Podman CDI:
|
||||
```yaml
|
||||
ollama_research:
|
||||
devices:
|
||||
- nvidia.com/gpu=1
|
||||
deploy:
|
||||
resources:
|
||||
reservations:
|
||||
devices: []
|
||||
```
|
||||
|
||||
### `Makefile`
|
||||
|
||||
Two additions after existing `COMPOSE` detection:
|
||||
|
||||
```makefile
|
||||
DUAL_GPU_MODE ?= $(shell grep -m1 '^DUAL_GPU_MODE=' .env 2>/dev/null | cut -d= -f2 || echo ollama)
|
||||
|
||||
# GPU overlay: matches single-gpu, dual-gpu (findstring gpu already covers these)
|
||||
# Sub-profile injection for dual-gpu modes:
|
||||
ifeq ($(PROFILE),dual-gpu)
|
||||
COMPOSE_FILES += --profile dual-gpu-$(DUAL_GPU_MODE)
|
||||
endif
|
||||
```
|
||||
|
||||
Update `manage.sh` usage block to document `dual-gpu` profile with `DUAL_GPU_MODE` note:
|
||||
```
|
||||
dual-gpu Ollama + Vision on GPU 0; GPU 1 mode set by DUAL_GPU_MODE
|
||||
DUAL_GPU_MODE=ollama (default) ollama_research on GPU 1
|
||||
DUAL_GPU_MODE=vllm vllm on GPU 1
|
||||
DUAL_GPU_MODE=mixed both on GPU 1 (VRAM-split; see preflight warning)
|
||||
```
|
||||
|
||||
### `scripts/preflight.py`
|
||||
|
||||
**1. `_SERVICES` — add `ollama_research`:**
|
||||
```python
|
||||
"ollama_research": ("ollama_research_port", 11435, "OLLAMA_RESEARCH_PORT", True, True),
|
||||
```
|
||||
|
||||
**2. `_LLM_BACKENDS` — add entries for both new backends:**
|
||||
```python
|
||||
"ollama_research": [("ollama_research", "/v1")],
|
||||
# vllm_research is an alias for vllm's port — preflight updates base_url for both:
|
||||
"vllm": [("vllm", "/v1"), ("vllm_research", "/v1")],
|
||||
```
|
||||
|
||||
**3. `_DOCKER_INTERNAL` — add `ollama_research`:**
|
||||
```python
|
||||
"ollama_research": ("ollama_research", 11434), # container-internal port is always 11434
|
||||
```
|
||||
|
||||
**4. `recommend_profile()` — unchanged** (still returns `"dual-gpu"` for 2 GPUs).
|
||||
Write `DUAL_GPU_MODE=ollama` to `.env` when first setting up a 2-GPU system.
|
||||
|
||||
**5. Mixed-mode VRAM warning** — after GPU resource section, before closing line:
|
||||
```python
|
||||
dual_gpu_mode = os.environ.get("DUAL_GPU_MODE", "ollama")
|
||||
if dual_gpu_mode == "mixed" and len(gpus) >= 2:
|
||||
if gpus[1]["vram_free_gb"] < 12:
|
||||
print(f"║ ⚠ DUAL_GPU_MODE=mixed: GPU 1 has only {gpus[1]['vram_free_gb']:.1f} GB free")
|
||||
print(f"║ Running ollama_research + vllm together may cause OOM.")
|
||||
print(f"║ Consider DUAL_GPU_MODE=ollama or DUAL_GPU_MODE=vllm instead.")
|
||||
```
|
||||
|
||||
**6. Download size warning** — profile-aware block added just before the closing `╚` line:
|
||||
|
||||
```
|
||||
║ Download sizes (first-run estimates)
|
||||
║ Docker images
|
||||
║ ollama/ollama ~800 MB (shared by ollama + ollama_research)
|
||||
║ searxng/searxng ~300 MB
|
||||
║ app (Python build) ~1.5 GB
|
||||
║ vision service ~3.0 GB [single-gpu and above]
|
||||
║ vllm/vllm-openai ~10.0 GB [vllm / mixed mode only]
|
||||
║
|
||||
║ Model weights (lazy-loaded on first use)
|
||||
║ llama3.2:3b ~2.0 GB → OLLAMA_MODELS_DIR
|
||||
║ moondream2 ~1.8 GB → vision container cache [single-gpu+]
|
||||
║ Note: ollama + ollama_research share the same model dir — no double download
|
||||
║
|
||||
║ ⚠ Total first-run: ~X GB (models persist between restarts)
|
||||
```
|
||||
|
||||
Total is summed at runtime based on active profile + `DUAL_GPU_MODE`.
|
||||
|
||||
Size table (used by the warning calculator):
|
||||
| Component | Size | Condition |
|
||||
|-----------|------|-----------|
|
||||
| `ollama/ollama` image | 800 MB | cpu, single-gpu, dual-gpu |
|
||||
| `searxng/searxng` image | 300 MB | always |
|
||||
| app image | 1,500 MB | always |
|
||||
| vision service image | 3,000 MB | single-gpu, dual-gpu |
|
||||
| `vllm/vllm-openai` image | 10,000 MB | vllm or mixed mode |
|
||||
| llama3.2:3b weights | 2,000 MB | cpu, single-gpu, dual-gpu |
|
||||
| moondream2 weights | 1,800 MB | single-gpu, dual-gpu |
|
||||
|
||||
### `config/llm.yaml`
|
||||
|
||||
**Add `vllm_research` backend:**
|
||||
```yaml
|
||||
vllm_research:
|
||||
api_key: ''
|
||||
base_url: http://host.docker.internal:8000/v1 # same port as vllm; preflight keeps in sync
|
||||
enabled: true
|
||||
model: __auto__
|
||||
supports_images: false
|
||||
type: openai_compat
|
||||
```
|
||||
|
||||
**Update `research_fallback_order`:**
|
||||
```yaml
|
||||
research_fallback_order:
|
||||
- claude_code
|
||||
- vllm_research
|
||||
- ollama_research
|
||||
- github_copilot
|
||||
- anthropic
|
||||
```
|
||||
|
||||
`vllm` stays in the main `fallback_order` (cover letters). `vllm_research` is the explicit
|
||||
research alias for the same service — different config key, same port, makes routing intent
|
||||
readable in the YAML.
|
||||
|
||||
---
|
||||
|
||||
## Downstream Compatibility
|
||||
|
||||
The LLM router requires no changes. `_is_reachable()` already skips backends that aren't
|
||||
responding. When `DUAL_GPU_MODE=ollama`, `vllm_research` is unreachable and skipped;
|
||||
`ollama_research` is up and used. When `DUAL_GPU_MODE=vllm`, the reverse. `mixed` mode
|
||||
makes both reachable; `vllm_research` wins as the higher-priority entry.
|
||||
|
||||
Preflight's `update_llm_yaml()` keeps `base_url` values correct for both adopted (external)
|
||||
and Docker-internal routing automatically, since `vllm_research` is registered under the
|
||||
`"vllm"` key in `_LLM_BACKENDS`.
|
||||
|
||||
---
|
||||
|
||||
## Future Considerations
|
||||
|
||||
- **Triple-GPU / 3+ service configs:** When a third product is active, extract this pattern
|
||||
into `circuitforge-core` as a reusable inference topology manager.
|
||||
- **Dual vLLM:** Two vLLM instances (e.g., different model sizes per task) follows the same
|
||||
pattern — add `vllm_research` as a separate compose service on its own port.
|
||||
- **VRAM-aware model selection:** Preflight could suggest smaller models when VRAM is tight
|
||||
in mixed mode (e.g., swap llama3.2:3b → llama3.2:1b for the research instance).
|
||||
- **Queue optimizer (1-GPU / CPU):** When only one inference backend is available and a batch
|
||||
of tasks is queued, group by task type (all cover letters first, then all research briefs)
|
||||
to avoid repeated model context switches. Tracked separately.
|
||||
811
docs/plans/2026-02-26-dual-gpu-plan.md
Normal file
811
docs/plans/2026-02-26-dual-gpu-plan.md
Normal file
|
|
@ -0,0 +1,811 @@
|
|||
# Dual-GPU / Dual-Inference Implementation Plan
|
||||
|
||||
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
|
||||
|
||||
**Goal:** Add `DUAL_GPU_MODE=ollama|vllm|mixed` env var that gates which inference service occupies GPU 1 on dual-GPU systems, plus a first-run download size warning in preflight.
|
||||
|
||||
**Architecture:** Sub-profiles (`dual-gpu-ollama`, `dual-gpu-vllm`, `dual-gpu-mixed`) are injected alongside `--profile dual-gpu` by the Makefile based on `DUAL_GPU_MODE`. The LLM router requires zero changes — `_is_reachable()` naturally skips backends that aren't running. Preflight gains `ollama_research` as a tracked service and emits a size warning block.
|
||||
|
||||
**Tech Stack:** Docker Compose profiles, Python (preflight.py), YAML (llm.yaml, compose files), bash (Makefile, manage.sh)
|
||||
|
||||
**Design doc:** `docs/plans/2026-02-26-dual-gpu-design.md`
|
||||
|
||||
**Test runner:** `conda run -n job-seeker python -m pytest tests/ -v`
|
||||
|
||||
---
|
||||
|
||||
### Task 1: Update `config/llm.yaml`
|
||||
|
||||
**Files:**
|
||||
- Modify: `config/llm.yaml`
|
||||
|
||||
**Step 1: Add `vllm_research` backend and update `research_fallback_order`**
|
||||
|
||||
Open `config/llm.yaml`. After the `vllm:` block, add:
|
||||
|
||||
```yaml
|
||||
vllm_research:
|
||||
api_key: ''
|
||||
base_url: http://host.docker.internal:8000/v1
|
||||
enabled: true
|
||||
model: __auto__
|
||||
supports_images: false
|
||||
type: openai_compat
|
||||
```
|
||||
|
||||
Replace `research_fallback_order:` section with:
|
||||
|
||||
```yaml
|
||||
research_fallback_order:
|
||||
- claude_code
|
||||
- vllm_research
|
||||
- ollama_research
|
||||
- github_copilot
|
||||
- anthropic
|
||||
```
|
||||
|
||||
**Step 2: Verify YAML parses cleanly**
|
||||
|
||||
```bash
|
||||
conda run -n job-seeker python -c "import yaml; yaml.safe_load(open('config/llm.yaml'))"
|
||||
```
|
||||
|
||||
Expected: no output (no error).
|
||||
|
||||
**Step 3: Run existing llm config test**
|
||||
|
||||
```bash
|
||||
conda run -n job-seeker python -m pytest tests/test_llm_router.py::test_config_loads -v
|
||||
```
|
||||
|
||||
Expected: PASS
|
||||
|
||||
**Step 4: Commit**
|
||||
|
||||
```bash
|
||||
git add config/llm.yaml
|
||||
git commit -m "feat: add vllm_research backend and update research_fallback_order"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 2: Write failing tests for preflight changes
|
||||
|
||||
**Files:**
|
||||
- Create: `tests/test_preflight.py`
|
||||
|
||||
No existing test file for preflight. Write all tests upfront — they fail until Task 3–5 implement the code.
|
||||
|
||||
**Step 1: Create `tests/test_preflight.py`**
|
||||
|
||||
```python
|
||||
"""Tests for scripts/preflight.py additions: dual-GPU service table, size warning, VRAM check."""
|
||||
import pytest
|
||||
from pathlib import Path
|
||||
from unittest.mock import patch
|
||||
import yaml
|
||||
import tempfile
|
||||
import os
|
||||
|
||||
|
||||
# ── Service table ──────────────────────────────────────────────────────────────
|
||||
|
||||
def test_ollama_research_in_services():
|
||||
"""ollama_research must be in _SERVICES at port 11435."""
|
||||
from scripts.preflight import _SERVICES
|
||||
assert "ollama_research" in _SERVICES
|
||||
_, default_port, env_var, docker_owned, adoptable = _SERVICES["ollama_research"]
|
||||
assert default_port == 11435
|
||||
assert env_var == "OLLAMA_RESEARCH_PORT"
|
||||
assert docker_owned is True
|
||||
assert adoptable is True
|
||||
|
||||
|
||||
def test_ollama_research_in_llm_backends():
|
||||
"""ollama_research must be a standalone key in _LLM_BACKENDS (not nested under ollama)."""
|
||||
from scripts.preflight import _LLM_BACKENDS
|
||||
assert "ollama_research" in _LLM_BACKENDS
|
||||
# Should map to the ollama_research llm backend
|
||||
backend_names = [name for name, _ in _LLM_BACKENDS["ollama_research"]]
|
||||
assert "ollama_research" in backend_names
|
||||
|
||||
|
||||
def test_vllm_research_in_llm_backends():
|
||||
"""vllm_research must be registered under vllm in _LLM_BACKENDS."""
|
||||
from scripts.preflight import _LLM_BACKENDS
|
||||
assert "vllm" in _LLM_BACKENDS
|
||||
backend_names = [name for name, _ in _LLM_BACKENDS["vllm"]]
|
||||
assert "vllm_research" in backend_names
|
||||
|
||||
|
||||
def test_ollama_research_in_docker_internal():
|
||||
"""ollama_research must map to internal port 11434 (Ollama's container port)."""
|
||||
from scripts.preflight import _DOCKER_INTERNAL
|
||||
assert "ollama_research" in _DOCKER_INTERNAL
|
||||
hostname, port = _DOCKER_INTERNAL["ollama_research"]
|
||||
assert hostname == "ollama_research"
|
||||
assert port == 11434 # container-internal port is always 11434
|
||||
|
||||
|
||||
def test_ollama_not_mapped_to_ollama_research_backend():
|
||||
"""ollama service key must only update the ollama llm backend, not ollama_research."""
|
||||
from scripts.preflight import _LLM_BACKENDS
|
||||
ollama_backend_names = [name for name, _ in _LLM_BACKENDS.get("ollama", [])]
|
||||
assert "ollama_research" not in ollama_backend_names
|
||||
|
||||
|
||||
# ── Download size warning ──────────────────────────────────────────────────────
|
||||
|
||||
def test_download_size_remote_profile():
|
||||
"""Remote profile: only searxng + app, no ollama, no vision, no vllm."""
|
||||
from scripts.preflight import _download_size_mb
|
||||
sizes = _download_size_mb("remote", "ollama")
|
||||
assert "searxng" in sizes
|
||||
assert "app" in sizes
|
||||
assert "ollama" not in sizes
|
||||
assert "vision_image" not in sizes
|
||||
assert "vllm_image" not in sizes
|
||||
|
||||
|
||||
def test_download_size_cpu_profile():
|
||||
"""CPU profile: adds ollama image + llama3.2:3b weights."""
|
||||
from scripts.preflight import _download_size_mb
|
||||
sizes = _download_size_mb("cpu", "ollama")
|
||||
assert "ollama" in sizes
|
||||
assert "llama3_2_3b" in sizes
|
||||
assert "vision_image" not in sizes
|
||||
|
||||
|
||||
def test_download_size_single_gpu_profile():
|
||||
"""Single-GPU: adds vision image + moondream2 weights."""
|
||||
from scripts.preflight import _download_size_mb
|
||||
sizes = _download_size_mb("single-gpu", "ollama")
|
||||
assert "vision_image" in sizes
|
||||
assert "moondream2" in sizes
|
||||
assert "vllm_image" not in sizes
|
||||
|
||||
|
||||
def test_download_size_dual_gpu_ollama_mode():
|
||||
"""dual-gpu + ollama mode: no vllm image."""
|
||||
from scripts.preflight import _download_size_mb
|
||||
sizes = _download_size_mb("dual-gpu", "ollama")
|
||||
assert "vllm_image" not in sizes
|
||||
|
||||
|
||||
def test_download_size_dual_gpu_vllm_mode():
|
||||
"""dual-gpu + vllm mode: adds ~10 GB vllm image."""
|
||||
from scripts.preflight import _download_size_mb
|
||||
sizes = _download_size_mb("dual-gpu", "vllm")
|
||||
assert "vllm_image" in sizes
|
||||
assert sizes["vllm_image"] >= 9000 # at least 9 GB
|
||||
|
||||
|
||||
def test_download_size_dual_gpu_mixed_mode():
|
||||
"""dual-gpu + mixed mode: also includes vllm image."""
|
||||
from scripts.preflight import _download_size_mb
|
||||
sizes = _download_size_mb("dual-gpu", "mixed")
|
||||
assert "vllm_image" in sizes
|
||||
|
||||
|
||||
# ── Mixed-mode VRAM warning ────────────────────────────────────────────────────
|
||||
|
||||
def test_mixed_mode_vram_warning_triggered():
|
||||
"""Should return a warning string when GPU 1 has < 12 GB free in mixed mode."""
|
||||
from scripts.preflight import _mixed_mode_vram_warning
|
||||
gpus = [
|
||||
{"name": "RTX 3090", "vram_total_gb": 24.0, "vram_free_gb": 20.0},
|
||||
{"name": "RTX 3090", "vram_total_gb": 24.0, "vram_free_gb": 8.0}, # tight
|
||||
]
|
||||
warning = _mixed_mode_vram_warning(gpus, "mixed")
|
||||
assert warning is not None
|
||||
assert "8.0" in warning or "GPU 1" in warning
|
||||
|
||||
|
||||
def test_mixed_mode_vram_warning_not_triggered_with_headroom():
|
||||
"""Should return None when GPU 1 has >= 12 GB free."""
|
||||
from scripts.preflight import _mixed_mode_vram_warning
|
||||
gpus = [
|
||||
{"name": "RTX 4090", "vram_total_gb": 24.0, "vram_free_gb": 20.0},
|
||||
{"name": "RTX 4090", "vram_total_gb": 24.0, "vram_free_gb": 18.0}, # plenty
|
||||
]
|
||||
warning = _mixed_mode_vram_warning(gpus, "mixed")
|
||||
assert warning is None
|
||||
|
||||
|
||||
def test_mixed_mode_vram_warning_not_triggered_for_other_modes():
|
||||
"""Warning only applies in mixed mode."""
|
||||
from scripts.preflight import _mixed_mode_vram_warning
|
||||
gpus = [
|
||||
{"name": "RTX 3090", "vram_total_gb": 24.0, "vram_free_gb": 20.0},
|
||||
{"name": "RTX 3090", "vram_total_gb": 24.0, "vram_free_gb": 6.0},
|
||||
]
|
||||
assert _mixed_mode_vram_warning(gpus, "ollama") is None
|
||||
assert _mixed_mode_vram_warning(gpus, "vllm") is None
|
||||
|
||||
|
||||
# ── update_llm_yaml with ollama_research ──────────────────────────────────────
|
||||
|
||||
def test_update_llm_yaml_sets_ollama_research_url_docker_internal():
|
||||
"""ollama_research backend URL must be set to ollama_research:11434 when Docker-owned."""
|
||||
from scripts.preflight import update_llm_yaml
|
||||
|
||||
llm_cfg = {
|
||||
"backends": {
|
||||
"ollama": {"base_url": "http://old", "type": "openai_compat"},
|
||||
"ollama_research": {"base_url": "http://old", "type": "openai_compat"},
|
||||
"vllm": {"base_url": "http://old", "type": "openai_compat"},
|
||||
"vllm_research": {"base_url": "http://old", "type": "openai_compat"},
|
||||
"vision_service": {"base_url": "http://old", "type": "vision_service"},
|
||||
}
|
||||
}
|
||||
|
||||
with tempfile.NamedTemporaryFile(mode="w", suffix=".yaml", delete=False) as f:
|
||||
yaml.dump(llm_cfg, f)
|
||||
tmp_path = Path(f.name)
|
||||
|
||||
ports = {
|
||||
"ollama": {
|
||||
"resolved": 11434, "external": False, "env_var": "OLLAMA_PORT"
|
||||
},
|
||||
"ollama_research": {
|
||||
"resolved": 11435, "external": False, "env_var": "OLLAMA_RESEARCH_PORT"
|
||||
},
|
||||
"vllm": {
|
||||
"resolved": 8000, "external": False, "env_var": "VLLM_PORT"
|
||||
},
|
||||
"vision": {
|
||||
"resolved": 8002, "external": False, "env_var": "VISION_PORT"
|
||||
},
|
||||
}
|
||||
|
||||
try:
|
||||
# Patch LLM_YAML to point at our temp file
|
||||
with patch("scripts.preflight.LLM_YAML", tmp_path):
|
||||
update_llm_yaml(ports)
|
||||
|
||||
result = yaml.safe_load(tmp_path.read_text())
|
||||
# Docker-internal: use service name + container port
|
||||
assert result["backends"]["ollama_research"]["base_url"] == "http://ollama_research:11434/v1"
|
||||
# vllm_research must match vllm's URL
|
||||
assert result["backends"]["vllm_research"]["base_url"] == result["backends"]["vllm"]["base_url"]
|
||||
finally:
|
||||
tmp_path.unlink()
|
||||
|
||||
|
||||
def test_update_llm_yaml_sets_ollama_research_url_external():
|
||||
"""When ollama_research is external (adopted), URL uses host.docker.internal:11435."""
|
||||
from scripts.preflight import update_llm_yaml
|
||||
|
||||
llm_cfg = {
|
||||
"backends": {
|
||||
"ollama": {"base_url": "http://old", "type": "openai_compat"},
|
||||
"ollama_research": {"base_url": "http://old", "type": "openai_compat"},
|
||||
}
|
||||
}
|
||||
|
||||
with tempfile.NamedTemporaryFile(mode="w", suffix=".yaml", delete=False) as f:
|
||||
yaml.dump(llm_cfg, f)
|
||||
tmp_path = Path(f.name)
|
||||
|
||||
ports = {
|
||||
"ollama": {"resolved": 11434, "external": False, "env_var": "OLLAMA_PORT"},
|
||||
"ollama_research": {"resolved": 11435, "external": True, "env_var": "OLLAMA_RESEARCH_PORT"},
|
||||
}
|
||||
|
||||
try:
|
||||
with patch("scripts.preflight.LLM_YAML", tmp_path):
|
||||
update_llm_yaml(ports)
|
||||
result = yaml.safe_load(tmp_path.read_text())
|
||||
assert result["backends"]["ollama_research"]["base_url"] == "http://host.docker.internal:11435/v1"
|
||||
finally:
|
||||
tmp_path.unlink()
|
||||
```
|
||||
|
||||
**Step 2: Run tests to confirm they all fail**
|
||||
|
||||
```bash
|
||||
conda run -n job-seeker python -m pytest tests/test_preflight.py -v 2>&1 | head -50
|
||||
```
|
||||
|
||||
Expected: all FAIL with `ImportError` or `AssertionError` — that's correct.
|
||||
|
||||
**Step 3: Commit failing tests**
|
||||
|
||||
```bash
|
||||
git add tests/test_preflight.py
|
||||
git commit -m "test: add failing tests for dual-gpu preflight additions"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 3: `preflight.py` — service table additions
|
||||
|
||||
**Files:**
|
||||
- Modify: `scripts/preflight.py:46-67` (`_SERVICES`, `_LLM_BACKENDS`, `_DOCKER_INTERNAL`)
|
||||
|
||||
**Step 1: Update `_SERVICES`**
|
||||
|
||||
Find the `_SERVICES` dict (currently ends at the `"ollama"` entry). Add `ollama_research` as a new entry:
|
||||
|
||||
```python
|
||||
_SERVICES: dict[str, tuple[str, int, str, bool, bool]] = {
|
||||
"streamlit": ("streamlit_port", 8501, "STREAMLIT_PORT", True, False),
|
||||
"searxng": ("searxng_port", 8888, "SEARXNG_PORT", True, True),
|
||||
"vllm": ("vllm_port", 8000, "VLLM_PORT", True, True),
|
||||
"vision": ("vision_port", 8002, "VISION_PORT", True, True),
|
||||
"ollama": ("ollama_port", 11434, "OLLAMA_PORT", True, True),
|
||||
"ollama_research": ("ollama_research_port", 11435, "OLLAMA_RESEARCH_PORT", True, True),
|
||||
}
|
||||
```
|
||||
|
||||
**Step 2: Update `_LLM_BACKENDS`**
|
||||
|
||||
Replace the existing dict:
|
||||
|
||||
```python
|
||||
_LLM_BACKENDS: dict[str, list[tuple[str, str]]] = {
|
||||
"ollama": [("ollama", "/v1")],
|
||||
"ollama_research": [("ollama_research", "/v1")],
|
||||
"vllm": [("vllm", "/v1"), ("vllm_research", "/v1")],
|
||||
"vision": [("vision_service", "")],
|
||||
}
|
||||
```
|
||||
|
||||
**Step 3: Update `_DOCKER_INTERNAL`**
|
||||
|
||||
Add `ollama_research` entry:
|
||||
|
||||
```python
|
||||
_DOCKER_INTERNAL: dict[str, tuple[str, int]] = {
|
||||
"ollama": ("ollama", 11434),
|
||||
"ollama_research": ("ollama_research", 11434), # container-internal port is always 11434
|
||||
"vllm": ("vllm", 8000),
|
||||
"vision": ("vision", 8002),
|
||||
"searxng": ("searxng", 8080),
|
||||
}
|
||||
```
|
||||
|
||||
**Step 4: Run service table tests**
|
||||
|
||||
```bash
|
||||
conda run -n job-seeker python -m pytest tests/test_preflight.py::test_ollama_research_in_services tests/test_preflight.py::test_ollama_research_in_llm_backends tests/test_preflight.py::test_vllm_research_in_llm_backends tests/test_preflight.py::test_ollama_research_in_docker_internal tests/test_preflight.py::test_ollama_not_mapped_to_ollama_research_backend tests/test_preflight.py::test_update_llm_yaml_sets_ollama_research_url_docker_internal tests/test_preflight.py::test_update_llm_yaml_sets_ollama_research_url_external -v
|
||||
```
|
||||
|
||||
Expected: all PASS
|
||||
|
||||
**Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add scripts/preflight.py
|
||||
git commit -m "feat: add ollama_research to preflight service table and LLM backend map"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 4: `preflight.py` — `_download_size_mb()` pure function
|
||||
|
||||
**Files:**
|
||||
- Modify: `scripts/preflight.py` (add new function after `calc_cpu_offload_gb`)
|
||||
|
||||
**Step 1: Add the function**
|
||||
|
||||
After `calc_cpu_offload_gb()`, add:
|
||||
|
||||
```python
|
||||
def _download_size_mb(profile: str, dual_gpu_mode: str = "ollama") -> dict[str, int]:
|
||||
"""
|
||||
Return estimated first-run download sizes in MB, keyed by component name.
|
||||
Profile-aware: only includes components that will actually be pulled.
|
||||
"""
|
||||
sizes: dict[str, int] = {
|
||||
"searxng": 300,
|
||||
"app": 1500,
|
||||
}
|
||||
if profile in ("cpu", "single-gpu", "dual-gpu"):
|
||||
sizes["ollama"] = 800
|
||||
sizes["llama3_2_3b"] = 2000
|
||||
if profile in ("single-gpu", "dual-gpu"):
|
||||
sizes["vision_image"] = 3000
|
||||
sizes["moondream2"] = 1800
|
||||
if profile == "dual-gpu" and dual_gpu_mode in ("vllm", "mixed"):
|
||||
sizes["vllm_image"] = 10000
|
||||
return sizes
|
||||
```
|
||||
|
||||
**Step 2: Run download size tests**
|
||||
|
||||
```bash
|
||||
conda run -n job-seeker python -m pytest tests/test_preflight.py -k "download_size" -v
|
||||
```
|
||||
|
||||
Expected: all PASS
|
||||
|
||||
**Step 3: Commit**
|
||||
|
||||
```bash
|
||||
git add scripts/preflight.py
|
||||
git commit -m "feat: add _download_size_mb() pure function for preflight size warning"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 5: `preflight.py` — VRAM warning, size report block, DUAL_GPU_MODE default
|
||||
|
||||
**Files:**
|
||||
- Modify: `scripts/preflight.py` (three additions to `main()` and a new helper)
|
||||
|
||||
**Step 1: Add `_mixed_mode_vram_warning()` after `_download_size_mb()`**
|
||||
|
||||
```python
|
||||
def _mixed_mode_vram_warning(gpus: list[dict], dual_gpu_mode: str) -> str | None:
|
||||
"""
|
||||
Return a warning string if GPU 1 likely lacks VRAM for mixed mode, else None.
|
||||
Only relevant when dual_gpu_mode == 'mixed' and at least 2 GPUs are present.
|
||||
"""
|
||||
if dual_gpu_mode != "mixed" or len(gpus) < 2:
|
||||
return None
|
||||
free = gpus[1]["vram_free_gb"]
|
||||
if free < 12:
|
||||
return (
|
||||
f"⚠ DUAL_GPU_MODE=mixed: GPU 1 has only {free:.1f} GB free — "
|
||||
f"running ollama_research + vllm together may cause OOM. "
|
||||
f"Consider DUAL_GPU_MODE=ollama or DUAL_GPU_MODE=vllm."
|
||||
)
|
||||
return None
|
||||
```
|
||||
|
||||
**Step 2: Run VRAM warning tests**
|
||||
|
||||
```bash
|
||||
conda run -n job-seeker python -m pytest tests/test_preflight.py -k "vram" -v
|
||||
```
|
||||
|
||||
Expected: all PASS
|
||||
|
||||
**Step 3: Wire size warning into `main()` report block**
|
||||
|
||||
In `main()`, find the closing `print("╚═...═╝")` line. Add the size warning block just before it:
|
||||
|
||||
```python
|
||||
# ── Download size warning ──────────────────────────────────────────────
|
||||
dual_gpu_mode = os.environ.get("DUAL_GPU_MODE", "ollama")
|
||||
sizes = _download_size_mb(profile, dual_gpu_mode)
|
||||
total_mb = sum(sizes.values())
|
||||
print("║")
|
||||
print("║ Download sizes (first-run estimates)")
|
||||
print("║ Docker images")
|
||||
print(f"║ app (Python build) ~{sizes.get('app', 0):,} MB")
|
||||
if "searxng" in sizes:
|
||||
print(f"║ searxng/searxng ~{sizes['searxng']:,} MB")
|
||||
if "ollama" in sizes:
|
||||
shared_note = " (shared by ollama + ollama_research)" if profile == "dual-gpu" and dual_gpu_mode in ("ollama", "mixed") else ""
|
||||
print(f"║ ollama/ollama ~{sizes['ollama']:,} MB{shared_note}")
|
||||
if "vision_image" in sizes:
|
||||
print(f"║ vision service ~{sizes['vision_image']:,} MB (torch + moondream)")
|
||||
if "vllm_image" in sizes:
|
||||
print(f"║ vllm/vllm-openai ~{sizes['vllm_image']:,} MB")
|
||||
print("║ Model weights (lazy-loaded on first use)")
|
||||
if "llama3_2_3b" in sizes:
|
||||
print(f"║ llama3.2:3b ~{sizes['llama3_2_3b']:,} MB → OLLAMA_MODELS_DIR")
|
||||
if "moondream2" in sizes:
|
||||
print(f"║ moondream2 ~{sizes['moondream2']:,} MB → vision container cache")
|
||||
if profile == "dual-gpu" and dual_gpu_mode in ("ollama", "mixed"):
|
||||
print("║ Note: ollama + ollama_research share model dir — no double download")
|
||||
print(f"║ ⚠ Total first-run: ~{total_mb / 1024:.1f} GB (models persist between restarts)")
|
||||
|
||||
# ── Mixed-mode VRAM warning ────────────────────────────────────────────
|
||||
vram_warn = _mixed_mode_vram_warning(gpus, dual_gpu_mode)
|
||||
if vram_warn:
|
||||
print("║")
|
||||
print(f"║ {vram_warn}")
|
||||
```
|
||||
|
||||
**Step 4: Wire `DUAL_GPU_MODE` default into `write_env()` block in `main()`**
|
||||
|
||||
In `main()`, find the `if not args.check_only:` block. After `env_updates["PEREGRINE_GPU_NAMES"]`, add:
|
||||
|
||||
```python
|
||||
# Write DUAL_GPU_MODE default for new 2-GPU setups (don't override user's choice)
|
||||
if len(gpus) >= 2:
|
||||
existing_env: dict[str, str] = {}
|
||||
if ENV_FILE.exists():
|
||||
for line in ENV_FILE.read_text().splitlines():
|
||||
if "=" in line and not line.startswith("#"):
|
||||
k, _, v = line.partition("=")
|
||||
existing_env[k.strip()] = v.strip()
|
||||
if "DUAL_GPU_MODE" not in existing_env:
|
||||
env_updates["DUAL_GPU_MODE"] = "ollama"
|
||||
```
|
||||
|
||||
**Step 5: Add `import os` if not already present at top of file**
|
||||
|
||||
Check line 1–30 of `scripts/preflight.py`. `import os` is already present inside `get_cpu_cores()` as a local import — move it to the top-level imports block:
|
||||
|
||||
```python
|
||||
import os # add alongside existing stdlib imports
|
||||
```
|
||||
|
||||
And remove the local `import os` inside `get_cpu_cores()`.
|
||||
|
||||
**Step 6: Run all preflight tests**
|
||||
|
||||
```bash
|
||||
conda run -n job-seeker python -m pytest tests/test_preflight.py -v
|
||||
```
|
||||
|
||||
Expected: all PASS
|
||||
|
||||
**Step 7: Smoke-check the preflight report output**
|
||||
|
||||
```bash
|
||||
conda run -n job-seeker python scripts/preflight.py --check-only
|
||||
```
|
||||
|
||||
Expected: report includes the `Download sizes` block near the bottom.
|
||||
|
||||
**Step 8: Commit**
|
||||
|
||||
```bash
|
||||
git add scripts/preflight.py
|
||||
git commit -m "feat: add DUAL_GPU_MODE default, VRAM warning, and download size report to preflight"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 6: `compose.yml` — `ollama_research` service + profile updates
|
||||
|
||||
**Files:**
|
||||
- Modify: `compose.yml`
|
||||
|
||||
**Step 1: Update `ollama` profiles line**
|
||||
|
||||
Find:
|
||||
```yaml
|
||||
profiles: [cpu, single-gpu, dual-gpu]
|
||||
```
|
||||
Replace with:
|
||||
```yaml
|
||||
profiles: [cpu, single-gpu, dual-gpu-ollama, dual-gpu-vllm, dual-gpu-mixed]
|
||||
```
|
||||
|
||||
**Step 2: Update `vision` profiles line**
|
||||
|
||||
Find:
|
||||
```yaml
|
||||
profiles: [single-gpu, dual-gpu]
|
||||
```
|
||||
Replace with:
|
||||
```yaml
|
||||
profiles: [single-gpu, dual-gpu-ollama, dual-gpu-vllm, dual-gpu-mixed]
|
||||
```
|
||||
|
||||
**Step 3: Update `vllm` profiles line**
|
||||
|
||||
Find:
|
||||
```yaml
|
||||
profiles: [dual-gpu]
|
||||
```
|
||||
Replace with:
|
||||
```yaml
|
||||
profiles: [dual-gpu-vllm, dual-gpu-mixed]
|
||||
```
|
||||
|
||||
**Step 4: Add `ollama_research` service**
|
||||
|
||||
After the closing lines of the `ollama` service block, add:
|
||||
|
||||
```yaml
|
||||
ollama_research:
|
||||
image: ollama/ollama:latest
|
||||
ports:
|
||||
- "${OLLAMA_RESEARCH_PORT:-11435}:11434"
|
||||
volumes:
|
||||
- ${OLLAMA_MODELS_DIR:-~/models/ollama}:/root/.ollama
|
||||
- ./docker/ollama/entrypoint.sh:/entrypoint.sh
|
||||
environment:
|
||||
- OLLAMA_MODELS=/root/.ollama
|
||||
- DEFAULT_OLLAMA_MODEL=${OLLAMA_RESEARCH_MODEL:-llama3.2:3b}
|
||||
entrypoint: ["/bin/bash", "/entrypoint.sh"]
|
||||
profiles: [dual-gpu-ollama, dual-gpu-mixed]
|
||||
restart: unless-stopped
|
||||
```
|
||||
|
||||
**Step 5: Validate compose YAML**
|
||||
|
||||
```bash
|
||||
docker compose -f compose.yml config --quiet
|
||||
```
|
||||
|
||||
Expected: no errors.
|
||||
|
||||
**Step 6: Commit**
|
||||
|
||||
```bash
|
||||
git add compose.yml
|
||||
git commit -m "feat: add ollama_research service and update profiles for dual-gpu sub-profiles"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 7: GPU overlay files — `compose.gpu.yml` and `compose.podman-gpu.yml`
|
||||
|
||||
**Files:**
|
||||
- Modify: `compose.gpu.yml`
|
||||
- Modify: `compose.podman-gpu.yml`
|
||||
|
||||
**Step 1: Add `ollama_research` to `compose.gpu.yml`**
|
||||
|
||||
After the `ollama:` block, add:
|
||||
|
||||
```yaml
|
||||
ollama_research:
|
||||
deploy:
|
||||
resources:
|
||||
reservations:
|
||||
devices:
|
||||
- driver: nvidia
|
||||
device_ids: ["1"]
|
||||
capabilities: [gpu]
|
||||
```
|
||||
|
||||
**Step 2: Add `ollama_research` to `compose.podman-gpu.yml`**
|
||||
|
||||
After the `ollama:` block, add:
|
||||
|
||||
```yaml
|
||||
ollama_research:
|
||||
devices:
|
||||
- nvidia.com/gpu=1
|
||||
deploy:
|
||||
resources:
|
||||
reservations:
|
||||
devices: []
|
||||
```
|
||||
|
||||
**Step 3: Validate both files**
|
||||
|
||||
```bash
|
||||
docker compose -f compose.yml -f compose.gpu.yml config --quiet
|
||||
```
|
||||
|
||||
Expected: no errors.
|
||||
|
||||
**Step 4: Commit**
|
||||
|
||||
```bash
|
||||
git add compose.gpu.yml compose.podman-gpu.yml
|
||||
git commit -m "feat: assign ollama_research to GPU 1 in Docker and Podman GPU overlays"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 8: `Makefile` + `manage.sh` — `DUAL_GPU_MODE` injection and help text
|
||||
|
||||
**Files:**
|
||||
- Modify: `Makefile`
|
||||
- Modify: `manage.sh`
|
||||
|
||||
**Step 1: Update `Makefile`**
|
||||
|
||||
After the `COMPOSE_OVERRIDE` variable, add `DUAL_GPU_MODE` reading:
|
||||
|
||||
```makefile
|
||||
DUAL_GPU_MODE ?= $(shell grep -m1 '^DUAL_GPU_MODE=' .env 2>/dev/null | cut -d= -f2 || echo ollama)
|
||||
```
|
||||
|
||||
In the GPU overlay block, find:
|
||||
```makefile
|
||||
else
|
||||
ifneq (,$(findstring gpu,$(PROFILE)))
|
||||
COMPOSE_FILES := -f compose.yml $(COMPOSE_OVERRIDE) -f compose.gpu.yml
|
||||
endif
|
||||
endif
|
||||
```
|
||||
|
||||
Replace the `else` branch with:
|
||||
```makefile
|
||||
else
|
||||
ifneq (,$(findstring gpu,$(PROFILE)))
|
||||
COMPOSE_FILES := -f compose.yml $(COMPOSE_OVERRIDE) -f compose.gpu.yml
|
||||
endif
|
||||
endif
|
||||
ifeq ($(PROFILE),dual-gpu)
|
||||
COMPOSE_FILES += --profile dual-gpu-$(DUAL_GPU_MODE)
|
||||
endif
|
||||
```
|
||||
|
||||
**Step 2: Update `manage.sh` — profiles help block**
|
||||
|
||||
Find the profiles section in `usage()`:
|
||||
```bash
|
||||
echo " dual-gpu Ollama + Vision + vLLM on GPU 0+1"
|
||||
```
|
||||
|
||||
Replace with:
|
||||
```bash
|
||||
echo " dual-gpu Ollama + Vision on GPU 0; GPU 1 set by DUAL_GPU_MODE"
|
||||
echo " DUAL_GPU_MODE=ollama (default) ollama_research on GPU 1"
|
||||
echo " DUAL_GPU_MODE=vllm vllm on GPU 1"
|
||||
echo " DUAL_GPU_MODE=mixed both on GPU 1 (VRAM-split)"
|
||||
```
|
||||
|
||||
**Step 3: Verify Makefile parses**
|
||||
|
||||
```bash
|
||||
make help
|
||||
```
|
||||
|
||||
Expected: help table prints cleanly, no make errors.
|
||||
|
||||
**Step 4: Verify manage.sh help**
|
||||
|
||||
```bash
|
||||
./manage.sh help
|
||||
```
|
||||
|
||||
Expected: new dual-gpu description appears in profiles section.
|
||||
|
||||
**Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add Makefile manage.sh
|
||||
git commit -m "feat: inject DUAL_GPU_MODE sub-profile in Makefile; update manage.sh help"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 9: Integration smoke test
|
||||
|
||||
**Goal:** Verify the full chain works for `DUAL_GPU_MODE=ollama` without actually starting Docker (dry-run compose config check).
|
||||
|
||||
**Step 1: Write `DUAL_GPU_MODE=ollama` to `.env` temporarily**
|
||||
|
||||
```bash
|
||||
echo "DUAL_GPU_MODE=ollama" >> .env
|
||||
```
|
||||
|
||||
**Step 2: Dry-run compose config for dual-gpu + dual-gpu-ollama**
|
||||
|
||||
```bash
|
||||
docker compose -f compose.yml -f compose.gpu.yml --profile dual-gpu --profile dual-gpu-ollama config 2>&1 | grep -E "^ [a-z]|image:|ports:"
|
||||
```
|
||||
|
||||
Expected output includes:
|
||||
- `ollama:` service with port 11434
|
||||
- `ollama_research:` service with port 11435
|
||||
- `vision:` service
|
||||
- `searxng:` service
|
||||
- **No** `vllm:` service
|
||||
|
||||
**Step 3: Dry-run for `DUAL_GPU_MODE=vllm`**
|
||||
|
||||
```bash
|
||||
docker compose -f compose.yml -f compose.gpu.yml --profile dual-gpu --profile dual-gpu-vllm config 2>&1 | grep -E "^ [a-z]|image:|ports:"
|
||||
```
|
||||
|
||||
Expected:
|
||||
- `ollama:` service (port 11434)
|
||||
- `vllm:` service (port 8000)
|
||||
- **No** `ollama_research:` service
|
||||
|
||||
**Step 4: Run full test suite**
|
||||
|
||||
```bash
|
||||
conda run -n job-seeker python -m pytest tests/ -v
|
||||
```
|
||||
|
||||
Expected: all existing tests PASS, all new preflight tests PASS.
|
||||
|
||||
**Step 5: Clean up `.env` test entry**
|
||||
|
||||
```bash
|
||||
# Remove the test DUAL_GPU_MODE line (preflight will re-write it correctly on next run)
|
||||
sed -i '/^DUAL_GPU_MODE=/d' .env
|
||||
```
|
||||
|
||||
**Step 6: Final commit**
|
||||
|
||||
```bash
|
||||
git add .env # in case preflight rewrote it during testing
|
||||
git commit -m "feat: dual-gpu DUAL_GPU_MODE complete — ollama/vllm/mixed GPU 1 selection"
|
||||
```
|
||||
Loading…
Reference in a new issue