Bug fixes (filed as #125–#128): - Wizard step 7 read data.titles instead of data.search.titles — user-entered job titles and locations were silently dropped on every wizard run (#125) - GET /api/settings/search returned "titles" key but store expected "job_titles" — Settings → Search Prefs always showed empty even when data existed (#126) - remote_only preference not persisted during wizard setup (#127) - apply-to-profile didn't set default_resume_id in user.yaml, so future Resume Profile saves never synced back to the library entry (#128) Also: - Wizard step headings corrected (off-by-one after Training step was inserted) - Ollama host in wizard inference step now reads from saved wizard state - Resume upload during wizard now creates a library entry and sets it as default Docs: - New: docs/user-guide/daily-workflow.md — end-to-end daily usage guide - Updated: docs/user-guide/settings.md — rewritten for Vue SPA (was Streamlit) - mkdocs.yml nav: Daily Workflow added as first User Guide entry GUI help links: - web/src/composables/useDocsUrl.ts — shared docs base URL composable - Home: "Daily Workflow guide ↗" link in subtitle - Job Review: "? Docs" link in title row - Resume Library: "? Help" link in header - Settings → Resume Profile: "? Help" link in page header - Settings → Search Prefs: "? Help" link in page header
219 lines
8.4 KiB
Markdown
219 lines
8.4 KiB
Markdown
# Docker Profiles
|
||
|
||
Peregrine uses Docker Compose profiles to start only the services your hardware supports. Choose a profile with `./manage.sh start --profile <name>`.
|
||
|
||
`manage.sh` delegates to `make`, which auto-detects Docker vs Podman and applies the correct GPU overlay — `compose.gpu.yml` for Docker, `compose.podman-gpu.yml` for Podman (CDI-based). You do not need to specify the overlay manually.
|
||
|
||
---
|
||
|
||
## Profile Reference
|
||
|
||
| Profile | Services started | Use case |
|
||
|---------|-----------------|----------|
|
||
| `cpu` | `web`, `api`, `ollama`, `searxng` | No GPU. Local models on CPU. Recommended default for new installs. |
|
||
| `single-gpu` | `web`, `api`, `ollama`, `vision`, `searxng` | One NVIDIA GPU. Covers cover letters, research, and vision. |
|
||
| `dual-gpu` | `web`, `api`, `ollama`, `vllm`, `vision`, `searxng` | Two NVIDIA GPUs. GPU split controlled by `DUAL_GPU_MODE`. |
|
||
| `cf-orch` | `web`, `api`, `searxng` | No local LLM. Inference routed to CircuitForge GPU cluster. Requires Paid license. |
|
||
| `remote` | `web`, `api`, `searxng` | No local LLM. Inference goes to cloud API keys (Anthropic, OpenAI-compatible). |
|
||
| `memory` | (any + memory flag) | Enables RAM-optimised container limits for low-RAM machines. Combine with another profile. |
|
||
|
||
---
|
||
|
||
## Service Descriptions
|
||
|
||
| Service | Image / Source | Host Port | Purpose |
|
||
|---------|---------------|-----------|---------|
|
||
| `web` | `Dockerfile.web` (Nginx + Vue SPA) | `VUE_PORT` (default 8506) | Main UI — serves the Vue frontend and proxies `/api/` to `api` |
|
||
| `api` | `Dockerfile` (FastAPI) | Internal only (proxied through `web`) | REST API — all backend logic |
|
||
| `ollama` | `ollama/ollama` | 11434 | Local model inference — cover letters and general tasks |
|
||
| `vllm` | `vllm/vllm-openai` | 8000 | High-throughput inference — research tasks |
|
||
| `vision` | `scripts/vision_service/` | 8002 | Moondream2 — survey screenshot analysis |
|
||
| `searxng` | `searxng/searxng` | 8888 | Private meta-search — company research web scraping |
|
||
|
||
The `web` container runs Nginx internally on port 80, mapped to `VUE_PORT` on the host. The Nginx config proxies `/api/` requests to `api:8601` — the FastAPI container is not exposed directly.
|
||
|
||
---
|
||
|
||
## Choosing a Profile
|
||
|
||
### cpu
|
||
|
||
Use `cpu` if:
|
||
- You have no GPU but want local inference (good for privacy)
|
||
- Acceptable for light use; cover letter generation may take several minutes per request
|
||
|
||
Pull a model after starting:
|
||
|
||
```bash
|
||
docker exec -it peregrine-ollama-1 ollama pull llama3.2:3b
|
||
```
|
||
|
||
`llama3.2:3b` is the recommended CPU model — it runs on machines with 8 GB of system RAM.
|
||
|
||
### single-gpu
|
||
|
||
Use `single-gpu` if:
|
||
- You have one NVIDIA GPU with at least 8 GB VRAM
|
||
- Recommended for most single-user installs
|
||
|
||
The vision service (Moondream2) starts on the same GPU using 4-bit quantisation (~1.5 GB VRAM). Pull a model after starting:
|
||
|
||
```bash
|
||
docker exec -it peregrine-ollama-1 ollama pull llama3.1:8b
|
||
```
|
||
|
||
### dual-gpu
|
||
|
||
Use `dual-gpu` if:
|
||
- You have two or more NVIDIA GPUs
|
||
- Default: GPU 0 handles Ollama (cover letters), GPU 1 handles vLLM (research)
|
||
|
||
See [Dual-GPU Modes](#dual-gpu-modes) below to configure how the two GPUs are split.
|
||
|
||
### cf-orch
|
||
|
||
Use `cf-orch` if:
|
||
- You have access to a CircuitForge GPU cluster running the cf-orch coordinator
|
||
- No local GPU required — inference is handled by the cluster
|
||
- Requires a Paid or higher license
|
||
|
||
Set `CF_ORCH_URL` in `.env` to your coordinator address:
|
||
|
||
```bash
|
||
CF_ORCH_URL=http://10.1.10.71:7700
|
||
```
|
||
|
||
The wizard hardware step lets you enter the URL interactively and verifies the connection before saving.
|
||
|
||
### remote
|
||
|
||
Use `remote` if:
|
||
- You have no local GPU and no cf-orch cluster
|
||
- You are using Anthropic Claude, OpenAI, or another cloud API exclusively
|
||
|
||
Configure at least one external LLM backend in **Settings → LLM Backends** after first login.
|
||
|
||
### memory (add-on)
|
||
|
||
Use the `memory` add-on alongside any profile for machines with limited RAM:
|
||
|
||
```bash
|
||
./manage.sh start --profile single-gpu --profile memory
|
||
```
|
||
|
||
This applies conservative container memory limits to prevent the OOM (out-of-memory) killer from terminating containers.
|
||
|
||
---
|
||
|
||
## Dual-GPU Modes
|
||
|
||
When using `dual-gpu`, `DUAL_GPU_MODE` in `.env` controls how the second GPU is used:
|
||
|
||
| Mode | GPU 0 | GPU 1 | Use case |
|
||
|------|-------|-------|----------|
|
||
| `mixed` (default) | Ollama | vLLM | Best overall: fast cover letters + high-throughput research |
|
||
| `ollama` | Ollama | Ollama | Both GPUs run Ollama; no vLLM; useful if vLLM models are too large for one card |
|
||
| `vllm` | vLLM | vLLM | Both GPUs run vLLM (tensor parallel); maximum research throughput |
|
||
|
||
Set in `.env`:
|
||
|
||
```bash
|
||
DUAL_GPU_MODE=mixed # default
|
||
# DUAL_GPU_MODE=ollama
|
||
# DUAL_GPU_MODE=vllm
|
||
```
|
||
|
||
The Makefile expands `dual-gpu` into `--profile dual-gpu-$(DUAL_GPU_MODE)` before passing it to `docker compose`. The `compose.gpu.yml` overlay defines the `dual-gpu-mixed`, `dual-gpu-ollama`, and `dual-gpu-vllm` profile variants.
|
||
|
||
---
|
||
|
||
## GPU Memory Guidance
|
||
|
||
| GPU VRAM | Recommended profile | Notes |
|
||
|----------|-------------------|-------|
|
||
| < 4 GB | `cpu` | GPU too small for practical model loading |
|
||
| 4–8 GB | `single-gpu` | Run smaller models (3B–8B parameters) |
|
||
| 8–16 GB | `single-gpu` | Run 8B–13B models comfortably |
|
||
| 16–24 GB | `single-gpu` | Run 13B–34B models |
|
||
| 24 GB+ (one card) | `single-gpu` | 70B models with quantisation |
|
||
| 16+ GB (two cards) | `dual-gpu` | Parallel cover letters + research |
|
||
|
||
---
|
||
|
||
## How preflight.py Works
|
||
|
||
`./manage.sh start` calls `scripts/preflight.py` before launching Docker. Preflight does the following:
|
||
|
||
1. **Port conflict detection** — checks whether `VUE_PORT`, `OLLAMA_PORT`, `VLLM_PORT`, `SEARXNG_PORT`, and `VISION_PORT` are already in use. Reports any conflicts and suggests alternatives.
|
||
|
||
2. **External service adoption** — if Ollama or SearXNG are already running on their configured ports (common when using native Ollama on macOS, or a shared SearXNG instance), preflight writes a `compose.override.yml` that stubs out the duplicate containers. The running process is adopted rather than replaced.
|
||
|
||
3. **GPU enumeration** — queries `nvidia-smi` for GPU count and VRAM per card. On Apple Silicon Macs, falls back to `system_profiler SPDisplaysDataType` and returns unified memory as the VRAM figure.
|
||
|
||
4. **RAM check** — reads `/proc/meminfo` (Linux) or `vm_stat` (macOS) for available system RAM.
|
||
|
||
5. **KV cache offload** — if GPU VRAM is less than 10 GB, preflight calculates `CPU_OFFLOAD_GB` and writes it to `.env`. The vLLM container picks this up via `--cpu-offload-gb` to overflow the KV cache to system RAM.
|
||
|
||
6. **Profile recommendation** — writes `RECOMMENDED_PROFILE` to `.env`. This is informational only; `./manage.sh start --profile <name>` uses the profile you specify.
|
||
|
||
Run preflight independently at any time:
|
||
|
||
```bash
|
||
./manage.sh preflight
|
||
# or
|
||
conda run -n cf python scripts/preflight.py
|
||
```
|
||
|
||
---
|
||
|
||
## Podman Support
|
||
|
||
Podman is fully supported as a Docker drop-in. `install.sh` detects whether Podman or Docker is available, and `manage.sh`/`make` use it automatically.
|
||
|
||
### GPU setup for Podman (CDI)
|
||
|
||
Podman uses the CDI (Container Device Interface) standard for GPU passthrough, rather than Docker's `--gpus all` flag. Generate the CDI spec once after driver installation:
|
||
|
||
```bash
|
||
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
|
||
```
|
||
|
||
Without this step, GPU profiles start but containers have no GPU access.
|
||
|
||
### Rootless Podman
|
||
|
||
Rootless Podman is supported. If you encounter permission errors on the Docker socket, ensure `podman.socket` is running for your user:
|
||
|
||
```bash
|
||
systemctl --user enable --now podman.socket
|
||
```
|
||
|
||
The `make` layer auto-detects rootless Podman and uses `XDG_RUNTIME_DIR/podman/podman.sock` instead of `/var/run/docker.sock`.
|
||
|
||
---
|
||
|
||
## Customising Ports
|
||
|
||
Edit `.env` before running `./manage.sh start`:
|
||
|
||
```bash
|
||
VUE_PORT=8506 # main UI (Vue SPA)
|
||
OLLAMA_PORT=11434
|
||
VLLM_PORT=8000
|
||
SEARXNG_PORT=8888
|
||
VISION_PORT=8002
|
||
```
|
||
|
||
All containers read from `.env` via the `env_file` directive in `compose.yml`.
|
||
|
||
---
|
||
|
||
## Wizard Test Instance
|
||
|
||
A separate compose file is available for testing first-run and onboarding wizard flows without touching your main data:
|
||
|
||
```bash
|
||
docker compose -f compose.wizard-test.yml --project-name peregrine-wizard up -d
|
||
```
|
||
|
||
The wizard test instance runs on port **8507** with ephemeral storage — every `docker compose restart` wipes the database back to a clean slate. Uses the same images as the main instance but mounts a minimal LLM config so the wizard detection endpoints work correctly.
|