peregrine/docs/getting-started/docker-profiles.md
pyr0ball 82c26074d8 fix: search prefs wizard data loss, resume sync link, docs + GUI help links
Bug fixes (filed as #125–#128):
- Wizard step 7 read data.titles instead of data.search.titles — user-entered
  job titles and locations were silently dropped on every wizard run (#125)
- GET /api/settings/search returned "titles" key but store expected "job_titles" —
  Settings → Search Prefs always showed empty even when data existed (#126)
- remote_only preference not persisted during wizard setup (#127)
- apply-to-profile didn't set default_resume_id in user.yaml, so future
  Resume Profile saves never synced back to the library entry (#128)

Also:
- Wizard step headings corrected (off-by-one after Training step was inserted)
- Ollama host in wizard inference step now reads from saved wizard state
- Resume upload during wizard now creates a library entry and sets it as default

Docs:
- New: docs/user-guide/daily-workflow.md — end-to-end daily usage guide
- Updated: docs/user-guide/settings.md — rewritten for Vue SPA (was Streamlit)
- mkdocs.yml nav: Daily Workflow added as first User Guide entry

GUI help links:
- web/src/composables/useDocsUrl.ts — shared docs base URL composable
- Home: "Daily Workflow guide ↗" link in subtitle
- Job Review: "? Docs" link in title row
- Resume Library: "? Help" link in header
- Settings → Resume Profile: "? Help" link in page header
- Settings → Search Prefs: "? Help" link in page header
2026-06-15 16:52:56 -07:00

219 lines
8.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Docker Profiles
Peregrine uses Docker Compose profiles to start only the services your hardware supports. Choose a profile with `./manage.sh start --profile <name>`.
`manage.sh` delegates to `make`, which auto-detects Docker vs Podman and applies the correct GPU overlay — `compose.gpu.yml` for Docker, `compose.podman-gpu.yml` for Podman (CDI-based). You do not need to specify the overlay manually.
---
## Profile Reference
| Profile | Services started | Use case |
|---------|-----------------|----------|
| `cpu` | `web`, `api`, `ollama`, `searxng` | No GPU. Local models on CPU. Recommended default for new installs. |
| `single-gpu` | `web`, `api`, `ollama`, `vision`, `searxng` | One NVIDIA GPU. Covers cover letters, research, and vision. |
| `dual-gpu` | `web`, `api`, `ollama`, `vllm`, `vision`, `searxng` | Two NVIDIA GPUs. GPU split controlled by `DUAL_GPU_MODE`. |
| `cf-orch` | `web`, `api`, `searxng` | No local LLM. Inference routed to CircuitForge GPU cluster. Requires Paid license. |
| `remote` | `web`, `api`, `searxng` | No local LLM. Inference goes to cloud API keys (Anthropic, OpenAI-compatible). |
| `memory` | (any + memory flag) | Enables RAM-optimised container limits for low-RAM machines. Combine with another profile. |
---
## Service Descriptions
| Service | Image / Source | Host Port | Purpose |
|---------|---------------|-----------|---------|
| `web` | `Dockerfile.web` (Nginx + Vue SPA) | `VUE_PORT` (default 8506) | Main UI — serves the Vue frontend and proxies `/api/` to `api` |
| `api` | `Dockerfile` (FastAPI) | Internal only (proxied through `web`) | REST API — all backend logic |
| `ollama` | `ollama/ollama` | 11434 | Local model inference — cover letters and general tasks |
| `vllm` | `vllm/vllm-openai` | 8000 | High-throughput inference — research tasks |
| `vision` | `scripts/vision_service/` | 8002 | Moondream2 — survey screenshot analysis |
| `searxng` | `searxng/searxng` | 8888 | Private meta-search — company research web scraping |
The `web` container runs Nginx internally on port 80, mapped to `VUE_PORT` on the host. The Nginx config proxies `/api/` requests to `api:8601` — the FastAPI container is not exposed directly.
---
## Choosing a Profile
### cpu
Use `cpu` if:
- You have no GPU but want local inference (good for privacy)
- Acceptable for light use; cover letter generation may take several minutes per request
Pull a model after starting:
```bash
docker exec -it peregrine-ollama-1 ollama pull llama3.2:3b
```
`llama3.2:3b` is the recommended CPU model — it runs on machines with 8 GB of system RAM.
### single-gpu
Use `single-gpu` if:
- You have one NVIDIA GPU with at least 8 GB VRAM
- Recommended for most single-user installs
The vision service (Moondream2) starts on the same GPU using 4-bit quantisation (~1.5 GB VRAM). Pull a model after starting:
```bash
docker exec -it peregrine-ollama-1 ollama pull llama3.1:8b
```
### dual-gpu
Use `dual-gpu` if:
- You have two or more NVIDIA GPUs
- Default: GPU 0 handles Ollama (cover letters), GPU 1 handles vLLM (research)
See [Dual-GPU Modes](#dual-gpu-modes) below to configure how the two GPUs are split.
### cf-orch
Use `cf-orch` if:
- You have access to a CircuitForge GPU cluster running the cf-orch coordinator
- No local GPU required — inference is handled by the cluster
- Requires a Paid or higher license
Set `CF_ORCH_URL` in `.env` to your coordinator address:
```bash
CF_ORCH_URL=http://10.1.10.71:7700
```
The wizard hardware step lets you enter the URL interactively and verifies the connection before saving.
### remote
Use `remote` if:
- You have no local GPU and no cf-orch cluster
- You are using Anthropic Claude, OpenAI, or another cloud API exclusively
Configure at least one external LLM backend in **Settings → LLM Backends** after first login.
### memory (add-on)
Use the `memory` add-on alongside any profile for machines with limited RAM:
```bash
./manage.sh start --profile single-gpu --profile memory
```
This applies conservative container memory limits to prevent the OOM (out-of-memory) killer from terminating containers.
---
## Dual-GPU Modes
When using `dual-gpu`, `DUAL_GPU_MODE` in `.env` controls how the second GPU is used:
| Mode | GPU 0 | GPU 1 | Use case |
|------|-------|-------|----------|
| `mixed` (default) | Ollama | vLLM | Best overall: fast cover letters + high-throughput research |
| `ollama` | Ollama | Ollama | Both GPUs run Ollama; no vLLM; useful if vLLM models are too large for one card |
| `vllm` | vLLM | vLLM | Both GPUs run vLLM (tensor parallel); maximum research throughput |
Set in `.env`:
```bash
DUAL_GPU_MODE=mixed # default
# DUAL_GPU_MODE=ollama
# DUAL_GPU_MODE=vllm
```
The Makefile expands `dual-gpu` into `--profile dual-gpu-$(DUAL_GPU_MODE)` before passing it to `docker compose`. The `compose.gpu.yml` overlay defines the `dual-gpu-mixed`, `dual-gpu-ollama`, and `dual-gpu-vllm` profile variants.
---
## GPU Memory Guidance
| GPU VRAM | Recommended profile | Notes |
|----------|-------------------|-------|
| < 4 GB | `cpu` | GPU too small for practical model loading |
| 48 GB | `single-gpu` | Run smaller models (3B8B parameters) |
| 816 GB | `single-gpu` | Run 8B13B models comfortably |
| 1624 GB | `single-gpu` | Run 13B34B models |
| 24 GB+ (one card) | `single-gpu` | 70B models with quantisation |
| 16+ GB (two cards) | `dual-gpu` | Parallel cover letters + research |
---
## How preflight.py Works
`./manage.sh start` calls `scripts/preflight.py` before launching Docker. Preflight does the following:
1. **Port conflict detection** checks whether `VUE_PORT`, `OLLAMA_PORT`, `VLLM_PORT`, `SEARXNG_PORT`, and `VISION_PORT` are already in use. Reports any conflicts and suggests alternatives.
2. **External service adoption** if Ollama or SearXNG are already running on their configured ports (common when using native Ollama on macOS, or a shared SearXNG instance), preflight writes a `compose.override.yml` that stubs out the duplicate containers. The running process is adopted rather than replaced.
3. **GPU enumeration** queries `nvidia-smi` for GPU count and VRAM per card. On Apple Silicon Macs, falls back to `system_profiler SPDisplaysDataType` and returns unified memory as the VRAM figure.
4. **RAM check** reads `/proc/meminfo` (Linux) or `vm_stat` (macOS) for available system RAM.
5. **KV cache offload** if GPU VRAM is less than 10 GB, preflight calculates `CPU_OFFLOAD_GB` and writes it to `.env`. The vLLM container picks this up via `--cpu-offload-gb` to overflow the KV cache to system RAM.
6. **Profile recommendation** writes `RECOMMENDED_PROFILE` to `.env`. This is informational only; `./manage.sh start --profile <name>` uses the profile you specify.
Run preflight independently at any time:
```bash
./manage.sh preflight
# or
conda run -n cf python scripts/preflight.py
```
---
## Podman Support
Podman is fully supported as a Docker drop-in. `install.sh` detects whether Podman or Docker is available, and `manage.sh`/`make` use it automatically.
### GPU setup for Podman (CDI)
Podman uses the CDI (Container Device Interface) standard for GPU passthrough, rather than Docker's `--gpus all` flag. Generate the CDI spec once after driver installation:
```bash
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
```
Without this step, GPU profiles start but containers have no GPU access.
### Rootless Podman
Rootless Podman is supported. If you encounter permission errors on the Docker socket, ensure `podman.socket` is running for your user:
```bash
systemctl --user enable --now podman.socket
```
The `make` layer auto-detects rootless Podman and uses `XDG_RUNTIME_DIR/podman/podman.sock` instead of `/var/run/docker.sock`.
---
## Customising Ports
Edit `.env` before running `./manage.sh start`:
```bash
VUE_PORT=8506 # main UI (Vue SPA)
OLLAMA_PORT=11434
VLLM_PORT=8000
SEARXNG_PORT=8888
VISION_PORT=8002
```
All containers read from `.env` via the `env_file` directive in `compose.yml`.
---
## Wizard Test Instance
A separate compose file is available for testing first-run and onboarding wizard flows without touching your main data:
```bash
docker compose -f compose.wizard-test.yml --project-name peregrine-wizard up -d
```
The wizard test instance runs on port **8507** with ephemeral storage every `docker compose restart` wipes the database back to a clean slate. Uses the same images as the main instance but mounts a minimal LLM config so the wizard detection endpoints work correctly.