peregrine/docs/getting-started/docker-profiles.md
pyr0ball 82c26074d8 fix: search prefs wizard data loss, resume sync link, docs + GUI help links
Bug fixes (filed as #125–#128):
- Wizard step 7 read data.titles instead of data.search.titles — user-entered
  job titles and locations were silently dropped on every wizard run (#125)
- GET /api/settings/search returned "titles" key but store expected "job_titles" —
  Settings → Search Prefs always showed empty even when data existed (#126)
- remote_only preference not persisted during wizard setup (#127)
- apply-to-profile didn't set default_resume_id in user.yaml, so future
  Resume Profile saves never synced back to the library entry (#128)

Also:
- Wizard step headings corrected (off-by-one after Training step was inserted)
- Ollama host in wizard inference step now reads from saved wizard state
- Resume upload during wizard now creates a library entry and sets it as default

Docs:
- New: docs/user-guide/daily-workflow.md — end-to-end daily usage guide
- Updated: docs/user-guide/settings.md — rewritten for Vue SPA (was Streamlit)
- mkdocs.yml nav: Daily Workflow added as first User Guide entry

GUI help links:
- web/src/composables/useDocsUrl.ts — shared docs base URL composable
- Home: "Daily Workflow guide ↗" link in subtitle
- Job Review: "? Docs" link in title row
- Resume Library: "? Help" link in header
- Settings → Resume Profile: "? Help" link in page header
- Settings → Search Prefs: "? Help" link in page header
2026-06-15 16:52:56 -07:00

8.4 KiB
Raw Blame History

Docker Profiles

Peregrine uses Docker Compose profiles to start only the services your hardware supports. Choose a profile with ./manage.sh start --profile <name>.

manage.sh delegates to make, which auto-detects Docker vs Podman and applies the correct GPU overlay — compose.gpu.yml for Docker, compose.podman-gpu.yml for Podman (CDI-based). You do not need to specify the overlay manually.


Profile Reference

Profile Services started Use case
cpu web, api, ollama, searxng No GPU. Local models on CPU. Recommended default for new installs.
single-gpu web, api, ollama, vision, searxng One NVIDIA GPU. Covers cover letters, research, and vision.
dual-gpu web, api, ollama, vllm, vision, searxng Two NVIDIA GPUs. GPU split controlled by DUAL_GPU_MODE.
cf-orch web, api, searxng No local LLM. Inference routed to CircuitForge GPU cluster. Requires Paid license.
remote web, api, searxng No local LLM. Inference goes to cloud API keys (Anthropic, OpenAI-compatible).
memory (any + memory flag) Enables RAM-optimised container limits for low-RAM machines. Combine with another profile.

Service Descriptions

Service Image / Source Host Port Purpose
web Dockerfile.web (Nginx + Vue SPA) VUE_PORT (default 8506) Main UI — serves the Vue frontend and proxies /api/ to api
api Dockerfile (FastAPI) Internal only (proxied through web) REST API — all backend logic
ollama ollama/ollama 11434 Local model inference — cover letters and general tasks
vllm vllm/vllm-openai 8000 High-throughput inference — research tasks
vision scripts/vision_service/ 8002 Moondream2 — survey screenshot analysis
searxng searxng/searxng 8888 Private meta-search — company research web scraping

The web container runs Nginx internally on port 80, mapped to VUE_PORT on the host. The Nginx config proxies /api/ requests to api:8601 — the FastAPI container is not exposed directly.


Choosing a Profile

cpu

Use cpu if:

  • You have no GPU but want local inference (good for privacy)
  • Acceptable for light use; cover letter generation may take several minutes per request

Pull a model after starting:

docker exec -it peregrine-ollama-1 ollama pull llama3.2:3b

llama3.2:3b is the recommended CPU model — it runs on machines with 8 GB of system RAM.

single-gpu

Use single-gpu if:

  • You have one NVIDIA GPU with at least 8 GB VRAM
  • Recommended for most single-user installs

The vision service (Moondream2) starts on the same GPU using 4-bit quantisation (~1.5 GB VRAM). Pull a model after starting:

docker exec -it peregrine-ollama-1 ollama pull llama3.1:8b

dual-gpu

Use dual-gpu if:

  • You have two or more NVIDIA GPUs
  • Default: GPU 0 handles Ollama (cover letters), GPU 1 handles vLLM (research)

See Dual-GPU Modes below to configure how the two GPUs are split.

cf-orch

Use cf-orch if:

  • You have access to a CircuitForge GPU cluster running the cf-orch coordinator
  • No local GPU required — inference is handled by the cluster
  • Requires a Paid or higher license

Set CF_ORCH_URL in .env to your coordinator address:

CF_ORCH_URL=http://10.1.10.71:7700

The wizard hardware step lets you enter the URL interactively and verifies the connection before saving.

remote

Use remote if:

  • You have no local GPU and no cf-orch cluster
  • You are using Anthropic Claude, OpenAI, or another cloud API exclusively

Configure at least one external LLM backend in Settings → LLM Backends after first login.

memory (add-on)

Use the memory add-on alongside any profile for machines with limited RAM:

./manage.sh start --profile single-gpu --profile memory

This applies conservative container memory limits to prevent the OOM (out-of-memory) killer from terminating containers.


Dual-GPU Modes

When using dual-gpu, DUAL_GPU_MODE in .env controls how the second GPU is used:

Mode GPU 0 GPU 1 Use case
mixed (default) Ollama vLLM Best overall: fast cover letters + high-throughput research
ollama Ollama Ollama Both GPUs run Ollama; no vLLM; useful if vLLM models are too large for one card
vllm vLLM vLLM Both GPUs run vLLM (tensor parallel); maximum research throughput

Set in .env:

DUAL_GPU_MODE=mixed    # default
# DUAL_GPU_MODE=ollama
# DUAL_GPU_MODE=vllm

The Makefile expands dual-gpu into --profile dual-gpu-$(DUAL_GPU_MODE) before passing it to docker compose. The compose.gpu.yml overlay defines the dual-gpu-mixed, dual-gpu-ollama, and dual-gpu-vllm profile variants.


GPU Memory Guidance

GPU VRAM Recommended profile Notes
< 4 GB cpu GPU too small for practical model loading
48 GB single-gpu Run smaller models (3B8B parameters)
816 GB single-gpu Run 8B13B models comfortably
1624 GB single-gpu Run 13B34B models
24 GB+ (one card) single-gpu 70B models with quantisation
16+ GB (two cards) dual-gpu Parallel cover letters + research

How preflight.py Works

./manage.sh start calls scripts/preflight.py before launching Docker. Preflight does the following:

  1. Port conflict detection — checks whether VUE_PORT, OLLAMA_PORT, VLLM_PORT, SEARXNG_PORT, and VISION_PORT are already in use. Reports any conflicts and suggests alternatives.

  2. External service adoption — if Ollama or SearXNG are already running on their configured ports (common when using native Ollama on macOS, or a shared SearXNG instance), preflight writes a compose.override.yml that stubs out the duplicate containers. The running process is adopted rather than replaced.

  3. GPU enumeration — queries nvidia-smi for GPU count and VRAM per card. On Apple Silicon Macs, falls back to system_profiler SPDisplaysDataType and returns unified memory as the VRAM figure.

  4. RAM check — reads /proc/meminfo (Linux) or vm_stat (macOS) for available system RAM.

  5. KV cache offload — if GPU VRAM is less than 10 GB, preflight calculates CPU_OFFLOAD_GB and writes it to .env. The vLLM container picks this up via --cpu-offload-gb to overflow the KV cache to system RAM.

  6. Profile recommendation — writes RECOMMENDED_PROFILE to .env. This is informational only; ./manage.sh start --profile <name> uses the profile you specify.

Run preflight independently at any time:

./manage.sh preflight
# or
conda run -n cf python scripts/preflight.py

Podman Support

Podman is fully supported as a Docker drop-in. install.sh detects whether Podman or Docker is available, and manage.sh/make use it automatically.

GPU setup for Podman (CDI)

Podman uses the CDI (Container Device Interface) standard for GPU passthrough, rather than Docker's --gpus all flag. Generate the CDI spec once after driver installation:

sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml

Without this step, GPU profiles start but containers have no GPU access.

Rootless Podman

Rootless Podman is supported. If you encounter permission errors on the Docker socket, ensure podman.socket is running for your user:

systemctl --user enable --now podman.socket

The make layer auto-detects rootless Podman and uses XDG_RUNTIME_DIR/podman/podman.sock instead of /var/run/docker.sock.


Customising Ports

Edit .env before running ./manage.sh start:

VUE_PORT=8506          # main UI (Vue SPA)
OLLAMA_PORT=11434
VLLM_PORT=8000
SEARXNG_PORT=8888
VISION_PORT=8002

All containers read from .env via the env_file directive in compose.yml.


Wizard Test Instance

A separate compose file is available for testing first-run and onboarding wizard flows without touching your main data:

docker compose -f compose.wizard-test.yml --project-name peregrine-wizard up -d

The wizard test instance runs on port 8507 with ephemeral storage — every docker compose restart wipes the database back to a clean slate. Uses the same images as the main instance but mounts a minimal LLM config so the wizard detection endpoints work correctly.