Bug fixes (filed as #125–#128): - Wizard step 7 read data.titles instead of data.search.titles — user-entered job titles and locations were silently dropped on every wizard run (#125) - GET /api/settings/search returned "titles" key but store expected "job_titles" — Settings → Search Prefs always showed empty even when data existed (#126) - remote_only preference not persisted during wizard setup (#127) - apply-to-profile didn't set default_resume_id in user.yaml, so future Resume Profile saves never synced back to the library entry (#128) Also: - Wizard step headings corrected (off-by-one after Training step was inserted) - Ollama host in wizard inference step now reads from saved wizard state - Resume upload during wizard now creates a library entry and sets it as default Docs: - New: docs/user-guide/daily-workflow.md — end-to-end daily usage guide - Updated: docs/user-guide/settings.md — rewritten for Vue SPA (was Streamlit) - mkdocs.yml nav: Daily Workflow added as first User Guide entry GUI help links: - web/src/composables/useDocsUrl.ts — shared docs base URL composable - Home: "Daily Workflow guide ↗" link in subtitle - Job Review: "? Docs" link in title row - Resume Library: "? Help" link in header - Settings → Resume Profile: "? Help" link in page header - Settings → Search Prefs: "? Help" link in page header
8.4 KiB
Docker Profiles
Peregrine uses Docker Compose profiles to start only the services your hardware supports. Choose a profile with ./manage.sh start --profile <name>.
manage.sh delegates to make, which auto-detects Docker vs Podman and applies the correct GPU overlay — compose.gpu.yml for Docker, compose.podman-gpu.yml for Podman (CDI-based). You do not need to specify the overlay manually.
Profile Reference
| Profile | Services started | Use case |
|---|---|---|
cpu |
web, api, ollama, searxng |
No GPU. Local models on CPU. Recommended default for new installs. |
single-gpu |
web, api, ollama, vision, searxng |
One NVIDIA GPU. Covers cover letters, research, and vision. |
dual-gpu |
web, api, ollama, vllm, vision, searxng |
Two NVIDIA GPUs. GPU split controlled by DUAL_GPU_MODE. |
cf-orch |
web, api, searxng |
No local LLM. Inference routed to CircuitForge GPU cluster. Requires Paid license. |
remote |
web, api, searxng |
No local LLM. Inference goes to cloud API keys (Anthropic, OpenAI-compatible). |
memory |
(any + memory flag) | Enables RAM-optimised container limits for low-RAM machines. Combine with another profile. |
Service Descriptions
| Service | Image / Source | Host Port | Purpose |
|---|---|---|---|
web |
Dockerfile.web (Nginx + Vue SPA) |
VUE_PORT (default 8506) |
Main UI — serves the Vue frontend and proxies /api/ to api |
api |
Dockerfile (FastAPI) |
Internal only (proxied through web) |
REST API — all backend logic |
ollama |
ollama/ollama |
11434 | Local model inference — cover letters and general tasks |
vllm |
vllm/vllm-openai |
8000 | High-throughput inference — research tasks |
vision |
scripts/vision_service/ |
8002 | Moondream2 — survey screenshot analysis |
searxng |
searxng/searxng |
8888 | Private meta-search — company research web scraping |
The web container runs Nginx internally on port 80, mapped to VUE_PORT on the host. The Nginx config proxies /api/ requests to api:8601 — the FastAPI container is not exposed directly.
Choosing a Profile
cpu
Use cpu if:
- You have no GPU but want local inference (good for privacy)
- Acceptable for light use; cover letter generation may take several minutes per request
Pull a model after starting:
docker exec -it peregrine-ollama-1 ollama pull llama3.2:3b
llama3.2:3b is the recommended CPU model — it runs on machines with 8 GB of system RAM.
single-gpu
Use single-gpu if:
- You have one NVIDIA GPU with at least 8 GB VRAM
- Recommended for most single-user installs
The vision service (Moondream2) starts on the same GPU using 4-bit quantisation (~1.5 GB VRAM). Pull a model after starting:
docker exec -it peregrine-ollama-1 ollama pull llama3.1:8b
dual-gpu
Use dual-gpu if:
- You have two or more NVIDIA GPUs
- Default: GPU 0 handles Ollama (cover letters), GPU 1 handles vLLM (research)
See Dual-GPU Modes below to configure how the two GPUs are split.
cf-orch
Use cf-orch if:
- You have access to a CircuitForge GPU cluster running the cf-orch coordinator
- No local GPU required — inference is handled by the cluster
- Requires a Paid or higher license
Set CF_ORCH_URL in .env to your coordinator address:
CF_ORCH_URL=http://10.1.10.71:7700
The wizard hardware step lets you enter the URL interactively and verifies the connection before saving.
remote
Use remote if:
- You have no local GPU and no cf-orch cluster
- You are using Anthropic Claude, OpenAI, or another cloud API exclusively
Configure at least one external LLM backend in Settings → LLM Backends after first login.
memory (add-on)
Use the memory add-on alongside any profile for machines with limited RAM:
./manage.sh start --profile single-gpu --profile memory
This applies conservative container memory limits to prevent the OOM (out-of-memory) killer from terminating containers.
Dual-GPU Modes
When using dual-gpu, DUAL_GPU_MODE in .env controls how the second GPU is used:
| Mode | GPU 0 | GPU 1 | Use case |
|---|---|---|---|
mixed (default) |
Ollama | vLLM | Best overall: fast cover letters + high-throughput research |
ollama |
Ollama | Ollama | Both GPUs run Ollama; no vLLM; useful if vLLM models are too large for one card |
vllm |
vLLM | vLLM | Both GPUs run vLLM (tensor parallel); maximum research throughput |
Set in .env:
DUAL_GPU_MODE=mixed # default
# DUAL_GPU_MODE=ollama
# DUAL_GPU_MODE=vllm
The Makefile expands dual-gpu into --profile dual-gpu-$(DUAL_GPU_MODE) before passing it to docker compose. The compose.gpu.yml overlay defines the dual-gpu-mixed, dual-gpu-ollama, and dual-gpu-vllm profile variants.
GPU Memory Guidance
| GPU VRAM | Recommended profile | Notes |
|---|---|---|
| < 4 GB | cpu |
GPU too small for practical model loading |
| 4–8 GB | single-gpu |
Run smaller models (3B–8B parameters) |
| 8–16 GB | single-gpu |
Run 8B–13B models comfortably |
| 16–24 GB | single-gpu |
Run 13B–34B models |
| 24 GB+ (one card) | single-gpu |
70B models with quantisation |
| 16+ GB (two cards) | dual-gpu |
Parallel cover letters + research |
How preflight.py Works
./manage.sh start calls scripts/preflight.py before launching Docker. Preflight does the following:
-
Port conflict detection — checks whether
VUE_PORT,OLLAMA_PORT,VLLM_PORT,SEARXNG_PORT, andVISION_PORTare already in use. Reports any conflicts and suggests alternatives. -
External service adoption — if Ollama or SearXNG are already running on their configured ports (common when using native Ollama on macOS, or a shared SearXNG instance), preflight writes a
compose.override.ymlthat stubs out the duplicate containers. The running process is adopted rather than replaced. -
GPU enumeration — queries
nvidia-smifor GPU count and VRAM per card. On Apple Silicon Macs, falls back tosystem_profiler SPDisplaysDataTypeand returns unified memory as the VRAM figure. -
RAM check — reads
/proc/meminfo(Linux) orvm_stat(macOS) for available system RAM. -
KV cache offload — if GPU VRAM is less than 10 GB, preflight calculates
CPU_OFFLOAD_GBand writes it to.env. The vLLM container picks this up via--cpu-offload-gbto overflow the KV cache to system RAM. -
Profile recommendation — writes
RECOMMENDED_PROFILEto.env. This is informational only;./manage.sh start --profile <name>uses the profile you specify.
Run preflight independently at any time:
./manage.sh preflight
# or
conda run -n cf python scripts/preflight.py
Podman Support
Podman is fully supported as a Docker drop-in. install.sh detects whether Podman or Docker is available, and manage.sh/make use it automatically.
GPU setup for Podman (CDI)
Podman uses the CDI (Container Device Interface) standard for GPU passthrough, rather than Docker's --gpus all flag. Generate the CDI spec once after driver installation:
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
Without this step, GPU profiles start but containers have no GPU access.
Rootless Podman
Rootless Podman is supported. If you encounter permission errors on the Docker socket, ensure podman.socket is running for your user:
systemctl --user enable --now podman.socket
The make layer auto-detects rootless Podman and uses XDG_RUNTIME_DIR/podman/podman.sock instead of /var/run/docker.sock.
Customising Ports
Edit .env before running ./manage.sh start:
VUE_PORT=8506 # main UI (Vue SPA)
OLLAMA_PORT=11434
VLLM_PORT=8000
SEARXNG_PORT=8888
VISION_PORT=8002
All containers read from .env via the env_file directive in compose.yml.
Wizard Test Instance
A separate compose file is available for testing first-run and onboarding wizard flows without touching your main data:
docker compose -f compose.wizard-test.yml --project-name peregrine-wizard up -d
The wizard test instance runs on port 8507 with ephemeral storage — every docker compose restart wipes the database back to a clean slate. Uses the same images as the main instance but mounts a minimal LLM config so the wizard detection endpoints work correctly.