peregrine/docs/getting-started/docker-profiles.md
pyr0ball e6410498af docs: mkdocs wiki — installation, user guide, developer guide, reference
Adds a full MkDocs documentation site under docs/ with Material theme.

Getting Started: installation walkthrough, 7-step first-run wizard guide,
Docker Compose profile reference with GPU memory guidance and preflight.py
description.

User Guide: job discovery (search profiles, custom boards, enrichment),
job review (sorting, match scores, batch actions), apply workspace (cover
letter gen, PDF export, mark applied), interviews (kanban stages, company
research auto-trigger, survey assistant), email sync (IMAP, Gmail App
Password, classification labels, stage auto-updates), integrations (all 13
drivers with tier requirements), settings (every tab documented).

Developer Guide: contributing (dev env setup, code style, branch naming, PR
checklist), architecture (ASCII layer diagram, design decisions), adding
scrapers (full scrape() interface, registration, search profile config,
test patterns), adding integrations (IntegrationBase full interface, auto-
discovery, tier gating, test patterns), testing (patterns, fixtures, what
not to test).

Reference: tier system (full FEATURES table, can_use/tier_label API, dev
override, adding gates), LLM router (backend types, complete() signature,
fallback chains, vision routing, __auto__ resolution, adding backends),
config files (every file with field-level docs and gitignore status).

Also adds CONTRIBUTING.md at repo root pointing to the docs site.
2026-02-25 12:05:49 -08:00

4 KiB
Raw Blame History

Docker Profiles

Peregrine uses Docker Compose profiles to start only the services your hardware can support. Choose a profile with make start PROFILE=<name>.


Profile Reference

Profile Services started Use case
remote app, searxng No GPU. LLM calls go to an external API (Anthropic, OpenAI-compatible).
cpu app, ollama, searxng No GPU. Runs local models on CPU — functional but slow.
single-gpu app, ollama, vision, searxng One NVIDIA GPU. Covers cover letters, research, and vision (survey screenshots).
dual-gpu app, ollama, vllm, vision, searxng Two NVIDIA GPUs. GPU 0 = Ollama (cover letters), GPU 1 = vLLM (research).

Service Descriptions

Service Image / Source Port Purpose
app Dockerfile (Streamlit) 8501 The main Peregrine UI
ollama ollama/ollama 11434 Local model inference — cover letters and general tasks
vllm vllm/vllm-openai 8000 High-throughput local inference — research tasks
vision scripts/vision_service/ 8002 Moondream2 — survey screenshot analysis
searxng searxng/searxng 8888 Private meta-search engine — company research web scraping

Choosing a Profile

remote

Use remote if:

  • You have no NVIDIA GPU
  • You plan to use Anthropic Claude or another API-hosted model exclusively
  • You want the fastest startup (only two containers)

You must configure at least one external LLM backend in Settings → LLM Backends.

cpu

Use cpu if:

  • You have no GPU but want to run models locally (e.g. for privacy)
  • Acceptable for light use; cover letter generation may take several minutes per request

Pull a model after the container starts:

docker exec -it peregrine-ollama-1 ollama pull llama3.1:8b

single-gpu

Use single-gpu if:

  • You have one NVIDIA GPU with at least 8 GB VRAM
  • Recommended for most single-user installs
  • The vision service (Moondream2) starts on the same GPU using 4-bit quantisation (~1.5 GB VRAM)

dual-gpu

Use dual-gpu if:

  • You have two or more NVIDIA GPUs
  • GPU 0 handles Ollama (cover letters, quick tasks)
  • GPU 1 handles vLLM (research, long-context tasks)
  • The vision service shares GPU 0 with Ollama

GPU Memory Guidance

GPU VRAM Recommended profile Notes
< 4 GB cpu GPU too small for practical model loading
48 GB single-gpu Run smaller models (3B8B parameters)
816 GB single-gpu Run 8B13B models comfortably
1624 GB single-gpu Run 13B34B models
24 GB+ single-gpu or dual-gpu 70B models with quantisation

How preflight.py Works

make start calls scripts/preflight.py before launching Docker. Preflight does the following:

  1. Port conflict detection — checks whether STREAMLIT_PORT, OLLAMA_PORT, VLLM_PORT, SEARXNG_PORT, and VISION_PORT are already in use. Reports any conflicts and suggests alternatives.

  2. GPU enumeration — queries nvidia-smi for GPU count and VRAM per card.

  3. RAM check — reads /proc/meminfo (Linux) or vm_stat (macOS) to determine available system RAM.

  4. KV cache offload — if GPU VRAM is less than 10 GB, preflight calculates CPU_OFFLOAD_GB (the amount of KV cache to spill to system RAM) and writes it to .env. The vLLM container picks this up via --cpu-offload-gb.

  5. Profile recommendation — writes RECOMMENDED_PROFILE to .env. This is informational; make start uses the PROFILE variable you specify (defaulting to remote).

You can run preflight independently:

make preflight
# or
python scripts/preflight.py

Customising Ports

Edit .env before running make start:

STREAMLIT_PORT=8501
OLLAMA_PORT=11434
VLLM_PORT=8000
SEARXNG_PORT=8888
VISION_PORT=8002

All containers read from .env via the env_file directive in compose.yml.