pyr0ball 82c26074d8 fix: search prefs wizard data loss, resume sync link, docs + GUI help links

Bug fixes (filed as #125–#128):
- Wizard step 7 read data.titles instead of data.search.titles — user-entered
  job titles and locations were silently dropped on every wizard run (#125)
- GET /api/settings/search returned "titles" key but store expected "job_titles" —
  Settings → Search Prefs always showed empty even when data existed (#126)
- remote_only preference not persisted during wizard setup (#127)
- apply-to-profile didn't set default_resume_id in user.yaml, so future
  Resume Profile saves never synced back to the library entry (#128)

Also:
- Wizard step headings corrected (off-by-one after Training step was inserted)
- Ollama host in wizard inference step now reads from saved wizard state
- Resume upload during wizard now creates a library entry and sets it as default

Docs:
- New: docs/user-guide/daily-workflow.md — end-to-end daily usage guide
- Updated: docs/user-guide/settings.md — rewritten for Vue SPA (was Streamlit)
- mkdocs.yml nav: Daily Workflow added as first User Guide entry

GUI help links:
- web/src/composables/useDocsUrl.ts — shared docs base URL composable
- Home: "Daily Workflow guide ↗" link in subtitle
- Job Review: "? Docs" link in title row
- Resume Library: "? Help" link in header
- Settings → Resume Profile: "? Help" link in page header
- Settings → Search Prefs: "? Help" link in page header

2026-06-15 16:52:56 -07:00

8.4 KiB

Raw Blame History

Docker Profiles

Peregrine uses Docker Compose profiles to start only the services your hardware supports. Choose a profile with ./manage.sh start --profile <name>.

manage.sh delegates to make, which auto-detects Docker vs Podman and applies the correct GPU overlay — compose.gpu.yml for Docker, compose.podman-gpu.yml for Podman (CDI-based). You do not need to specify the overlay manually.

Profile Reference

Profile	Services started	Use case
`cpu`	`web`, `api`, `ollama`, `searxng`	No GPU. Local models on CPU. Recommended default for new installs.
`single-gpu`	`web`, `api`, `ollama`, `vision`, `searxng`	One NVIDIA GPU. Covers cover letters, research, and vision.
`dual-gpu`	`web`, `api`, `ollama`, `vllm`, `vision`, `searxng`	Two NVIDIA GPUs. GPU split controlled by `DUAL_GPU_MODE`.
`cf-orch`	`web`, `api`, `searxng`	No local LLM. Inference routed to CircuitForge GPU cluster. Requires Paid license.
`remote`	`web`, `api`, `searxng`	No local LLM. Inference goes to cloud API keys (Anthropic, OpenAI-compatible).
`memory`	(any + memory flag)	Enables RAM-optimised container limits for low-RAM machines. Combine with another profile.

Service Descriptions

Service	Image / Source	Host Port	Purpose
`web`	`Dockerfile.web` (Nginx + Vue SPA)	`VUE_PORT` (default 8506)	Main UI — serves the Vue frontend and proxies `/api/` to `api`
`api`	`Dockerfile` (FastAPI)	Internal only (proxied through `web`)	REST API — all backend logic
`ollama`	`ollama/ollama`	11434	Local model inference — cover letters and general tasks
`vllm`	`vllm/vllm-openai`	8000	High-throughput inference — research tasks
`vision`	`scripts/vision_service/`	8002	Moondream2 — survey screenshot analysis
`searxng`	`searxng/searxng`	8888	Private meta-search — company research web scraping

The web container runs Nginx internally on port 80, mapped to VUE_PORT on the host. The Nginx config proxies /api/ requests to api:8601 — the FastAPI container is not exposed directly.

Choosing a Profile

cpu

Use cpu if:

You have no GPU but want local inference (good for privacy)
Acceptable for light use; cover letter generation may take several minutes per request

Pull a model after starting:

docker exec -it peregrine-ollama-1 ollama pull llama3.2:3b

llama3.2:3b is the recommended CPU model — it runs on machines with 8 GB of system RAM.

single-gpu

Use single-gpu if:

You have one NVIDIA GPU with at least 8 GB VRAM
Recommended for most single-user installs

The vision service (Moondream2) starts on the same GPU using 4-bit quantisation (~1.5 GB VRAM). Pull a model after starting:

docker exec -it peregrine-ollama-1 ollama pull llama3.1:8b

dual-gpu

Use dual-gpu if:

You have two or more NVIDIA GPUs
Default: GPU 0 handles Ollama (cover letters), GPU 1 handles vLLM (research)

See Dual-GPU Modes below to configure how the two GPUs are split.

cf-orch

Use cf-orch if:

You have access to a CircuitForge GPU cluster running the cf-orch coordinator
No local GPU required — inference is handled by the cluster
Requires a Paid or higher license

Set CF_ORCH_URL in .env to your coordinator address:

CF_ORCH_URL=http://10.1.10.71:7700

The wizard hardware step lets you enter the URL interactively and verifies the connection before saving.

remote

Use remote if:

You have no local GPU and no cf-orch cluster
You are using Anthropic Claude, OpenAI, or another cloud API exclusively

Configure at least one external LLM backend in Settings → LLM Backends after first login.

memory (add-on)

Use the memory add-on alongside any profile for machines with limited RAM:

./manage.sh start --profile single-gpu --profile memory

This applies conservative container memory limits to prevent the OOM (out-of-memory) killer from terminating containers.

Dual-GPU Modes

When using dual-gpu, DUAL_GPU_MODE in .env controls how the second GPU is used:

Mode	GPU 0	GPU 1	Use case
`mixed` (default)	Ollama	vLLM	Best overall: fast cover letters + high-throughput research
`ollama`	Ollama	Ollama	Both GPUs run Ollama; no vLLM; useful if vLLM models are too large for one card
`vllm`	vLLM	vLLM	Both GPUs run vLLM (tensor parallel); maximum research throughput

Set in .env:

DUAL_GPU_MODE=mixed    # default
# DUAL_GPU_MODE=ollama
# DUAL_GPU_MODE=vllm

The Makefile expands dual-gpu into --profile dual-gpu-$(DUAL_GPU_MODE) before passing it to docker compose. The compose.gpu.yml overlay defines the dual-gpu-mixed, dual-gpu-ollama, and dual-gpu-vllm profile variants.

GPU Memory Guidance

GPU VRAM	Recommended profile	Notes
< 4 GB	`cpu`	GPU too small for practical model loading
4–8 GB	`single-gpu`	Run smaller models (3B–8B parameters)
8–16 GB	`single-gpu`	Run 8B–13B models comfortably
16–24 GB	`single-gpu`	Run 13B–34B models
24 GB+ (one card)	`single-gpu`	70B models with quantisation
16+ GB (two cards)	`dual-gpu`	Parallel cover letters + research

How preflight.py Works

./manage.sh start calls scripts/preflight.py before launching Docker. Preflight does the following:

Port conflict detection — checks whether VUE_PORT, OLLAMA_PORT, VLLM_PORT, SEARXNG_PORT, and VISION_PORT are already in use. Reports any conflicts and suggests alternatives.
External service adoption — if Ollama or SearXNG are already running on their configured ports (common when using native Ollama on macOS, or a shared SearXNG instance), preflight writes a compose.override.yml that stubs out the duplicate containers. The running process is adopted rather than replaced.
GPU enumeration — queries nvidia-smi for GPU count and VRAM per card. On Apple Silicon Macs, falls back to system_profiler SPDisplaysDataType and returns unified memory as the VRAM figure.
RAM check — reads /proc/meminfo (Linux) or vm_stat (macOS) for available system RAM.
KV cache offload — if GPU VRAM is less than 10 GB, preflight calculates CPU_OFFLOAD_GB and writes it to .env. The vLLM container picks this up via --cpu-offload-gb to overflow the KV cache to system RAM.
Profile recommendation — writes RECOMMENDED_PROFILE to .env. This is informational only; ./manage.sh start --profile <name> uses the profile you specify.

Run preflight independently at any time:

./manage.sh preflight
# or
conda run -n cf python scripts/preflight.py

Podman Support

Podman is fully supported as a Docker drop-in. install.sh detects whether Podman or Docker is available, and manage.sh/make use it automatically.

GPU setup for Podman (CDI)

Podman uses the CDI (Container Device Interface) standard for GPU passthrough, rather than Docker's --gpus all flag. Generate the CDI spec once after driver installation:

sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml

Without this step, GPU profiles start but containers have no GPU access.

Rootless Podman

Rootless Podman is supported. If you encounter permission errors on the Docker socket, ensure podman.socket is running for your user:

systemctl --user enable --now podman.socket

The make layer auto-detects rootless Podman and uses XDG_RUNTIME_DIR/podman/podman.sock instead of /var/run/docker.sock.

Customising Ports

Edit .env before running ./manage.sh start:

VUE_PORT=8506          # main UI (Vue SPA)
OLLAMA_PORT=11434
VLLM_PORT=8000
SEARXNG_PORT=8888
VISION_PORT=8002

All containers read from .env via the env_file directive in compose.yml.

Wizard Test Instance

A separate compose file is available for testing first-run and onboarding wizard flows without touching your main data:

docker compose -f compose.wizard-test.yml --project-name peregrine-wizard up -d

The wizard test instance runs on port 8507 with ephemeral storage — every docker compose restart wipes the database back to a clean slate. Uses the same images as the main instance but mounts a minimal LLM config so the wizard detection endpoints work correctly.

8.4 KiB Raw Blame History Unescape Escape