Adds a full MkDocs documentation site under docs/ with Material theme. Getting Started: installation walkthrough, 7-step first-run wizard guide, Docker Compose profile reference with GPU memory guidance and preflight.py description. User Guide: job discovery (search profiles, custom boards, enrichment), job review (sorting, match scores, batch actions), apply workspace (cover letter gen, PDF export, mark applied), interviews (kanban stages, company research auto-trigger, survey assistant), email sync (IMAP, Gmail App Password, classification labels, stage auto-updates), integrations (all 13 drivers with tier requirements), settings (every tab documented). Developer Guide: contributing (dev env setup, code style, branch naming, PR checklist), architecture (ASCII layer diagram, design decisions), adding scrapers (full scrape() interface, registration, search profile config, test patterns), adding integrations (IntegrationBase full interface, auto- discovery, tier gating, test patterns), testing (patterns, fixtures, what not to test). Reference: tier system (full FEATURES table, can_use/tier_label API, dev override, adding gates), LLM router (backend types, complete() signature, fallback chains, vision routing, __auto__ resolution, adding backends), config files (every file with field-level docs and gitignore status). Also adds CONTRIBUTING.md at repo root pointing to the docs site.
118 lines
4 KiB
Markdown
118 lines
4 KiB
Markdown
# Docker Profiles
|
||
|
||
Peregrine uses Docker Compose profiles to start only the services your hardware can support. Choose a profile with `make start PROFILE=<name>`.
|
||
|
||
---
|
||
|
||
## Profile Reference
|
||
|
||
| Profile | Services started | Use case |
|
||
|---------|----------------|----------|
|
||
| `remote` | `app`, `searxng` | No GPU. LLM calls go to an external API (Anthropic, OpenAI-compatible). |
|
||
| `cpu` | `app`, `ollama`, `searxng` | No GPU. Runs local models on CPU — functional but slow. |
|
||
| `single-gpu` | `app`, `ollama`, `vision`, `searxng` | One NVIDIA GPU. Covers cover letters, research, and vision (survey screenshots). |
|
||
| `dual-gpu` | `app`, `ollama`, `vllm`, `vision`, `searxng` | Two NVIDIA GPUs. GPU 0 = Ollama (cover letters), GPU 1 = vLLM (research). |
|
||
|
||
---
|
||
|
||
## Service Descriptions
|
||
|
||
| Service | Image / Source | Port | Purpose |
|
||
|---------|---------------|------|---------|
|
||
| `app` | `Dockerfile` (Streamlit) | 8501 | The main Peregrine UI |
|
||
| `ollama` | `ollama/ollama` | 11434 | Local model inference — cover letters and general tasks |
|
||
| `vllm` | `vllm/vllm-openai` | 8000 | High-throughput local inference — research tasks |
|
||
| `vision` | `scripts/vision_service/` | 8002 | Moondream2 — survey screenshot analysis |
|
||
| `searxng` | `searxng/searxng` | 8888 | Private meta-search engine — company research web scraping |
|
||
|
||
---
|
||
|
||
## Choosing a Profile
|
||
|
||
### remote
|
||
|
||
Use `remote` if:
|
||
- You have no NVIDIA GPU
|
||
- You plan to use Anthropic Claude or another API-hosted model exclusively
|
||
- You want the fastest startup (only two containers)
|
||
|
||
You must configure at least one external LLM backend in **Settings → LLM Backends**.
|
||
|
||
### cpu
|
||
|
||
Use `cpu` if:
|
||
- You have no GPU but want to run models locally (e.g. for privacy)
|
||
- Acceptable for light use; cover letter generation may take several minutes per request
|
||
|
||
Pull a model after the container starts:
|
||
|
||
```bash
|
||
docker exec -it peregrine-ollama-1 ollama pull llama3.1:8b
|
||
```
|
||
|
||
### single-gpu
|
||
|
||
Use `single-gpu` if:
|
||
- You have one NVIDIA GPU with at least 8 GB VRAM
|
||
- Recommended for most single-user installs
|
||
- The vision service (Moondream2) starts on the same GPU using 4-bit quantisation (~1.5 GB VRAM)
|
||
|
||
### dual-gpu
|
||
|
||
Use `dual-gpu` if:
|
||
- You have two or more NVIDIA GPUs
|
||
- GPU 0 handles Ollama (cover letters, quick tasks)
|
||
- GPU 1 handles vLLM (research, long-context tasks)
|
||
- The vision service shares GPU 0 with Ollama
|
||
|
||
---
|
||
|
||
## GPU Memory Guidance
|
||
|
||
| GPU VRAM | Recommended profile | Notes |
|
||
|----------|-------------------|-------|
|
||
| < 4 GB | `cpu` | GPU too small for practical model loading |
|
||
| 4–8 GB | `single-gpu` | Run smaller models (3B–8B parameters) |
|
||
| 8–16 GB | `single-gpu` | Run 8B–13B models comfortably |
|
||
| 16–24 GB | `single-gpu` | Run 13B–34B models |
|
||
| 24 GB+ | `single-gpu` or `dual-gpu` | 70B models with quantisation |
|
||
|
||
---
|
||
|
||
## How preflight.py Works
|
||
|
||
`make start` calls `scripts/preflight.py` before launching Docker. Preflight does the following:
|
||
|
||
1. **Port conflict detection** — checks whether `STREAMLIT_PORT`, `OLLAMA_PORT`, `VLLM_PORT`, `SEARXNG_PORT`, and `VISION_PORT` are already in use. Reports any conflicts and suggests alternatives.
|
||
|
||
2. **GPU enumeration** — queries `nvidia-smi` for GPU count and VRAM per card.
|
||
|
||
3. **RAM check** — reads `/proc/meminfo` (Linux) or `vm_stat` (macOS) to determine available system RAM.
|
||
|
||
4. **KV cache offload** — if GPU VRAM is less than 10 GB, preflight calculates `CPU_OFFLOAD_GB` (the amount of KV cache to spill to system RAM) and writes it to `.env`. The vLLM container picks this up via `--cpu-offload-gb`.
|
||
|
||
5. **Profile recommendation** — writes `RECOMMENDED_PROFILE` to `.env`. This is informational; `make start` uses the `PROFILE` variable you specify (defaulting to `remote`).
|
||
|
||
You can run preflight independently:
|
||
|
||
```bash
|
||
make preflight
|
||
# or
|
||
python scripts/preflight.py
|
||
```
|
||
|
||
---
|
||
|
||
## Customising Ports
|
||
|
||
Edit `.env` before running `make start`:
|
||
|
||
```bash
|
||
STREAMLIT_PORT=8501
|
||
OLLAMA_PORT=11434
|
||
VLLM_PORT=8000
|
||
SEARXNG_PORT=8888
|
||
VISION_PORT=8002
|
||
```
|
||
|
||
All containers read from `.env` via the `env_file` directive in `compose.yml`.
|