peregrine/docs/getting-started/installation.md

# Installation

This page walks through a full Peregrine installation from scratch.

---

## Prerequisites

- **Git** — to clone the repository
- **Internet connection** — `install.sh` downloads Docker/Podman and other dependencies
- **Operating system**: Ubuntu/Debian, Fedora/RHEL, Arch Linux, or macOS (with Docker Desktop)

!!! warning "Windows"
    Windows is not supported. Use [WSL2 with Ubuntu](https://docs.microsoft.com/windows/wsl/install) instead.

---

## Step 1 — Clone the repository

```bash
git clone https://git.opensourcesolarpunk.com/Circuit-Forge/peregrine
cd peregrine
```

---

## Step 2 — Run install.sh

```bash
bash install.sh
```

`install.sh` performs the following automatically:

1. **Detects your platform** (Ubuntu/Debian, Fedora/RHEL, Arch, macOS)
2. **Installs Git** if not already present
3. **Installs Docker Engine** (or Podman if Docker is not available) via official repositories
4. **Adds your user to the `docker` group** so you do not need `sudo` for docker commands (Linux only — log out and back in after this)
5. **Detects NVIDIA GPUs** — if `nvidia-smi` is present and working, installs the NVIDIA Container Toolkit and configures Docker/Podman to use it
6. **Creates `.env` from `.env.example`** — edit `.env` to customise ports and model storage paths before starting

!!! note "macOS"
    `install.sh` installs Docker Desktop via Homebrew (`brew install --cask docker`) then exits. Open Docker Desktop, start it, then re-run the script. Ollama can also run natively for Metal GPU-accelerated inference — see the macOS note in Step 4.

!!! note "GPU requirement"
    For GPU support, `nvidia-smi` must return output before you run `install.sh`. Install your NVIDIA driver first.

---

## Step 2a — Podman users: GPU CDI setup

If you prefer rootless Podman over Docker, `install.sh` detects it and manages.sh/make use it automatically. For GPU profiles to work with Podman you must generate a CDI spec first:

```bash
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
```

This needs to be done once after driver installation. Without it, GPU profiles will start but containers will not have GPU access. Docker users can skip this step — Docker uses `--gpus all` instead of CDI.

---

## Step 3 — (Optional) Edit .env

The `.env` file controls ports and volume mount paths. The defaults work for most single-user installs:

```bash
# Main UI port
VUE_PORT=8506

# Model paths — use full absolute paths, not ~ (tilde does not expand inside containers)
DOCS_DIR=/home/yourname/Documents/JobSearch
OLLAMA_MODELS_DIR=/home/yourname/models/ollama

# Inference model defaults
OLLAMA_DEFAULT_MODEL=llama3.2:3b

# External API keys — only needed for the "remote" profile or BYOK unlock
ANTHROPIC_API_KEY=
```

Change `VUE_PORT` if 8506 is taken on your machine. See [Docker Profiles](docker-profiles.md) for a full port reference.

---

## Step 4 — Start Peregrine

Choose a profile based on your hardware:

```bash
./manage.sh start                        # cpu — local Ollama on CPU (recommended default)
./manage.sh start --profile single-gpu   # one NVIDIA GPU
./manage.sh start --profile dual-gpu     # two NVIDIA GPUs
./manage.sh start --profile remote       # no local LLM — use cloud API keys only
```

`manage.sh start` runs `preflight.py` first, which checks for port conflicts and writes GPU/RAM recommendations to `.env`. Then it calls `docker compose` (or `podman compose`) with the right compose file overlay for your hardware.

!!! tip "macOS with native Ollama"
    If you installed Ollama natively via Homebrew for Metal GPU inference, start with `--profile cpu`. The container API on port 8506 connects to your host's Ollama at `localhost:11434` automatically.

---

## Step 5 — Open the UI

Navigate to **http://localhost:8506** (or whatever `VUE_PORT` you set).

The first-run wizard launches automatically. See [First-Run Wizard](first-run-wizard.md) for a step-by-step guide.

---

## Supported Platforms

| Platform | Tested | Notes |
|----------|--------|-------|
| Ubuntu 22.04 / 24.04 | Yes | Primary target |
| Debian 12 | Yes | |
| Fedora 39/40 | Yes | |
| RHEL / Rocky / AlmaLinux | Yes | |
| Arch Linux / Manjaro | Yes | |
| macOS (Apple Silicon) | Yes | Docker Desktop required; GPU via native Ollama (Metal) |
| macOS (Intel) | Yes | Docker Desktop required; no GPU support |
| Windows | No | Use WSL2 with Ubuntu |

---

## GPU Support

Only NVIDIA GPUs are supported. AMD ROCm is not currently supported.

Requirements:

- NVIDIA driver installed and `nvidia-smi` working before running `install.sh`
- CUDA 12.x recommended (CUDA 11.x may work but is untested)
- Minimum 8 GB VRAM for `single-gpu` profile with default models
- **Podman users:** CDI spec required — see Step 2a above

For `dual-gpu`, both cards must be NVIDIA. GPU 0 handles Ollama (cover letters, general tasks) and GPU 1 handles the research workload. The exact behaviour is controlled by `DUAL_GPU_MODE` — see [Docker Profiles](docker-profiles.md#dual-gpu-modes).

If your GPU has less than 10 GB VRAM, `preflight.py` calculates a `CPU_OFFLOAD_GB` value and writes it to `.env`. The vLLM container picks this up via `--cpu-offload-gb` to overflow KV cache to system RAM.

---

## Stopping Peregrine

```bash
./manage.sh stop       # stop all containers
./manage.sh restart    # stop then start again (runs preflight first)
```

---

## Reinstalling / Clean State

```bash
./manage.sh clean      # removes containers, images, and data volumes (destructive)
```

You will be prompted to type `yes` to confirm.