# hardware

GPU enumeration and VRAM-tier profile generation. Used by `manage.sh` at startup to recommend a Docker Compose profile and by the cf-orch coordinator for resource allocation.

```python
from circuitforge_core.hardware import get_gpus, recommend_profile, HardwareProfile
```

## GPU detection

`get_gpus()` returns a list of detected GPUs with their VRAM capacity. Detection strategy:

1. Try `nvidia-smi` (Linux/Windows NVIDIA)
2. Fall back to `system_profiler SPDisplaysDataType` on Darwin when `hw.optional.arm64=1` (Apple Silicon)
3. Return CPU-only profile if neither succeeds

```python
gpus = get_gpus()
# [{"name": "RTX 4090", "vram_gb": 24.0, "type": "nvidia"},
#  {"name": "Apple M2 Max", "vram_gb": 32.0, "type": "apple_silicon"}]
```

## Compose profile recommendation

```python
profile = recommend_profile(gpus)
# "single-gpu" | "dual-gpu" | "cpu" | "remote"
```

Profile selection rules:
- `single-gpu`: one NVIDIA GPU with >= 8GB VRAM
- `dual-gpu`: two or more NVIDIA GPUs
- `cpu`: no NVIDIA GPU (Apple Silicon uses `cpu` since Docker on Mac has no Metal passthrough)
- `remote`: explicitly requested or when local inference would exceed available VRAM

!!! note "Apple Silicon"
    Apple Silicon Macs should run Ollama natively (outside Docker) for Metal-accelerated inference. Docker on macOS runs in a Linux VM with no Metal passthrough. `preflight.py` in each product detects native Ollama on :11434 and adopts it automatically.

## VRAM tiers

| VRAM | Models that fit |
|------|----------------|
| < 4 GB | Quantized 1B–3B models (Phi-3 mini, Llama 3.2 3B Q4) |
| 4–8 GB | 7B–8B models Q4 (Llama 3.1 8B, Mistral 7B) |
| 8–16 GB | 13B–14B models Q4, 7B models in full precision |
| 16–24 GB | 30B models Q4, 13B full precision |
| 24 GB+ | 70B models Q4 |

## HardwareProfile

The `HardwareProfile` dataclass is written to `compose.override.yml` by `preflight.py` at product startup, making GPU capabilities available to Docker Compose without hardcoding.