# llm

LLM router with a configurable fallback chain. Abstracts over Ollama, vLLM, Anthropic, and any OpenAI-compatible backend. Products never talk to a specific LLM backend directly.

```python
from circuitforge_core.llm import LLMRouter
```

## Design principle

The router implements "local inference first." Cloud backends sit at the end of the fallback chain. A product configured with only Ollama will never silently fall through to a paid API.

## Configuration

The router reads `config/llm.yaml` from the product's working directory (or the path passed to the constructor). Each product maintains its own `llm.yaml`; cf-core provides the router, not the config.

```yaml
# config/llm.yaml example
fallback_order:
  - ollama
  - vllm
  - anthropic

ollama:
  enabled: true
  base_url: http://localhost:11434
  model: llama3.2:3b

vllm:
  enabled: false
  base_url: http://localhost:8000

anthropic:
  enabled: false
  api_key_env: ANTHROPIC_API_KEY
```

## API

### `LLMRouter(config_path=None, cloud_mode=False)`

Instantiate the router. In most products, instantiation happens inside a shim that injects product-specific config resolution.

### `router.complete(prompt, system=None, images=None, fallback_order=None) -> str`

Send a completion request. Tries backends in order; falls through on error or unavailability.

```python
router = LLMRouter()
response = router.complete(
    prompt="Summarize this recipe in one sentence.",
    system="You are a cooking assistant.",
)
```

Pass `images: list[str]` (base64-encoded) for vision requests — non-vision backends are automatically skipped when images are present.

Pass `fallback_order=["vllm", "anthropic"]` to override the config chain for a specific call (useful for task-specific routing).

### `router.stream(prompt, system=None) -> Iterator[str]`

Streaming variant. Yields token chunks as they arrive. Not all backends support streaming; the router logs a warning and falls back to a non-streaming backend if needed.

## Shim requirement

!!! warning "Always use the product shim"
    Scripts and endpoints must import `LLMRouter` from the product shim (`scripts/llm_router.py` or `app/llm_router.py`), never directly from `circuitforge_core.llm.router`. The shim handles tri-level config resolution (env vars override config file overrides defaults) and cloud mode wiring. Bypassing it breaks cloud deployments silently.

## Backends

| Backend | Type | Notes |
|---------|------|-------|
| `ollama` | Local | Preferred default; model names from `config/llm.yaml` |
| `vllm` | Local GPU | For high-throughput or large models |
| `anthropic` | Cloud | Requires `ANTHROPIC_API_KEY` env var |
| `openai` | Cloud | Any OpenAI-compatible endpoint |
| `claude_code` | Local wrapper | claude-bridge OpenAI-compatible wrapper on :3009 |

## Vision routing

When images are included in a `complete()` call, the router checks each backend's vision capability before trying it. Configure vision priority separately:

```yaml
vision_fallback_order:
  - vision_service   # local moondream2 via FastAPI on :8002
  - claude_code
  - anthropic
```