circuitforge-core/docs/modules/llm.md
pyr0ball 383897f990
Some checks are pending
CI / test (push) Waiting to run
Mirror / mirror (push) Waiting to run
Release — PyPI / release (push) Waiting to run
feat: platforms module + docs + scripts
- platforms/: eBay platform adapter (snipe integration layer)
- docs/: developer guide, module reference, getting-started docs
- scripts/: utility scripts for development and deployment
2026-04-24 15:23:16 -07:00

3 KiB

llm

LLM router with a configurable fallback chain. Abstracts over Ollama, vLLM, Anthropic, and any OpenAI-compatible backend. Products never talk to a specific LLM backend directly.

from circuitforge_core.llm import LLMRouter

Design principle

The router implements "local inference first." Cloud backends sit at the end of the fallback chain. A product configured with only Ollama will never silently fall through to a paid API.

Configuration

The router reads config/llm.yaml from the product's working directory (or the path passed to the constructor). Each product maintains its own llm.yaml; cf-core provides the router, not the config.

# config/llm.yaml example
fallback_order:
  - ollama
  - vllm
  - anthropic

ollama:
  enabled: true
  base_url: http://localhost:11434
  model: llama3.2:3b

vllm:
  enabled: false
  base_url: http://localhost:8000

anthropic:
  enabled: false
  api_key_env: ANTHROPIC_API_KEY

API

LLMRouter(config_path=None, cloud_mode=False)

Instantiate the router. In most products, instantiation happens inside a shim that injects product-specific config resolution.

router.complete(prompt, system=None, images=None, fallback_order=None) -> str

Send a completion request. Tries backends in order; falls through on error or unavailability.

router = LLMRouter()
response = router.complete(
    prompt="Summarize this recipe in one sentence.",
    system="You are a cooking assistant.",
)

Pass images: list[str] (base64-encoded) for vision requests — non-vision backends are automatically skipped when images are present.

Pass fallback_order=["vllm", "anthropic"] to override the config chain for a specific call (useful for task-specific routing).

router.stream(prompt, system=None) -> Iterator[str]

Streaming variant. Yields token chunks as they arrive. Not all backends support streaming; the router logs a warning and falls back to a non-streaming backend if needed.

Shim requirement

!!! warning "Always use the product shim" Scripts and endpoints must import LLMRouter from the product shim (scripts/llm_router.py or app/llm_router.py), never directly from circuitforge_core.llm.router. The shim handles tri-level config resolution (env vars override config file overrides defaults) and cloud mode wiring. Bypassing it breaks cloud deployments silently.

Backends

Backend Type Notes
ollama Local Preferred default; model names from config/llm.yaml
vllm Local GPU For high-throughput or large models
anthropic Cloud Requires ANTHROPIC_API_KEY env var
openai Cloud Any OpenAI-compatible endpoint
claude_code Local wrapper claude-bridge OpenAI-compatible wrapper on :3009

Vision routing

When images are included in a complete() call, the router checks each backend's vision capability before trying it. Configure vision priority separately:

vision_fallback_order:
  - vision_service   # local moondream2 via FastAPI on :8002
  - claude_code
  - anthropic