snipe/config/llm.yaml.example
pyr0ball af1ffa1d94
Some checks are pending
CI / Python tests (push) Waiting to run
CI / Frontend typecheck + tests (push) Waiting to run
Mirror / mirror (push) Waiting to run
feat: wire Search with AI to cf-orch → Ollama (llama3.1:8b)
- Add app/llm/router.py shim — tri-level config lookup:
  repo config/llm.yaml → ~/.config/circuitforge/llm.yaml → env vars
- Add config/llm.cloud.yaml — ollama via cf-orch, llama3.1:8b
- Add config/llm.yaml.example — self-hosted reference config
- compose.cloud.yml: mount llm.cloud.yaml, set CF_ORCH_URL,
  add host.docker.internal:host-gateway (required on Linux Docker)
- api/main.py: use app.llm.router.LLMRouter (shim) not core directly
- .env.example: update LLM section to reference config/llm.yaml.example
- .gitignore: exclude config/llm.yaml (keep example + cloud yaml)

End-to-end tested: 3.2s for "used RTX 3080 under $400, no mining cards"
via cloud container → host.docker.internal:11434 → Ollama llama3.1:8b
2026-04-14 13:23:44 -07:00

45 lines
1.2 KiB
Text

# config/llm.yaml.example
# Snipe — LLM backend configuration
#
# Copy to config/llm.yaml and edit for your setup.
# The query builder ("Search with AI") uses the text fallback_order.
#
# Backends are tried in fallback_order until one succeeds.
# Set enabled: false to skip a backend without removing it.
#
# CF Orchestrator (cf-orch): when CF_ORCH_URL is set in the environment and a
# backend has a cf_orch block, allocations are routed through cf-orch for
# VRAM-aware scheduling. Omit cf_orch to hit the backend directly.
backends:
anthropic:
type: anthropic
api_key_env: ANTHROPIC_API_KEY
model: claude-haiku-4-5-20251001
enabled: false
supports_images: false
openai:
type: openai_compat
base_url: https://api.openai.com/v1
api_key_env: OPENAI_API_KEY
model: gpt-4o-mini
enabled: false
supports_images: false
ollama:
type: openai_compat
base_url: http://localhost:11434/v1
api_key: ollama
model: llama3.1:8b
enabled: true
supports_images: false
# Uncomment to route through cf-orch for VRAM-aware scheduling:
# cf_orch:
# service: ollama
# ttl_s: 300
fallback_order:
- anthropic
- openai
- ollama