ollama: register as tracked service in cf-orch (VRAM accounting + adopt-if-running) #16

Closed
opened 2026-04-02 21:48:50 -07:00 by pyr0ball · 0 comments
Owner

Problem

Ollama is listed in GPU profiles with a max_mb budget, but cf-orch has no visibility into it:

  • Its VRAM footprint (e.g. 5.8 GB on GPU 0) is invisible to the allocator — the coordinator may try to start a second service on a GPU that Ollama already filled
  • It does not appear in the dashboard service instance table
  • cf-orch cannot start or stop it, even when it has a ProcessSpec

This also blocks self-hoster generalisation: Ollama is the most common local LLM backend, and most self-hosters will already have it running as a system service.

Proposed solution

1. adopt-if-running ProcessSpec mode

Add an adopt: true flag to the managed block. On coordinator startup (or first allocation attempt for that service), the agent:

  1. Checks if Ollama is already reachable (GET http://localhost:11434/api/tags)
  2. If yes → registers a running instance in the ServiceRegistry with the known URL; no process is started
  3. If no → starts Ollama via the ProcessSpec exec path as normal
# Profile example
ollama:
  max_mb: 6500
  priority: 1
  managed:
    type: process
    adopt: true                  # adopt if already running
    exec_path: "/usr/local/bin/ollama"
    args_template: "serve"
    port: 11434
    host_port: 11434
    health_path: /api/tags       # non-standard health endpoint

2. health_path field in ProcessSpec

Ollama does not expose GET /health — it uses GET /api/tags. Add an optional health_path field to ProcessSpec so the probe loop uses the correct endpoint.

3. VRAM accounting

Once Ollama is a tracked running instance, the existing VRAM lease machinery accounts for it automatically. The allocator will correctly refuse to schedule competing services on the same GPU when Ollama is loaded.

4. Dashboard visibility

Ollama will appear in the Service Instances table as any other managed service.

Acceptance criteria

  • Ollama running independently → cf-orch agent detects it on startup and registers as running
  • Ollama not running → cf-orch starts it via ProcessSpec as normal
  • Ollama VRAM lease reflected in allocator (blocks competing allocations)
  • Dashboard shows Ollama instance with state and URL
  • health_path field honoured by probe loop
  • All four GPU profile YAMLs updated with managed: + adopt: true blocks for Ollama
## Problem Ollama is listed in GPU profiles with a `max_mb` budget, but cf-orch has no visibility into it: - Its VRAM footprint (e.g. 5.8 GB on GPU 0) is invisible to the allocator — the coordinator may try to start a second service on a GPU that Ollama already filled - It does not appear in the dashboard service instance table - cf-orch cannot start or stop it, even when it has a ProcessSpec This also blocks self-hoster generalisation: Ollama is the most common local LLM backend, and most self-hosters will already have it running as a system service. ## Proposed solution ### 1. `adopt-if-running` ProcessSpec mode Add an `adopt: true` flag to the managed block. On coordinator startup (or first allocation attempt for that service), the agent: 1. Checks if Ollama is already reachable (`GET http://localhost:11434/api/tags`) 2. If yes → registers a `running` instance in the ServiceRegistry with the known URL; no process is started 3. If no → starts Ollama via the ProcessSpec exec path as normal ```yaml # Profile example ollama: max_mb: 6500 priority: 1 managed: type: process adopt: true # adopt if already running exec_path: "/usr/local/bin/ollama" args_template: "serve" port: 11434 host_port: 11434 health_path: /api/tags # non-standard health endpoint ``` ### 2. `health_path` field in ProcessSpec Ollama does not expose `GET /health` — it uses `GET /api/tags`. Add an optional `health_path` field to ProcessSpec so the probe loop uses the correct endpoint. ### 3. VRAM accounting Once Ollama is a tracked `running` instance, the existing VRAM lease machinery accounts for it automatically. The allocator will correctly refuse to schedule competing services on the same GPU when Ollama is loaded. ### 4. Dashboard visibility Ollama will appear in the Service Instances table as any other managed service. ## Acceptance criteria - [ ] Ollama running independently → cf-orch agent detects it on startup and registers as `running` - [ ] Ollama not running → cf-orch starts it via ProcessSpec as normal - [ ] Ollama VRAM lease reflected in allocator (blocks competing allocations) - [ ] Dashboard shows Ollama instance with state and URL - [ ] `health_path` field honoured by probe loop - [ ] All four GPU profile YAMLs updated with `managed:` + `adopt: true` blocks for Ollama
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Circuit-Forge/circuitforge-core#16
No description provided.