ollama: register as tracked service in cf-orch (VRAM accounting + adopt-if-running) #16

New issue

Closed

opened 2026-04-02 21:48:50 -07:00 by pyr0ball · 0 comments

pyr0ball commented

2026-04-02 21:48:50 -07:00

Owner

Problem

Ollama is listed in GPU profiles with a max_mb budget, but cf-orch has no visibility into it:

Its VRAM footprint (e.g. 5.8 GB on GPU 0) is invisible to the allocator — the coordinator may try to start a second service on a GPU that Ollama already filled
It does not appear in the dashboard service instance table
cf-orch cannot start or stop it, even when it has a ProcessSpec

This also blocks self-hoster generalisation: Ollama is the most common local LLM backend, and most self-hosters will already have it running as a system service.

Proposed solution

1. `adopt-if-running` ProcessSpec mode

Add an adopt: true flag to the managed block. On coordinator startup (or first allocation attempt for that service), the agent:

Checks if Ollama is already reachable (GET http://localhost:11434/api/tags)
If yes → registers a running instance in the ServiceRegistry with the known URL; no process is started
If no → starts Ollama via the ProcessSpec exec path as normal

# Profile example
ollama:
  max_mb: 6500
  priority: 1
  managed:
    type: process
    adopt: true                  # adopt if already running
    exec_path: "/usr/local/bin/ollama"
    args_template: "serve"
    port: 11434
    host_port: 11434
    health_path: /api/tags       # non-standard health endpoint

2. `health_path` field in ProcessSpec

Ollama does not expose GET /health — it uses GET /api/tags. Add an optional health_path field to ProcessSpec so the probe loop uses the correct endpoint.

3. VRAM accounting

Once Ollama is a tracked running instance, the existing VRAM lease machinery accounts for it automatically. The allocator will correctly refuse to schedule competing services on the same GPU when Ollama is loaded.

4. Dashboard visibility

Ollama will appear in the Service Instances table as any other managed service.

Acceptance criteria

Ollama running independently → cf-orch agent detects it on startup and registers as running
Ollama not running → cf-orch starts it via ProcessSpec as normal
Ollama VRAM lease reflected in allocator (blocks competing allocations)
Dashboard shows Ollama instance with state and URL
health_path field honoured by probe loop
All four GPU profile YAMLs updated with managed: + adopt: true blocks for Ollama

## Problem Ollama is listed in GPU profiles with a `max_mb` budget, but cf-orch has no visibility into it: - Its VRAM footprint (e.g. 5.8 GB on GPU 0) is invisible to the allocator — the coordinator may try to start a second service on a GPU that Ollama already filled - It does not appear in the dashboard service instance table - cf-orch cannot start or stop it, even when it has a ProcessSpec This also blocks self-hoster generalisation: Ollama is the most common local LLM backend, and most self-hosters will already have it running as a system service. ## Proposed solution ### 1. `adopt-if-running` ProcessSpec mode Add an `adopt: true` flag to the managed block. On coordinator startup (or first allocation attempt for that service), the agent: 1. Checks if Ollama is already reachable (`GET http://localhost:11434/api/tags`) 2. If yes → registers a `running` instance in the ServiceRegistry with the known URL; no process is started 3. If no → starts Ollama via the ProcessSpec exec path as normal ```yaml # Profile example ollama: max_mb: 6500 priority: 1 managed: type: process adopt: true # adopt if already running exec_path: "/usr/local/bin/ollama" args_template: "serve" port: 11434 host_port: 11434 health_path: /api/tags # non-standard health endpoint ``` ### 2. `health_path` field in ProcessSpec Ollama does not expose `GET /health` — it uses `GET /api/tags`. Add an optional `health_path` field to ProcessSpec so the probe loop uses the correct endpoint. ### 3. VRAM accounting Once Ollama is a tracked `running` instance, the existing VRAM lease machinery accounts for it automatically. The allocator will correctly refuse to schedule competing services on the same GPU when Ollama is loaded. ### 4. Dashboard visibility Ollama will appear in the Service Instances table as any other managed service. ## Acceptance criteria - [ ] Ollama running independently → cf-orch agent detects it on startup and registers as `running` - [ ] Ollama not running → cf-orch starts it via ProcessSpec as normal - [ ] Ollama VRAM lease reflected in allocator (blocks competing allocations) - [ ] Dashboard shows Ollama instance with state and URL - [ ] `health_path` field honoured by probe loop - [ ] All four GPU profile YAMLs updated with `managed:` + `adopt: true` blocks for Ollama