- ProcessSpec: adopt (bool) and health_path (str, default /health) fields
- ServiceManager: adopt=True probes health_path before spawning; is_running()
uses health probe for adopt services rather than proc table + socket check
- _probe_health() helper: urllib GET on localhost:port+path, returns bool
- Agent /services/{service}/start: returns adopted=True when service was
already running; coordinator sets state=running immediately (no probe wait)
- ServiceInstance: health_path field (default /health)
- service_registry.upsert_instance(): health_path kwarg
- Probe loop uses inst.health_path instead of hardcoded /health
- coordinator allocate_service: looks up health_path from profile spec via
_get_health_path() and stores on ServiceInstance
- All GPU profiles (2/4/6/8/16/24 GB + cpu-16/32): ollama managed block
with adopt=true, health_path=/api/tags, port 11434
- 11 new tests
closes#15
- NodeStore: SQLite persistence for known agent nodes
(~/.local/share/circuitforge/cf-orch-nodes.db)
- upsert on every register(); prune_stale() for 30-day cleanup
- survives coordinator restarts — data readable by next process
- AgentSupervisor.restore_from_store(): reload known nodes on startup,
mark all offline; heartbeat loop brings back any that respond
- AgentSupervisor.register(): persists to NodeStore on every call
- cli.py coordinator: NodeStore wired in; restore_from_store() called
before uvicorn starts
- cli.py agent: one-shot registration replaced with persistent reconnect
loop (daemon thread, 30 s interval) — coordinator restart → nodes
reappear within one cycle with no manual intervention on agent hosts
- 16 new tests: NodeStore (8) + AgentSupervisor watchdog (8)
max_mb // 2 was too loose — Qwen2.5-3B needs ~5.9 GB on an 8 GB card
but the threshold only required 3.25 GB free, allowing Ollama to hold
4.5 GB while a load attempt was still dispatched (causing OOM crash).
- node_selector: can_fit = free_mb >= service_max_mb (was // 2)
- coordinator /start: same threshold fix + updated error message
- tests: two new node_selector tests pin the full-ceiling semantics;
updated stale docstring in coordinator app test
- ServiceRegistry: add sweep_expired_allocations() to remove stale TTL
allocations and transition instances to idle; add get_allocation() helper
- AgentSupervisor._run_idle_sweep: call sweep_expired_allocations() before
idle-timeout check so crashed-caller leaks are cleaned up each sweep tick
- schema._parse_managed: copy raw dict before extracting 'type' key instead
of mutating caller's dict with pop()
- app.release_allocation: validate allocation belongs to the given service
path param before releasing; return 404 if mismatch
- router._try_cf_orch_alloc: replace print() with logger.warning(); add
module-level logger = logging.getLogger(__name__)
- tests: add test_sweep_expired_allocations covering TTL expiry and idle
state transition
- CRITICAL: idle sweep now calls mark_stopped() after successful HTTP stop,
preventing repeated stop POSTs on every 3rd tick for the same instance
- CRITICAL: active_allocations() now filters by gpu_id to avoid marking wrong
instance idle on multi-GPU nodes when an allocation is released
- CRITICAL: VRAM pre-flight guard in ensure_service was dead code — added the
actual HTTPException(503) before the candidate loop
- IMPORTANT: register() now updates agent_url on re-registration if it changed,
so relocated agents are tracked correctly
- IMPORTANT: updated test_service_registry.py callers of active_allocations()
to pass the now-required gpu_id argument
Add idle_stop_after_s to ServiceProfile (default 0 = never stop).
Set 600s (10 min) timeout on vllm slot in all single-GPU profiles.
Backward compatible; non-vllm services inherit default 0 (no auto-stop).
Implements CFOrchClient with allocate() (sync contextmanager) and
allocate_async() (async contextmanager) for cf-orch GPU resource
allocation. Releases allocation on exit; ignores 404 on release;
raises RuntimeError on non-2xx allocation response. Exports
CFOrchClient and Allocation from circuitforge_core.resources.
Note: async test uses unittest.mock rather than httpretty — httpretty
only patches stdlib sockets and does not intercept httpx async (anyio)
transport.
- dashboard.html: node-centric layout — GPU cards with VRAM bars and
sparklines, active leases table with TTL progress bars, service health
pill, auto-refreshes every 5s via fetch() against the local JSON API
- All dynamic content set via DOM textContent / createElementNS — no
innerHTML with user-sourced strings
- coordinator/app.py: serves dashboard.html at GET / (HTMLResponse,
excluded from OpenAPI schema); HTML read at import time from package dir
- test_dashboard_serves_html: verifies 200, content-type text/html,
and key route markers present
- C1: Remove _reserved_vram decrement from _scheduler_loop reaper; sole
responsibility now belongs to _batch_worker's finally block, eliminating
the double-decrement race that could drive _reserved_vram negative.
- C2: Move TaskScheduler construction (including VRAM detection httpx call)
outside _scheduler_lock in get_scheduler(); lock is now only held for the
final singleton assignment, preventing 2s lock contention on first call.
- I1: Add RunTaskFn type alias (Callable[...]) and use it in __init__ and
get_scheduler() instead of bare Callable.
- I2: Replace namedtuple TaskSpec with typed NamedTuple class.
- I3: Parameterize _queues annotation as dict[str, deque[TaskSpec]].
- I4: Wrap _queues read in start() with self._lock.
- I5: Replace time.sleep() ordering assertion in test_vram_budget_blocks_second_type
with event-based synchronization using type_a_started/type_b_started events.
- M2: Use sqlite3.connect() as context manager in _load_queued_tasks.
- M3: Strengthen weak assertion in test_enqueue_returns_false_when_queue_full.
- M4: Add test_reserved_vram_zero_after_task_completes to catch C1 regression.
Adds test_detect_vram_preflight_fallback to cover the spec path where
cf-orch is unreachable but scripts.preflight.get_gpus() succeeds,
verifying detect_available_vram_gb() returns the summed total VRAM.
Uses sys.modules injection to simulate the preflight module being present.
Extract generic batch scheduler into circuitforge_core.tasks.scheduler
so any CircuitForge product can use it. Includes VRAM detection via
cf-orch coordinator (cooperative free-VRAM), preflight fallback, and
unlimited fallback; singleton API; full test coverage (12 tests).