feat: auto service lifecycle — /allocate, NodeSelector, idle sweep, CFOrchClient #9

Merged

pyr0ball merged 15 commits from feature/orch-auto-lifecycle into main

2026-04-02 14:11:36 -07:00

Author	SHA1	Message	Date
pyr0ball	e58c3aea23	fix: TTL sweep, immutability, service-scoped release, logger in orch alloc - ServiceRegistry: add sweep_expired_allocations() to remove stale TTL allocations and transition instances to idle; add get_allocation() helper - AgentSupervisor._run_idle_sweep: call sweep_expired_allocations() before idle-timeout check so crashed-caller leaks are cleaned up each sweep tick - schema._parse_managed: copy raw dict before extracting 'type' key instead of mutating caller's dict with pop() - app.release_allocation: validate allocation belongs to the given service path param before releasing; return 404 if mismatch - router._try_cf_orch_alloc: replace print() with logger.warning(); add module-level logger = logging.getLogger(__name__) - tests: add test_sweep_expired_allocations covering TTL expiry and idle state transition	2026-04-02 12:55:38 -07:00
pyr0ball	1a20b80a50	test: add VRAM pre-flight 503 test for ensure_service	2026-04-02 12:49:50 -07:00
pyr0ball	02806359af	feat: add Services table to coordinator dashboard	2026-04-02 12:47:27 -07:00
pyr0ball	a4ccaaf3e2	fix: address coordinator/idle-sweep quality issues from review - CRITICAL: idle sweep now calls mark_stopped() after successful HTTP stop, preventing repeated stop POSTs on every 3rd tick for the same instance - CRITICAL: active_allocations() now filters by gpu_id to avoid marking wrong instance idle on multi-GPU nodes when an allocation is released - CRITICAL: VRAM pre-flight guard in ensure_service was dead code — added the actual HTTPException(503) before the candidate loop - IMPORTANT: register() now updates agent_url on re-registration if it changed, so relocated agents are tracked correctly - IMPORTANT: updated test_service_registry.py callers of active_allocations() to pass the now-required gpu_id argument	2026-04-02 12:45:31 -07:00
pyr0ball	49ab9e4e88	feat: wire ServiceRegistry into coordinator allocate endpoints	2026-04-02 12:30:58 -07:00
pyr0ball	c299482e0d	feat: add idle sweep to AgentSupervisor	2026-04-02 12:30:28 -07:00
pyr0ball	1e168ac636	feat(profiles): add idle_stop_after_s field; set 600s for vllm slot Add idle_stop_after_s to ServiceProfile (default 0 = never stop). Set 600s (10 min) timeout on vllm slot in all single-GPU profiles. Backward compatible; non-vllm services inherit default 0 (no auto-stop).	2026-04-02 12:24:19 -07:00
pyr0ball	9754f522d9	feat(orch): add ServiceRegistry — allocation tracking + idle state machine	2026-04-02 12:22:46 -07:00
pyr0ball	17a24173f7	feat(llm): add cf_orch allocation support to LLMRouter backends	2026-04-02 12:19:17 -07:00
pyr0ball	f741e6a80b	fix(orch): hoist service-known check; capture resident_keys once in allocate	2026-04-02 11:45:48 -07:00
pyr0ball	defaf39883	feat(core): add CFOrchClient sync+async context manager Implements CFOrchClient with allocate() (sync contextmanager) and allocate_async() (async contextmanager) for cf-orch GPU resource allocation. Releases allocation on exit; ignores 404 on release; raises RuntimeError on non-2xx allocation response. Exports CFOrchClient and Allocation from circuitforge_core.resources. Note: async test uses unittest.mock rather than httpretty — httpretty only patches stdlib sockets and does not intercept httpx async (anyio) transport.	2026-04-02 11:44:35 -07:00
pyr0ball	8201f6b3e9	feat(orch): add /api/services/{service}/allocate with auto node selection	2026-04-02 11:25:38 -07:00
pyr0ball	52d2c5cf38	feat(orch): expose online_agents() and resident_keys() helpers	2026-04-02 11:22:29 -07:00
pyr0ball	d600fb6651	refactor(orch): hoist service_max_mb lookup; clarify warm-fallback comments	2026-04-02 11:21:20 -07:00
pyr0ball	13eb0c85f1	feat(orch): add NodeSelector — warm-first GPU scoring	2026-04-02 11:18:44 -07:00