Commit graph

29 commits

Author SHA1 Message Date
c78341fc6f feat(orch): replace Ouro/vllm-Docker with generic HF inference server; add ProcessSpec
- Add circuitforge_core/resources/inference/llm_server.py: generic OpenAI-compatible
  FastAPI server for any HuggingFace causal LM (Phi-4-mini-instruct, Qwen2.5-3B-Instruct)
- Add service_manager.py + service_probe.py: ProcessSpec start/stop/is_running support
  (Popen-based; socket probe confirms readiness before marking running)
- Update all 4 public GPU profiles to use ProcessSpec→llm_server instead of Docker vllm:
  6gb (max_mb 5500), 8gb (max_mb 6500), 16gb/24gb (max_mb 9000)
- Model candidates: Phi-4-mini-instruct first (7.2GB), Qwen2.5-3B-Instruct fallback (5.8GB)
- Remove ouro_server.py (Ouro incompatible with transformers 5.x; vllm Docker also incompatible)
- Add 17 tests for ServiceManager ProcessSpec (start/stop/is_running/list/get_url)
2026-04-02 15:33:08 -07:00
e58c3aea23 fix: TTL sweep, immutability, service-scoped release, logger in orch alloc
- ServiceRegistry: add sweep_expired_allocations() to remove stale TTL
  allocations and transition instances to idle; add get_allocation() helper
- AgentSupervisor._run_idle_sweep: call sweep_expired_allocations() before
  idle-timeout check so crashed-caller leaks are cleaned up each sweep tick
- schema._parse_managed: copy raw dict before extracting 'type' key instead
  of mutating caller's dict with pop()
- app.release_allocation: validate allocation belongs to the given service
  path param before releasing; return 404 if mismatch
- router._try_cf_orch_alloc: replace print() with logger.warning(); add
  module-level logger = logging.getLogger(__name__)
- tests: add test_sweep_expired_allocations covering TTL expiry and idle
  state transition
2026-04-02 12:55:38 -07:00
1a20b80a50 test: add VRAM pre-flight 503 test for ensure_service 2026-04-02 12:49:50 -07:00
a4ccaaf3e2 fix: address coordinator/idle-sweep quality issues from review
- CRITICAL: idle sweep now calls mark_stopped() after successful HTTP stop,
  preventing repeated stop POSTs on every 3rd tick for the same instance
- CRITICAL: active_allocations() now filters by gpu_id to avoid marking wrong
  instance idle on multi-GPU nodes when an allocation is released
- CRITICAL: VRAM pre-flight guard in ensure_service was dead code — added the
  actual HTTPException(503) before the candidate loop
- IMPORTANT: register() now updates agent_url on re-registration if it changed,
  so relocated agents are tracked correctly
- IMPORTANT: updated test_service_registry.py callers of active_allocations()
  to pass the now-required gpu_id argument
2026-04-02 12:45:31 -07:00
49ab9e4e88 feat: wire ServiceRegistry into coordinator allocate endpoints 2026-04-02 12:30:58 -07:00
c299482e0d feat: add idle sweep to AgentSupervisor 2026-04-02 12:30:28 -07:00
1e168ac636 feat(profiles): add idle_stop_after_s field; set 600s for vllm slot
Add idle_stop_after_s to ServiceProfile (default 0 = never stop).
Set 600s (10 min) timeout on vllm slot in all single-GPU profiles.
Backward compatible; non-vllm services inherit default 0 (no auto-stop).
2026-04-02 12:24:19 -07:00
9754f522d9 feat(orch): add ServiceRegistry — allocation tracking + idle state machine 2026-04-02 12:22:46 -07:00
defaf39883 feat(core): add CFOrchClient sync+async context manager
Implements CFOrchClient with allocate() (sync contextmanager) and
allocate_async() (async contextmanager) for cf-orch GPU resource
allocation. Releases allocation on exit; ignores 404 on release;
raises RuntimeError on non-2xx allocation response. Exports
CFOrchClient and Allocation from circuitforge_core.resources.

Note: async test uses unittest.mock rather than httpretty — httpretty
only patches stdlib sockets and does not intercept httpx async (anyio)
transport.
2026-04-02 11:44:35 -07:00
8201f6b3e9 feat(orch): add /api/services/{service}/allocate with auto node selection 2026-04-02 11:25:38 -07:00
52d2c5cf38 feat(orch): expose online_agents() and resident_keys() helpers 2026-04-02 11:22:29 -07:00
13eb0c85f1 feat(orch): add NodeSelector — warm-first GPU scoring 2026-04-02 11:18:44 -07:00
7aa0ad7a51 feat(dashboard): add self-hosted coordinator dashboard at GET /
- dashboard.html: node-centric layout — GPU cards with VRAM bars and
  sparklines, active leases table with TTL progress bars, service health
  pill, auto-refreshes every 5s via fetch() against the local JSON API
- All dynamic content set via DOM textContent / createElementNS — no
  innerHTML with user-sourced strings
- coordinator/app.py: serves dashboard.html at GET / (HTMLResponse,
  excluded from OpenAPI schema); HTML read at import time from package dir
- test_dashboard_serves_html: verifies 200, content-type text/html,
  and key route markers present
2026-03-31 18:57:25 -07:00
d755e9ea2c test(resources): add integration tests for full lease/eviction cycle 2026-03-30 22:37:06 -07:00
70017abd35 feat(resources): add cf-orch CLI with start, agent, status, install-service commands 2026-03-30 22:27:11 -07:00
4bcd297b18 feat(resources): add cforch-coordinator FastAPI app with lease/node/profile endpoints 2026-03-30 22:01:46 -07:00
cede761d82 feat(resources): add AgentSupervisor and EvictionEngine 2026-03-30 21:44:42 -07:00
7718911652 feat(resources): add cforch-agent FastAPI app with /health /gpu-info /evict 2026-03-30 20:51:08 -07:00
4a857d5339 feat(resources): add EvictionExecutor with SIGTERM/grace/SIGKILL sequence 2026-03-30 20:46:45 -07:00
a79fd10f45 fix(resources): patch subprocess at import site in gpu_monitor tests 2026-03-30 20:45:01 -07:00
3dcbe801f1 feat(resources): add GpuMonitor for nvidia-smi polling 2026-03-30 20:42:57 -07:00
6b239b76e3 fix(resources): rename lambda var; convert asyncio.run test to async 2026-03-30 20:41:03 -07:00
d60503f059 feat(resources): add LeaseManager with VRAM tracking and eviction candidate selection 2026-03-30 20:38:51 -07:00
cdd8072b32 fix(resources): move MagicMock import to module level in profile registry tests 2026-03-30 20:36:40 -07:00
0389f4f167 feat(resources): add ProfileRegistry with auto-detect and public profile loading 2026-03-30 20:34:16 -07:00
bfc1f7b7b9 fix(resources): guard non-dict YAML in load_profile; remove unused FIXTURES constant 2026-03-30 20:30:30 -07:00
c6a58b6a37 feat(resources): add GPU profile schema and public 8GB/6GB/2GB profiles 2026-03-30 20:28:06 -07:00
b774afb6b0 fix(resources): add expires_at sentinel comment; move pytest import to module level 2026-03-30 20:25:58 -07:00
0888f0f16b feat(resources): add shared VRAMLease, GpuInfo, NodeInfo models 2026-03-30 20:21:37 -07:00