feat(resources): cf-orch GPU VRAM orchestration — Plan A core #1

Merged
pyr0ball merged 21 commits from feature/cforch-core-orchestration into main 2026-03-31 10:43:53 -07:00

21 commits

Author SHA1 Message Date
db4e3047fd fix(resources): address code review findings from final review
- eviction_engine: replace deprecated asyncio.get_event_loop() with
  get_running_loop() (Python 3.12 compatibility)
- eviction_engine: remove unused httpx import
- coordinator app: return 422 for unknown node_id instead of silently
  falling back to hardcoded localhost URL
- eviction_executor: guard against pid <= 0 to prevent accidental
  SIGTERM to process group
- pyproject.toml: move pytest-asyncio to [dev] extras, not [orch]
- profile_registry: document CPU profile exclusion from list_public()
2026-03-30 22:46:07 -07:00
d755e9ea2c test(resources): add integration tests for full lease/eviction cycle 2026-03-30 22:37:06 -07:00
1f296c0cdb feat(resources): add [orch] package extras, cf-orch entry point, Docker compose 2026-03-30 22:34:40 -07:00
5fb3a2b41e style(resources): apply Black formatting to cli.py 2026-03-30 22:33:38 -07:00
70017abd35 feat(resources): add cf-orch CLI with start, agent, status, install-service commands 2026-03-30 22:27:11 -07:00
dba49a47fe refactor(resources): rename cforch → cf-orch in FastAPI titles 2026-03-30 22:22:48 -07:00
4bcd297b18 feat(resources): add cforch-coordinator FastAPI app with lease/node/profile endpoints 2026-03-30 22:01:46 -07:00
cede761d82 feat(resources): add AgentSupervisor and EvictionEngine 2026-03-30 21:44:42 -07:00
7718911652 feat(resources): add cforch-agent FastAPI app with /health /gpu-info /evict 2026-03-30 20:51:08 -07:00
4a857d5339 feat(resources): add EvictionExecutor with SIGTERM/grace/SIGKILL sequence 2026-03-30 20:46:45 -07:00
a79fd10f45 fix(resources): patch subprocess at import site in gpu_monitor tests 2026-03-30 20:45:01 -07:00
3dcbe801f1 feat(resources): add GpuMonitor for nvidia-smi polling 2026-03-30 20:42:57 -07:00
6b239b76e3 fix(resources): rename lambda var; convert asyncio.run test to async 2026-03-30 20:41:03 -07:00
d60503f059 feat(resources): add LeaseManager with VRAM tracking and eviction candidate selection 2026-03-30 20:38:51 -07:00
cdd8072b32 fix(resources): move MagicMock import to module level in profile registry tests 2026-03-30 20:36:40 -07:00
0389f4f167 feat(resources): add ProfileRegistry with auto-detect and public profile loading 2026-03-30 20:34:16 -07:00
5429e3f595 feat(resources): add 24GB, 16GB, 4GB, CPU+32GB, CPU+16GB public profiles 2026-03-30 20:32:13 -07:00
bfc1f7b7b9 fix(resources): guard non-dict YAML in load_profile; remove unused FIXTURES constant 2026-03-30 20:30:30 -07:00
c6a58b6a37 feat(resources): add GPU profile schema and public 8GB/6GB/2GB profiles 2026-03-30 20:28:06 -07:00
b774afb6b0 fix(resources): add expires_at sentinel comment; move pytest import to module level 2026-03-30 20:25:58 -07:00
0888f0f16b feat(resources): add shared VRAMLease, GpuInfo, NodeInfo models 2026-03-30 20:21:37 -07:00