feat(orch): agent self-registration + coordinator heartbeat loop #4
No reviewers
Labels
No labels
architecture
backlog
enhancement
module:documents
module:hardware
module:manage
module:pipeline
module:voice
priority:backlog
priority:high
priority:medium
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: Circuit-Forge/circuitforge-core#4
Loading…
Reference in a new issue
No description provided.
Delete branch "feature/orch-agent-registration"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
POST /api/nodes— agents self-register by posting{node_id, agent_url}; coordinator immediately polls for GPU infoAgentSupervisorheartbeat loop (was never started before — root cause of empty dashboard)cf-orch startpre-registers the local agent so its GPUs appear immediately without a separatecf-orch agentcallcf-orch agentfires a registration POST to the coordinator in a daemon thread after a 2s delay, then blocks onuvicorn.run()--advertise-hostflag oncf-orch agentfor NATted/multi-homed nodes (e.g. Navi behind a VPN)Test plan
conda run -n cf pytest tests/ -q— 94/94 passcf-orch start --node-id heimdall→GET /api/nodesreturns heimdall with both RTX 4000shttp://10.1.10.71:7700/cf-orch agent --coordinator http://10.1.10.71:7700 --node-id navi --advertise-host 10.1.10.10coordinator/app.py: - Add POST /api/nodes — agents POST {node_id, agent_url} to self-register; coordinator immediately polls the new agent for GPU info - Add lifespan context manager that starts/stops AgentSupervisor heartbeat loop (previously the loop was never started) cli.py start: - Add --node-id flag (default 'local') - Pre-register the local agent URL (http://127.0.0.1:{agent_port}) so the heartbeat loop can poll it immediately on startup - Drop redundant lease_manager.register_gpu() call — supervisor.poll_agent() now does this via the heartbeat after the agent responds cli.py agent: - Add --advertise-host flag for NATted/multi-homed nodes - Fire registration POST to coordinator in a daemon thread (2s delay) so uvicorn.run() can start binding immediately; no double uvicorn.run()Pull request closed