fix(orch): transition vllm instance state from starting to running after port probe #10
Labels
No labels
architecture
backlog
enhancement
module:documents
module:hardware
module:manage
module:pipeline
module:voice
priority:backlog
priority:high
priority:medium
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: Circuit-Forge/circuitforge-core#10
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
After a successful
/allocate, the coordinator seeds the service instance withstate: starting. There is no background probe to flip it torunningonce llm_server is accepting connections.Symptom:
/api/servicesalways showsstate: startingeven when the server is healthy at port 8000.Suggested approach
Add a lifespan background task in the coordinator that polls
startinginstances every few seconds, probes their/healthURL, and callsservice_registry.upsert_instance(state='running')on success orfailedafter a configurable timeout.Alternative: agent emits a
/services/{service}/readycallback to the coordinator onceis_running()returns True.Relevant files
circuitforge_core/resources/coordinator/app.py— upsert_instance seeds statecircuitforge_core/resources/agent/service_manager.py— is_running() socket probecircuitforge_core/resources/agent/service_probe.py— probe utilitiesFixed in
feature/orch-llm-server@a7290c1._run_instance_probe_loopruns as a background asyncio task in coordinator lifespan. Polls allstartinginstances every 5 s viaGET /health; transitions torunningon 200, orstoppedafter 300 s timeout.