6.4 KiB
6.4 KiB
Changelog
All notable changes to circuitforge-core are documented here.
Format follows Keep a Changelog.
Versions follow Semantic Versioning.
[0.4.0] — 2026-04-02
Added
Agent watchdog — coordinator-restart reconnect (closes #15)
NodeStore: SQLite persistence for known agent nodes (~/.local/share/circuitforge/cf-orch-nodes.db);upserton every registration,prune_staleremoves nodes unseen for 30+ daysAgentSupervisor.restore_from_store(): reloads all previously-known nodes on coordinator startup; nodes startoffline=Falseand come online within one heartbeat cycle (~10 s) without touching the agent processesAgentSupervisor.register()now persists toNodeStoreon every call- Agent CLI: one-shot registration replaced with a persistent 30 s reconnect loop (daemon thread); coordinator restart → remote nodes (Navi, Strahl, etc.) reappear automatically with no manual intervention
Ollama adopt-if-running + configurable health path (closes #16)
ProcessSpec.adopt(bool, defaultFalse): whenTrue,ServiceManager.start()probes the health endpoint first and claims the already-running process rather than spawning a new one — designed for system daemons like OllamaProcessSpec.health_path(str, default"/health"): configurable health probe path; Ollama uses/api/tagsServiceManager._probe_health(): shared urllib health check used by bothstart()andis_running()for adopt services- Agent
/services/{service}/startresponse includesadopted: truewhen the service was claimed rather than started; coordinator sets instance state torunningimmediately (skips probe loop wait) ServiceInstance.health_pathfield;upsert_instance(health_path=)kwarg- Coordinator probe loop uses
inst.health_pathinstead of hardcoded/health _get_health_path()helper looks up the ProcessSpec health path from the profile registry- All GPU profiles (2/4/6/8/16/24 GB + cpu-16/32 GB):
ollamaservice now has amanaged:block withadopt: true,health_path: /api/tags, port 11434
[0.3.0] — 2026-04-02
Added
Hardware module (circuitforge_core.hardware) — closes #5
detect_hardware(): probes nvidia-smi / rocm-smi / Apple system_profiler / CPU fallback →HardwareSpecselect_tier(vram_mb): maps physical VRAM to a namedVramTier(CPU / 2 / 4 / 6 / 8 / 16 / 24 GB)generate_profile(spec): converts aHardwareSpec+ service URLs →LLMConfig(llm.yaml-compatible)HardwareSpec,LLMBackendConfig,LLMConfigdataclasses
cf-docuvision service (circuitforge_core.resources.docuvision) — closes #8
- FastAPI HTTP service wrapping ByteDance/Dolphin-v2 (Qwen2.5-VL backbone, ~8 GB VRAM)
POST /extract: acceptsimage_b64orimage_path+hint(auto / table / text / form) →ExtractResponse- Lazy model loading — model stays unloaded until first request
- JSON-structured output with 21 element types; plain-text fallback when model returns unstructured output
ProcessSpecmanaged blocks wired into all four GPU profiles (6 / 8 / 16 / 24 GB)--gpu-idflag respected viaCUDA_VISIBLE_DEVICES
Documents module (circuitforge_core.documents) — closes #7
ingest(image_bytes, hint) → StructuredDocument— single call for all consumers- Primary path: cf-docuvision HTTP service; automatic fallback to
LLMRoutervision; graceful empty doc on total failure StructuredDocument,Element,ParsedTablefrozen dataclasses with.headings/.paragraphsconvenience propertiesCF_DOCUVISION_URLenv var for service URL overrideDocuvisionClient: reusable HTTP client for cf-docuvision withis_healthy()probe
Coordinator probe loop tests — closes #13
- 4 async tests for
_run_instance_probe_loop: healthy transition, timeout eviction, state cleanup, no-URL guard
[0.2.0] — 2026-04-02
Added
Orchestrator — auto service lifecycle
ServiceRegistry: in-memory allocation tracker with state machine (starting → running → idle → stopped)NodeSelector: warm-first GPU scoring — prefers nodes already running the requested model, falls back to highest free VRAM/api/services/{service}/allocatecoordinator endpoint: auto-selects best node, starts the llm_server process via agent, returns URLCFOrchClient: sync + async context managers for coordinator allocation/release- Idle sweep in
AgentSupervisor: stops instances that have been idle longer thanidle_stop_after_s(default 600 s for vllm slot) - Background health probe loop: coordinator polls all
startinginstances every 5 s viaGET /health; promotes torunningon 200, marksstoppedafter 300 s timeout (closes #10) - Services table in coordinator dashboard HTML
idle_stop_after_sfield in service profiles
LLM Router
- cf-orch allocation support in
LLMRouterbackends - VRAM lease acquisition/release wired through scheduler batch workers
Scheduler
- cf-orch VRAM lease per batch worker — prevents over-subscription
join()on batch worker threads during shutdown
HF inference server (llm_server.py)
- Generic HuggingFace
transformersinference server replacing Ouro/vllm-Docker-specific code ProcessSpecwiring in agentservice_manager.py- Handles transformers 5.x
BatchEncodingreturn type fromapply_chat_template - Uses
dtype=kwarg (replaces deprecatedtorch_dtype=)
Fixed
- VRAM pre-flight threshold tightened: coordinator and
NodeSelectornow require fullservice_max_mbfree (wasmax_mb // 2), preventing instances from starting on GPUs with insufficient headroom (closes #11 / related) ServiceInstancenow seeded correctly on first/allocatecall- TTL sweep, immutability, and service-scoped release correctness in allocation path
- Coordinator logger added for allocation path visibility
Changed
- Removed Ouro/vllm-Docker specifics from llm_server — now a generic HF inference endpoint
[0.1.0] — 2026-03-01
Added
- Package scaffold (
circuitforge_core) - DB base connection and migration runner
- Generalised tier system with BYOK (bring your own key) and local-vision unlocks
- LLM router extracted from Peregrine (fallback chain, vision-aware, BYOK support)
- Config module and vision router stub
- cf-orch orchestrator: coordinator (port 7700) + agent (port 7701)
- Agent registration + VRAM lease wiring
- Coordinator dashboard (HTML)