circuitforge-core

Author	SHA1	Message	Date
pyr0ball	67701f0d29	feat(orch): agent self-registration and coordinator heartbeat loop coordinator/app.py: - Add POST /api/nodes — agents POST {node_id, agent_url} to self-register; coordinator immediately polls the new agent for GPU info - Add lifespan context manager that starts/stops AgentSupervisor heartbeat loop (previously the loop was never started) cli.py start: - Add --node-id flag (default 'local') - Pre-register the local agent URL (http://127.0.0.1:{agent_port}) so the heartbeat loop can poll it immediately on startup - Drop redundant lease_manager.register_gpu() call — supervisor.poll_agent() now does this via the heartbeat after the agent responds cli.py agent: - Add --advertise-host flag for NATted/multi-homed nodes - Fire registration POST to coordinator in a daemon thread (2s delay) so uvicorn.run() can start binding immediately; no double uvicorn.run()	2026-03-31 19:20:35 -07:00
pyr0ball	7aa0ad7a51	feat(dashboard): add self-hosted coordinator dashboard at GET / - dashboard.html: node-centric layout — GPU cards with VRAM bars and sparklines, active leases table with TTL progress bars, service health pill, auto-refreshes every 5s via fetch() against the local JSON API - All dynamic content set via DOM textContent / createElementNS — no innerHTML with user-sourced strings - coordinator/app.py: serves dashboard.html at GET / (HTMLResponse, excluded from OpenAPI schema); HTML read at import time from package dir - test_dashboard_serves_html: verifies 200, content-type text/html, and key route markers present	2026-03-31 18:57:25 -07:00
pyr0ball	c027fe6137	fix(core): SQLite timeout=30, INSERT OR IGNORE migrations, parameterize tier unlockables - get_connection(): add timeout=30 to both sqlite3 and pysqlcipher3 paths so concurrent writers retry instead of immediately raising OperationalError - run_migrations(): INSERT OR IGNORE so two Store() calls racing on first boot don't hit a UNIQUE constraint on the migrations table - can_use() / tier_label(): accept _byok_unlockable and _local_vision_unlockable overrides so products pass their own frozensets rather than sharing module-level constants (required for circuitforge-core to serve multiple products cleanly)	2026-03-31 10:37:51 -07:00
pyr0ball	22bad8590a	fix(tasks): fix VRAM accounting race, lock scope, type annotations - C1: Remove _reserved_vram decrement from _scheduler_loop reaper; sole responsibility now belongs to _batch_worker's finally block, eliminating the double-decrement race that could drive _reserved_vram negative. - C2: Move TaskScheduler construction (including VRAM detection httpx call) outside _scheduler_lock in get_scheduler(); lock is now only held for the final singleton assignment, preventing 2s lock contention on first call. - I1: Add RunTaskFn type alias (Callable[...]) and use it in __init__ and get_scheduler() instead of bare Callable. - I2: Replace namedtuple TaskSpec with typed NamedTuple class. - I3: Parameterize _queues annotation as dict[str, deque[TaskSpec]]. - I4: Wrap _queues read in start() with self._lock. - I5: Replace time.sleep() ordering assertion in test_vram_budget_blocks_second_type with event-based synchronization using type_a_started/type_b_started events. - M2: Use sqlite3.connect() as context manager in _load_queued_tasks. - M3: Strengthen weak assertion in test_enqueue_returns_false_when_queue_full. - M4: Add test_reserved_vram_zero_after_task_completes to catch C1 regression.	2026-03-31 09:15:09 -07:00
pyr0ball	5801928f8e	feat(tasks): add shared VRAM-aware LLM task scheduler Extract generic batch scheduler into circuitforge_core.tasks.scheduler so any CircuitForge product can use it. Includes VRAM detection via cf-orch coordinator (cooperative free-VRAM), preflight fallback, and unlimited fallback; singleton API; full test coverage (12 tests).	2026-03-30 23:12:23 -07:00
pyr0ball	db4e3047fd	fix(resources): address code review findings from final review - eviction_engine: replace deprecated asyncio.get_event_loop() with get_running_loop() (Python 3.12 compatibility) - eviction_engine: remove unused httpx import - coordinator app: return 422 for unknown node_id instead of silently falling back to hardcoded localhost URL - eviction_executor: guard against pid <= 0 to prevent accidental SIGTERM to process group - pyproject.toml: move pytest-asyncio to [dev] extras, not [orch] - profile_registry: document CPU profile exclusion from list_public()	2026-03-30 22:46:07 -07:00
pyr0ball	1f296c0cdb	feat(resources): add [orch] package extras, cf-orch entry point, Docker compose	2026-03-30 22:34:40 -07:00
pyr0ball	5fb3a2b41e	style(resources): apply Black formatting to cli.py	2026-03-30 22:33:38 -07:00
pyr0ball	70017abd35	feat(resources): add cf-orch CLI with start, agent, status, install-service commands	2026-03-30 22:27:11 -07:00
pyr0ball	dba49a47fe	refactor(resources): rename cforch → cf-orch in FastAPI titles	2026-03-30 22:22:48 -07:00
pyr0ball	4bcd297b18	feat(resources): add cforch-coordinator FastAPI app with lease/node/profile endpoints	2026-03-30 22:01:46 -07:00
pyr0ball	cede761d82	feat(resources): add AgentSupervisor and EvictionEngine	2026-03-30 21:44:42 -07:00
pyr0ball	7718911652	feat(resources): add cforch-agent FastAPI app with /health /gpu-info /evict	2026-03-30 20:51:08 -07:00
pyr0ball	4a857d5339	feat(resources): add EvictionExecutor with SIGTERM/grace/SIGKILL sequence	2026-03-30 20:46:45 -07:00
pyr0ball	3dcbe801f1	feat(resources): add GpuMonitor for nvidia-smi polling	2026-03-30 20:42:57 -07:00
pyr0ball	6b239b76e3	fix(resources): rename lambda var; convert asyncio.run test to async	2026-03-30 20:41:03 -07:00
pyr0ball	d60503f059	feat(resources): add LeaseManager with VRAM tracking and eviction candidate selection	2026-03-30 20:38:51 -07:00
pyr0ball	0389f4f167	feat(resources): add ProfileRegistry with auto-detect and public profile loading	2026-03-30 20:34:16 -07:00
pyr0ball	5429e3f595	feat(resources): add 24GB, 16GB, 4GB, CPU+32GB, CPU+16GB public profiles	2026-03-30 20:32:13 -07:00
pyr0ball	bfc1f7b7b9	fix(resources): guard non-dict YAML in load_profile; remove unused FIXTURES constant	2026-03-30 20:30:30 -07:00
pyr0ball	c6a58b6a37	feat(resources): add GPU profile schema and public 8GB/6GB/2GB profiles	2026-03-30 20:28:06 -07:00
pyr0ball	b774afb6b0	fix(resources): add expires_at sentinel comment; move pytest import to module level	2026-03-30 20:25:58 -07:00
pyr0ball	0888f0f16b	feat(resources): add shared VRAMLease, GpuInfo, NodeInfo models	2026-03-30 20:21:37 -07:00
pyr0ball	56042dffba	feat: add wizard and pipeline stubs	2026-03-25 11:09:40 -07:00
pyr0ball	e09622729c	feat: add config module and vision router stub	2026-03-25 11:08:03 -07:00
pyr0ball	ae4624158e	feat: add LLM router (extracted from Peregrine)	2026-03-25 11:06:29 -07:00
pyr0ball	97ee2c20b6	feat: add generalised tier system with BYOK and local vision unlocks	2026-03-25 11:04:55 -07:00
pyr0ball	76506a390e	feat: add db base connection and migration runner	2026-03-25 11:03:35 -07:00
pyr0ball	c4c9b78b91	feat: scaffold circuitforge-core package	2026-03-25 11:02:26 -07:00

29 commits