circuitforge-core

Author	SHA1	Message	Date
pyr0ball	a36f469d60	chore: CHANGELOG for v0.3.0	2026-04-02 18:56:49 -07:00
pyr0ball	1de5ec767c	Merge pull request 'feat: hardware detection, cf-docuvision service, documents ingestion pipeline' (#14 ) from feature/hardware-docuvision into main	2026-04-02 18:55:50 -07:00
pyr0ball	cd9864b5e8	feat: hardware detection, cf-docuvision service, documents ingestion pipeline Closes #5, #7, #8, #13 ## hardware module (closes #5) - HardwareSpec, LLMBackendConfig, LLMConfig dataclasses - VramTier ladder (CPU / 2 / 4 / 6 / 8 / 16 / 24 GB) with select_tier() - generate_profile() maps HardwareSpec → LLMConfig for llm.yaml generation - detect_hardware() with nvidia-smi / rocm-smi / system_profiler / cpu fallback - 31 tests across tiers, generator, and detect ## cf-docuvision service (closes #8) - FastAPI service wrapping ByteDance/Dolphin-v2 (Qwen2.5-VL backbone) - POST /extract: image_b64 or image_path + hint → ExtractResponse - Lazy model loading; JSON-structured output with plain-text fallback - ProcessSpec managed blocks added to all four GPU profiles (6/8/16/24 GB) - 14 tests ## documents module (closes #7) - StructuredDocument, Element, ParsedTable dataclasses (frozen, composable) - DocuvisionClient: thin HTTP client for cf-docuvision POST /extract - ingest(): primary cf-docuvision path → LLMRouter vision fallback → empty doc - CF_DOCUVISION_URL env var for URL override - 22 tests ## coordinator probe loop (closes #13) - _run_instance_probe_loop: starting → running on 200; starting → stopped on timeout - 4 async tests with CancelledError-based tick control	2026-04-02 18:53:25 -07:00
pyr0ball	482c430cdb	docs: add CHANGELOG for v0.1.0 and v0.2.0	2026-04-02 17:25:06 -07:00
pyr0ball	749e51ccca	Merge pull request 'feat(orch): health probe loop + VRAM pre-flight fix' (#12 ) from feature/orch-llm-server into main	2026-04-02 17:24:09 -07:00
pyr0ball	a7290c1240	feat(orch): background health probe loop — starting → running transition Coordinator now polls all 'starting' instances every 5 s via GET /health. On 200: state → running. After 300 s without a healthy response: state → stopped. Closes #10.	2026-04-02 17:18:16 -07:00
pyr0ball	bd132851ec	fix(orch): tighten VRAM pre-flight to require full max_mb free (not half) max_mb // 2 was too loose — Qwen2.5-3B needs ~5.9 GB on an 8 GB card but the threshold only required 3.25 GB free, allowing Ollama to hold 4.5 GB while a load attempt was still dispatched (causing OOM crash). - node_selector: can_fit = free_mb >= service_max_mb (was // 2) - coordinator /start: same threshold fix + updated error message - tests: two new node_selector tests pin the full-ceiling semantics; updated stale docstring in coordinator app test	2026-04-02 16:44:36 -07:00
pyr0ball	2d095f0090	fix(llm-server): handle transformers 5.x BatchEncoding; use dtype kwarg - apply_chat_template() returns BatchEncoding in transformers 5.x (not bare tensor); extract .input_ids explicitly with fallback for 4.x compat - Switch from deprecated torch_dtype= to dtype= in from_pretrained()	2026-04-02 16:36:07 -07:00
pyr0ball	c78341fc6f	feat(orch): replace Ouro/vllm-Docker with generic HF inference server; add ProcessSpec - Add circuitforge_core/resources/inference/llm_server.py: generic OpenAI-compatible FastAPI server for any HuggingFace causal LM (Phi-4-mini-instruct, Qwen2.5-3B-Instruct) - Add service_manager.py + service_probe.py: ProcessSpec start/stop/is_running support (Popen-based; socket probe confirms readiness before marking running) - Update all 4 public GPU profiles to use ProcessSpec→llm_server instead of Docker vllm: 6gb (max_mb 5500), 8gb (max_mb 6500), 16gb/24gb (max_mb 9000) - Model candidates: Phi-4-mini-instruct first (7.2GB), Qwen2.5-3B-Instruct fallback (5.8GB) - Remove ouro_server.py (Ouro incompatible with transformers 5.x; vllm Docker also incompatible) - Add 17 tests for ServiceManager ProcessSpec (start/stop/is_running/list/get_url)	2026-04-02 15:33:08 -07:00
pyr0ball	27999925cf	fix(orch): seed ServiceInstance on first allocate start	2026-04-02 14:22:55 -07:00
pyr0ball	c5e12b74f2	Merge pull request 'feat: auto service lifecycle — /allocate, NodeSelector, idle sweep, CFOrchClient' (#9 ) from feature/orch-auto-lifecycle into main	2026-04-02 14:11:36 -07:00
pyr0ball	e58c3aea23	fix: TTL sweep, immutability, service-scoped release, logger in orch alloc - ServiceRegistry: add sweep_expired_allocations() to remove stale TTL allocations and transition instances to idle; add get_allocation() helper - AgentSupervisor._run_idle_sweep: call sweep_expired_allocations() before idle-timeout check so crashed-caller leaks are cleaned up each sweep tick - schema._parse_managed: copy raw dict before extracting 'type' key instead of mutating caller's dict with pop() - app.release_allocation: validate allocation belongs to the given service path param before releasing; return 404 if mismatch - router._try_cf_orch_alloc: replace print() with logger.warning(); add module-level logger = logging.getLogger(__name__) - tests: add test_sweep_expired_allocations covering TTL expiry and idle state transition	2026-04-02 12:55:38 -07:00
pyr0ball	1a20b80a50	test: add VRAM pre-flight 503 test for ensure_service	2026-04-02 12:49:50 -07:00
pyr0ball	02806359af	feat: add Services table to coordinator dashboard	2026-04-02 12:47:27 -07:00
pyr0ball	a4ccaaf3e2	fix: address coordinator/idle-sweep quality issues from review - CRITICAL: idle sweep now calls mark_stopped() after successful HTTP stop, preventing repeated stop POSTs on every 3rd tick for the same instance - CRITICAL: active_allocations() now filters by gpu_id to avoid marking wrong instance idle on multi-GPU nodes when an allocation is released - CRITICAL: VRAM pre-flight guard in ensure_service was dead code — added the actual HTTPException(503) before the candidate loop - IMPORTANT: register() now updates agent_url on re-registration if it changed, so relocated agents are tracked correctly - IMPORTANT: updated test_service_registry.py callers of active_allocations() to pass the now-required gpu_id argument	2026-04-02 12:45:31 -07:00
pyr0ball	49ab9e4e88	feat: wire ServiceRegistry into coordinator allocate endpoints	2026-04-02 12:30:58 -07:00
pyr0ball	c299482e0d	feat: add idle sweep to AgentSupervisor	2026-04-02 12:30:28 -07:00
pyr0ball	1e168ac636	feat(profiles): add idle_stop_after_s field; set 600s for vllm slot Add idle_stop_after_s to ServiceProfile (default 0 = never stop). Set 600s (10 min) timeout on vllm slot in all single-GPU profiles. Backward compatible; non-vllm services inherit default 0 (no auto-stop).	2026-04-02 12:24:19 -07:00
pyr0ball	9754f522d9	feat(orch): add ServiceRegistry — allocation tracking + idle state machine	2026-04-02 12:22:46 -07:00
pyr0ball	17a24173f7	feat(llm): add cf_orch allocation support to LLMRouter backends	2026-04-02 12:19:17 -07:00
pyr0ball	f741e6a80b	fix(orch): hoist service-known check; capture resident_keys once in allocate	2026-04-02 11:45:48 -07:00
pyr0ball	defaf39883	feat(core): add CFOrchClient sync+async context manager Implements CFOrchClient with allocate() (sync contextmanager) and allocate_async() (async contextmanager) for cf-orch GPU resource allocation. Releases allocation on exit; ignores 404 on release; raises RuntimeError on non-2xx allocation response. Exports CFOrchClient and Allocation from circuitforge_core.resources. Note: async test uses unittest.mock rather than httpretty — httpretty only patches stdlib sockets and does not intercept httpx async (anyio) transport.	2026-04-02 11:44:35 -07:00
pyr0ball	8201f6b3e9	feat(orch): add /api/services/{service}/allocate with auto node selection	2026-04-02 11:25:38 -07:00
pyr0ball	52d2c5cf38	feat(orch): expose online_agents() and resident_keys() helpers	2026-04-02 11:22:29 -07:00
pyr0ball	d600fb6651	refactor(orch): hoist service_max_mb lookup; clarify warm-fallback comments	2026-04-02 11:21:20 -07:00
pyr0ball	13eb0c85f1	feat(orch): add NodeSelector — warm-first GPU scoring	2026-04-02 11:18:44 -07:00
pyr0ball	427182aae7	feat: cf-orch agent registration + VRAM lease wiring Merges feature/orch-agent-registration into main. - Agent self-registration and coordinator heartbeat loop - TaskScheduler acquires/releases cf-orch VRAM lease per batch worker - shutdown() now joins batch worker threads for clean teardown - 94 tests passing	2026-04-01 11:21:38 -07:00
pyr0ball	aa51794f45	fix(scheduler): join batch worker threads in shutdown() Previously shutdown() only joined the scheduler loop thread. Batch worker threads (which decrement _reserved_vram in their finally block) could still be running when shutdown returned, leaving stale VRAM accounting. Now snapshots active workers under lock and joins them all. Snapshot-then-join pattern avoids holding the lock across blocking join calls (which would deadlock since workers acquire the same lock on exit).	2026-04-01 11:21:30 -07:00
pyr0ball	6b8e421eb2	feat(scheduler): acquire/release cf-orch VRAM lease per batch worker Before running a batch of tasks, the scheduler now requests a VRAM lease from the cf-orch coordinator (POST /api/leases). The lease is held for the full batch and released in the finally block so it's always cleaned up even on error. Falls back gracefully if the coordinator is unreachable. Adds coordinator_url and service_name params to TaskScheduler.__init__ and get_scheduler() so callers can override the default localhost:7700.	2026-04-01 11:06:16 -07:00
pyr0ball	67701f0d29	feat(orch): agent self-registration and coordinator heartbeat loop coordinator/app.py: - Add POST /api/nodes — agents POST {node_id, agent_url} to self-register; coordinator immediately polls the new agent for GPU info - Add lifespan context manager that starts/stops AgentSupervisor heartbeat loop (previously the loop was never started) cli.py start: - Add --node-id flag (default 'local') - Pre-register the local agent URL (http://127.0.0.1:{agent_port}) so the heartbeat loop can poll it immediately on startup - Drop redundant lease_manager.register_gpu() call — supervisor.poll_agent() now does this via the heartbeat after the agent responds cli.py agent: - Add --advertise-host flag for NATted/multi-homed nodes - Fire registration POST to coordinator in a daemon thread (2s delay) so uvicorn.run() can start binding immediately; no double uvicorn.run()	2026-03-31 19:20:35 -07:00
pyr0ball	4596aad290	Merge pull request 'feat(dashboard): self-hosted coordinator dashboard at GET /' (#3 ) from feature/orch-dashboard into main	2026-03-31 18:59:46 -07:00
pyr0ball	7aa0ad7a51	feat(dashboard): add self-hosted coordinator dashboard at GET / - dashboard.html: node-centric layout — GPU cards with VRAM bars and sparklines, active leases table with TTL progress bars, service health pill, auto-refreshes every 5s via fetch() against the local JSON API - All dynamic content set via DOM textContent / createElementNS — no innerHTML with user-sourced strings - coordinator/app.py: serves dashboard.html at GET / (HTMLResponse, excluded from OpenAPI schema); HTML read at import time from package dir - test_dashboard_serves_html: verifies 200, content-type text/html, and key route markers present	2026-03-31 18:57:25 -07:00
pyr0ball	563b73ce85	Merge pull request 'feat(tasks): shared VRAM-aware LLM task scheduler' (#2 ) from feature/shared-task-scheduler into main	2026-03-31 10:45:21 -07:00
pyr0ball	99f4e95018	Merge pull request 'feat(resources): cf-orch GPU VRAM orchestration — Plan A core' (#1 ) from feature/cforch-core-orchestration into main	2026-03-31 10:43:52 -07:00
pyr0ball	c027fe6137	fix(core): SQLite timeout=30, INSERT OR IGNORE migrations, parameterize tier unlockables - get_connection(): add timeout=30 to both sqlite3 and pysqlcipher3 paths so concurrent writers retry instead of immediately raising OperationalError - run_migrations(): INSERT OR IGNORE so two Store() calls racing on first boot don't hit a UNIQUE constraint on the migrations table - can_use() / tier_label(): accept _byok_unlockable and _local_vision_unlockable overrides so products pass their own frozensets rather than sharing module-level constants (required for circuitforge-core to serve multiple products cleanly)	2026-03-31 10:37:51 -07:00
pyr0ball	22bad8590a	fix(tasks): fix VRAM accounting race, lock scope, type annotations - C1: Remove _reserved_vram decrement from _scheduler_loop reaper; sole responsibility now belongs to _batch_worker's finally block, eliminating the double-decrement race that could drive _reserved_vram negative. - C2: Move TaskScheduler construction (including VRAM detection httpx call) outside _scheduler_lock in get_scheduler(); lock is now only held for the final singleton assignment, preventing 2s lock contention on first call. - I1: Add RunTaskFn type alias (Callable[...]) and use it in __init__ and get_scheduler() instead of bare Callable. - I2: Replace namedtuple TaskSpec with typed NamedTuple class. - I3: Parameterize _queues annotation as dict[str, deque[TaskSpec]]. - I4: Wrap _queues read in start() with self._lock. - I5: Replace time.sleep() ordering assertion in test_vram_budget_blocks_second_type with event-based synchronization using type_a_started/type_b_started events. - M2: Use sqlite3.connect() as context manager in _load_queued_tasks. - M3: Strengthen weak assertion in test_enqueue_returns_false_when_queue_full. - M4: Add test_reserved_vram_zero_after_task_completes to catch C1 regression.	2026-03-31 09:15:09 -07:00
pyr0ball	09a5087c72	test(tasks): add preflight fallback coverage to scheduler tests Adds test_detect_vram_preflight_fallback to cover the spec path where cf-orch is unreachable but scripts.preflight.get_gpus() succeeds, verifying detect_available_vram_gb() returns the summed total VRAM. Uses sys.modules injection to simulate the preflight module being present.	2026-03-30 23:15:19 -07:00
pyr0ball	5801928f8e	feat(tasks): add shared VRAM-aware LLM task scheduler Extract generic batch scheduler into circuitforge_core.tasks.scheduler so any CircuitForge product can use it. Includes VRAM detection via cf-orch coordinator (cooperative free-VRAM), preflight fallback, and unlimited fallback; singleton API; full test coverage (12 tests).	2026-03-30 23:12:23 -07:00
pyr0ball	db4e3047fd	fix(resources): address code review findings from final review - eviction_engine: replace deprecated asyncio.get_event_loop() with get_running_loop() (Python 3.12 compatibility) - eviction_engine: remove unused httpx import - coordinator app: return 422 for unknown node_id instead of silently falling back to hardcoded localhost URL - eviction_executor: guard against pid <= 0 to prevent accidental SIGTERM to process group - pyproject.toml: move pytest-asyncio to [dev] extras, not [orch] - profile_registry: document CPU profile exclusion from list_public()	2026-03-30 22:46:07 -07:00
pyr0ball	d755e9ea2c	test(resources): add integration tests for full lease/eviction cycle	2026-03-30 22:37:06 -07:00
pyr0ball	1f296c0cdb	feat(resources): add [orch] package extras, cf-orch entry point, Docker compose	2026-03-30 22:34:40 -07:00
pyr0ball	5fb3a2b41e	style(resources): apply Black formatting to cli.py	2026-03-30 22:33:38 -07:00
pyr0ball	70017abd35	feat(resources): add cf-orch CLI with start, agent, status, install-service commands	2026-03-30 22:27:11 -07:00
pyr0ball	dba49a47fe	refactor(resources): rename cforch → cf-orch in FastAPI titles	2026-03-30 22:22:48 -07:00
pyr0ball	4bcd297b18	feat(resources): add cforch-coordinator FastAPI app with lease/node/profile endpoints	2026-03-30 22:01:46 -07:00
pyr0ball	cede761d82	feat(resources): add AgentSupervisor and EvictionEngine	2026-03-30 21:44:42 -07:00
pyr0ball	7718911652	feat(resources): add cforch-agent FastAPI app with /health /gpu-info /evict	2026-03-30 20:51:08 -07:00
pyr0ball	4a857d5339	feat(resources): add EvictionExecutor with SIGTERM/grace/SIGKILL sequence	2026-03-30 20:46:45 -07:00
pyr0ball	a79fd10f45	fix(resources): patch subprocess at import site in gpu_monitor tests	2026-03-30 20:45:01 -07:00
pyr0ball	3dcbe801f1	feat(resources): add GpuMonitor for nvidia-smi polling	2026-03-30 20:42:57 -07:00

1 2

65 commits