Commit graph

31 commits

Author SHA1 Message Date
c027fe6137 fix(core): SQLite timeout=30, INSERT OR IGNORE migrations, parameterize tier unlockables
- get_connection(): add timeout=30 to both sqlite3 and pysqlcipher3 paths so
  concurrent writers retry instead of immediately raising OperationalError
- run_migrations(): INSERT OR IGNORE so two Store() calls racing on first boot
  don't hit a UNIQUE constraint on the migrations table
- can_use() / tier_label(): accept _byok_unlockable and _local_vision_unlockable
  overrides so products pass their own frozensets rather than sharing module-level
  constants (required for circuitforge-core to serve multiple products cleanly)
2026-03-31 10:37:51 -07:00
22bad8590a fix(tasks): fix VRAM accounting race, lock scope, type annotations
- C1: Remove _reserved_vram decrement from _scheduler_loop reaper; sole
  responsibility now belongs to _batch_worker's finally block, eliminating
  the double-decrement race that could drive _reserved_vram negative.
- C2: Move TaskScheduler construction (including VRAM detection httpx call)
  outside _scheduler_lock in get_scheduler(); lock is now only held for the
  final singleton assignment, preventing 2s lock contention on first call.
- I1: Add RunTaskFn type alias (Callable[...]) and use it in __init__ and
  get_scheduler() instead of bare Callable.
- I2: Replace namedtuple TaskSpec with typed NamedTuple class.
- I3: Parameterize _queues annotation as dict[str, deque[TaskSpec]].
- I4: Wrap _queues read in start() with self._lock.
- I5: Replace time.sleep() ordering assertion in test_vram_budget_blocks_second_type
  with event-based synchronization using type_a_started/type_b_started events.
- M2: Use sqlite3.connect() as context manager in _load_queued_tasks.
- M3: Strengthen weak assertion in test_enqueue_returns_false_when_queue_full.
- M4: Add test_reserved_vram_zero_after_task_completes to catch C1 regression.
2026-03-31 09:15:09 -07:00
09a5087c72 test(tasks): add preflight fallback coverage to scheduler tests
Adds test_detect_vram_preflight_fallback to cover the spec path where
cf-orch is unreachable but scripts.preflight.get_gpus() succeeds,
verifying detect_available_vram_gb() returns the summed total VRAM.
Uses sys.modules injection to simulate the preflight module being present.
2026-03-30 23:15:19 -07:00
5801928f8e feat(tasks): add shared VRAM-aware LLM task scheduler
Extract generic batch scheduler into circuitforge_core.tasks.scheduler
so any CircuitForge product can use it. Includes VRAM detection via
cf-orch coordinator (cooperative free-VRAM), preflight fallback, and
unlimited fallback; singleton API; full test coverage (12 tests).
2026-03-30 23:12:23 -07:00
db4e3047fd fix(resources): address code review findings from final review
- eviction_engine: replace deprecated asyncio.get_event_loop() with
  get_running_loop() (Python 3.12 compatibility)
- eviction_engine: remove unused httpx import
- coordinator app: return 422 for unknown node_id instead of silently
  falling back to hardcoded localhost URL
- eviction_executor: guard against pid <= 0 to prevent accidental
  SIGTERM to process group
- pyproject.toml: move pytest-asyncio to [dev] extras, not [orch]
- profile_registry: document CPU profile exclusion from list_public()
2026-03-30 22:46:07 -07:00
d755e9ea2c test(resources): add integration tests for full lease/eviction cycle 2026-03-30 22:37:06 -07:00
1f296c0cdb feat(resources): add [orch] package extras, cf-orch entry point, Docker compose 2026-03-30 22:34:40 -07:00
5fb3a2b41e style(resources): apply Black formatting to cli.py 2026-03-30 22:33:38 -07:00
70017abd35 feat(resources): add cf-orch CLI with start, agent, status, install-service commands 2026-03-30 22:27:11 -07:00
dba49a47fe refactor(resources): rename cforch → cf-orch in FastAPI titles 2026-03-30 22:22:48 -07:00
4bcd297b18 feat(resources): add cforch-coordinator FastAPI app with lease/node/profile endpoints 2026-03-30 22:01:46 -07:00
cede761d82 feat(resources): add AgentSupervisor and EvictionEngine 2026-03-30 21:44:42 -07:00
7718911652 feat(resources): add cforch-agent FastAPI app with /health /gpu-info /evict 2026-03-30 20:51:08 -07:00
4a857d5339 feat(resources): add EvictionExecutor with SIGTERM/grace/SIGKILL sequence 2026-03-30 20:46:45 -07:00
a79fd10f45 fix(resources): patch subprocess at import site in gpu_monitor tests 2026-03-30 20:45:01 -07:00
3dcbe801f1 feat(resources): add GpuMonitor for nvidia-smi polling 2026-03-30 20:42:57 -07:00
6b239b76e3 fix(resources): rename lambda var; convert asyncio.run test to async 2026-03-30 20:41:03 -07:00
d60503f059 feat(resources): add LeaseManager with VRAM tracking and eviction candidate selection 2026-03-30 20:38:51 -07:00
cdd8072b32 fix(resources): move MagicMock import to module level in profile registry tests 2026-03-30 20:36:40 -07:00
0389f4f167 feat(resources): add ProfileRegistry with auto-detect and public profile loading 2026-03-30 20:34:16 -07:00
5429e3f595 feat(resources): add 24GB, 16GB, 4GB, CPU+32GB, CPU+16GB public profiles 2026-03-30 20:32:13 -07:00
bfc1f7b7b9 fix(resources): guard non-dict YAML in load_profile; remove unused FIXTURES constant 2026-03-30 20:30:30 -07:00
c6a58b6a37 feat(resources): add GPU profile schema and public 8GB/6GB/2GB profiles 2026-03-30 20:28:06 -07:00
b774afb6b0 fix(resources): add expires_at sentinel comment; move pytest import to module level 2026-03-30 20:25:58 -07:00
0888f0f16b feat(resources): add shared VRAMLease, GpuInfo, NodeInfo models 2026-03-30 20:21:37 -07:00
56042dffba feat: add wizard and pipeline stubs 2026-03-25 11:09:40 -07:00
e09622729c feat: add config module and vision router stub 2026-03-25 11:08:03 -07:00
ae4624158e feat: add LLM router (extracted from Peregrine) 2026-03-25 11:06:29 -07:00
97ee2c20b6 feat: add generalised tier system with BYOK and local vision unlocks 2026-03-25 11:04:55 -07:00
76506a390e feat: add db base connection and migration runner 2026-03-25 11:03:35 -07:00
c4c9b78b91 feat: scaffold circuitforge-core package 2026-03-25 11:02:26 -07:00