feat(tasks): shared VRAM-aware LLM task scheduler #2

Merged

pyr0ball merged 4 commits from feature/shared-task-scheduler into main

2026-03-31 10:45:21 -07:00

pyr0ball commented

2026-03-31 10:42:22 -07:00

Owner

Summary

Adds circuitforge_core.tasks.scheduler — a generic, VRAM-aware background task scheduler extracted from Peregrine
Products supply task_types, vram_budgets, and a run_task_fn; the core handles threading, VRAM accounting, and queue depth
VRAM detection priority: cf-orch /api/nodes (lease-aware free VRAM) → scripts.preflight.get_gpus() total → 999.0 (unlimited fallback)
Double-checked locking: VRAM detection runs outside the singleton lock to avoid holding it across network I/O
_batch_worker is the sole authority for _reserved_vram accounting (in finally) — fixes double-decrement race vs _scheduler_loop reaper
Also fixes: SQLite timeout=30 on all connections, INSERT OR IGNORE in migrations, parameterized _byok_unlockable/_local_vision_unlockable in can_use()/tier_label()
13 tests covering VRAM detection paths, singleton, reset, startup loading, missing table, accounting correctness

⚠️ Depends on #1 (cf-orch) — merge that first so detect_available_vram_gb() can reach the /api/nodes endpoint in production.

Test plan

conda run -n cf pytest tests/test_tasks/ -v — 13 tests pass
VRAM detection falls back gracefully when cf-orch is not running
Singleton resets cleanly between tests via reset_scheduler()
_reserved_vram returns to 0 after task completes

## Summary - Adds `circuitforge_core.tasks.scheduler` — a generic, VRAM-aware background task scheduler extracted from Peregrine - Products supply `task_types`, `vram_budgets`, and a `run_task_fn`; the core handles threading, VRAM accounting, and queue depth - VRAM detection priority: cf-orch `/api/nodes` (lease-aware free VRAM) → `scripts.preflight.get_gpus()` total → 999.0 (unlimited fallback) - Double-checked locking: VRAM detection runs outside the singleton lock to avoid holding it across network I/O - `_batch_worker` is the sole authority for `_reserved_vram` accounting (in `finally`) — fixes double-decrement race vs `_scheduler_loop` reaper - Also fixes: SQLite `timeout=30` on all connections, `INSERT OR IGNORE` in migrations, parameterized `_byok_unlockable`/`_local_vision_unlockable` in `can_use()`/`tier_label()` - 13 tests covering VRAM detection paths, singleton, reset, startup loading, missing table, accounting correctness > ⚠️ **Depends on #1** (cf-orch) — merge that first so `detect_available_vram_gb()` can reach the `/api/nodes` endpoint in production. ## Test plan - [ ] `conda run -n cf pytest tests/test_tasks/ -v` — 13 tests pass - [ ] VRAM detection falls back gracefully when cf-orch is not running - [ ] Singleton resets cleanly between tests via `reset_scheduler()` - [ ] `_reserved_vram` returns to 0 after task completes

pyr0ball added 25 commits 2026-03-31 10:42:22 -07:00

feat(resources): add shared VRAMLease, GpuInfo, NodeInfo models 0888f0f16b

fix(resources): add expires_at sentinel comment; move pytest import to module level b774afb6b0

feat(resources): add GPU profile schema and public 8GB/6GB/2GB profiles c6a58b6a37

fix(resources): guard non-dict YAML in load_profile; remove unused FIXTURES constant bfc1f7b7b9

feat(resources): add 24GB, 16GB, 4GB, CPU+32GB, CPU+16GB public profiles 5429e3f595

feat(resources): add ProfileRegistry with auto-detect and public profile loading 0389f4f167

fix(resources): move MagicMock import to module level in profile registry tests cdd8072b32

feat(resources): add LeaseManager with VRAM tracking and eviction candidate selection d60503f059

fix(resources): rename lambda var; convert asyncio.run test to async 6b239b76e3

feat(resources): add GpuMonitor for nvidia-smi polling 3dcbe801f1

fix(resources): patch subprocess at import site in gpu_monitor tests a79fd10f45

feat(resources): add EvictionExecutor with SIGTERM/grace/SIGKILL sequence 4a857d5339

feat(resources): add cforch-agent FastAPI app with /health /gpu-info /evict 7718911652

feat(resources): add AgentSupervisor and EvictionEngine cede761d82

feat(resources): add cforch-coordinator FastAPI app with lease/node/profile endpoints 4bcd297b18

refactor(resources): rename cforch → cf-orch in FastAPI titles dba49a47fe

feat(resources): add cf-orch CLI with start, agent, status, install-service commands 70017abd35

style(resources): apply Black formatting to cli.py 5fb3a2b41e

feat(resources): add [orch] package extras, cf-orch entry point, Docker compose 1f296c0cdb

test(resources): add integration tests for full lease/eviction cycle d755e9ea2c

fix(resources): address code review findings from final review db4e3047fd

- eviction_engine: replace deprecated asyncio.get_event_loop() with
  get_running_loop() (Python 3.12 compatibility)
- eviction_engine: remove unused httpx import
- coordinator app: return 422 for unknown node_id instead of silently
  falling back to hardcoded localhost URL
- eviction_executor: guard against pid <= 0 to prevent accidental
  SIGTERM to process group
- pyproject.toml: move pytest-asyncio to [dev] extras, not [orch]
- profile_registry: document CPU profile exclusion from list_public()

feat(tasks): add shared VRAM-aware LLM task scheduler 5801928f8e

Extract generic batch scheduler into circuitforge_core.tasks.scheduler
so any CircuitForge product can use it. Includes VRAM detection via
cf-orch coordinator (cooperative free-VRAM), preflight fallback, and
unlimited fallback; singleton API; full test coverage (12 tests).

test(tasks): add preflight fallback coverage to scheduler tests 09a5087c72

Adds test_detect_vram_preflight_fallback to cover the spec path where
cf-orch is unreachable but scripts.preflight.get_gpus() succeeds,
verifying detect_available_vram_gb() returns the summed total VRAM.
Uses sys.modules injection to simulate the preflight module being present.

fix(tasks): fix VRAM accounting race, lock scope, type annotations 22bad8590a

- C1: Remove _reserved_vram decrement from _scheduler_loop reaper; sole
  responsibility now belongs to _batch_worker's finally block, eliminating
  the double-decrement race that could drive _reserved_vram negative.
- C2: Move TaskScheduler construction (including VRAM detection httpx call)
  outside _scheduler_lock in get_scheduler(); lock is now only held for the
  final singleton assignment, preventing 2s lock contention on first call.
- I1: Add RunTaskFn type alias (Callable[...]) and use it in __init__ and
  get_scheduler() instead of bare Callable.
- I2: Replace namedtuple TaskSpec with typed NamedTuple class.
- I3: Parameterize _queues annotation as dict[str, deque[TaskSpec]].
- I4: Wrap _queues read in start() with self._lock.
- I5: Replace time.sleep() ordering assertion in test_vram_budget_blocks_second_type
  with event-based synchronization using type_a_started/type_b_started events.
- M2: Use sqlite3.connect() as context manager in _load_queued_tasks.
- M3: Strengthen weak assertion in test_enqueue_returns_false_when_queue_full.
- M4: Add test_reserved_vram_zero_after_task_completes to catch C1 regression.

fix(core): SQLite timeout=30, INSERT OR IGNORE migrations, parameterize tier unlockables c027fe6137

- get_connection(): add timeout=30 to both sqlite3 and pysqlcipher3 paths so
  concurrent writers retry instead of immediately raising OperationalError
- run_migrations(): INSERT OR IGNORE so two Store() calls racing on first boot
  don't hit a UNIQUE constraint on the migrations table
- can_use() / tier_label(): accept _byok_unlockable and _local_vision_unlockable
  overrides so products pass their own frozensets rather than sharing module-level
  constants (required for circuitforge-core to serve multiple products cleanly)