feat(tasks): shared VRAM-aware LLM task scheduler #2

Merged
pyr0ball merged 4 commits from feature/shared-task-scheduler into main 2026-03-31 10:45:21 -07:00
Owner

Summary

  • Adds circuitforge_core.tasks.scheduler — a generic, VRAM-aware background task scheduler extracted from Peregrine
  • Products supply task_types, vram_budgets, and a run_task_fn; the core handles threading, VRAM accounting, and queue depth
  • VRAM detection priority: cf-orch /api/nodes (lease-aware free VRAM) → scripts.preflight.get_gpus() total → 999.0 (unlimited fallback)
  • Double-checked locking: VRAM detection runs outside the singleton lock to avoid holding it across network I/O
  • _batch_worker is the sole authority for _reserved_vram accounting (in finally) — fixes double-decrement race vs _scheduler_loop reaper
  • Also fixes: SQLite timeout=30 on all connections, INSERT OR IGNORE in migrations, parameterized _byok_unlockable/_local_vision_unlockable in can_use()/tier_label()
  • 13 tests covering VRAM detection paths, singleton, reset, startup loading, missing table, accounting correctness

⚠️ Depends on #1 (cf-orch) — merge that first so detect_available_vram_gb() can reach the /api/nodes endpoint in production.

Test plan

  • conda run -n cf pytest tests/test_tasks/ -v — 13 tests pass
  • VRAM detection falls back gracefully when cf-orch is not running
  • Singleton resets cleanly between tests via reset_scheduler()
  • _reserved_vram returns to 0 after task completes
## Summary - Adds `circuitforge_core.tasks.scheduler` — a generic, VRAM-aware background task scheduler extracted from Peregrine - Products supply `task_types`, `vram_budgets`, and a `run_task_fn`; the core handles threading, VRAM accounting, and queue depth - VRAM detection priority: cf-orch `/api/nodes` (lease-aware free VRAM) → `scripts.preflight.get_gpus()` total → 999.0 (unlimited fallback) - Double-checked locking: VRAM detection runs outside the singleton lock to avoid holding it across network I/O - `_batch_worker` is the sole authority for `_reserved_vram` accounting (in `finally`) — fixes double-decrement race vs `_scheduler_loop` reaper - Also fixes: SQLite `timeout=30` on all connections, `INSERT OR IGNORE` in migrations, parameterized `_byok_unlockable`/`_local_vision_unlockable` in `can_use()`/`tier_label()` - 13 tests covering VRAM detection paths, singleton, reset, startup loading, missing table, accounting correctness > ⚠️ **Depends on #1** (cf-orch) — merge that first so `detect_available_vram_gb()` can reach the `/api/nodes` endpoint in production. ## Test plan - [ ] `conda run -n cf pytest tests/test_tasks/ -v` — 13 tests pass - [ ] VRAM detection falls back gracefully when cf-orch is not running - [ ] Singleton resets cleanly between tests via `reset_scheduler()` - [ ] `_reserved_vram` returns to 0 after task completes
pyr0ball added 25 commits 2026-03-31 10:42:22 -07:00
- eviction_engine: replace deprecated asyncio.get_event_loop() with
  get_running_loop() (Python 3.12 compatibility)
- eviction_engine: remove unused httpx import
- coordinator app: return 422 for unknown node_id instead of silently
  falling back to hardcoded localhost URL
- eviction_executor: guard against pid <= 0 to prevent accidental
  SIGTERM to process group
- pyproject.toml: move pytest-asyncio to [dev] extras, not [orch]
- profile_registry: document CPU profile exclusion from list_public()
Extract generic batch scheduler into circuitforge_core.tasks.scheduler
so any CircuitForge product can use it. Includes VRAM detection via
cf-orch coordinator (cooperative free-VRAM), preflight fallback, and
unlimited fallback; singleton API; full test coverage (12 tests).
Adds test_detect_vram_preflight_fallback to cover the spec path where
cf-orch is unreachable but scripts.preflight.get_gpus() succeeds,
verifying detect_available_vram_gb() returns the summed total VRAM.
Uses sys.modules injection to simulate the preflight module being present.
- C1: Remove _reserved_vram decrement from _scheduler_loop reaper; sole
  responsibility now belongs to _batch_worker's finally block, eliminating
  the double-decrement race that could drive _reserved_vram negative.
- C2: Move TaskScheduler construction (including VRAM detection httpx call)
  outside _scheduler_lock in get_scheduler(); lock is now only held for the
  final singleton assignment, preventing 2s lock contention on first call.
- I1: Add RunTaskFn type alias (Callable[...]) and use it in __init__ and
  get_scheduler() instead of bare Callable.
- I2: Replace namedtuple TaskSpec with typed NamedTuple class.
- I3: Parameterize _queues annotation as dict[str, deque[TaskSpec]].
- I4: Wrap _queues read in start() with self._lock.
- I5: Replace time.sleep() ordering assertion in test_vram_budget_blocks_second_type
  with event-based synchronization using type_a_started/type_b_started events.
- M2: Use sqlite3.connect() as context manager in _load_queued_tasks.
- M3: Strengthen weak assertion in test_enqueue_returns_false_when_queue_full.
- M4: Add test_reserved_vram_zero_after_task_completes to catch C1 regression.
- get_connection(): add timeout=30 to both sqlite3 and pysqlcipher3 paths so
  concurrent writers retry instead of immediately raising OperationalError
- run_migrations(): INSERT OR IGNORE so two Store() calls racing on first boot
  don't hit a UNIQUE constraint on the migrations table
- can_use() / tier_label(): accept _byok_unlockable and _local_vision_unlockable
  overrides so products pass their own frozensets rather than sharing module-level
  constants (required for circuitforge-core to serve multiple products cleanly)
pyr0ball merged commit 563b73ce85 into main 2026-03-31 10:45:21 -07:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Circuit-Forge/circuitforge-core#2
No description provided.