feat(tasks): shared VRAM-aware LLM task scheduler #2

Merged

pyr0ball merged 4 commits from feature/shared-task-scheduler into main

2026-03-31 10:45:21 -07:00

Author	SHA1	Message	Date
pyr0ball	c027fe6137	fix(core): SQLite timeout=30, INSERT OR IGNORE migrations, parameterize tier unlockables - get_connection(): add timeout=30 to both sqlite3 and pysqlcipher3 paths so concurrent writers retry instead of immediately raising OperationalError - run_migrations(): INSERT OR IGNORE so two Store() calls racing on first boot don't hit a UNIQUE constraint on the migrations table - can_use() / tier_label(): accept _byok_unlockable and _local_vision_unlockable overrides so products pass their own frozensets rather than sharing module-level constants (required for circuitforge-core to serve multiple products cleanly)	2026-03-31 10:37:51 -07:00
pyr0ball	22bad8590a	fix(tasks): fix VRAM accounting race, lock scope, type annotations - C1: Remove _reserved_vram decrement from _scheduler_loop reaper; sole responsibility now belongs to _batch_worker's finally block, eliminating the double-decrement race that could drive _reserved_vram negative. - C2: Move TaskScheduler construction (including VRAM detection httpx call) outside _scheduler_lock in get_scheduler(); lock is now only held for the final singleton assignment, preventing 2s lock contention on first call. - I1: Add RunTaskFn type alias (Callable[...]) and use it in __init__ and get_scheduler() instead of bare Callable. - I2: Replace namedtuple TaskSpec with typed NamedTuple class. - I3: Parameterize _queues annotation as dict[str, deque[TaskSpec]]. - I4: Wrap _queues read in start() with self._lock. - I5: Replace time.sleep() ordering assertion in test_vram_budget_blocks_second_type with event-based synchronization using type_a_started/type_b_started events. - M2: Use sqlite3.connect() as context manager in _load_queued_tasks. - M3: Strengthen weak assertion in test_enqueue_returns_false_when_queue_full. - M4: Add test_reserved_vram_zero_after_task_completes to catch C1 regression.	2026-03-31 09:15:09 -07:00
pyr0ball	09a5087c72	test(tasks): add preflight fallback coverage to scheduler tests Adds test_detect_vram_preflight_fallback to cover the spec path where cf-orch is unreachable but scripts.preflight.get_gpus() succeeds, verifying detect_available_vram_gb() returns the summed total VRAM. Uses sys.modules injection to simulate the preflight module being present.	2026-03-30 23:15:19 -07:00
pyr0ball	5801928f8e	feat(tasks): add shared VRAM-aware LLM task scheduler Extract generic batch scheduler into circuitforge_core.tasks.scheduler so any CircuitForge product can use it. Includes VRAM detection via cf-orch coordinator (cooperative free-VRAM), preflight fallback, and unlimited fallback; singleton API; full test coverage (12 tests).	2026-03-30 23:12:23 -07:00