feat: LLM queue optimizer — resource-aware batch scheduler (closes #2) #13

Merged
pyr0ball merged 17 commits from feature/llm-queue-optimizer into main 2026-03-15 05:11:30 -07:00
Owner

Summary

  • New scripts/task_scheduler.py: Resource-aware TaskScheduler singleton that groups LLM tasks by type into per-type deques, schedules batches using a VRAM budget check (deepest queue wins), and runs each type serially to avoid repeated model context-switching. Up to N type batches may run concurrently when VRAM fits.
  • scripts/task_runner.py: submit_task() now routes cover_letter, company_research, and wizard_generate through the scheduler; all other types (discovery, email_sync, etc.) continue spawning free threads unchanged.
  • scripts/db.py: reset_running_tasks() — on restart, marks only running tasks failed while leaving queued tasks intact for the scheduler to resume (durability).
  • app/app.py: _startup() calls reset_running_tasks() instead of the old inline SQL that cleared both queued and running rows.
  • config/llm.yaml.example: Documented scheduler.vram_budgets and max_queue_depth config keys.
  • tests/test_task_scheduler.py (new): 24 tests covering budget loading, VRAM detection, enqueue depth guard, FIFO ordering, concurrent batches, mid-batch pickup, crash recovery, singleton thread-safety, and durability.

Key design decisions

  • No tier gating — scheduler applies to all tiers; especially important for the shared cloud instance where multiple users compete for GPU resources
  • Starvation guard — when _reserved_vram == 0, at least one batch always starts even if its budget exceeds the VRAM ceiling (prevents permanent deadlock on under-resourced systems)
  • Durabilityqueued rows survive restarts and are re-loaded into deques on TaskScheduler.__init__; only running rows (results unknown) are reset to failed
  • Circular import avoidedtask_scheduler.py never imports task_runner.py; routing lives in submit_task(), _run_task passed as a parameter

Test plan

  • 469 tests pass, 1 pre-existing failure excluded (test_generate_calls_llm_router, tracked in issue #12)
  • tests/test_task_scheduler.py: 24 new tests, all passing
  • tests/test_task_runner.py: all passing, no regressions
  • Full suite baseline: 445 → 469 (24 new tests added)
  • app/app.py syntax verified with py_compile
  • config/llm.yaml.example YAML valid
## Summary - **New `scripts/task_scheduler.py`**: Resource-aware `TaskScheduler` singleton that groups LLM tasks by type into per-type deques, schedules batches using a VRAM budget check (deepest queue wins), and runs each type serially to avoid repeated model context-switching. Up to N type batches may run concurrently when VRAM fits. - **`scripts/task_runner.py`**: `submit_task()` now routes `cover_letter`, `company_research`, and `wizard_generate` through the scheduler; all other types (`discovery`, `email_sync`, etc.) continue spawning free threads unchanged. - **`scripts/db.py`**: `reset_running_tasks()` — on restart, marks only `running` tasks failed while leaving `queued` tasks intact for the scheduler to resume (durability). - **`app/app.py`**: `_startup()` calls `reset_running_tasks()` instead of the old inline SQL that cleared both `queued` and `running` rows. - **`config/llm.yaml.example`**: Documented `scheduler.vram_budgets` and `max_queue_depth` config keys. - **`tests/test_task_scheduler.py`** (new): 24 tests covering budget loading, VRAM detection, enqueue depth guard, FIFO ordering, concurrent batches, mid-batch pickup, crash recovery, singleton thread-safety, and durability. ## Key design decisions - **No tier gating** — scheduler applies to all tiers; especially important for the shared cloud instance where multiple users compete for GPU resources - **Starvation guard** — when `_reserved_vram == 0`, at least one batch always starts even if its budget exceeds the VRAM ceiling (prevents permanent deadlock on under-resourced systems) - **Durability** — `queued` rows survive restarts and are re-loaded into deques on `TaskScheduler.__init__`; only `running` rows (results unknown) are reset to `failed` - **Circular import avoided** — `task_scheduler.py` never imports `task_runner.py`; routing lives in `submit_task()`, `_run_task` passed as a parameter ## Test plan - [x] 469 tests pass, 1 pre-existing failure excluded (`test_generate_calls_llm_router`, tracked in issue #12) - [x] `tests/test_task_scheduler.py`: 24 new tests, all passing - [x] `tests/test_task_runner.py`: all passing, no regressions - [x] Full suite baseline: 445 → 469 (24 new tests added) - [x] `app/app.py` syntax verified with `py_compile` - [x] `config/llm.yaml.example` YAML valid
pyr0ball added 17 commits 2026-03-15 05:07:33 -07:00
Prevents worktree directories from being tracked.
Resource-aware batch scheduler for LLM tasks. Closes #2.
Addresses 16 review findings across two passes:
- Clarify _active.pop/double-decrement non-issue
- Fix app.py change target (inline SQL, not kill_stuck_tasks)
- Scope durability to LLM types only
- Add _budgets to state table with load logic
- Fix singleton safety explanation (lock, not GIL)
- Ghost row fix: mark dropped tasks failed in DB
- Document static _available_vram as known limitation
- Fix test_llm_tasks_batch_by_type description
- Eliminate circular import via routing split in submit_task()
- Add missing budget warning at construction
11-task TDD plan across 3 reviewed chunks. Covers:
- reset_running_tasks() db helper
- TaskScheduler skeleton + __init__ + enqueue + loop + workers
- Thread-safe singleton, durability, submit_task routing shim
- app.py startup change + full suite verification
Replaces the spawn-per-task model for LLM task types with scheduler
routing: cover_letter, company_research, and wizard_generate are now
enqueued via the TaskScheduler singleton for VRAM-aware batching.
Non-LLM tasks (discovery, email_sync, etc.) continue to spawn daemon
threads directly. Adds autouse clean_scheduler fixture to
test_task_runner.py to prevent singleton cross-test contamination.
feat: LLM queue optimizer complete — closes #2
Some checks failed
CI / test (pull_request) Failing after 32s
22091760bd
Resource-aware batch scheduler for LLM tasks:
- scripts/task_scheduler.py (new): TaskScheduler singleton with VRAM-aware
  batch scheduling, durability, thread-safe singleton, memory safety
- scripts/task_runner.py: submit_task() routes LLM types through scheduler
- scripts/db.py: reset_running_tasks() for durable restart behavior
- app/app.py: _startup() preserves queued tasks on restart
- config/llm.yaml.example: scheduler VRAM budget config documented
- tests/test_task_scheduler.py (new): 24 tests covering all behaviors

Pre-existing failure: test_generate_calls_llm_router (issue #12, unrelated)
pyr0ball self-assigned this 2026-03-15 05:08:24 -07:00
pyr0ball added this to the The Menagerie project 2026-03-15 05:08:26 -07:00
pyr0ball added this to the Beta — Private Access milestone 2026-03-15 05:08:30 -07:00
pyr0ball added the
feature-request
label 2026-03-15 05:08:37 -07:00
pyr0ball merged commit e6f0e41de4 into main 2026-03-15 05:11:30 -07:00
pyr0ball deleted branch feature/llm-queue-optimizer 2026-03-15 05:11:30 -07:00
Sign in to join this conversation.
No reviewers
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Circuit-Forge/peregrine#13
No description provided.