feat: LLM queue optimizer — resource-aware batch scheduler (closes #2) #13

Merged

pyr0ball merged 17 commits from feature/llm-queue-optimizer into main

2026-03-15 05:11:30 -07:00

pyr0ball commented

2026-03-15 05:07:32 -07:00

Owner

Summary

New scripts/task_scheduler.py: Resource-aware TaskScheduler singleton that groups LLM tasks by type into per-type deques, schedules batches using a VRAM budget check (deepest queue wins), and runs each type serially to avoid repeated model context-switching. Up to N type batches may run concurrently when VRAM fits.
scripts/task_runner.py: submit_task() now routes cover_letter, company_research, and wizard_generate through the scheduler; all other types (discovery, email_sync, etc.) continue spawning free threads unchanged.
scripts/db.py: reset_running_tasks() — on restart, marks only running tasks failed while leaving queued tasks intact for the scheduler to resume (durability).
app/app.py: _startup() calls reset_running_tasks() instead of the old inline SQL that cleared both queued and running rows.
config/llm.yaml.example: Documented scheduler.vram_budgets and max_queue_depth config keys.
tests/test_task_scheduler.py (new): 24 tests covering budget loading, VRAM detection, enqueue depth guard, FIFO ordering, concurrent batches, mid-batch pickup, crash recovery, singleton thread-safety, and durability.

Key design decisions

No tier gating — scheduler applies to all tiers; especially important for the shared cloud instance where multiple users compete for GPU resources
Starvation guard — when _reserved_vram == 0, at least one batch always starts even if its budget exceeds the VRAM ceiling (prevents permanent deadlock on under-resourced systems)
Durability — queued rows survive restarts and are re-loaded into deques on TaskScheduler.__init__; only running rows (results unknown) are reset to failed
Circular import avoided — task_scheduler.py never imports task_runner.py; routing lives in submit_task(), _run_task passed as a parameter

Test plan

469 tests pass, 1 pre-existing failure excluded (test_generate_calls_llm_router, tracked in issue #12)
tests/test_task_scheduler.py: 24 new tests, all passing
tests/test_task_runner.py: all passing, no regressions
Full suite baseline: 445 → 469 (24 new tests added)
app/app.py syntax verified with py_compile
config/llm.yaml.example YAML valid

## Summary - **New `scripts/task_scheduler.py`**: Resource-aware `TaskScheduler` singleton that groups LLM tasks by type into per-type deques, schedules batches using a VRAM budget check (deepest queue wins), and runs each type serially to avoid repeated model context-switching. Up to N type batches may run concurrently when VRAM fits. - **`scripts/task_runner.py`**: `submit_task()` now routes `cover_letter`, `company_research`, and `wizard_generate` through the scheduler; all other types (`discovery`, `email_sync`, etc.) continue spawning free threads unchanged. - **`scripts/db.py`**: `reset_running_tasks()` — on restart, marks only `running` tasks failed while leaving `queued` tasks intact for the scheduler to resume (durability). - **`app/app.py`**: `_startup()` calls `reset_running_tasks()` instead of the old inline SQL that cleared both `queued` and `running` rows. - **`config/llm.yaml.example`**: Documented `scheduler.vram_budgets` and `max_queue_depth` config keys. - **`tests/test_task_scheduler.py`** (new): 24 tests covering budget loading, VRAM detection, enqueue depth guard, FIFO ordering, concurrent batches, mid-batch pickup, crash recovery, singleton thread-safety, and durability. ## Key design decisions - **No tier gating** — scheduler applies to all tiers; especially important for the shared cloud instance where multiple users compete for GPU resources - **Starvation guard** — when `_reserved_vram == 0`, at least one batch always starts even if its budget exceeds the VRAM ceiling (prevents permanent deadlock on under-resourced systems) - **Durability** — `queued` rows survive restarts and are re-loaded into deques on `TaskScheduler.__init__`; only `running` rows (results unknown) are reset to `failed` - **Circular import avoided** — `task_scheduler.py` never imports `task_runner.py`; routing lives in `submit_task()`, `_run_task` passed as a parameter ## Test plan - [x] 469 tests pass, 1 pre-existing failure excluded (`test_generate_calls_llm_router`, tracked in issue #12) - [x] `tests/test_task_scheduler.py`: 24 new tests, all passing - [x] `tests/test_task_runner.py`: all passing, no regressions - [x] Full suite baseline: 445 → 469 (24 new tests added) - [x] `app/app.py` syntax verified with `py_compile` - [x] `config/llm.yaml.example` YAML valid

pyr0ball added 17 commits 2026-03-15 05:07:33 -07:00

chore: add .worktrees/ to .gitignore 397b778217

Prevents worktree directories from being tracked.

docs: add LLM queue optimizer design spec 9fcfe7daa1

Resource-aware batch scheduler for LLM tasks. Closes #2.

docs: revise queue optimizer spec after review 7983f3365d

Addresses 16 review findings across two passes:
- Clarify _active.pop/double-decrement non-issue
- Fix app.py change target (inline SQL, not kill_stuck_tasks)
- Scope durability to LLM types only
- Add _budgets to state table with load logic
- Fix singleton safety explanation (lock, not GIL)
- Ghost row fix: mark dropped tasks failed in DB
- Document static _available_vram as known limitation
- Fix test_llm_tasks_batch_by_type description
- Eliminate circular import via routing split in submit_task()
- Add missing budget warning at construction

docs: add LLM queue optimizer implementation plan 07166325dd

11-task TDD plan across 3 reviewed chunks. Covers:
- reset_running_tasks() db helper
- TaskScheduler skeleton + __init__ + enqueue + loop + workers
- Thread-safe singleton, durability, submit_task routing shim
- app.py startup change + full suite verification

feat(db): add reset_running_tasks() for durable scheduler restart 5ba654e414

refactor(tests): remove unused imports from test_task_scheduler 1f2273f049

docs(config): add scheduler VRAM budget config to llm.yaml.example b664240340

feat(scheduler): add task_scheduler.py skeleton with constants and TaskSpec 0fedf7989e

feat(scheduler): implement TaskScheduler.__init__ with budget loading and VRAM detection cceacdd371

refactor(scheduler): use module-level _get_gpus directly in __init__ fa780af2f1

feat(scheduler): implement enqueue() with depth guard and ghost-row cleanup 605e820fa6

feat(scheduler): implement scheduler loop and batch worker with VRAM-aware scheduling 84ce68af46

feat(scheduler): implement thread-safe singleton get_scheduler/reset_scheduler 1d9020c99a

feat(scheduler): add durability — re-queue surviving LLM tasks on startup dfd2f0214e

feat(task_runner): route LLM tasks through scheduler in submit_task() 1139cd55ec

Replaces the spawn-per-task model for LLM task types with scheduler
routing: cover_letter, company_research, and wizard_generate are now
enqueued via the TaskScheduler singleton for VRAM-aware batching.
Non-LLM tasks (discovery, email_sync, etc.) continue to spawn daemon
threads directly. Adds autouse clean_scheduler fixture to
test_task_runner.py to prevent singleton cross-test contamination.

feat(app): use reset_running_tasks() on startup to preserve queued tasks a17ba1e8d8

feat: LLM queue optimizer complete — closes #2

CI / test (pull_request) Failing after 32s

Details

22091760bd

Resource-aware batch scheduler for LLM tasks:
- scripts/task_scheduler.py (new): TaskScheduler singleton with VRAM-aware
  batch scheduling, durability, thread-safe singleton, memory safety
- scripts/task_runner.py: submit_task() routes LLM types through scheduler
- scripts/db.py: reset_running_tasks() for durable restart behavior
- app/app.py: _startup() preserves queued tasks on restart
- config/llm.yaml.example: scheduler VRAM budget config documented
- tests/test_task_scheduler.py (new): 24 tests covering all behaviors

Pre-existing failure: test_generate_calls_llm_router (issue #12, unrelated)