peregrine/docs/superpowers/plans/2026-03-14-llm-queue-optimizer.md
pyr0ball 07166325dd docs: add LLM queue optimizer implementation plan
11-task TDD plan across 3 reviewed chunks. Covers:
- reset_running_tasks() db helper
- TaskScheduler skeleton + __init__ + enqueue + loop + workers
- Thread-safe singleton, durability, submit_task routing shim
- app.py startup change + full suite verification
2026-03-14 17:11:49 -07:00

1306 lines
44 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# LLM Queue Optimizer Implementation Plan
> **For agentic workers:** REQUIRED: Use superpowers:subagent-driven-development (if subagents available) or superpowers:executing-plans to implement this plan. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Replace Peregrine's spawn-per-task LLM threading model with a resource-aware batch scheduler that groups tasks by model type, respects VRAM budgets, and survives process restarts.
**Architecture:** A new `TaskScheduler` singleton (in `scripts/task_scheduler.py`) maintains per-type deques for LLM tasks (`cover_letter`, `company_research`, `wizard_generate`). A scheduler daemon thread picks the deepest queue that fits in available VRAM and runs it serially; multiple type batches may overlap when VRAM allows. Non-LLM tasks (`discovery`, `email_sync`, etc.) continue to spawn free threads unchanged. On restart, `queued` LLM tasks are re-loaded from SQLite; only `running` tasks (results unknown) are reset to `failed`.
**Tech Stack:** Python 3.12, SQLite (via `scripts/db.py`), `threading`, `collections.deque`, `scripts/preflight.py` (VRAM detection), pytest
**Spec:** `docs/superpowers/specs/2026-03-14-llm-queue-optimizer-design.md`
**Worktree:** `/Library/Development/CircuitForge/peregrine/.worktrees/feature-llm-queue-optimizer/`
**All commands run from worktree root.** Pytest: `/devl/miniconda3/envs/job-seeker/bin/pytest`
---
## Chunk 1: Foundation
Tasks 13. DB helper, config update, and skeleton module. No threading yet.
---
### Task 1: `reset_running_tasks()` in `scripts/db.py`
Adds a focused restart-safe helper that resets only `running` tasks to `failed`, leaving `queued` rows untouched for the scheduler to resume.
**Files:**
- Modify: `scripts/db.py` (after `kill_stuck_tasks()`, ~line 367)
- Create: `tests/test_task_scheduler.py` (first test)
- [ ] **Step 1: Create the test file with the first failing test**
Create `tests/test_task_scheduler.py`:
```python
# tests/test_task_scheduler.py
"""Tests for scripts/task_scheduler.py and related db helpers."""
import sqlite3
import threading
import time
from collections import deque
from pathlib import Path
import pytest
from scripts.db import init_db, reset_running_tasks
@pytest.fixture
def tmp_db(tmp_path):
db = tmp_path / "test.db"
init_db(db)
return db
def test_reset_running_tasks_resets_only_running(tmp_db):
"""reset_running_tasks() marks running→failed but leaves queued untouched."""
conn = sqlite3.connect(tmp_db)
conn.execute(
"INSERT INTO background_tasks (task_type, job_id, status) VALUES (?,?,?)",
("cover_letter", 1, "running"),
)
conn.execute(
"INSERT INTO background_tasks (task_type, job_id, status) VALUES (?,?,?)",
("company_research", 2, "queued"),
)
conn.commit()
conn.close()
count = reset_running_tasks(tmp_db)
conn = sqlite3.connect(tmp_db)
rows = {r[0]: r[1] for r in conn.execute(
"SELECT task_type, status FROM background_tasks"
).fetchall()}
conn.close()
assert count == 1
assert rows["cover_letter"] == "failed"
assert rows["company_research"] == "queued"
def test_reset_running_tasks_returns_zero_when_nothing_running(tmp_db):
"""Returns 0 when no running tasks exist."""
conn = sqlite3.connect(tmp_db)
conn.execute(
"INSERT INTO background_tasks (task_type, job_id, status) VALUES (?,?,?)",
("cover_letter", 1, "queued"),
)
conn.commit()
conn.close()
assert reset_running_tasks(tmp_db) == 0
```
- [ ] **Step 2: Run tests to confirm they fail**
```bash
/devl/miniconda3/envs/job-seeker/bin/pytest tests/test_task_scheduler.py -v
```
Expected: `ImportError: cannot import name 'reset_running_tasks' from 'scripts.db'`
- [ ] **Step 3: Add `reset_running_tasks()` to `scripts/db.py`**
Insert after `kill_stuck_tasks()` (~line 367):
```python
def reset_running_tasks(db_path: Path = DEFAULT_DB) -> int:
"""On restart: mark in-flight tasks failed. Queued tasks survive for the scheduler."""
conn = sqlite3.connect(db_path)
count = conn.execute(
"UPDATE background_tasks SET status='failed', error='Interrupted by restart',"
" finished_at=datetime('now') WHERE status='running'"
).rowcount
conn.commit()
conn.close()
return count
```
- [ ] **Step 4: Run tests to confirm they pass**
```bash
/devl/miniconda3/envs/job-seeker/bin/pytest tests/test_task_scheduler.py -v
```
Expected: `2 passed`
- [ ] **Step 5: Commit**
```bash
git add scripts/db.py tests/test_task_scheduler.py
git commit -m "feat(db): add reset_running_tasks() for durable scheduler restart"
```
---
### Task 2: Add `scheduler:` section to `config/llm.yaml.example`
Documents VRAM budgets so operators know what to configure.
**Files:**
- Modify: `config/llm.yaml.example` (append at end)
- [ ] **Step 1: Append scheduler config section**
Add to the end of `config/llm.yaml.example`:
```yaml
# ── Scheduler — LLM batch queue optimizer ─────────────────────────────────────
# The scheduler batches LLM tasks by model type to avoid GPU model switching.
# VRAM budgets are conservative peak estimates (GB) for each task type.
# Increase if your models are larger; decrease if tasks share GPU memory well.
scheduler:
vram_budgets:
cover_letter: 2.5 # alex-cover-writer:latest (~2GB GGUF + headroom)
company_research: 5.0 # llama3.1:8b or vllm model
wizard_generate: 2.5 # same model family as cover_letter
max_queue_depth: 500 # max pending tasks per type before drops (with logged warning)
```
- [ ] **Step 2: Verify the file is valid YAML**
```bash
conda run -n job-seeker python -c "import yaml; yaml.safe_load(open('config/llm.yaml.example'))"
```
Expected: no output (no error)
- [ ] **Step 3: Commit**
```bash
git add config/llm.yaml.example
git commit -m "docs(config): add scheduler VRAM budget config to llm.yaml.example"
```
---
### Task 3: Create `scripts/task_scheduler.py` skeleton
Establishes the module with constants, `TaskSpec`, and an empty `TaskScheduler` class. Subsequent tasks fill in the implementation method by method under TDD.
**Files:**
- Create: `scripts/task_scheduler.py`
- [ ] **Step 1: Create the skeleton file**
Create `scripts/task_scheduler.py`:
```python
# scripts/task_scheduler.py
"""Resource-aware batch scheduler for LLM background tasks.
Routes LLM task types through per-type deques with VRAM-aware scheduling.
Non-LLM tasks bypass this module — routing lives in scripts/task_runner.py.
Public API:
LLM_TASK_TYPES — set of task type strings routed through the scheduler
get_scheduler() — lazy singleton accessor
reset_scheduler() — test teardown only
"""
import logging
import sqlite3
import threading
from collections import deque, namedtuple
from pathlib import Path
from typing import Callable, Optional
# Module-level import so tests can monkeypatch scripts.task_scheduler._get_gpus
try:
from scripts.preflight import get_gpus as _get_gpus
except Exception: # graceful degradation if preflight unavailable
_get_gpus = lambda: []
logger = logging.getLogger(__name__)
# Task types that go through the scheduler (all others spawn free threads)
LLM_TASK_TYPES: frozenset[str] = frozenset({
"cover_letter",
"company_research",
"wizard_generate",
})
# Conservative peak VRAM estimates (GB) per task type.
# Overridable per-install via scheduler.vram_budgets in config/llm.yaml.
DEFAULT_VRAM_BUDGETS: dict[str, float] = {
"cover_letter": 2.5, # alex-cover-writer:latest (~2GB GGUF + headroom)
"company_research": 5.0, # llama3.1:8b or vllm model
"wizard_generate": 2.5, # same model family as cover_letter
}
# Lightweight task descriptor stored in per-type deques
TaskSpec = namedtuple("TaskSpec", ["id", "job_id", "params"])
class TaskScheduler:
"""Resource-aware LLM task batch scheduler. Use get_scheduler() — not direct construction."""
pass
# ── Singleton ─────────────────────────────────────────────────────────────────
_scheduler: Optional[TaskScheduler] = None
_scheduler_lock = threading.Lock()
def get_scheduler(db_path: Path, run_task_fn: Callable = None) -> TaskScheduler:
"""Return the process-level TaskScheduler singleton, constructing it if needed.
run_task_fn is required on the first call (when the singleton is constructed);
ignored on subsequent calls. Pass scripts.task_runner._run_task.
"""
raise NotImplementedError
def reset_scheduler() -> None:
"""Shut down and clear the singleton. TEST TEARDOWN ONLY — not for production use."""
raise NotImplementedError
```
- [ ] **Step 2: Verify the module imports cleanly**
```bash
conda run -n job-seeker python -c "from scripts.task_scheduler import LLM_TASK_TYPES, TaskSpec, TaskScheduler; print('ok')"
```
Expected: `ok`
- [ ] **Step 3: Commit**
```bash
git add scripts/task_scheduler.py
git commit -m "feat(scheduler): add task_scheduler.py skeleton with constants and TaskSpec"
```
---
## Chunk 2: Scheduler Core
Tasks 47. Implements `TaskScheduler` method-by-method under TDD: init, enqueue, loop, workers, singleton, and durability.
---
### Task 4: `TaskScheduler.__init__()` — budget loading and VRAM detection
**Files:**
- Modify: `scripts/task_scheduler.py` (replace `pass` in class body)
- Modify: `tests/test_task_scheduler.py` (add tests)
- [ ] **Step 1: Add failing tests**
Append to `tests/test_task_scheduler.py`:
```python
from scripts.task_scheduler import (
TaskScheduler, LLM_TASK_TYPES, DEFAULT_VRAM_BUDGETS,
get_scheduler, reset_scheduler,
)
def _noop_run_task(*args, **kwargs):
"""Stand-in for _run_task that does nothing."""
pass
@pytest.fixture(autouse=True)
def clean_scheduler():
"""Reset singleton between every test."""
yield
reset_scheduler()
def test_default_budgets_used_when_no_config(tmp_db):
"""Scheduler falls back to DEFAULT_VRAM_BUDGETS when config key absent."""
s = TaskScheduler(tmp_db, _noop_run_task)
assert s._budgets == DEFAULT_VRAM_BUDGETS
def test_config_budgets_override_defaults(tmp_db, tmp_path):
"""Values in llm.yaml scheduler.vram_budgets override defaults."""
config_dir = tmp_db.parent.parent / "config"
config_dir.mkdir(parents=True, exist_ok=True)
(config_dir / "llm.yaml").write_text(
"scheduler:\n vram_budgets:\n cover_letter: 9.9\n"
)
s = TaskScheduler(tmp_db, _noop_run_task)
assert s._budgets["cover_letter"] == 9.9
# Non-overridden keys still use defaults
assert s._budgets["company_research"] == DEFAULT_VRAM_BUDGETS["company_research"]
def test_missing_budget_logs_warning(tmp_db, caplog):
"""A type in LLM_TASK_TYPES with no budget entry logs a warning."""
import logging
# Temporarily add a type with no budget
original = LLM_TASK_TYPES.copy() if hasattr(LLM_TASK_TYPES, 'copy') else set(LLM_TASK_TYPES)
from scripts import task_scheduler as ts
ts.LLM_TASK_TYPES = frozenset(LLM_TASK_TYPES | {"orphan_type"})
try:
with caplog.at_level(logging.WARNING, logger="scripts.task_scheduler"):
s = TaskScheduler(tmp_db, _noop_run_task)
assert any("orphan_type" in r.message for r in caplog.records)
finally:
ts.LLM_TASK_TYPES = frozenset(original)
def test_cpu_only_system_gets_unlimited_vram(tmp_db, monkeypatch):
"""_available_vram is 999.0 when _get_gpus() returns empty list."""
# Patch the module-level _get_gpus in task_scheduler (not preflight)
# so __init__'s _ts_mod._get_gpus() call picks up the mock.
monkeypatch.setattr("scripts.task_scheduler._get_gpus", lambda: [])
s = TaskScheduler(tmp_db, _noop_run_task)
assert s._available_vram == 999.0
def test_gpu_vram_summed_across_all_gpus(tmp_db, monkeypatch):
"""_available_vram sums vram_total_gb across all detected GPUs."""
fake_gpus = [
{"name": "RTX 3090", "vram_total_gb": 24.0, "vram_free_gb": 20.0},
{"name": "RTX 3090", "vram_total_gb": 24.0, "vram_free_gb": 18.0},
]
monkeypatch.setattr("scripts.task_scheduler._get_gpus", lambda: fake_gpus)
s = TaskScheduler(tmp_db, _noop_run_task)
assert s._available_vram == 48.0
```
- [ ] **Step 2: Run to confirm failures**
```bash
/devl/miniconda3/envs/job-seeker/bin/pytest tests/test_task_scheduler.py -v -k "budget or vram or warning"
```
Expected: multiple failures — `TaskScheduler.__init__` not implemented yet
- [ ] **Step 3: Implement `__init__`**
Replace `pass` in the `TaskScheduler` class with:
```python
def __init__(self, db_path: Path, run_task_fn: Callable) -> None:
self._db_path = db_path
self._run_task = run_task_fn
self._lock = threading.Lock()
self._wake = threading.Event()
self._stop = threading.Event()
self._queues: dict[str, deque] = {}
self._active: dict[str, threading.Thread] = {}
self._reserved_vram: float = 0.0
self._thread: Optional[threading.Thread] = None
# Load VRAM budgets: defaults + optional config overrides
self._budgets: dict[str, float] = dict(DEFAULT_VRAM_BUDGETS)
config_path = db_path.parent.parent / "config" / "llm.yaml"
self._max_queue_depth: int = 500
if config_path.exists():
try:
import yaml
with open(config_path) as f:
cfg = yaml.safe_load(f) or {}
sched_cfg = cfg.get("scheduler", {})
self._budgets.update(sched_cfg.get("vram_budgets", {}))
self._max_queue_depth = sched_cfg.get("max_queue_depth", 500)
except Exception as exc:
logger.warning("Failed to load scheduler config from %s: %s", config_path, exc)
# Warn on LLM types with no budget entry after merge
for t in LLM_TASK_TYPES:
if t not in self._budgets:
logger.warning(
"No VRAM budget defined for LLM task type %r — "
"defaulting to 0.0 GB (unlimited concurrency for this type)", t
)
# Detect total GPU VRAM; fall back to unlimited (999) on CPU-only systems.
# Uses module-level _get_gpus so tests can monkeypatch scripts.task_scheduler._get_gpus.
try:
from scripts import task_scheduler as _ts_mod
gpus = _ts_mod._get_gpus()
self._available_vram: float = (
sum(g["vram_total_gb"] for g in gpus) if gpus else 999.0
)
except Exception:
self._available_vram = 999.0
```
- [ ] **Step 4: Run tests to confirm they pass**
```bash
/devl/miniconda3/envs/job-seeker/bin/pytest tests/test_task_scheduler.py -v -k "budget or vram or warning"
```
Expected: 5 passed
- [ ] **Step 5: Commit**
```bash
git add scripts/task_scheduler.py tests/test_task_scheduler.py
git commit -m "feat(scheduler): implement TaskScheduler.__init__ with budget loading and VRAM detection"
```
---
### Task 5: `TaskScheduler.enqueue()` — depth guard and ghost-row cleanup
**Files:**
- Modify: `scripts/task_scheduler.py` (add `enqueue` method)
- Modify: `tests/test_task_scheduler.py` (add tests)
- [ ] **Step 1: Add failing tests**
Append to `tests/test_task_scheduler.py`:
```python
def test_enqueue_adds_taskspec_to_deque(tmp_db):
"""enqueue() appends a TaskSpec to the correct per-type deque."""
s = TaskScheduler(tmp_db, _noop_run_task)
s.enqueue(1, "cover_letter", 10, None)
s.enqueue(2, "cover_letter", 11, '{"key": "val"}')
assert len(s._queues["cover_letter"]) == 2
assert s._queues["cover_letter"][0].id == 1
assert s._queues["cover_letter"][1].id == 2
def test_enqueue_wakes_scheduler(tmp_db):
"""enqueue() sets the _wake event so the scheduler loop re-evaluates."""
s = TaskScheduler(tmp_db, _noop_run_task)
assert not s._wake.is_set()
s.enqueue(1, "cover_letter", 10, None)
assert s._wake.is_set()
def test_max_queue_depth_marks_task_failed(tmp_db):
"""When queue is at max_queue_depth, dropped task is marked failed in DB."""
from scripts.db import insert_task
s = TaskScheduler(tmp_db, _noop_run_task)
s._max_queue_depth = 2
# Fill the queue to the limit via direct deque manipulation (no DB rows needed)
from scripts.task_scheduler import TaskSpec
s._queues.setdefault("cover_letter", deque())
s._queues["cover_letter"].append(TaskSpec(99, 1, None))
s._queues["cover_letter"].append(TaskSpec(100, 2, None))
# Insert a real DB row for the task we're about to drop
task_id, _ = insert_task(tmp_db, "cover_letter", 3)
# This enqueue should be rejected and the DB row marked failed
s.enqueue(task_id, "cover_letter", 3, None)
conn = sqlite3.connect(tmp_db)
row = conn.execute(
"SELECT status, error FROM background_tasks WHERE id=?", (task_id,)
).fetchone()
conn.close()
assert row[0] == "failed"
assert "depth" in row[1].lower()
# Queue length unchanged
assert len(s._queues["cover_letter"]) == 2
def test_max_queue_depth_logs_warning(tmp_db, caplog):
"""Queue depth overflow logs a WARNING."""
import logging
from scripts.db import insert_task
from scripts.task_scheduler import TaskSpec
s = TaskScheduler(tmp_db, _noop_run_task)
s._max_queue_depth = 0 # immediately at limit
task_id, _ = insert_task(tmp_db, "cover_letter", 1)
with caplog.at_level(logging.WARNING, logger="scripts.task_scheduler"):
s.enqueue(task_id, "cover_letter", 1, None)
assert any("depth" in r.message.lower() for r in caplog.records)
```
- [ ] **Step 2: Run to confirm failures**
```bash
/devl/miniconda3/envs/job-seeker/bin/pytest tests/test_task_scheduler.py -v -k "enqueue or depth"
```
Expected: failures — `enqueue` not defined
- [ ] **Step 3: Implement `enqueue()`**
Add method to `TaskScheduler` (after `__init__`):
```python
def enqueue(self, task_id: int, task_type: str, job_id: int,
params: Optional[str]) -> None:
"""Add an LLM task to the scheduler queue.
If the queue for this type is at max_queue_depth, the task is marked
failed in SQLite immediately (no ghost queued rows) and a warning is logged.
"""
from scripts.db import update_task_status
with self._lock:
q = self._queues.setdefault(task_type, deque())
if len(q) >= self._max_queue_depth:
logger.warning(
"Queue depth limit reached for %s (max=%d) — task %d dropped",
task_type, self._max_queue_depth, task_id,
)
update_task_status(self._db_path, task_id, "failed",
error="Queue depth limit reached")
return
q.append(TaskSpec(task_id, job_id, params))
self._wake.set()
```
- [ ] **Step 4: Run tests to confirm they pass**
```bash
/devl/miniconda3/envs/job-seeker/bin/pytest tests/test_task_scheduler.py -v -k "enqueue or depth"
```
Expected: 4 passed
- [ ] **Step 5: Commit**
```bash
git add scripts/task_scheduler.py tests/test_task_scheduler.py
git commit -m "feat(scheduler): implement enqueue() with depth guard and ghost-row cleanup"
```
---
### Task 6: Scheduler loop, batch worker, `start()`, and `shutdown()`
The core execution engine. The scheduler loop picks the deepest eligible queue and starts a serial batch worker for it.
**Files:**
- Modify: `scripts/task_scheduler.py` (add `start`, `shutdown`, `_scheduler_loop`, `_batch_worker`)
- Modify: `tests/test_task_scheduler.py` (add threading tests)
- [ ] **Step 1: Add failing tests**
Append to `tests/test_task_scheduler.py`:
```python
# ── Threading helpers ─────────────────────────────────────────────────────────
def _make_recording_run_task(log: list, done_event: threading.Event, expected: int):
"""Returns a mock _run_task that records (task_id, task_type) and sets done when expected count reached."""
def _run(db_path, task_id, task_type, job_id, params):
log.append((task_id, task_type))
if len(log) >= expected:
done_event.set()
return _run
def _start_scheduler(tmp_db, run_task_fn, available_vram=999.0):
s = TaskScheduler(tmp_db, run_task_fn)
s._available_vram = available_vram
s.start()
return s
# ── Tests ─────────────────────────────────────────────────────────────────────
def test_deepest_queue_wins_first_slot(tmp_db):
"""Type with more queued tasks starts first when VRAM only fits one type."""
log, done = [], threading.Event()
# Build scheduler but DO NOT start it yet — enqueue all tasks first
# so the scheduler sees the full picture on its very first wake.
run_task_fn = _make_recording_run_task(log, done, 4)
s = TaskScheduler(tmp_db, run_task_fn)
s._available_vram = 3.0 # fits cover_letter (2.5) but not +company_research (5.0)
# Enqueue cover_letter (3 tasks) and company_research (1 task) before start.
# cover_letter has the deeper queue and must win the first batch slot.
for i in range(3):
s.enqueue(i + 1, "cover_letter", i + 1, None)
s.enqueue(4, "company_research", 4, None)
s.start() # scheduler now sees all tasks atomically on its first iteration
assert done.wait(timeout=5.0), "timed out — not all 4 tasks completed"
s.shutdown()
assert len(log) == 4
cl = [i for i, (_, t) in enumerate(log) if t == "cover_letter"]
cr = [i for i, (_, t) in enumerate(log) if t == "company_research"]
assert len(cl) == 3 and len(cr) == 1
assert max(cl) < min(cr), "All cover_letter tasks must finish before company_research starts"
def test_fifo_within_type(tmp_db):
"""Tasks of the same type execute in arrival (FIFO) order."""
log, done = [], threading.Event()
s = _start_scheduler(tmp_db, _make_recording_run_task(log, done, 3))
for task_id in [10, 20, 30]:
s.enqueue(task_id, "cover_letter", task_id, None)
assert done.wait(timeout=5.0), "timed out — not all 3 tasks completed"
s.shutdown()
assert [task_id for task_id, _ in log] == [10, 20, 30]
def test_concurrent_batches_when_vram_allows(tmp_db):
"""Two type batches start simultaneously when VRAM fits both."""
started = {"cover_letter": threading.Event(), "company_research": threading.Event()}
all_done = threading.Event()
log = []
def run_task(db_path, task_id, task_type, job_id, params):
started[task_type].set()
log.append(task_type)
if len(log) >= 2:
all_done.set()
# VRAM=10.0 fits both cover_letter (2.5) and company_research (5.0) simultaneously
s = _start_scheduler(tmp_db, run_task, available_vram=10.0)
s.enqueue(1, "cover_letter", 1, None)
s.enqueue(2, "company_research", 2, None)
all_done.wait(timeout=5.0)
s.shutdown()
# Both types should have started (possibly overlapping)
assert started["cover_letter"].is_set()
assert started["company_research"].is_set()
def test_new_tasks_picked_up_mid_batch(tmp_db):
"""A task enqueued while a batch is running is consumed in the same batch."""
log, done = [], threading.Event()
task1_started = threading.Event() # fires when task 1 begins executing
task2_ready = threading.Event() # fires when task 2 has been enqueued
def run_task(db_path, task_id, task_type, job_id, params):
if task_id == 1:
task1_started.set() # signal: task 1 is now running
task2_ready.wait(timeout=2.0) # wait for task 2 to be in the deque
log.append(task_id)
if len(log) >= 2:
done.set()
s = _start_scheduler(tmp_db, run_task)
s.enqueue(1, "cover_letter", 1, None)
task1_started.wait(timeout=2.0) # wait until task 1 is actually executing
s.enqueue(2, "cover_letter", 2, None)
task2_ready.set() # unblock task 1 so it finishes
assert done.wait(timeout=5.0), "timed out — task 2 never picked up mid-batch"
s.shutdown()
assert log == [1, 2]
def test_worker_crash_releases_vram(tmp_db):
"""If _run_task raises, _reserved_vram returns to 0 and scheduler continues."""
log, done = [], threading.Event()
def run_task(db_path, task_id, task_type, job_id, params):
if task_id == 1:
raise RuntimeError("simulated failure")
log.append(task_id)
done.set()
s = _start_scheduler(tmp_db, run_task, available_vram=3.0)
s.enqueue(1, "cover_letter", 1, None)
s.enqueue(2, "cover_letter", 2, None)
assert done.wait(timeout=5.0), "timed out — task 2 never completed after task 1 crash"
s.shutdown()
# Second task still ran, VRAM was released
assert 2 in log
assert s._reserved_vram == 0.0
```
- [ ] **Step 2: Run to confirm failures**
```bash
/devl/miniconda3/envs/job-seeker/bin/pytest tests/test_task_scheduler.py -v -k "batch or fifo or concurrent or mid_batch or crash"
```
Expected: failures — `start`, `shutdown` not defined
- [ ] **Step 3: Implement `start()`, `shutdown()`, `_scheduler_loop()`, `_batch_worker()`**
Add these methods to `TaskScheduler`:
```python
def start(self) -> None:
"""Start the background scheduler loop thread. Call once after construction."""
self._thread = threading.Thread(
target=self._scheduler_loop, name="task-scheduler", daemon=True
)
self._thread.start()
def shutdown(self, timeout: float = 5.0) -> None:
"""Signal the scheduler to stop and wait for it to exit."""
self._stop.set()
self._wake.set() # unblock any wait()
if self._thread and self._thread.is_alive():
self._thread.join(timeout=timeout)
def _scheduler_loop(self) -> None:
"""Main scheduler daemon — wakes on enqueue or batch completion."""
while not self._stop.is_set():
self._wake.wait(timeout=30)
self._wake.clear()
with self._lock:
# Defense in depth: reap externally-killed batch threads.
# In normal operation _active.pop() runs in finally before _wake fires,
# so this reap finds nothing — no double-decrement risk.
for t, thread in list(self._active.items()):
if not thread.is_alive():
self._reserved_vram -= self._budgets.get(t, 0.0)
del self._active[t]
# Start new type batches while VRAM allows
candidates = sorted(
[t for t in self._queues if self._queues[t] and t not in self._active],
key=lambda t: len(self._queues[t]),
reverse=True,
)
for task_type in candidates:
budget = self._budgets.get(task_type, 0.0)
if self._reserved_vram + budget <= self._available_vram:
thread = threading.Thread(
target=self._batch_worker,
args=(task_type,),
name=f"batch-{task_type}",
daemon=True,
)
self._active[task_type] = thread
self._reserved_vram += budget
thread.start()
def _batch_worker(self, task_type: str) -> None:
"""Serial consumer for one task type. Runs until the type's deque is empty."""
try:
while True:
with self._lock:
q = self._queues.get(task_type)
if not q:
break
task = q.popleft()
# _run_task is scripts.task_runner._run_task (passed at construction)
self._run_task(
self._db_path, task.id, task_type, task.job_id, task.params
)
finally:
# Always release — even if _run_task raises.
# _active.pop here prevents the scheduler loop reap from double-decrementing.
with self._lock:
self._active.pop(task_type, None)
self._reserved_vram -= self._budgets.get(task_type, 0.0)
self._wake.set()
```
- [ ] **Step 4: Run tests to confirm they pass**
```bash
/devl/miniconda3/envs/job-seeker/bin/pytest tests/test_task_scheduler.py -v -k "batch or fifo or concurrent or mid_batch or crash"
```
Expected: 5 passed
- [ ] **Step 5: Run all scheduler tests so far**
```bash
/devl/miniconda3/envs/job-seeker/bin/pytest tests/test_task_scheduler.py -v
```
Expected: all passing (no regressions)
- [ ] **Step 6: Commit**
```bash
git add scripts/task_scheduler.py tests/test_task_scheduler.py
git commit -m "feat(scheduler): implement scheduler loop and batch worker with VRAM-aware scheduling"
```
---
## Chunk 3: Integration
Tasks 711. Singleton, durability, routing shim, app.py startup change, and full suite verification.
---
### Task 7: Singleton — `get_scheduler()` and `reset_scheduler()`
**Files:**
- Modify: `scripts/task_scheduler.py` (implement the two functions)
- Modify: `tests/test_task_scheduler.py` (add tests)
- [ ] **Step 1: Add failing tests**
Append to `tests/test_task_scheduler.py`:
```python
def test_get_scheduler_returns_singleton(tmp_db):
"""Multiple calls to get_scheduler() return the same instance."""
s1 = get_scheduler(tmp_db, _noop_run_task)
s2 = get_scheduler(tmp_db, _noop_run_task)
assert s1 is s2
def test_singleton_thread_safe(tmp_db):
"""Concurrent get_scheduler() calls produce exactly one instance."""
instances = []
errors = []
def _get():
try:
instances.append(get_scheduler(tmp_db, _noop_run_task))
except Exception as e:
errors.append(e)
threads = [threading.Thread(target=_get) for _ in range(20)]
for t in threads:
t.start()
for t in threads:
t.join()
assert not errors
assert len(set(id(s) for s in instances)) == 1 # all the same object
def test_reset_scheduler_cleans_up(tmp_db):
"""reset_scheduler() shuts down the scheduler; no threads linger."""
s = get_scheduler(tmp_db, _noop_run_task)
thread = s._thread
assert thread.is_alive()
reset_scheduler()
thread.join(timeout=2.0)
assert not thread.is_alive()
# After reset, get_scheduler creates a fresh instance
s2 = get_scheduler(tmp_db, _noop_run_task)
assert s2 is not s
```
- [ ] **Step 2: Run to confirm failures**
```bash
/devl/miniconda3/envs/job-seeker/bin/pytest tests/test_task_scheduler.py -v -k "singleton or reset"
```
Expected: failures — `get_scheduler` / `reset_scheduler` raise `NotImplementedError`
- [ ] **Step 3: Implement `get_scheduler()` and `reset_scheduler()`**
Replace the `raise NotImplementedError` stubs at the bottom of `scripts/task_scheduler.py`:
```python
def get_scheduler(db_path: Path, run_task_fn: Callable = None) -> TaskScheduler:
"""Return the process-level TaskScheduler singleton, constructing it if needed.
run_task_fn is required on the first call; ignored on subsequent calls.
Safety: inner lock + double-check prevents double-construction under races.
The outer None check is a fast-path performance optimisation only.
"""
global _scheduler
if _scheduler is None: # fast path — avoids lock on steady state
with _scheduler_lock:
if _scheduler is None: # re-check under lock (double-checked locking)
if run_task_fn is None:
raise ValueError("run_task_fn required on first get_scheduler() call")
_scheduler = TaskScheduler(db_path, run_task_fn)
_scheduler.start()
return _scheduler
def reset_scheduler() -> None:
"""Shut down and clear the singleton. TEST TEARDOWN ONLY."""
global _scheduler
with _scheduler_lock:
if _scheduler is not None:
_scheduler.shutdown()
_scheduler = None
```
- [ ] **Step 4: Run tests to confirm they pass**
```bash
/devl/miniconda3/envs/job-seeker/bin/pytest tests/test_task_scheduler.py -v -k "singleton or reset"
```
Expected: 3 passed
- [ ] **Step 5: Commit**
```bash
git add scripts/task_scheduler.py tests/test_task_scheduler.py
git commit -m "feat(scheduler): implement thread-safe singleton get_scheduler/reset_scheduler"
```
---
### Task 8: Durability — re-queue surviving `queued` rows on startup
On construction, the scheduler loads pre-existing `queued` LLM tasks from SQLite into deques, so they execute after restart without user re-submission.
**Files:**
- Modify: `scripts/task_scheduler.py` (add durability query to `__init__`)
- Modify: `tests/test_task_scheduler.py` (add tests)
- [ ] **Step 1: Add failing tests**
Append to `tests/test_task_scheduler.py`:
```python
def test_durability_loads_queued_llm_tasks_on_startup(tmp_db):
"""Scheduler loads pre-existing queued LLM tasks into deques at construction."""
from scripts.db import insert_task
# Pre-insert queued rows simulating a prior run
id1, _ = insert_task(tmp_db, "cover_letter", 1)
id2, _ = insert_task(tmp_db, "company_research", 2)
s = TaskScheduler(tmp_db, _noop_run_task)
assert len(s._queues.get("cover_letter", [])) == 1
assert s._queues["cover_letter"][0].id == id1
assert len(s._queues.get("company_research", [])) == 1
assert s._queues["company_research"][0].id == id2
def test_durability_excludes_non_llm_queued_tasks(tmp_db):
"""Non-LLM queued tasks are not loaded into the scheduler deques."""
from scripts.db import insert_task
insert_task(tmp_db, "discovery", 0)
insert_task(tmp_db, "email_sync", 0)
s = TaskScheduler(tmp_db, _noop_run_task)
assert "discovery" not in s._queues or len(s._queues["discovery"]) == 0
assert "email_sync" not in s._queues or len(s._queues["email_sync"]) == 0
def test_durability_preserves_fifo_order(tmp_db):
"""Queued tasks are loaded in created_at (FIFO) order."""
conn = sqlite3.connect(tmp_db)
# Insert with explicit timestamps to control order
conn.execute(
"INSERT INTO background_tasks (task_type, job_id, params, status, created_at)"
" VALUES (?,?,?,?,?)", ("cover_letter", 1, None, "queued", "2026-01-01 10:00:00")
)
conn.execute(
"INSERT INTO background_tasks (task_type, job_id, params, status, created_at)"
" VALUES (?,?,?,?,?)", ("cover_letter", 2, None, "queued", "2026-01-01 09:00:00")
)
conn.commit()
ids = [r[0] for r in conn.execute(
"SELECT id FROM background_tasks ORDER BY created_at ASC"
).fetchall()]
conn.close()
s = TaskScheduler(tmp_db, _noop_run_task)
loaded_ids = [t.id for t in s._queues["cover_letter"]]
assert loaded_ids == ids
```
- [ ] **Step 2: Run to confirm failures**
```bash
/devl/miniconda3/envs/job-seeker/bin/pytest tests/test_task_scheduler.py -v -k "durability"
```
Expected: failures — deques empty on construction (durability not implemented yet)
- [ ] **Step 3: Add durability query to `__init__`**
Append to the end of `TaskScheduler.__init__()` (after VRAM detection):
```python
# Durability: reload surviving 'queued' LLM tasks from prior run
self._load_queued_tasks()
```
Add the private method to `TaskScheduler`:
```python
def _load_queued_tasks(self) -> None:
"""Load pre-existing queued LLM tasks from SQLite into deques (called once in __init__)."""
llm_types = sorted(LLM_TASK_TYPES) # sorted for deterministic SQL params in logs
placeholders = ",".join("?" * len(llm_types))
conn = sqlite3.connect(self._db_path)
rows = conn.execute(
f"SELECT id, task_type, job_id, params FROM background_tasks"
f" WHERE status='queued' AND task_type IN ({placeholders})"
f" ORDER BY created_at ASC",
llm_types,
).fetchall()
conn.close()
for row_id, task_type, job_id, params in rows:
q = self._queues.setdefault(task_type, deque())
q.append(TaskSpec(row_id, job_id, params))
if rows:
logger.info("Scheduler: resumed %d queued task(s) from prior run", len(rows))
```
- [ ] **Step 4: Run tests to confirm they pass**
```bash
/devl/miniconda3/envs/job-seeker/bin/pytest tests/test_task_scheduler.py -v -k "durability"
```
Expected: 3 passed
- [ ] **Step 5: Commit**
```bash
git add scripts/task_scheduler.py tests/test_task_scheduler.py
git commit -m "feat(scheduler): add durability — re-queue surviving LLM tasks on startup"
```
---
### Task 9: `submit_task()` routing shim in `task_runner.py`
Replaces the old spawn-per-task model with scheduler routing for LLM tasks while leaving non-LLM tasks unchanged.
**Files:**
- Modify: `scripts/task_runner.py` (`submit_task` function)
- Modify: `tests/test_task_scheduler.py` (add integration test)
- [ ] **Step 1: Add failing test**
Append to `tests/test_task_scheduler.py`:
```python
def test_non_llm_tasks_bypass_scheduler(tmp_db):
"""submit_task() for non-LLM types invoke _run_task directly, not enqueue()."""
from scripts import task_runner
# Initialize the singleton properly so submit_task routes correctly
s = get_scheduler(tmp_db, _noop_run_task)
run_task_calls = []
enqueue_calls = []
original_run_task = task_runner._run_task
original_enqueue = s.enqueue
def recording_run_task(*args, **kwargs):
run_task_calls.append(args[2]) # task_type is 3rd arg
def recording_enqueue(task_id, task_type, job_id, params):
enqueue_calls.append(task_type)
import unittest.mock as mock
with mock.patch.object(task_runner, "_run_task", recording_run_task), \
mock.patch.object(s, "enqueue", recording_enqueue):
task_runner.submit_task(tmp_db, "discovery", 0)
# discovery goes directly to _run_task; enqueue is never called
assert "discovery" not in enqueue_calls
# The scheduler deque is untouched
assert "discovery" not in s._queues or len(s._queues["discovery"]) == 0
def test_llm_tasks_routed_to_scheduler(tmp_db):
"""submit_task() for LLM types calls enqueue(), not _run_task directly."""
from scripts import task_runner
s = get_scheduler(tmp_db, _noop_run_task)
enqueue_calls = []
original_enqueue = s.enqueue
import unittest.mock as mock
with mock.patch.object(s, "enqueue", side_effect=lambda *a, **kw: enqueue_calls.append(a[1]) or original_enqueue(*a, **kw)):
task_runner.submit_task(tmp_db, "cover_letter", 1)
assert "cover_letter" in enqueue_calls
```
- [ ] **Step 2: Run to confirm failures**
```bash
/devl/miniconda3/envs/job-seeker/bin/pytest tests/test_task_scheduler.py -v -k "bypass or routed"
```
Expected: failures — `submit_task` still spawns threads for all types
- [ ] **Step 3: Update `submit_task()` in `scripts/task_runner.py`**
Replace the existing `submit_task` function:
```python
def submit_task(db_path: Path = DEFAULT_DB, task_type: str = "",
job_id: int = None,
params: str | None = None) -> tuple[int, bool]:
"""Submit a background task.
LLM task types (cover_letter, company_research, wizard_generate) are routed
through the TaskScheduler for VRAM-aware batch scheduling.
All other types spawn a free daemon thread as before.
Returns (task_id, True) if a new task was queued.
Returns (existing_id, False) if an identical task is already in-flight.
"""
task_id, is_new = insert_task(db_path, task_type, job_id or 0, params=params)
if is_new:
from scripts.task_scheduler import get_scheduler, LLM_TASK_TYPES
if task_type in LLM_TASK_TYPES:
get_scheduler(db_path, run_task_fn=_run_task).enqueue(
task_id, task_type, job_id or 0, params
)
else:
t = threading.Thread(
target=_run_task,
args=(db_path, task_id, task_type, job_id or 0, params),
daemon=True,
)
t.start()
return task_id, is_new
```
- [ ] **Step 4: Run tests to confirm they pass**
```bash
/devl/miniconda3/envs/job-seeker/bin/pytest tests/test_task_scheduler.py -v -k "bypass or routed"
```
Expected: 2 passed
- [ ] **Step 5: Commit**
```bash
git add scripts/task_runner.py tests/test_task_scheduler.py
git commit -m "feat(task_runner): route LLM tasks through scheduler in submit_task()"
```
---
### Task 10: `app.py` startup — replace inline SQL with `reset_running_tasks()`
Enables durability by leaving `queued` rows intact on restart.
**Files:**
- Modify: `app/app.py` (`_startup` function)
- [ ] **Step 1: Locate the exact lines to change in `app/app.py`**
The block to replace is inside `_startup()`. It looks like:
```python
conn.execute(
"UPDATE background_tasks SET status='failed', error='Interrupted by server restart',"
" finished_at=datetime('now') WHERE status IN ('queued','running')"
)
conn.commit()
```
- [ ] **Step 2: Replace the inline SQL block**
In `app/app.py`, find `_startup()`. At the start of the function body, **before** the existing `conn = sqlite3.connect(get_db_path())` block, add:
```python
# Reset only in-flight tasks — queued tasks survive for the scheduler to resume.
# MUST run before any submit_task() call in this function.
from scripts.db import reset_running_tasks
reset_running_tasks(get_db_path())
```
Then delete the inline SQL block and its `conn.commit()` call. Leave the `conn = sqlite3.connect(...)` that follows (used by the SearXNG re-queue logic) untouched.
The result should look like:
```python
@st.cache_resource
def _startup() -> None:
"""Runs exactly once per server lifetime (st.cache_resource).
1. Marks zombie tasks as failed.
2. Auto-queues re-runs for any research generated without SearXNG data,
if SearXNG is now reachable.
"""
# Reset only in-flight tasks — queued tasks survive for the scheduler to resume.
# MUST run before any submit_task() call in this function.
from scripts.db import reset_running_tasks
reset_running_tasks(get_db_path())
conn = sqlite3.connect(get_db_path())
# ... remainder of function unchanged ...
```
- [ ] **Step 3: Verify the app module has valid syntax**
```bash
conda run -n job-seeker python -m py_compile app/app.py && echo "syntax ok"
```
Expected: `syntax ok` (avoids executing Streamlit module-level code which would fail outside a server context)
- [ ] **Step 4: Commit**
```bash
git add app/app.py
git commit -m "feat(app): use reset_running_tasks() on startup to preserve queued tasks"
```
---
### Task 11: Full suite verification
Run the complete test suite against the baseline (pre-existing failure already documented in issue #12).
**Files:** none — verification only
- [ ] **Step 1: Run the full test suite excluding the known pre-existing failure**
```bash
/devl/miniconda3/envs/job-seeker/bin/pytest tests/ -v -k "not test_generate_calls_llm_router" 2>&1 | tail -10
```
Expected: `N passed` with zero failures. Any failure here is a regression introduced by this feature.
- [ ] **Step 1b: Confirm the pre-existing failure still exists (and only that one)**
```bash
/devl/miniconda3/envs/job-seeker/bin/pytest tests/ -v 2>&1 | grep -E "FAILED|passed|failed" | tail -5
```
Expected: exactly `1 failed` (the pre-existing `test_generate_calls_llm_router`, tracked in issue #12)
- [ ] **Step 2: Verify no regressions in task runner tests**
```bash
/devl/miniconda3/envs/job-seeker/bin/pytest tests/ -v -k "task_runner or task_scheduler" 2>&1 | tail -20
```
Expected: all passing
- [ ] **Step 3: Final commit — update branch with feature complete marker**
```bash
git commit --allow-empty -m "feat: LLM queue optimizer complete — closes #2
Resource-aware batch scheduler for LLM tasks:
- scripts/task_scheduler.py (new): TaskScheduler singleton with VRAM-aware
batch scheduling, durability, thread-safe singleton, memory safety
- scripts/task_runner.py: submit_task() routes LLM types through scheduler
- scripts/db.py: reset_running_tasks() for durable restart behavior
- app/app.py: _startup() preserves queued tasks on restart
- config/llm.yaml.example: scheduler VRAM budget config documented
- tests/test_task_scheduler.py (new): 13 tests covering all behaviors
Pre-existing failure: test_generate_calls_llm_router (issue #12, unrelated)"
```