[New Feature] Queue optimizer: batch LLM tasks by type on single-GPU/CPU systems #2

New issue

Closed

opened 2026-02-27 00:02:36 -08:00 by pyr0ball · 0 comments

pyr0ball commented

2026-02-27 00:02:36 -08:00

Owner

Summary

On single-GPU and CPU-only systems, the background task queue processes jobs in arrival order regardless of task type. This forces the LLM to context-switch between the cover letter model and the research model repeatedly, wasting time on model loads/swaps.

Add a queue optimizer that batches tasks by type before dispatching — run all pending cover letter tasks first, then all pending research tasks (or vice versa), rather than round-robin by arrival.

Motivation

Model loading is the dominant latency on single-GPU systems:

meghan-cover-writer:latest (~2GB GGUF) — loaded for cover letter gen
Research model (llama3.1:8b or vllm model) — different weights

If a user approves 10 jobs at once, the current queue alternates: cover letter → research → cover letter → research → ... loading the model 20 times. The optimizer loads it once per batch.

Design

Batch ordering strategy

After any task is enqueued, the optimizer reorders the background_tasks queue (status=queued) by:

Group by task_type
Order groups by: whichever type has the most pending tasks first (greedy batching)
Within each group, preserve arrival order (FIFO)

Trigger

On any submit_task() call, reorder queued rows in SQLite
task_runner.py already pulls ORDER BY created_at ASC — change to ORDER BY batch_priority ASC, created_at ASC
Add batch_priority ephemeral column (or reorder by updating a sort_order column)

Fallback

If only one task type is in the queue, no reordering (existing behavior preserved)
On dual-GPU systems, optimizer is disabled (both task types run truly parallel)

Tier

Paid tier only. Free tier keeps arrival-order queue (simpler, no optimization overhead).

Files

scripts/task_runner.py — add _reorder_queue() called from submit_task()
app/db.py — possibly add sort_order column to background_tasks
tests/test_task_runner.py — new test file

Acceptance Criteria

Queuing N cover letter tasks + M research tasks processes all cover letters before any research task (on single-GPU)
Arrival order within each type is preserved
Dual-GPU systems bypass the optimizer
Free tier uses unmodified arrival-order queue
Existing background task tests still pass

## Summary On single-GPU and CPU-only systems, the background task queue processes jobs in arrival order regardless of task type. This forces the LLM to context-switch between the cover letter model and the research model repeatedly, wasting time on model loads/swaps. Add a queue optimizer that batches tasks by type before dispatching — run all pending cover letter tasks first, then all pending research tasks (or vice versa), rather than round-robin by arrival. ## Motivation Model loading is the dominant latency on single-GPU systems: - `meghan-cover-writer:latest` (~2GB GGUF) — loaded for cover letter gen - Research model (`llama3.1:8b` or vllm model) — different weights If a user approves 10 jobs at once, the current queue alternates: cover letter → research → cover letter → research → ... loading the model 20 times. The optimizer loads it once per batch. ## Design ### Batch ordering strategy After any task is enqueued, the optimizer reorders the `background_tasks` queue (status=`queued`) by: 1. Group by `task_type` 2. Order groups by: whichever type has the most pending tasks first (greedy batching) 3. Within each group, preserve arrival order (FIFO) ### Trigger - On any `submit_task()` call, reorder queued rows in SQLite - `task_runner.py` already pulls `ORDER BY created_at ASC` — change to `ORDER BY batch_priority ASC, created_at ASC` - Add `batch_priority` ephemeral column (or reorder by updating a `sort_order` column) ### Fallback - If only one task type is in the queue, no reordering (existing behavior preserved) - On dual-GPU systems, optimizer is disabled (both task types run truly parallel) ## Tier **Paid tier only.** Free tier keeps arrival-order queue (simpler, no optimization overhead). ## Files - `scripts/task_runner.py` — add `_reorder_queue()` called from `submit_task()` - `app/db.py` — possibly add `sort_order` column to `background_tasks` - `tests/test_task_runner.py` — new test file ## Acceptance Criteria - [ ] Queuing N cover letter tasks + M research tasks processes all cover letters before any research task (on single-GPU) - [ ] Arrival order within each type is preserved - [ ] Dual-GPU systems bypass the optimizer - [ ] Free tier uses unmodified arrival-order queue - [ ] Existing background task tests still pass