[New Feature] Queue optimizer: batch LLM tasks by type on single-GPU/CPU systems #2

Closed
opened 2026-02-27 00:02:36 -08:00 by pyr0ball · 0 comments
Owner

Summary

On single-GPU and CPU-only systems, the background task queue processes jobs in arrival order regardless of task type. This forces the LLM to context-switch between the cover letter model and the research model repeatedly, wasting time on model loads/swaps.

Add a queue optimizer that batches tasks by type before dispatching — run all pending cover letter tasks first, then all pending research tasks (or vice versa), rather than round-robin by arrival.

Motivation

Model loading is the dominant latency on single-GPU systems:

  • meghan-cover-writer:latest (~2GB GGUF) — loaded for cover letter gen
  • Research model (llama3.1:8b or vllm model) — different weights

If a user approves 10 jobs at once, the current queue alternates: cover letter → research → cover letter → research → ... loading the model 20 times. The optimizer loads it once per batch.

Design

Batch ordering strategy

After any task is enqueued, the optimizer reorders the background_tasks queue (status=queued) by:

  1. Group by task_type
  2. Order groups by: whichever type has the most pending tasks first (greedy batching)
  3. Within each group, preserve arrival order (FIFO)

Trigger

  • On any submit_task() call, reorder queued rows in SQLite
  • task_runner.py already pulls ORDER BY created_at ASC — change to ORDER BY batch_priority ASC, created_at ASC
  • Add batch_priority ephemeral column (or reorder by updating a sort_order column)

Fallback

  • If only one task type is in the queue, no reordering (existing behavior preserved)
  • On dual-GPU systems, optimizer is disabled (both task types run truly parallel)

Tier

Paid tier only. Free tier keeps arrival-order queue (simpler, no optimization overhead).

Files

  • scripts/task_runner.py — add _reorder_queue() called from submit_task()
  • app/db.py — possibly add sort_order column to background_tasks
  • tests/test_task_runner.py — new test file

Acceptance Criteria

  • Queuing N cover letter tasks + M research tasks processes all cover letters before any research task (on single-GPU)
  • Arrival order within each type is preserved
  • Dual-GPU systems bypass the optimizer
  • Free tier uses unmodified arrival-order queue
  • Existing background task tests still pass
## Summary On single-GPU and CPU-only systems, the background task queue processes jobs in arrival order regardless of task type. This forces the LLM to context-switch between the cover letter model and the research model repeatedly, wasting time on model loads/swaps. Add a queue optimizer that batches tasks by type before dispatching — run all pending cover letter tasks first, then all pending research tasks (or vice versa), rather than round-robin by arrival. ## Motivation Model loading is the dominant latency on single-GPU systems: - `meghan-cover-writer:latest` (~2GB GGUF) — loaded for cover letter gen - Research model (`llama3.1:8b` or vllm model) — different weights If a user approves 10 jobs at once, the current queue alternates: cover letter → research → cover letter → research → ... loading the model 20 times. The optimizer loads it once per batch. ## Design ### Batch ordering strategy After any task is enqueued, the optimizer reorders the `background_tasks` queue (status=`queued`) by: 1. Group by `task_type` 2. Order groups by: whichever type has the most pending tasks first (greedy batching) 3. Within each group, preserve arrival order (FIFO) ### Trigger - On any `submit_task()` call, reorder queued rows in SQLite - `task_runner.py` already pulls `ORDER BY created_at ASC` — change to `ORDER BY batch_priority ASC, created_at ASC` - Add `batch_priority` ephemeral column (or reorder by updating a `sort_order` column) ### Fallback - If only one task type is in the queue, no reordering (existing behavior preserved) - On dual-GPU systems, optimizer is disabled (both task types run truly parallel) ## Tier **Paid tier only.** Free tier keeps arrival-order queue (simpler, no optimization overhead). ## Files - `scripts/task_runner.py` — add `_reorder_queue()` called from `submit_task()` - `app/db.py` — possibly add `sort_order` column to `background_tasks` - `tests/test_task_runner.py` — new test file ## Acceptance Criteria - [ ] Queuing N cover letter tasks + M research tasks processes all cover letters before any research task (on single-GPU) - [ ] Arrival order within each type is preserved - [ ] Dual-GPU systems bypass the optimizer - [ ] Free tier uses unmodified arrival-order queue - [ ] Existing background task tests still pass
pyr0ball self-assigned this 2026-03-14 16:37:46 -07:00
pyr0ball added this to the The Menagerie project 2026-03-14 16:37:55 -07:00
pyr0ball added this to the Beta — Private Access milestone 2026-03-14 16:38:12 -07:00
pyr0ball added the
feature-request
label 2026-03-14 16:41:22 -07:00
Sign in to join this conversation.
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Circuit-Forge/peregrine#2
No description provided.