[New Feature] Queue optimizer: batch LLM tasks by type on single-GPU/CPU systems #2
Labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: Circuit-Forge/peregrine#2
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
On single-GPU and CPU-only systems, the background task queue processes jobs in arrival order regardless of task type. This forces the LLM to context-switch between the cover letter model and the research model repeatedly, wasting time on model loads/swaps.
Add a queue optimizer that batches tasks by type before dispatching — run all pending cover letter tasks first, then all pending research tasks (or vice versa), rather than round-robin by arrival.
Motivation
Model loading is the dominant latency on single-GPU systems:
meghan-cover-writer:latest(~2GB GGUF) — loaded for cover letter genllama3.1:8bor vllm model) — different weightsIf a user approves 10 jobs at once, the current queue alternates: cover letter → research → cover letter → research → ... loading the model 20 times. The optimizer loads it once per batch.
Design
Batch ordering strategy
After any task is enqueued, the optimizer reorders the
background_tasksqueue (status=queued) by:task_typeTrigger
submit_task()call, reorder queued rows in SQLitetask_runner.pyalready pullsORDER BY created_at ASC— change toORDER BY batch_priority ASC, created_at ASCbatch_priorityephemeral column (or reorder by updating asort_ordercolumn)Fallback
Tier
Paid tier only. Free tier keeps arrival-order queue (simpler, no optimization overhead).
Files
scripts/task_runner.py— add_reorder_queue()called fromsubmit_task()app/db.py— possibly addsort_ordercolumn tobackground_taskstests/test_task_runner.py— new test fileAcceptance Criteria