survey/analyze blocks workers — needs async task queue #107

New issue

Closed

opened 2026-04-20 06:36:12 -07:00 by pyr0ball · 1 comment

pyr0ball commented

2026-04-20 06:36:12 -07:00

Owner

Problem

POST /api/jobs/:id/survey/analyze is the only LLM endpoint in Peregrine without an async task queue. It blocks its FastAPI worker thread for the full LLM inference duration.

Discovered during load testing: at 100 concurrent users with survey tasks at weight 2, ~22 greenlets simultaneously blocked on LLM inference. This exhausted all LLM backends and starved every async task (cover letter, research, ATS optimizer) into 90-second poll timeouts.

Evidence (load test spike 2026-04-20, 100 users, 10 min)

survey/analyze [text]    p50=120,000ms  (2 min median, fully blocking)
survey/analyze [visual]  23 x 500 All LLM backends exhausted
cover_letter/task [poll] 58 x 90s timeout (last=pending)
research/task [poll]     18 x 90s timeout (last=pending)

Fix

Mirror the pattern used by cover_letter, research, and resume_optimizer:

POST queues a survey_analyze background task, returns task_id
GET /survey/analyze/task polls: queued -> running -> done/failed
Result stored in survey_responses; frontend polls until done

Impact

Free tier: text-mode quick — fast but still blocks under concurrency. Paid tier: detailed + visual (vision model) — definitely needs queuing. Does not affect core job pipeline but degrades all LLM endpoints under real concurrency.

## Problem `POST /api/jobs/:id/survey/analyze` is the only LLM endpoint in Peregrine without an async task queue. It blocks its FastAPI worker thread for the full LLM inference duration. Discovered during load testing: at 100 concurrent users with survey tasks at weight 2, ~22 greenlets simultaneously blocked on LLM inference. This exhausted all LLM backends and starved every async task (cover letter, research, ATS optimizer) into 90-second poll timeouts. ## Evidence (load test spike 2026-04-20, 100 users, 10 min) ``` survey/analyze [text] p50=120,000ms (2 min median, fully blocking) survey/analyze [visual] 23 x 500 All LLM backends exhausted cover_letter/task [poll] 58 x 90s timeout (last=pending) research/task [poll] 18 x 90s timeout (last=pending) ``` ## Fix Mirror the pattern used by cover_letter, research, and resume_optimizer: 1. POST queues a survey_analyze background task, returns task_id 2. GET /survey/analyze/task polls: queued -> running -> done/failed 3. Result stored in survey_responses; frontend polls until done ## Impact Free tier: text-mode quick — fast but still blocks under concurrency. Paid tier: detailed + visual (vision model) — definitely needs queuing. Does not affect core job pipeline but degrades all LLM endpoints under real concurrency.

pyr0ball commented

2026-04-20 11:06:40 -07:00

Author

Owner

Fixed in commit 9101e71.

Moved POST /survey/analyze off the FastAPI worker thread by routing it through the LLM task queue (same pattern as cover_letter, company_research, resume_optimize):

scripts/survey_assistant.py — extracted prompt builders + run_survey_analyze()
scripts/task_scheduler.py — added survey_analyze to LLM_TASK_TYPES, 2.5 GB VRAM budget
scripts/task_runner.py — added elif branch, result stored as JSON
dev-api.py — sync endpoint → submit_task() + new poll route GET /survey/analyze/task
web/src/stores/survey.ts — analyze() now submits task and polls at 3s interval

Duplicate in-flight tasks attach silently (same task_id returned).

Fixed in commit 9101e71. Moved `POST /survey/analyze` off the FastAPI worker thread by routing it through the LLM task queue (same pattern as `cover_letter`, `company_research`, `resume_optimize`): - `scripts/survey_assistant.py` — extracted prompt builders + `run_survey_analyze()` - `scripts/task_scheduler.py` — added `survey_analyze` to `LLM_TASK_TYPES`, 2.5 GB VRAM budget - `scripts/task_runner.py` — added `elif` branch, result stored as JSON - `dev-api.py` — sync endpoint → `submit_task()` + new poll route `GET /survey/analyze/task` - `web/src/stores/survey.ts` — `analyze()` now submits task and polls at 3s interval Duplicate in-flight tasks attach silently (same task_id returned).