Imitation pipeline: culture-fit survey analysis #31

New issue

Open

opened 2026-04-10 22:39:06 -07:00 by pyr0ball · 0 comments

pyr0ball commented

2026-04-10 22:39:06 -07:00

Owner

Context: Peregrine's Survey Assistant page (7_Survey.py) helps candidates answer employer culture-fit surveys by analyzing question text or a screenshot and recommending answers. This is a two-mode pipeline: vision (screenshot input) and text (pasted survey). Both are candidate for fine-tuned replacement.

What Peregrine uses this for:
The user pastes survey question text or uploads a screenshot. In Quick mode the model outputs a lettered answer recommendation with a one-sentence rationale per question (format: 1. B — reason). In Analysis mode it evaluates each answer option individually before making a recommendation. A system prompt frames the model as an advisor who knows the candidate values collaboration, communication, growth, and impact.

Input/output schema (text path):

System prompt: "You are a job application advisor helping a candidate answer a culture-fit survey. The candidate values collaborative teamwork, clear communication, growth, and impact. Choose answers that present them in the best professional light."
Input: raw survey text (questions + multiple-choice options as pasted)
Output Quick mode: "1. B — brief reason\n2. A — brief reason\n..." — one line per question
Output Analysis mode: per-option evaluation paragraphs + final recommendation
Fallback chain used: research_fallback_order from config/llm.yaml

Input/output schema (vision/screenshot path):

No system prompt; prompt is embedded in the user turn
Input: base64-encoded screenshot image
Output Quick mode: same "1. B — brief reason" format extracted from the screenshot
Output Analysis mode: same multi-paragraph structure
Fallback chain used: vision_fallback_order from config/llm.yaml (vision_service → claude_code → anthropic)

Current model/fallback chain:

Text: research_fallback_order (typically claude_code → vllm → ollama_research → ...)
Vision: vision_fallback_order (vision_service [moondream2] → claude_code → anthropic); non-vision backends skipped automatically when images are present

Recommended model domain:

Text path: instruction-following + classification, 3B-7B. The task is constrained multiple-choice reasoning with a fixed output format. This is a strong fine-tune candidate because the output schema is rigid and the reasoning pattern is repetitive.
Vision path: multimodal OCR + reasoning; cannot be replicated by a text-only model. The moondream2 vision service already handles this; a fine-tuned multimodal model is a longer-term goal.

Can Avocet produce training data for it?
Yes for the text path. Avocet's label tool is well-suited: present the survey text + the model's recommendation and ask the labeler whether each answer choice is correct, with the option to override. The survey_responses table in Peregrine's staging.db already stores raw_input, llm_output, mode, and source per response — existing accepted outputs are silver labels.

Suggested data collection approach:

Silver labels: export survey_responses rows where source=text_paste and the user did not override the output (implicit acceptance signal)
Build an Avocet card that shows survey text + model answer recommendations; labeler marks each question answer as correct/incorrect and optionally selects the better answer
Quick mode and Analysis mode should be treated as separate fine-tune targets (different output schemas)
Vision path: defer fine-tuning; focus on text path first; moondream2 is already a reasonable baseline for screenshot OCR
Preference pairs: where the user did override a recommendation, the (prompt, override) pair is a natural DPO rejected/chosen pair

Related: Peregrine app/pages/7_Survey.py; survey_responses table in staging.db

**Context:** Peregrine's Survey Assistant page (7_Survey.py) helps candidates answer employer culture-fit surveys by analyzing question text or a screenshot and recommending answers. This is a two-mode pipeline: vision (screenshot input) and text (pasted survey). Both are candidate for fine-tuned replacement. **What Peregrine uses this for:** The user pastes survey question text or uploads a screenshot. In Quick mode the model outputs a lettered answer recommendation with a one-sentence rationale per question (format: `1. B — reason`). In Analysis mode it evaluates each answer option individually before making a recommendation. A system prompt frames the model as an advisor who knows the candidate values collaboration, communication, growth, and impact. **Input/output schema (text path):** - System prompt: `"You are a job application advisor helping a candidate answer a culture-fit survey. The candidate values collaborative teamwork, clear communication, growth, and impact. Choose answers that present them in the best professional light."` - Input: raw survey text (questions + multiple-choice options as pasted) - Output Quick mode: `"1. B — brief reason\n2. A — brief reason\n..."` — one line per question - Output Analysis mode: per-option evaluation paragraphs + final recommendation - Fallback chain used: `research_fallback_order` from `config/llm.yaml` **Input/output schema (vision/screenshot path):** - No system prompt; prompt is embedded in the user turn - Input: base64-encoded screenshot image - Output Quick mode: same `"1. B — brief reason"` format extracted from the screenshot - Output Analysis mode: same multi-paragraph structure - Fallback chain used: `vision_fallback_order` from `config/llm.yaml` (`vision_service → claude_code → anthropic`) **Current model/fallback chain:** - Text: `research_fallback_order` (typically `claude_code → vllm → ollama_research → ...`) - Vision: `vision_fallback_order` (`vision_service [moondream2] → claude_code → anthropic`); non-vision backends skipped automatically when images are present **Recommended model domain:** - Text path: instruction-following + classification, 3B-7B. The task is constrained multiple-choice reasoning with a fixed output format. This is a strong fine-tune candidate because the output schema is rigid and the reasoning pattern is repetitive. - Vision path: multimodal OCR + reasoning; cannot be replicated by a text-only model. The moondream2 vision service already handles this; a fine-tuned multimodal model is a longer-term goal. **Can Avocet produce training data for it?** Yes for the text path. Avocet's label tool is well-suited: present the survey text + the model's recommendation and ask the labeler whether each answer choice is correct, with the option to override. The `survey_responses` table in Peregrine's `staging.db` already stores `raw_input`, `llm_output`, `mode`, and `source` per response — existing accepted outputs are silver labels. **Suggested data collection approach:** - Silver labels: export `survey_responses` rows where `source=text_paste` and the user did not override the output (implicit acceptance signal) - Build an Avocet card that shows survey text + model answer recommendations; labeler marks each question answer as correct/incorrect and optionally selects the better answer - Quick mode and Analysis mode should be treated as separate fine-tune targets (different output schemas) - Vision path: defer fine-tuning; focus on text path first; moondream2 is already a reasonable baseline for screenshot OCR - Preference pairs: where the user did override a recommendation, the (prompt, override) pair is a natural DPO rejected/chosen pair **Related:** Peregrine `app/pages/7_Survey.py`; `survey_responses` table in `staging.db`