Imitation pipeline: resume career summary generation #30
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context: Peregrine's resume parser uses regex as its primary extraction path but falls back to an LLM to generate a career summary when no summary section is detected in the document. This is a small, bounded generation task well-suited to a dedicated fine-tuned 1B-3B model.
What Peregrine uses this for:
After
parse_resume()runs regex-based section extraction,structure_resume()checks whether acareer_summaryfield was populated. If not (common for resumes that lack an explicit Summary or Objective section), it calls_llm_career_summary()which sends the first 1500 characters of the raw resume text to the LLM and requests a 2-3 sentence professional career summary.Input/output schema:
"Write a 2-3 sentence professional career summary for this candidate based on their resume. Return only the summary text, no labels.\n\nResume:\n{raw_text[:1500]}"Current model/fallback chain:
Default
LLMRouter()— no task-specific override; usesfallback_orderfromconfig/llm.yaml(typicallyclaude_code → ollama → vllm → copilot → anthropic).Recommended model domain:
Extraction + summarization, 1B-3B. The task is highly constrained: input is structured resume text, output is a fixed-format 2-3 sentence summary. This is the simplest LLM task in the Peregrine pipeline and the lowest bar for a fine-tuned replacement — a 1B model with 500-1000 examples should match or exceed the base model quality.
Can Avocet produce training data for it?
Yes — straightforwardly. Any resume text + human-written or reviewed career summary is a valid training pair. Peregrine users who have an existing summary section provide implicit ground truth (resume text in → summary section out). Labeling effort is low: present the generated summary alongside the resume excerpt and ask for a thumbs up/down with optional edit.
Suggested data collection approach:
parse_resume()successfully extracts a summary section, that (resume text, summary) pair is a free training example requiring no human reviewRelated: Peregrine
scripts/resume_parser.py(_llm_career_summary,structure_resume)