peregrine/docs/plans/2026-02-22-research-workflow-design.md
pyr0ball c368c7a977 chore: seed Peregrine from personal job-seeker (pre-generalization)
App: Peregrine
Company: Circuit Forge LLC
Source: github.com/pyr0ball/job-seeker (personal fork, not linked)
2026-02-24 18:25:39 -08:00

5.8 KiB
Raw Blame History

Research Workflow Redesign

Date: 2026-02-22 Status: Approved

Problem

The current company_research.py produces shallow output:

  • Resume context is a hardcoded 2-sentence blurb — talking points aren't grounded in Meghan's actual experience
  • Search coverage is limited: CEO, HQ, LinkedIn, one generic news query
  • Output has 4 sections; new data categories (tech stack, funding, culture, competitors) have nowhere to go
  • No skills/keyword config to drive experience matching against the JD

Approach: Query Expansion + Parallel JSON Searches + Single LLM Pass

Run all searches (companyScraper sequential + new parallel SearXNG JSON queries), aggregate into a structured context block, pre-select resume experiences by keyword score, single LLM call produces all expanded sections.


Design

1. Search Pipeline

Phase 1 — companyScraper (unchanged, sequential)

  • CEO name, HQ address, LinkedIn URL

Phase 1b — Parallel SearXNG JSON queries (new/expanded)

Six queries run concurrently via daemon threads:

Intent Query pattern
Recent news/press "{company}" news 2025 2026
Funding & investors "{company}" funding round investors Series valuation
Tech stack "{company}" tech stack engineering technology platform
Competitors "{company}" competitors alternatives vs market
Culture / Glassdoor "{company}" glassdoor culture reviews employees
CEO press (if found) "{ceo}" "{company}"

Each returns 34 deduplicated snippets (title + content + URL), labeled by type. Results are best-effort — any failed query is silently skipped.


2. Resume Matching

config/resume_keywords.yaml — three categories, tag-managed via Settings UI:

skills:
  - Customer Success
  - Technical Account Management
  - Revenue Operations
  - Salesforce
  - Gainsight
  - data analysis
  - stakeholder management

domains:
  - B2B SaaS
  - enterprise software
  - security / compliance
  - post-sale lifecycle

keywords:
  - QBR
  - churn reduction
  - NRR / ARR
  - onboarding
  - renewal
  - executive sponsorship
  - VOC

Matching logic:

  1. Case-insensitive substring check of all keywords against JD text → matched_keywords list
  2. Score each experience entry: count of matched keywords appearing in position title + responsibility bullets
  3. Top 2 by score → included in prompt as full detail (position, company, period, all bullets)
  4. Remaining entries → condensed one-liners ("Founder @ M3 Consulting, 2023present")

UpGuard NDA rule (explicit in prompt): reference as "enterprise security vendor" in general; only name UpGuard directly if the role has a strong security/compliance focus.


3. LLM Context Block Structure

## Role Context
{title} at {company}

## Job Description
{JD text, up to 2500 chars}

## Meghan's Matched Experience
[Top 2 scored experience entries — full detail]

Also in Meghan's background: [remaining entries as one-liners]

## Matched Skills & Keywords
Skills matching this JD: {matched_keywords joined}

## Live Company Data
- CEO: {name}
- HQ: {location}
- LinkedIn: {url}

## News & Press
[snippets]

## Funding & Investors
[snippets]

## Tech Stack
[snippets]

## Competitors
[snippets]

## Culture & Employee Signals
[snippets]

4. Output Sections (7, up from 4)

Section header Purpose
## Company Overview What they do, business model, size/stage, market position
## Leadership & Culture CEO background, leadership team, philosophy
## Tech Stack & Product What they build, relevant technology, product direction
## Funding & Market Position Stage, investors, recent rounds, competitor landscape
## Recent Developments News, launches, pivots, exec moves
## Red Flags & Watch-outs Culture issues, layoffs, exec departures, financial stress
## Talking Points for Meghan 5 role-matched, resume-grounded, UpGuard-aware talking points ready to speak aloud

Talking points prompt instructs LLM to: cite the specific matched experience by name, reference matched skills, apply UpGuard NDA rule, frame each as a ready-to-speak sentence.


5. DB Schema Changes

Add columns to company_research table:

ALTER TABLE company_research ADD COLUMN tech_brief TEXT;
ALTER TABLE company_research ADD COLUMN funding_brief TEXT;
ALTER TABLE company_research ADD COLUMN competitors_brief TEXT;
ALTER TABLE company_research ADD COLUMN red_flags TEXT;

Existing columns (company_brief, ceo_brief, talking_points, raw_output) unchanged.


6. Settings UI — Skills & Keywords Tab

New tab in app/pages/2_Settings.py:

  • One expander or subheader per category (Skills, Domains, Keywords)
  • Tag chips rendered with st.pills or columns of st.badge-style buttons with ×
  • Inline text input + Add button per category
  • Each add/remove saves immediately to config/resume_keywords.yaml

7. Interview Prep UI Changes

app/pages/6_Interview_Prep.py — render new sections alongside existing ones:

  • Tech Stack & Product (new panel)
  • Funding & Market Position (new panel)
  • Red Flags & Watch-outs (new panel, visually distinct — e.g. orange/amber)
  • Talking Points promoted to top (most useful during a live call)

Files Affected

File Change
scripts/company_research.py Parallel search queries, resume matching, expanded prompt + sections
scripts/db.py Add 4 new columns to company_research; update save_research / get_research
config/resume_keywords.yaml New file
config/resume_keywords.yaml.example New committed template
app/pages/2_Settings.py New Skills & Keywords tab
app/pages/6_Interview_Prep.py Render new sections
tests/test_db.py Tests for new columns
tests/test_company_research.py New test file for matching logic + section parsing