pyr0ball c368c7a977 chore: seed Peregrine from personal job-seeker (pre-generalization)

App: Peregrine
Company: Circuit Forge LLC
Source: github.com/pyr0ball/job-seeker (personal fork, not linked)

2026-02-24 18:25:39 -08:00

5.8 KiB

Raw Blame History

Research Workflow Redesign

Date: 2026-02-22 Status: Approved

Problem

The current company_research.py produces shallow output:

Resume context is a hardcoded 2-sentence blurb — talking points aren't grounded in Meghan's actual experience
Search coverage is limited: CEO, HQ, LinkedIn, one generic news query
Output has 4 sections; new data categories (tech stack, funding, culture, competitors) have nowhere to go
No skills/keyword config to drive experience matching against the JD

Approach: Query Expansion + Parallel JSON Searches + Single LLM Pass

Run all searches (companyScraper sequential + new parallel SearXNG JSON queries), aggregate into a structured context block, pre-select resume experiences by keyword score, single LLM call produces all expanded sections.

Design

1. Search Pipeline

Phase 1 — companyScraper (unchanged, sequential)

CEO name, HQ address, LinkedIn URL

Phase 1b — Parallel SearXNG JSON queries (new/expanded)

Six queries run concurrently via daemon threads:

Intent	Query pattern
Recent news/press	`"{company}" news 2025 2026`
Funding & investors	`"{company}" funding round investors Series valuation`
Tech stack	`"{company}" tech stack engineering technology platform`
Competitors	`"{company}" competitors alternatives vs market`
Culture / Glassdoor	`"{company}" glassdoor culture reviews employees`
CEO press (if found)	`"{ceo}" "{company}"`

Each returns 3–4 deduplicated snippets (title + content + URL), labeled by type. Results are best-effort — any failed query is silently skipped.

2. Resume Matching

config/resume_keywords.yaml — three categories, tag-managed via Settings UI:

skills:
  - Customer Success
  - Technical Account Management
  - Revenue Operations
  - Salesforce
  - Gainsight
  - data analysis
  - stakeholder management

domains:
  - B2B SaaS
  - enterprise software
  - security / compliance
  - post-sale lifecycle

keywords:
  - QBR
  - churn reduction
  - NRR / ARR
  - onboarding
  - renewal
  - executive sponsorship
  - VOC

Matching logic:

Case-insensitive substring check of all keywords against JD text → matched_keywords list
Score each experience entry: count of matched keywords appearing in position title + responsibility bullets
Top 2 by score → included in prompt as full detail (position, company, period, all bullets)
Remaining entries → condensed one-liners ("Founder @ M3 Consulting, 2023–present")

UpGuard NDA rule (explicit in prompt): reference as "enterprise security vendor" in general; only name UpGuard directly if the role has a strong security/compliance focus.

3. LLM Context Block Structure

## Role Context
{title} at {company}

## Job Description
{JD text, up to 2500 chars}

## Meghan's Matched Experience
[Top 2 scored experience entries — full detail]

Also in Meghan's background: [remaining entries as one-liners]

## Matched Skills & Keywords
Skills matching this JD: {matched_keywords joined}

## Live Company Data
- CEO: {name}
- HQ: {location}
- LinkedIn: {url}

## News & Press
[snippets]

## Funding & Investors
[snippets]

## Tech Stack
[snippets]

## Competitors
[snippets]

## Culture & Employee Signals
[snippets]

4. Output Sections (7, up from 4)

Section header	Purpose
`## Company Overview`	What they do, business model, size/stage, market position
`## Leadership & Culture`	CEO background, leadership team, philosophy
`## Tech Stack & Product`	What they build, relevant technology, product direction
`## Funding & Market Position`	Stage, investors, recent rounds, competitor landscape
`## Recent Developments`	News, launches, pivots, exec moves
`## Red Flags & Watch-outs`	Culture issues, layoffs, exec departures, financial stress
`## Talking Points for Meghan`	5 role-matched, resume-grounded, UpGuard-aware talking points ready to speak aloud

Talking points prompt instructs LLM to: cite the specific matched experience by name, reference matched skills, apply UpGuard NDA rule, frame each as a ready-to-speak sentence.

5. DB Schema Changes

Add columns to company_research table:

ALTER TABLE company_research ADD COLUMN tech_brief TEXT;
ALTER TABLE company_research ADD COLUMN funding_brief TEXT;
ALTER TABLE company_research ADD COLUMN competitors_brief TEXT;
ALTER TABLE company_research ADD COLUMN red_flags TEXT;

Existing columns (company_brief, ceo_brief, talking_points, raw_output) unchanged.

6. Settings UI — Skills & Keywords Tab

New tab in app/pages/2_Settings.py:

One expander or subheader per category (Skills, Domains, Keywords)
Tag chips rendered with st.pills or columns of st.badge-style buttons with ×
Inline text input + Add button per category
Each add/remove saves immediately to config/resume_keywords.yaml

7. Interview Prep UI Changes

app/pages/6_Interview_Prep.py — render new sections alongside existing ones:

Tech Stack & Product (new panel)
Funding & Market Position (new panel)
Red Flags & Watch-outs (new panel, visually distinct — e.g. orange/amber)
Talking Points promoted to top (most useful during a live call)

Files Affected

File	Change
`scripts/company_research.py`	Parallel search queries, resume matching, expanded prompt + sections
`scripts/db.py`	Add 4 new columns to `company_research`; update `save_research` / `get_research`
`config/resume_keywords.yaml`	New file
`config/resume_keywords.yaml.example`	New committed template
`app/pages/2_Settings.py`	New Skills & Keywords tab
`app/pages/6_Interview_Prep.py`	Render new sections
`tests/test_db.py`	Tests for new columns
`tests/test_company_research.py`	New test file for matching logic + section parsing

5.8 KiB Raw Blame History Unescape Escape