peregrine/docs/plans/2026-02-22-research-workflow-design.md

# Research Workflow Redesign

**Date:** 2026-02-22
**Status:** Approved

## Problem

The current `company_research.py` produces shallow output:
- Resume context is a hardcoded 2-sentence blurb — talking points aren't grounded in Alex's actual experience
- Search coverage is limited: CEO, HQ, LinkedIn, one generic news query
- Output has 4 sections; new data categories (tech stack, funding, culture, competitors) have nowhere to go
- No skills/keyword config to drive experience matching against the JD

## Approach: Query Expansion + Parallel JSON Searches + Single LLM Pass

Run all searches (companyScraper sequential + new parallel SearXNG JSON queries), aggregate into a structured context block, pre-select resume experiences by keyword score, single LLM call produces all expanded sections.

---

## Design

### 1. Search Pipeline

**Phase 1 — companyScraper (unchanged, sequential)**
- CEO name, HQ address, LinkedIn URL

**Phase 1b — Parallel SearXNG JSON queries (new/expanded)**

Six queries run concurrently via daemon threads:

| Intent | Query pattern |
|---|---|
| Recent news/press | `"{company}" news 2025 2026` |
| Funding & investors | `"{company}" funding round investors Series valuation` |
| Tech stack | `"{company}" tech stack engineering technology platform` |
| Competitors | `"{company}" competitors alternatives vs market` |
| Culture / Glassdoor | `"{company}" glassdoor culture reviews employees` |
| CEO press (if found) | `"{ceo}" "{company}"` |

Each returns 3–4 deduplicated snippets (title + content + URL), labeled by type.
Results are best-effort — any failed query is silently skipped.

---

### 2. Resume Matching

**`config/resume_keywords.yaml`** — three categories, tag-managed via Settings UI:

```yaml
skills:
  - Customer Success
  - Technical Account Management
  - Revenue Operations
  - Salesforce
  - Gainsight
  - data analysis
  - stakeholder management

domains:
  - B2B SaaS
  - enterprise software
  - security / compliance
  - post-sale lifecycle

keywords:
  - QBR
  - churn reduction
  - NRR / ARR
  - onboarding
  - renewal
  - executive sponsorship
  - VOC
```

**Matching logic:**
1. Case-insensitive substring check of all keywords against JD text → `matched_keywords` list
2. Score each experience entry: count of matched keywords appearing in position title + responsibility bullets
3. Top 2 by score → included in prompt as full detail (position, company, period, all bullets)
4. Remaining entries → condensed one-liners ("Founder @ M3 Consulting, 2023–present")

**UpGuard NDA rule** (explicit in prompt): reference as "enterprise security vendor" in general; only name UpGuard directly if the role has a strong security/compliance focus.

---

### 3. LLM Context Block Structure

```
## Role Context
{title} at {company}

## Job Description
{JD text, up to 2500 chars}

## Alex's Matched Experience
[Top 2 scored experience entries — full detail]

Also in Alex's background: [remaining entries as one-liners]

## Matched Skills & Keywords
Skills matching this JD: {matched_keywords joined}

## Live Company Data
- CEO: {name}
- HQ: {location}
- LinkedIn: {url}

## News & Press
[snippets]

## Funding & Investors
[snippets]

## Tech Stack
[snippets]

## Competitors
[snippets]

## Culture & Employee Signals
[snippets]
```

---

### 4. Output Sections (7, up from 4)

| Section header | Purpose |
|---|---|
| `## Company Overview` | What they do, business model, size/stage, market position |
| `## Leadership & Culture` | CEO background, leadership team, philosophy |
| `## Tech Stack & Product` | What they build, relevant technology, product direction |
| `## Funding & Market Position` | Stage, investors, recent rounds, competitor landscape |
| `## Recent Developments` | News, launches, pivots, exec moves |
| `## Red Flags & Watch-outs` | Culture issues, layoffs, exec departures, financial stress |
| `## Talking Points for Alex` | 5 role-matched, resume-grounded, UpGuard-aware talking points ready to speak aloud |

Talking points prompt instructs LLM to: cite the specific matched experience by name, reference matched skills, apply UpGuard NDA rule, frame each as a ready-to-speak sentence.

---

### 5. DB Schema Changes

Add columns to `company_research` table:

```sql
ALTER TABLE company_research ADD COLUMN tech_brief TEXT;
ALTER TABLE company_research ADD COLUMN funding_brief TEXT;
ALTER TABLE company_research ADD COLUMN competitors_brief TEXT;
ALTER TABLE company_research ADD COLUMN red_flags TEXT;
```

Existing columns (`company_brief`, `ceo_brief`, `talking_points`, `raw_output`) unchanged.

---

### 6. Settings UI — Skills & Keywords Tab

New tab in `app/pages/2_Settings.py`:
- One expander or subheader per category (Skills, Domains, Keywords)
- Tag chips rendered with `st.pills` or columns of `st.badge`-style buttons with ×
- Inline text input + Add button per category
- Each add/remove saves immediately to `config/resume_keywords.yaml`

---

### 7. Interview Prep UI Changes

`app/pages/6_Interview_Prep.py` — render new sections alongside existing ones:
- Tech Stack & Product (new panel)
- Funding & Market Position (new panel)
- Red Flags & Watch-outs (new panel, visually distinct — e.g. orange/amber)
- Talking Points promoted to top (most useful during a live call)

---

## Files Affected

| File | Change |
|---|---|
| `scripts/company_research.py` | Parallel search queries, resume matching, expanded prompt + sections |
| `scripts/db.py` | Add 4 new columns to `company_research`; update `save_research` / `get_research` |
| `config/resume_keywords.yaml` | New file |
| `config/resume_keywords.yaml.example` | New committed template |
| `app/pages/2_Settings.py` | New Skills & Keywords tab |
| `app/pages/6_Interview_Prep.py` | Render new sections |
| `tests/test_db.py` | Tests for new columns |
| `tests/test_company_research.py` | New test file for matching logic + section parsing |