peregrine/docs/plans/2026-02-20-job-seeker-implementation.md
pyr0ball f11a38eb0b chore: seed Peregrine from personal job-seeker (pre-generalization)
App: Peregrine
Company: Circuit Forge LLC
Source: github.com/pyr0ball/job-seeker (personal fork, not linked)
2026-02-24 18:25:39 -08:00

1090 lines
31 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Job Seeker Platform — Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
**Goal:** Stand up a job discovery pipeline (JobSpy → Notion) with LLM routing, resume matching, and automated LinkedIn application support for Alex Rivera.
**Architecture:** JobSpy scrapes listings from multiple boards and pushes deduplicated results into a Notion database. A local LLM router with 5-backend fallback chain powers AIHawk's application answer generation. Resume Matcher scores each listing against Alex's resume and writes keyword gaps back to Notion.
**Tech Stack:** Python 3.12, conda env `job-seeker`, `python-jobspy`, `notion-client`, `openai` SDK, `anthropic` SDK, `pyyaml`, `pandas`, Resume-Matcher (cloned), Auto_Jobs_Applier_AIHawk (cloned), pytest, pytest-mock
**Priority order:** Discovery (Tasks 15) must be running before Match or AIHawk setup.
**Document storage rule:** Resumes and cover letters live in `/Library/Documents/JobSearch/` — never committed to this repo.
---
## Task 1: Conda Environment + Project Scaffold
**Files:**
- Create: `/devl/job-seeker/environment.yml`
- Create: `/devl/job-seeker/.gitignore`
- Create: `/devl/job-seeker/tests/__init__.py`
**Step 1: Write environment.yml**
```yaml
# /devl/job-seeker/environment.yml
name: job-seeker
channels:
- conda-forge
- defaults
dependencies:
- python=3.12
- pip
- pip:
- python-jobspy
- notion-client
- openai
- anthropic
- pyyaml
- pandas
- requests
- pytest
- pytest-mock
```
**Step 2: Create the conda env**
```bash
conda env create -f /devl/job-seeker/environment.yml
```
Expected: env `job-seeker` created with no errors.
**Step 3: Verify the env**
```bash
conda run -n job-seeker python -c "import jobspy, notion_client, openai, anthropic; print('all good')"
```
Expected: `all good`
**Step 4: Write .gitignore**
```gitignore
# /devl/job-seeker/.gitignore
.env
config/notion.yaml # contains Notion token
__pycache__/
*.pyc
.pytest_cache/
output/
aihawk/
resume_matcher/
```
Note: `aihawk/` and `resume_matcher/` are cloned externally — don't commit them.
**Step 5: Create tests directory**
```bash
mkdir -p /devl/job-seeker/tests
touch /devl/job-seeker/tests/__init__.py
```
**Step 6: Commit**
```bash
cd /devl/job-seeker
git add environment.yml .gitignore tests/__init__.py
git commit -m "feat: add conda env spec and project scaffold"
```
---
## Task 2: Config Files
**Files:**
- Create: `config/search_profiles.yaml`
- Create: `config/llm.yaml`
- Create: `config/notion.yaml.example` (the real `notion.yaml` is gitignored)
**Step 1: Write search_profiles.yaml**
```yaml
# config/search_profiles.yaml
profiles:
- name: cs_leadership
titles:
- "Customer Success Manager"
- "Director of Customer Success"
- "VP Customer Success"
- "Head of Customer Success"
- "Technical Account Manager"
- "Revenue Operations Manager"
- "Customer Experience Lead"
locations:
- "Remote"
- "San Francisco Bay Area, CA"
boards:
- linkedin
- indeed
- glassdoor
- zip_recruiter
results_per_board: 25
hours_old: 72
```
**Step 2: Write llm.yaml**
```yaml
# config/llm.yaml
fallback_order:
- claude_code
- ollama
- vllm
- github_copilot
- anthropic
backends:
claude_code:
type: openai_compat
base_url: http://localhost:3009/v1
model: claude-code-terminal
api_key: "any"
ollama:
type: openai_compat
base_url: http://localhost:11434/v1
model: llama3.2
api_key: "ollama"
vllm:
type: openai_compat
base_url: http://localhost:8000/v1
model: __auto__
api_key: ""
github_copilot:
type: openai_compat
base_url: http://localhost:3010/v1
model: gpt-4o
api_key: "any"
anthropic:
type: anthropic
model: claude-sonnet-4-6
api_key_env: ANTHROPIC_API_KEY
```
**Step 3: Write notion.yaml.example**
```yaml
# config/notion.yaml.example
# Copy to config/notion.yaml and fill in your values.
# notion.yaml is gitignored — never commit it.
token: "secret_XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
database_id: "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
```
**Step 4: Commit**
```bash
cd /devl/job-seeker
git add config/search_profiles.yaml config/llm.yaml config/notion.yaml.example
git commit -m "feat: add search profiles, LLM config, and Notion config template"
```
---
## Task 3: Create Notion Database
This task creates the Notion DB that all scripts write to. Do it once manually.
**Step 1: Open Notion and create a new database**
Create a full-page database called **"Alex's Job Search"** in whatever Notion workspace you use for tracking.
**Step 2: Add the required properties**
Delete the default properties and create exactly these (type matters):
| Property Name | Type |
|----------------|----------|
| Job Title | Title |
| Company | Text |
| Location | Text |
| Remote | Checkbox |
| URL | URL |
| Source | Select |
| Status | Select |
| Match Score | Number |
| Keyword Gaps | Text |
| Salary | Text |
| Date Found | Date |
| Notes | Text |
For the **Status** select, add these options in order:
`New`, `Reviewing`, `Applied`, `Interview`, `Offer`, `Rejected`
For the **Source** select, add:
`Linkedin`, `Indeed`, `Glassdoor`, `Zip_Recruiter`
**Step 3: Get the database ID**
Open the database as a full page. The URL will look like:
`https://www.notion.so/YourWorkspace/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX?v=...`
The 32-character hex string before the `?` is the database ID.
**Step 4: Get your Notion integration token**
Go to https://www.notion.so/my-integrations → create integration (or use existing) →
copy the "Internal Integration Token" (starts with `secret_`).
Connect the integration to your database: open the database → `...` menu →
Add connections → select your integration.
**Step 5: Write config/notion.yaml**
```bash
cp /devl/job-seeker/config/notion.yaml.example /devl/job-seeker/config/notion.yaml
# Edit notion.yaml and fill in your token and database_id
```
**Step 6: Verify connection**
```bash
conda run -n job-seeker python3 -c "
from notion_client import Client
import yaml
cfg = yaml.safe_load(open('/devl/job-seeker/config/notion.yaml'))
n = Client(auth=cfg['token'])
db = n.databases.retrieve(cfg['database_id'])
print('Connected to:', db['title'][0]['plain_text'])
"
```
Expected: `Connected to: Alex's Job Search`
---
## Task 4: LLM Router
**Files:**
- Create: `scripts/llm_router.py`
- Create: `tests/test_llm_router.py`
**Step 1: Write the failing tests**
```python
# tests/test_llm_router.py
import pytest
from unittest.mock import patch, MagicMock
from pathlib import Path
import yaml
# Point tests at the real config
CONFIG_PATH = Path(__file__).parent.parent / "config" / "llm.yaml"
def test_config_loads():
"""Config file is valid YAML with required keys."""
cfg = yaml.safe_load(CONFIG_PATH.read_text())
assert "fallback_order" in cfg
assert "backends" in cfg
assert len(cfg["fallback_order"]) >= 1
def test_router_uses_first_reachable_backend(tmp_path):
"""Router skips unreachable backends and uses the first that responds."""
from scripts.llm_router import LLMRouter
router = LLMRouter(CONFIG_PATH)
mock_response = MagicMock()
mock_response.choices[0].message.content = "hello"
with patch.object(router, "_is_reachable", side_effect=[False, True, True, True, True]), \
patch("scripts.llm_router.OpenAI") as MockOpenAI:
instance = MockOpenAI.return_value
instance.chat.completions.create.return_value = mock_response
# Also mock models.list for __auto__ case
mock_model = MagicMock()
mock_model.id = "test-model"
instance.models.list.return_value.data = [mock_model]
result = router.complete("say hello")
assert result == "hello"
def test_router_raises_when_all_backends_fail():
"""Router raises RuntimeError when every backend is unreachable or errors."""
from scripts.llm_router import LLMRouter
router = LLMRouter(CONFIG_PATH)
with patch.object(router, "_is_reachable", return_value=False):
with pytest.raises(RuntimeError, match="All LLM backends exhausted"):
router.complete("say hello")
def test_is_reachable_returns_false_on_connection_error():
"""_is_reachable returns False when the health endpoint is unreachable."""
from scripts.llm_router import LLMRouter
import requests
router = LLMRouter(CONFIG_PATH)
with patch("scripts.llm_router.requests.get", side_effect=requests.ConnectionError):
result = router._is_reachable("http://localhost:9999/v1")
assert result is False
```
**Step 2: Run tests to verify they fail**
```bash
cd /devl/job-seeker
conda run -n job-seeker pytest tests/test_llm_router.py -v
```
Expected: `ImportError``scripts.llm_router` doesn't exist yet.
**Step 3: Create scripts/__init__.py**
```bash
touch /devl/job-seeker/scripts/__init__.py
```
**Step 4: Write scripts/llm_router.py**
```python
# scripts/llm_router.py
"""
LLM abstraction layer with priority fallback chain.
Reads config/llm.yaml. Tries backends in order; falls back on any error.
"""
import os
import yaml
import requests
from pathlib import Path
from openai import OpenAI
CONFIG_PATH = Path(__file__).parent.parent / "config" / "llm.yaml"
class LLMRouter:
def __init__(self, config_path: Path = CONFIG_PATH):
with open(config_path) as f:
self.config = yaml.safe_load(f)
def _is_reachable(self, base_url: str) -> bool:
"""Quick health-check ping. Returns True if backend is up."""
health_url = base_url.rstrip("/").removesuffix("/v1") + "/health"
try:
resp = requests.get(health_url, timeout=2)
return resp.status_code < 500
except Exception:
return False
def _resolve_model(self, client: OpenAI, model: str) -> str:
"""Resolve __auto__ to the first model served by vLLM."""
if model != "__auto__":
return model
models = client.models.list()
return models.data[0].id
def complete(self, prompt: str, system: str | None = None) -> str:
"""
Generate a completion. Tries each backend in fallback_order.
Raises RuntimeError if all backends are exhausted.
"""
for name in self.config["fallback_order"]:
backend = self.config["backends"][name]
if backend["type"] == "openai_compat":
if not self._is_reachable(backend["base_url"]):
print(f"[LLMRouter] {name}: unreachable, skipping")
continue
try:
client = OpenAI(
base_url=backend["base_url"],
api_key=backend.get("api_key", "any"),
)
model = self._resolve_model(client, backend["model"])
messages = []
if system:
messages.append({"role": "system", "content": system})
messages.append({"role": "user", "content": prompt})
resp = client.chat.completions.create(
model=model, messages=messages
)
print(f"[LLMRouter] Used backend: {name} ({model})")
return resp.choices[0].message.content
except Exception as e:
print(f"[LLMRouter] {name}: error — {e}, trying next")
continue
elif backend["type"] == "anthropic":
api_key = os.environ.get(backend["api_key_env"], "")
if not api_key:
print(f"[LLMRouter] {name}: {backend['api_key_env']} not set, skipping")
continue
try:
import anthropic as _anthropic
client = _anthropic.Anthropic(api_key=api_key)
kwargs: dict = {
"model": backend["model"],
"max_tokens": 4096,
"messages": [{"role": "user", "content": prompt}],
}
if system:
kwargs["system"] = system
msg = client.messages.create(**kwargs)
print(f"[LLMRouter] Used backend: {name}")
return msg.content[0].text
except Exception as e:
print(f"[LLMRouter] {name}: error — {e}, trying next")
continue
raise RuntimeError("All LLM backends exhausted")
# Module-level singleton for convenience
_router: LLMRouter | None = None
def complete(prompt: str, system: str | None = None) -> str:
global _router
if _router is None:
_router = LLMRouter()
return _router.complete(prompt, system)
```
**Step 5: Run tests to verify they pass**
```bash
conda run -n job-seeker pytest tests/test_llm_router.py -v
```
Expected: 4 tests PASS.
**Step 6: Smoke-test against live Ollama**
```bash
conda run -n job-seeker python3 -c "
from scripts.llm_router import complete
print(complete('Say: job-seeker LLM router is working'))
"
```
Expected: A short response from Ollama (or next reachable backend).
**Step 7: Commit**
```bash
cd /devl/job-seeker
git add scripts/__init__.py scripts/llm_router.py tests/test_llm_router.py
git commit -m "feat: add LLM router with 5-backend fallback chain"
```
---
## Task 5: Job Discovery (discover.py) — PRIORITY
**Files:**
- Create: `scripts/discover.py`
- Create: `tests/test_discover.py`
**Step 1: Write the failing tests**
```python
# tests/test_discover.py
import pytest
from unittest.mock import patch, MagicMock, call
import pandas as pd
from pathlib import Path
SAMPLE_JOB = {
"title": "Customer Success Manager",
"company": "Acme Corp",
"location": "Remote",
"is_remote": True,
"job_url": "https://linkedin.com/jobs/view/123456",
"site": "linkedin",
"salary_source": "$90,000 - $120,000",
}
def make_jobs_df(jobs=None):
return pd.DataFrame(jobs or [SAMPLE_JOB])
def test_get_existing_urls_returns_set():
"""get_existing_urls returns a set of URL strings from Notion pages."""
from scripts.discover import get_existing_urls
mock_notion = MagicMock()
mock_notion.databases.query.return_value = {
"results": [
{"properties": {"URL": {"url": "https://example.com/job/1"}}},
{"properties": {"URL": {"url": "https://example.com/job/2"}}},
],
"has_more": False,
"next_cursor": None,
}
urls = get_existing_urls(mock_notion, "fake-db-id")
assert urls == {"https://example.com/job/1", "https://example.com/job/2"}
def test_discover_skips_duplicate_urls():
"""discover does not push a job whose URL is already in Notion."""
from scripts.discover import run_discovery
existing = {"https://linkedin.com/jobs/view/123456"}
with patch("scripts.discover.scrape_jobs", return_value=make_jobs_df()), \
patch("scripts.discover.get_existing_urls", return_value=existing), \
patch("scripts.discover.push_to_notion") as mock_push, \
patch("scripts.discover.Client"):
run_discovery()
mock_push.assert_not_called()
def test_discover_pushes_new_jobs():
"""discover pushes jobs whose URLs are not already in Notion."""
from scripts.discover import run_discovery
with patch("scripts.discover.scrape_jobs", return_value=make_jobs_df()), \
patch("scripts.discover.get_existing_urls", return_value=set()), \
patch("scripts.discover.push_to_notion") as mock_push, \
patch("scripts.discover.Client"):
run_discovery()
assert mock_push.call_count == 1
def test_push_to_notion_sets_status_new():
"""push_to_notion always sets Status to 'New'."""
from scripts.discover import push_to_notion
mock_notion = MagicMock()
push_to_notion(mock_notion, "fake-db-id", SAMPLE_JOB)
call_kwargs = mock_notion.pages.create.call_args[1]
status = call_kwargs["properties"]["Status"]["select"]["name"]
assert status == "New"
```
**Step 2: Run tests to verify they fail**
```bash
conda run -n job-seeker pytest tests/test_discover.py -v
```
Expected: `ImportError``scripts.discover` doesn't exist yet.
**Step 3: Write scripts/discover.py**
```python
# scripts/discover.py
"""
JobSpy → Notion discovery pipeline.
Scrapes job boards, deduplicates against existing Notion records,
and pushes new listings with Status=New.
Usage:
conda run -n job-seeker python scripts/discover.py
"""
import yaml
from datetime import datetime
from pathlib import Path
import pandas as pd
from jobspy import scrape_jobs
from notion_client import Client
CONFIG_DIR = Path(__file__).parent.parent / "config"
NOTION_CFG = CONFIG_DIR / "notion.yaml"
PROFILES_CFG = CONFIG_DIR / "search_profiles.yaml"
def load_config() -> tuple[dict, dict]:
profiles = yaml.safe_load(PROFILES_CFG.read_text())
notion_cfg = yaml.safe_load(NOTION_CFG.read_text())
return profiles, notion_cfg
def get_existing_urls(notion: Client, db_id: str) -> set[str]:
"""Return the set of all job URLs already tracked in Notion."""
existing: set[str] = set()
has_more = True
start_cursor = None
while has_more:
kwargs: dict = {"database_id": db_id, "page_size": 100}
if start_cursor:
kwargs["start_cursor"] = start_cursor
resp = notion.databases.query(**kwargs)
for page in resp["results"]:
url = page["properties"].get("URL", {}).get("url")
if url:
existing.add(url)
has_more = resp.get("has_more", False)
start_cursor = resp.get("next_cursor")
return existing
def push_to_notion(notion: Client, db_id: str, job: dict) -> None:
"""Create a new page in the Notion jobs database for a single listing."""
notion.pages.create(
parent={"database_id": db_id},
properties={
"Job Title": {"title": [{"text": {"content": str(job.get("title", "Unknown"))}}]},
"Company": {"rich_text": [{"text": {"content": str(job.get("company", ""))}}]},
"Location": {"rich_text": [{"text": {"content": str(job.get("location", ""))}}]},
"Remote": {"checkbox": bool(job.get("is_remote", False))},
"URL": {"url": str(job.get("job_url", ""))},
"Source": {"select": {"name": str(job.get("site", "unknown")).title()}},
"Status": {"select": {"name": "New"}},
"Salary": {"rich_text": [{"text": {"content": str(job.get("salary_source") or "")}}]},
"Date Found": {"date": {"start": datetime.now().isoformat()[:10]}},
},
)
def run_discovery() -> None:
profiles_cfg, notion_cfg = load_config()
notion = Client(auth=notion_cfg["token"])
db_id = notion_cfg["database_id"]
existing_urls = get_existing_urls(notion, db_id)
print(f"[discover] {len(existing_urls)} existing listings in Notion")
new_count = 0
for profile in profiles_cfg["profiles"]:
print(f"\n[discover] Profile: {profile['name']}")
for location in profile["locations"]:
print(f" Scraping: {location}")
jobs: pd.DataFrame = scrape_jobs(
site_name=profile["boards"],
search_term=" OR ".join(f'"{t}"' for t in profile["titles"]),
location=location,
results_wanted=profile.get("results_per_board", 25),
hours_old=profile.get("hours_old", 72),
linkedin_fetch_description=True,
)
for _, job in jobs.iterrows():
url = str(job.get("job_url", ""))
if not url or url in existing_urls:
continue
push_to_notion(notion, db_id, job.to_dict())
existing_urls.add(url)
new_count += 1
print(f" + {job.get('title')} @ {job.get('company')}")
print(f"\n[discover] Done — {new_count} new listings pushed to Notion.")
if __name__ == "__main__":
run_discovery()
```
**Step 4: Run tests to verify they pass**
```bash
conda run -n job-seeker pytest tests/test_discover.py -v
```
Expected: 4 tests PASS.
**Step 5: Run a live discovery (requires notion.yaml to be set up from Task 3)**
```bash
conda run -n job-seeker python scripts/discover.py
```
Expected: listings printed and pushed to Notion. Check the Notion DB to confirm rows appear with Status=New.
**Step 6: Commit**
```bash
cd /devl/job-seeker
git add scripts/discover.py tests/test_discover.py
git commit -m "feat: add JobSpy discovery pipeline with Notion deduplication"
```
---
## Task 6: Clone and Configure Resume Matcher
**Step 1: Clone Resume Matcher**
```bash
cd /devl/job-seeker
git clone https://github.com/srbhr/Resume-Matcher.git resume_matcher
```
**Step 2: Install Resume Matcher dependencies into the job-seeker env**
```bash
conda run -n job-seeker pip install -r /devl/job-seeker/resume_matcher/requirements.txt
```
If there are conflicts, install only the core matching library:
```bash
conda run -n job-seeker pip install sentence-transformers streamlit qdrant-client pypdf2
```
**Step 3: Verify it launches**
```bash
conda run -n job-seeker streamlit run /devl/job-seeker/resume_matcher/streamlit_app.py --server.port 8501
```
Expected: Streamlit opens on http://localhost:8501 (port confirmed clear).
Stop it with Ctrl+C — we'll run it on-demand.
**Step 4: Note the resume path to use**
The ATS-clean resume to use with Resume Matcher:
```
/Library/Documents/JobSearch/Alex_Rivera_Resume_02-19-2025.pdf
```
---
## Task 7: Resume Match Script (match.py)
**Files:**
- Create: `scripts/match.py`
- Create: `tests/test_match.py`
**Step 1: Write the failing tests**
```python
# tests/test_match.py
import pytest
from unittest.mock import patch, MagicMock
def test_extract_job_description_from_url():
"""extract_job_description fetches and returns text from a URL."""
from scripts.match import extract_job_description
with patch("scripts.match.requests.get") as mock_get:
mock_get.return_value.text = "<html><body><p>We need a CSM with Salesforce.</p></body></html>"
mock_get.return_value.raise_for_status = MagicMock()
result = extract_job_description("https://example.com/job/123")
assert "CSM" in result
assert "Salesforce" in result
def test_score_is_between_0_and_100():
"""match_score returns a float in [0, 100]."""
from scripts.match import match_score
# Provide minimal inputs that the scorer can handle
score, gaps = match_score(
resume_text="Customer Success Manager with Salesforce experience",
job_text="Looking for a Customer Success Manager who knows Salesforce and Gainsight",
)
assert 0 <= score <= 100
assert isinstance(gaps, list)
def test_write_score_to_notion():
"""write_match_to_notion updates the Notion page with score and gaps."""
from scripts.match import write_match_to_notion
mock_notion = MagicMock()
write_match_to_notion(mock_notion, "page-id-abc", 85.5, ["Gainsight", "Churnzero"])
mock_notion.pages.update.assert_called_once()
call_kwargs = mock_notion.pages.update.call_args[1]
assert call_kwargs["page_id"] == "page-id-abc"
score_val = call_kwargs["properties"]["Match Score"]["number"]
assert score_val == 85.5
```
**Step 2: Run tests to verify they fail**
```bash
conda run -n job-seeker pytest tests/test_match.py -v
```
Expected: `ImportError``scripts.match` doesn't exist.
**Step 3: Write scripts/match.py**
```python
# scripts/match.py
"""
Resume Matcher integration: score a Notion job listing against Alex's resume.
Writes Match Score and Keyword Gaps back to the Notion page.
Usage:
conda run -n job-seeker python scripts/match.py <notion-page-url-or-id>
"""
import re
import sys
from pathlib import Path
import requests
import yaml
from bs4 import BeautifulSoup
from notion_client import Client
CONFIG_DIR = Path(__file__).parent.parent / "config"
RESUME_PATH = Path("/Library/Documents/JobSearch/Alex_Rivera_Resume_02-19-2025.pdf")
def load_notion() -> tuple[Client, str]:
cfg = yaml.safe_load((CONFIG_DIR / "notion.yaml").read_text())
return Client(auth=cfg["token"]), cfg["database_id"]
def extract_page_id(url_or_id: str) -> str:
"""Extract 32-char Notion page ID from a URL or return as-is."""
match = re.search(r"[0-9a-f]{32}", url_or_id.replace("-", ""))
if match:
return match.group(0)
return url_or_id.strip()
def get_job_url_from_notion(notion: Client, page_id: str) -> str:
page = notion.pages.retrieve(page_id)
return page["properties"]["URL"]["url"]
def extract_job_description(url: str) -> str:
"""Fetch a job listing URL and return its visible text."""
resp = requests.get(url, headers={"User-Agent": "Mozilla/5.0"}, timeout=10)
resp.raise_for_status()
soup = BeautifulSoup(resp.text, "html.parser")
for tag in soup(["script", "style", "nav", "header", "footer"]):
tag.decompose()
return " ".join(soup.get_text(separator=" ").split())
def read_resume_text() -> str:
"""Extract text from the ATS-clean PDF resume."""
try:
import pypdf
reader = pypdf.PdfReader(str(RESUME_PATH))
return " ".join(page.extract_text() or "" for page in reader.pages)
except ImportError:
import PyPDF2
with open(RESUME_PATH, "rb") as f:
reader = PyPDF2.PdfReader(f)
return " ".join(p.extract_text() or "" for p in reader.pages)
def match_score(resume_text: str, job_text: str) -> tuple[float, list[str]]:
"""
Score resume against job description using TF-IDF keyword overlap.
Returns (score 0-100, list of keywords in job not found in resume).
"""
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
vectorizer = TfidfVectorizer(stop_words="english", max_features=200)
tfidf = vectorizer.fit_transform([resume_text, job_text])
score = float(cosine_similarity(tfidf[0:1], tfidf[1:2])[0][0]) * 100
# Keyword gap: terms in job description not present in resume (lowercased)
job_terms = set(job_text.lower().split())
resume_terms = set(resume_text.lower().split())
feature_names = vectorizer.get_feature_names_out()
job_tfidf = tfidf[1].toarray()[0]
top_indices = np.argsort(job_tfidf)[::-1][:30]
top_job_terms = [feature_names[i] for i in top_indices if job_tfidf[i] > 0]
gaps = [t for t in top_job_terms if t not in resume_terms][:10]
return round(score, 1), gaps
def write_match_to_notion(notion: Client, page_id: str, score: float, gaps: list[str]) -> None:
notion.pages.update(
page_id=page_id,
properties={
"Match Score": {"number": score},
"Keyword Gaps": {"rich_text": [{"text": {"content": ", ".join(gaps)}}]},
},
)
def run_match(page_url_or_id: str) -> None:
notion, _ = load_notion()
page_id = extract_page_id(page_url_or_id)
print(f"[match] Page ID: {page_id}")
job_url = get_job_url_from_notion(notion, page_id)
print(f"[match] Fetching job description from: {job_url}")
job_text = extract_job_description(job_url)
resume_text = read_resume_text()
score, gaps = match_score(resume_text, job_text)
print(f"[match] Score: {score}/100")
print(f"[match] Keyword gaps: {', '.join(gaps) or 'none'}")
write_match_to_notion(notion, page_id, score, gaps)
print("[match] Written to Notion.")
if __name__ == "__main__":
if len(sys.argv) < 2:
print("Usage: python scripts/match.py <notion-page-url-or-id>")
sys.exit(1)
run_match(sys.argv[1])
```
**Step 4: Install sklearn (needed by match.py)**
```bash
conda run -n job-seeker pip install scikit-learn beautifulsoup4 pypdf
```
**Step 5: Run tests**
```bash
conda run -n job-seeker pytest tests/test_match.py -v
```
Expected: 3 tests PASS.
**Step 6: Commit**
```bash
cd /devl/job-seeker
git add scripts/match.py tests/test_match.py
git commit -m "feat: add resume match scoring with Notion write-back"
```
---
## Task 8: Clone and Configure AIHawk
**Step 1: Clone AIHawk**
```bash
cd /devl/job-seeker
git clone https://github.com/feder-cr/Auto_Jobs_Applier_AIHawk.git aihawk
```
**Step 2: Install AIHawk dependencies**
```bash
conda run -n job-seeker pip install -r /devl/job-seeker/aihawk/requirements.txt
```
**Step 3: Install Playwright browsers (AIHawk uses Playwright for browser automation)**
```bash
conda run -n job-seeker playwright install chromium
```
**Step 4: Create AIHawk personal info config**
AIHawk reads a `personal_info.yaml`. Create it in AIHawk's data directory:
```bash
cp /devl/job-seeker/aihawk/data_folder/plain_text_resume.yaml \
/devl/job-seeker/aihawk/data_folder/plain_text_resume.yaml.bak
```
Edit `/devl/job-seeker/aihawk/data_folder/plain_text_resume.yaml` with Alex's info.
Key fields to fill:
- `personal_information`: name, email, phone, linkedin, github (leave blank), location
- `work_experience`: pull from the SVG content already extracted
- `education`: Texas State University, Mass Communications & PR, 2012-2015
- `skills`: Zendesk, Intercom, Asana, Jira, etc.
**Step 5: Configure AIHawk to use the LLM router**
AIHawk's config (`aihawk/data_folder/config.yaml`) has an `llm_model_type` and `llm_model` field.
Set it to use the local OpenAI-compatible endpoint:
```yaml
# In aihawk/data_folder/config.yaml
llm_model_type: openai
llm_model: claude-code-terminal
openai_api_url: http://localhost:3009/v1 # or whichever backend is running
```
If 3009 is down, change to `http://localhost:11434/v1` (Ollama).
**Step 6: Run AIHawk in dry-run mode first**
```bash
conda run -n job-seeker python /devl/job-seeker/aihawk/main.py --help
```
Review the flags. Start with a test run before enabling real submissions.
**Step 7: Commit the environment update**
```bash
cd /devl/job-seeker
conda env export -n job-seeker > environment.yml
git add environment.yml
git commit -m "chore: update environment.yml with all installed packages"
```
---
## Task 9: End-to-End Smoke Test
**Step 1: Run full test suite**
```bash
conda run -n job-seeker pytest tests/ -v
```
Expected: all tests PASS.
**Step 2: Run discovery**
```bash
conda run -n job-seeker python scripts/discover.py
```
Expected: new listings appear in Notion with Status=New.
**Step 3: Run match on one listing**
Copy the URL of a Notion page from the DB and run:
```bash
conda run -n job-seeker python scripts/match.py "https://www.notion.so/..."
```
Expected: Match Score and Keyword Gaps written back to that Notion page.
**Step 4: Commit anything left**
```bash
cd /devl/job-seeker
git status
git add -p # stage only code/config, not secrets
git commit -m "chore: final smoke test cleanup"
```
---
## Quick Reference
| Command | What it does |
|---|---|
| `conda run -n job-seeker python scripts/discover.py` | Scrape boards → push new listings to Notion |
| `conda run -n job-seeker python scripts/match.py <url>` | Score a listing → write back to Notion |
| `conda run -n job-seeker streamlit run resume_matcher/streamlit_app.py --server.port 8501` | Open Resume Matcher UI |
| `conda run -n job-seeker pytest tests/ -v` | Run all tests |
| `cd "/Library/Documents/Post Fight Processing" && ./manage.sh start` | Start Claude Code pipeline (port 3009) |
| `cd "/Library/Documents/Post Fight Processing" && ./manage-copilot.sh start` | Start Copilot wrapper (port 3010) |