peregrine/docs/plans/2026-02-20-job-seeker-implementation.md

# Job Seeker Platform — Implementation Plan

> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

**Goal:** Stand up a job discovery pipeline (JobSpy → Notion) with LLM routing, resume matching, and automated LinkedIn application support for Alex Rivera.

**Architecture:** JobSpy scrapes listings from multiple boards and pushes deduplicated results into a Notion database. A local LLM router with 5-backend fallback chain powers AIHawk's application answer generation. Resume Matcher scores each listing against Alex's resume and writes keyword gaps back to Notion.

**Tech Stack:** Python 3.12, conda env `job-seeker`, `python-jobspy`, `notion-client`, `openai` SDK, `anthropic` SDK, `pyyaml`, `pandas`, Resume-Matcher (cloned), Auto_Jobs_Applier_AIHawk (cloned), pytest, pytest-mock

**Priority order:** Discovery (Tasks 1–5) must be running before Match or AIHawk setup.

**Document storage rule:** Resumes and cover letters live in `/Library/Documents/JobSearch/` — never committed to this repo.

---

## Task 1: Conda Environment + Project Scaffold

**Files:**
- Create: `/devl/job-seeker/environment.yml`
- Create: `/devl/job-seeker/.gitignore`
- Create: `/devl/job-seeker/tests/__init__.py`

**Step 1: Write environment.yml**

```yaml
# /devl/job-seeker/environment.yml
name: job-seeker
channels:
  - conda-forge
  - defaults
dependencies:
  - python=3.12
  - pip
  - pip:
    - python-jobspy
    - notion-client
    - openai
    - anthropic
    - pyyaml
    - pandas
    - requests
    - pytest
    - pytest-mock
```

**Step 2: Create the conda env**

```bash
conda env create -f /devl/job-seeker/environment.yml
```

Expected: env `job-seeker` created with no errors.

**Step 3: Verify the env**

```bash
conda run -n job-seeker python -c "import jobspy, notion_client, openai, anthropic; print('all good')"
```

Expected: `all good`

**Step 4: Write .gitignore**

```gitignore
# /devl/job-seeker/.gitignore
.env
config/notion.yaml          # contains Notion token
__pycache__/
*.pyc
.pytest_cache/
output/
aihawk/
resume_matcher/
```

Note: `aihawk/` and `resume_matcher/` are cloned externally — don't commit them.

**Step 5: Create tests directory**

```bash
mkdir -p /devl/job-seeker/tests
touch /devl/job-seeker/tests/__init__.py
```

**Step 6: Commit**

```bash
cd /devl/job-seeker
git add environment.yml .gitignore tests/__init__.py
git commit -m "feat: add conda env spec and project scaffold"
```

---

## Task 2: Config Files

**Files:**
- Create: `config/search_profiles.yaml`
- Create: `config/llm.yaml`
- Create: `config/notion.yaml.example` (the real `notion.yaml` is gitignored)

**Step 1: Write search_profiles.yaml**

```yaml
# config/search_profiles.yaml
profiles:
  - name: cs_leadership
    titles:
      - "Customer Success Manager"
      - "Director of Customer Success"
      - "VP Customer Success"
      - "Head of Customer Success"
      - "Technical Account Manager"
      - "Revenue Operations Manager"
      - "Customer Experience Lead"
    locations:
      - "Remote"
      - "San Francisco Bay Area, CA"
    boards:
      - linkedin
      - indeed
      - glassdoor
      - zip_recruiter
    results_per_board: 25
    hours_old: 72
```

**Step 2: Write llm.yaml**

```yaml
# config/llm.yaml
fallback_order:
  - claude_code
  - ollama
  - vllm
  - github_copilot
  - anthropic

backends:
  claude_code:
    type: openai_compat
    base_url: http://localhost:3009/v1
    model: claude-code-terminal
    api_key: "any"

  ollama:
    type: openai_compat
    base_url: http://localhost:11434/v1
    model: llama3.2
    api_key: "ollama"

  vllm:
    type: openai_compat
    base_url: http://localhost:8000/v1
    model: __auto__
    api_key: ""

  github_copilot:
    type: openai_compat
    base_url: http://localhost:3010/v1
    model: gpt-4o
    api_key: "any"

  anthropic:
    type: anthropic
    model: claude-sonnet-4-6
    api_key_env: ANTHROPIC_API_KEY
```

**Step 3: Write notion.yaml.example**

```yaml
# config/notion.yaml.example
# Copy to config/notion.yaml and fill in your values.
# notion.yaml is gitignored — never commit it.
token: "secret_XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
database_id: "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
```

**Step 4: Commit**

```bash
cd /devl/job-seeker
git add config/search_profiles.yaml config/llm.yaml config/notion.yaml.example
git commit -m "feat: add search profiles, LLM config, and Notion config template"
```

---

## Task 3: Create Notion Database

This task creates the Notion DB that all scripts write to. Do it once manually.

**Step 1: Open Notion and create a new database**

Create a full-page database called **"Alex's Job Search"** in whatever Notion workspace you use for tracking.

**Step 2: Add the required properties**

Delete the default properties and create exactly these (type matters):

| Property Name  | Type     |
|----------------|----------|
| Job Title      | Title    |
| Company        | Text     |
| Location       | Text     |
| Remote         | Checkbox |
| URL            | URL      |
| Source         | Select   |
| Status         | Select   |
| Match Score    | Number   |
| Keyword Gaps   | Text     |
| Salary         | Text     |
| Date Found     | Date     |
| Notes          | Text     |

For the **Status** select, add these options in order:
`New`, `Reviewing`, `Applied`, `Interview`, `Offer`, `Rejected`

For the **Source** select, add:
`Linkedin`, `Indeed`, `Glassdoor`, `Zip_Recruiter`

**Step 3: Get the database ID**

Open the database as a full page. The URL will look like:
`https://www.notion.so/YourWorkspace/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX?v=...`

The 32-character hex string before the `?` is the database ID.

**Step 4: Get your Notion integration token**

Go to https://www.notion.so/my-integrations → create integration (or use existing) →
copy the "Internal Integration Token" (starts with `secret_`).

Connect the integration to your database: open the database → `...` menu →
Add connections → select your integration.

**Step 5: Write config/notion.yaml**

```bash
cp /devl/job-seeker/config/notion.yaml.example /devl/job-seeker/config/notion.yaml
# Edit notion.yaml and fill in your token and database_id
```

**Step 6: Verify connection**

```bash
conda run -n job-seeker python3 -c "
from notion_client import Client
import yaml
cfg = yaml.safe_load(open('/devl/job-seeker/config/notion.yaml'))
n = Client(auth=cfg['token'])
db = n.databases.retrieve(cfg['database_id'])
print('Connected to:', db['title'][0]['plain_text'])
"
```

Expected: `Connected to: Alex's Job Search`

---

## Task 4: LLM Router

**Files:**
- Create: `scripts/llm_router.py`
- Create: `tests/test_llm_router.py`

**Step 1: Write the failing tests**

```python
# tests/test_llm_router.py
import pytest
from unittest.mock import patch, MagicMock
from pathlib import Path
import yaml

# Point tests at the real config
CONFIG_PATH = Path(__file__).parent.parent / "config" / "llm.yaml"


def test_config_loads():
    """Config file is valid YAML with required keys."""
    cfg = yaml.safe_load(CONFIG_PATH.read_text())
    assert "fallback_order" in cfg
    assert "backends" in cfg
    assert len(cfg["fallback_order"]) >= 1


def test_router_uses_first_reachable_backend(tmp_path):
    """Router skips unreachable backends and uses the first that responds."""
    from scripts.llm_router import LLMRouter

    router = LLMRouter(CONFIG_PATH)

    mock_response = MagicMock()
    mock_response.choices[0].message.content = "hello"

    with patch.object(router, "_is_reachable", side_effect=[False, True, True, True, True]), \
         patch("scripts.llm_router.OpenAI") as MockOpenAI:
        instance = MockOpenAI.return_value
        instance.chat.completions.create.return_value = mock_response
        # Also mock models.list for __auto__ case
        mock_model = MagicMock()
        mock_model.id = "test-model"
        instance.models.list.return_value.data = [mock_model]

        result = router.complete("say hello")

    assert result == "hello"


def test_router_raises_when_all_backends_fail():
    """Router raises RuntimeError when every backend is unreachable or errors."""
    from scripts.llm_router import LLMRouter

    router = LLMRouter(CONFIG_PATH)

    with patch.object(router, "_is_reachable", return_value=False):
        with pytest.raises(RuntimeError, match="All LLM backends exhausted"):
            router.complete("say hello")


def test_is_reachable_returns_false_on_connection_error():
    """_is_reachable returns False when the health endpoint is unreachable."""
    from scripts.llm_router import LLMRouter
    import requests

    router = LLMRouter(CONFIG_PATH)

    with patch("scripts.llm_router.requests.get", side_effect=requests.ConnectionError):
        result = router._is_reachable("http://localhost:9999/v1")

    assert result is False
```

**Step 2: Run tests to verify they fail**

```bash
cd /devl/job-seeker
conda run -n job-seeker pytest tests/test_llm_router.py -v
```

Expected: `ImportError` — `scripts.llm_router` doesn't exist yet.

**Step 3: Create scripts/__init__.py**

```bash
touch /devl/job-seeker/scripts/__init__.py
```

**Step 4: Write scripts/llm_router.py**

```python
# scripts/llm_router.py
"""
LLM abstraction layer with priority fallback chain.
Reads config/llm.yaml. Tries backends in order; falls back on any error.
"""
import os
import yaml
import requests
from pathlib import Path
from openai import OpenAI

CONFIG_PATH = Path(__file__).parent.parent / "config" / "llm.yaml"


class LLMRouter:
    def __init__(self, config_path: Path = CONFIG_PATH):
        with open(config_path) as f:
            self.config = yaml.safe_load(f)

    def _is_reachable(self, base_url: str) -> bool:
        """Quick health-check ping. Returns True if backend is up."""
        health_url = base_url.rstrip("/").removesuffix("/v1") + "/health"
        try:
            resp = requests.get(health_url, timeout=2)
            return resp.status_code < 500
        except Exception:
            return False

    def _resolve_model(self, client: OpenAI, model: str) -> str:
        """Resolve __auto__ to the first model served by vLLM."""
        if model != "__auto__":
            return model
        models = client.models.list()
        return models.data[0].id

    def complete(self, prompt: str, system: str | None = None) -> str:
        """
        Generate a completion. Tries each backend in fallback_order.
        Raises RuntimeError if all backends are exhausted.
        """
        for name in self.config["fallback_order"]:
            backend = self.config["backends"][name]

            if backend["type"] == "openai_compat":
                if not self._is_reachable(backend["base_url"]):
                    print(f"[LLMRouter] {name}: unreachable, skipping")
                    continue
                try:
                    client = OpenAI(
                        base_url=backend["base_url"],
                        api_key=backend.get("api_key", "any"),
                    )
                    model = self._resolve_model(client, backend["model"])
                    messages = []
                    if system:
                        messages.append({"role": "system", "content": system})
                    messages.append({"role": "user", "content": prompt})

                    resp = client.chat.completions.create(
                        model=model, messages=messages
                    )
                    print(f"[LLMRouter] Used backend: {name} ({model})")
                    return resp.choices[0].message.content

                except Exception as e:
                    print(f"[LLMRouter] {name}: error — {e}, trying next")
                    continue

            elif backend["type"] == "anthropic":
                api_key = os.environ.get(backend["api_key_env"], "")
                if not api_key:
                    print(f"[LLMRouter] {name}: {backend['api_key_env']} not set, skipping")
                    continue
                try:
                    import anthropic as _anthropic
                    client = _anthropic.Anthropic(api_key=api_key)
                    kwargs: dict = {
                        "model": backend["model"],
                        "max_tokens": 4096,
                        "messages": [{"role": "user", "content": prompt}],
                    }
                    if system:
                        kwargs["system"] = system
                    msg = client.messages.create(**kwargs)
                    print(f"[LLMRouter] Used backend: {name}")
                    return msg.content[0].text
                except Exception as e:
                    print(f"[LLMRouter] {name}: error — {e}, trying next")
                    continue

        raise RuntimeError("All LLM backends exhausted")


# Module-level singleton for convenience
_router: LLMRouter | None = None


def complete(prompt: str, system: str | None = None) -> str:
    global _router
    if _router is None:
        _router = LLMRouter()
    return _router.complete(prompt, system)
```

**Step 5: Run tests to verify they pass**

```bash
conda run -n job-seeker pytest tests/test_llm_router.py -v
```

Expected: 4 tests PASS.

**Step 6: Smoke-test against live Ollama**

```bash
conda run -n job-seeker python3 -c "
from scripts.llm_router import complete
print(complete('Say: job-seeker LLM router is working'))
"
```

Expected: A short response from Ollama (or next reachable backend).

**Step 7: Commit**

```bash
cd /devl/job-seeker
git add scripts/__init__.py scripts/llm_router.py tests/test_llm_router.py
git commit -m "feat: add LLM router with 5-backend fallback chain"
```

---

## Task 5: Job Discovery (discover.py) — PRIORITY

**Files:**
- Create: `scripts/discover.py`
- Create: `tests/test_discover.py`

**Step 1: Write the failing tests**

```python
# tests/test_discover.py
import pytest
from unittest.mock import patch, MagicMock, call
import pandas as pd
from pathlib import Path


SAMPLE_JOB = {
    "title": "Customer Success Manager",
    "company": "Acme Corp",
    "location": "Remote",
    "is_remote": True,
    "job_url": "https://linkedin.com/jobs/view/123456",
    "site": "linkedin",
    "salary_source": "$90,000 - $120,000",
}


def make_jobs_df(jobs=None):
    return pd.DataFrame(jobs or [SAMPLE_JOB])


def test_get_existing_urls_returns_set():
    """get_existing_urls returns a set of URL strings from Notion pages."""
    from scripts.discover import get_existing_urls

    mock_notion = MagicMock()
    mock_notion.databases.query.return_value = {
        "results": [
            {"properties": {"URL": {"url": "https://example.com/job/1"}}},
            {"properties": {"URL": {"url": "https://example.com/job/2"}}},
        ],
        "has_more": False,
        "next_cursor": None,
    }

    urls = get_existing_urls(mock_notion, "fake-db-id")
    assert urls == {"https://example.com/job/1", "https://example.com/job/2"}


def test_discover_skips_duplicate_urls():
    """discover does not push a job whose URL is already in Notion."""
    from scripts.discover import run_discovery

    existing = {"https://linkedin.com/jobs/view/123456"}

    with patch("scripts.discover.scrape_jobs", return_value=make_jobs_df()), \
         patch("scripts.discover.get_existing_urls", return_value=existing), \
         patch("scripts.discover.push_to_notion") as mock_push, \
         patch("scripts.discover.Client"):
        run_discovery()

    mock_push.assert_not_called()


def test_discover_pushes_new_jobs():
    """discover pushes jobs whose URLs are not already in Notion."""
    from scripts.discover import run_discovery

    with patch("scripts.discover.scrape_jobs", return_value=make_jobs_df()), \
         patch("scripts.discover.get_existing_urls", return_value=set()), \
         patch("scripts.discover.push_to_notion") as mock_push, \
         patch("scripts.discover.Client"):
        run_discovery()

    assert mock_push.call_count == 1


def test_push_to_notion_sets_status_new():
    """push_to_notion always sets Status to 'New'."""
    from scripts.discover import push_to_notion

    mock_notion = MagicMock()
    push_to_notion(mock_notion, "fake-db-id", SAMPLE_JOB)

    call_kwargs = mock_notion.pages.create.call_args[1]
    status = call_kwargs["properties"]["Status"]["select"]["name"]
    assert status == "New"
```

**Step 2: Run tests to verify they fail**

```bash
conda run -n job-seeker pytest tests/test_discover.py -v
```

Expected: `ImportError` — `scripts.discover` doesn't exist yet.

**Step 3: Write scripts/discover.py**

```python
# scripts/discover.py
"""
JobSpy → Notion discovery pipeline.
Scrapes job boards, deduplicates against existing Notion records,
and pushes new listings with Status=New.

Usage:
    conda run -n job-seeker python scripts/discover.py
"""
import yaml
from datetime import datetime
from pathlib import Path

import pandas as pd
from jobspy import scrape_jobs
from notion_client import Client

CONFIG_DIR = Path(__file__).parent.parent / "config"
NOTION_CFG = CONFIG_DIR / "notion.yaml"
PROFILES_CFG = CONFIG_DIR / "search_profiles.yaml"


def load_config() -> tuple[dict, dict]:
    profiles = yaml.safe_load(PROFILES_CFG.read_text())
    notion_cfg = yaml.safe_load(NOTION_CFG.read_text())
    return profiles, notion_cfg


def get_existing_urls(notion: Client, db_id: str) -> set[str]:
    """Return the set of all job URLs already tracked in Notion."""
    existing: set[str] = set()
    has_more = True
    start_cursor = None

    while has_more:
        kwargs: dict = {"database_id": db_id, "page_size": 100}
        if start_cursor:
            kwargs["start_cursor"] = start_cursor
        resp = notion.databases.query(**kwargs)

        for page in resp["results"]:
            url = page["properties"].get("URL", {}).get("url")
            if url:
                existing.add(url)

        has_more = resp.get("has_more", False)
        start_cursor = resp.get("next_cursor")

    return existing


def push_to_notion(notion: Client, db_id: str, job: dict) -> None:
    """Create a new page in the Notion jobs database for a single listing."""
    notion.pages.create(
        parent={"database_id": db_id},
        properties={
            "Job Title": {"title": [{"text": {"content": str(job.get("title", "Unknown"))}}]},
            "Company":   {"rich_text": [{"text": {"content": str(job.get("company", ""))}}]},
            "Location":  {"rich_text": [{"text": {"content": str(job.get("location", ""))}}]},
            "Remote":    {"checkbox": bool(job.get("is_remote", False))},
            "URL":       {"url": str(job.get("job_url", ""))},
            "Source":    {"select": {"name": str(job.get("site", "unknown")).title()}},
            "Status":    {"select": {"name": "New"}},
            "Salary":    {"rich_text": [{"text": {"content": str(job.get("salary_source") or "")}}]},
            "Date Found": {"date": {"start": datetime.now().isoformat()[:10]}},
        },
    )


def run_discovery() -> None:
    profiles_cfg, notion_cfg = load_config()
    notion = Client(auth=notion_cfg["token"])
    db_id = notion_cfg["database_id"]

    existing_urls = get_existing_urls(notion, db_id)
    print(f"[discover] {len(existing_urls)} existing listings in Notion")

    new_count = 0

    for profile in profiles_cfg["profiles"]:
        print(f"\n[discover] Profile: {profile['name']}")
        for location in profile["locations"]:
            print(f"  Scraping: {location}")
            jobs: pd.DataFrame = scrape_jobs(
                site_name=profile["boards"],
                search_term=" OR ".join(f'"{t}"' for t in profile["titles"]),
                location=location,
                results_wanted=profile.get("results_per_board", 25),
                hours_old=profile.get("hours_old", 72),
                linkedin_fetch_description=True,
            )

            for _, job in jobs.iterrows():
                url = str(job.get("job_url", ""))
                if not url or url in existing_urls:
                    continue
                push_to_notion(notion, db_id, job.to_dict())
                existing_urls.add(url)
                new_count += 1
                print(f"  + {job.get('title')} @ {job.get('company')}")

    print(f"\n[discover] Done — {new_count} new listings pushed to Notion.")


if __name__ == "__main__":
    run_discovery()
```

**Step 4: Run tests to verify they pass**

```bash
conda run -n job-seeker pytest tests/test_discover.py -v
```

Expected: 4 tests PASS.

**Step 5: Run a live discovery (requires notion.yaml to be set up from Task 3)**

```bash
conda run -n job-seeker python scripts/discover.py
```

Expected: listings printed and pushed to Notion. Check the Notion DB to confirm rows appear with Status=New.

**Step 6: Commit**

```bash
cd /devl/job-seeker
git add scripts/discover.py tests/test_discover.py
git commit -m "feat: add JobSpy discovery pipeline with Notion deduplication"
```

---

## Task 6: Clone and Configure Resume Matcher

**Step 1: Clone Resume Matcher**

```bash
cd /devl/job-seeker
git clone https://github.com/srbhr/Resume-Matcher.git resume_matcher
```

**Step 2: Install Resume Matcher dependencies into the job-seeker env**

```bash
conda run -n job-seeker pip install -r /devl/job-seeker/resume_matcher/requirements.txt
```

If there are conflicts, install only the core matching library:
```bash
conda run -n job-seeker pip install sentence-transformers streamlit qdrant-client pypdf2
```

**Step 3: Verify it launches**

```bash
conda run -n job-seeker streamlit run /devl/job-seeker/resume_matcher/streamlit_app.py --server.port 8501
```

Expected: Streamlit opens on http://localhost:8501 (port confirmed clear).
Stop it with Ctrl+C — we'll run it on-demand.

**Step 4: Note the resume path to use**

The ATS-clean resume to use with Resume Matcher:
```
/Library/Documents/JobSearch/Alex_Rivera_Resume_02-19-2025.pdf
```

---

## Task 7: Resume Match Script (match.py)

**Files:**
- Create: `scripts/match.py`
- Create: `tests/test_match.py`

**Step 1: Write the failing tests**

```python
# tests/test_match.py
import pytest
from unittest.mock import patch, MagicMock


def test_extract_job_description_from_url():
    """extract_job_description fetches and returns text from a URL."""
    from scripts.match import extract_job_description

    with patch("scripts.match.requests.get") as mock_get:
        mock_get.return_value.text = "<html><body><p>We need a CSM with Salesforce.</p></body></html>"
        mock_get.return_value.raise_for_status = MagicMock()
        result = extract_job_description("https://example.com/job/123")

    assert "CSM" in result
    assert "Salesforce" in result


def test_score_is_between_0_and_100():
    """match_score returns a float in [0, 100]."""
    from scripts.match import match_score

    # Provide minimal inputs that the scorer can handle
    score, gaps = match_score(
        resume_text="Customer Success Manager with Salesforce experience",
        job_text="Looking for a Customer Success Manager who knows Salesforce and Gainsight",
    )
    assert 0 <= score <= 100
    assert isinstance(gaps, list)


def test_write_score_to_notion():
    """write_match_to_notion updates the Notion page with score and gaps."""
    from scripts.match import write_match_to_notion

    mock_notion = MagicMock()
    write_match_to_notion(mock_notion, "page-id-abc", 85.5, ["Gainsight", "Churnzero"])

    mock_notion.pages.update.assert_called_once()
    call_kwargs = mock_notion.pages.update.call_args[1]
    assert call_kwargs["page_id"] == "page-id-abc"
    score_val = call_kwargs["properties"]["Match Score"]["number"]
    assert score_val == 85.5
```

**Step 2: Run tests to verify they fail**

```bash
conda run -n job-seeker pytest tests/test_match.py -v
```

Expected: `ImportError` — `scripts.match` doesn't exist.

**Step 3: Write scripts/match.py**

```python
# scripts/match.py
"""
Resume Matcher integration: score a Notion job listing against Alex's resume.
Writes Match Score and Keyword Gaps back to the Notion page.

Usage:
    conda run -n job-seeker python scripts/match.py <notion-page-url-or-id>
"""
import re
import sys
from pathlib import Path

import requests
import yaml
from bs4 import BeautifulSoup
from notion_client import Client

CONFIG_DIR = Path(__file__).parent.parent / "config"
RESUME_PATH = Path("/Library/Documents/JobSearch/Alex_Rivera_Resume_02-19-2025.pdf")


def load_notion() -> tuple[Client, str]:
    cfg = yaml.safe_load((CONFIG_DIR / "notion.yaml").read_text())
    return Client(auth=cfg["token"]), cfg["database_id"]


def extract_page_id(url_or_id: str) -> str:
    """Extract 32-char Notion page ID from a URL or return as-is."""
    match = re.search(r"[0-9a-f]{32}", url_or_id.replace("-", ""))
    if match:
        return match.group(0)
    return url_or_id.strip()


def get_job_url_from_notion(notion: Client, page_id: str) -> str:
    page = notion.pages.retrieve(page_id)
    return page["properties"]["URL"]["url"]


def extract_job_description(url: str) -> str:
    """Fetch a job listing URL and return its visible text."""
    resp = requests.get(url, headers={"User-Agent": "Mozilla/5.0"}, timeout=10)
    resp.raise_for_status()
    soup = BeautifulSoup(resp.text, "html.parser")
    for tag in soup(["script", "style", "nav", "header", "footer"]):
        tag.decompose()
    return " ".join(soup.get_text(separator=" ").split())


def read_resume_text() -> str:
    """Extract text from the ATS-clean PDF resume."""
    try:
        import pypdf
        reader = pypdf.PdfReader(str(RESUME_PATH))
        return " ".join(page.extract_text() or "" for page in reader.pages)
    except ImportError:
        import PyPDF2
        with open(RESUME_PATH, "rb") as f:
            reader = PyPDF2.PdfReader(f)
            return " ".join(p.extract_text() or "" for p in reader.pages)


def match_score(resume_text: str, job_text: str) -> tuple[float, list[str]]:
    """
    Score resume against job description using TF-IDF keyword overlap.
    Returns (score 0-100, list of keywords in job not found in resume).
    """
    from sklearn.feature_extraction.text import TfidfVectorizer
    from sklearn.metrics.pairwise import cosine_similarity
    import numpy as np

    vectorizer = TfidfVectorizer(stop_words="english", max_features=200)
    tfidf = vectorizer.fit_transform([resume_text, job_text])
    score = float(cosine_similarity(tfidf[0:1], tfidf[1:2])[0][0]) * 100

    # Keyword gap: terms in job description not present in resume (lowercased)
    job_terms = set(job_text.lower().split())
    resume_terms = set(resume_text.lower().split())
    feature_names = vectorizer.get_feature_names_out()
    job_tfidf = tfidf[1].toarray()[0]
    top_indices = np.argsort(job_tfidf)[::-1][:30]
    top_job_terms = [feature_names[i] for i in top_indices if job_tfidf[i] > 0]
    gaps = [t for t in top_job_terms if t not in resume_terms][:10]

    return round(score, 1), gaps


def write_match_to_notion(notion: Client, page_id: str, score: float, gaps: list[str]) -> None:
    notion.pages.update(
        page_id=page_id,
        properties={
            "Match Score": {"number": score},
            "Keyword Gaps": {"rich_text": [{"text": {"content": ", ".join(gaps)}}]},
        },
    )


def run_match(page_url_or_id: str) -> None:
    notion, _ = load_notion()
    page_id = extract_page_id(page_url_or_id)

    print(f"[match] Page ID: {page_id}")
    job_url = get_job_url_from_notion(notion, page_id)
    print(f"[match] Fetching job description from: {job_url}")

    job_text = extract_job_description(job_url)
    resume_text = read_resume_text()

    score, gaps = match_score(resume_text, job_text)
    print(f"[match] Score: {score}/100")
    print(f"[match] Keyword gaps: {', '.join(gaps) or 'none'}")

    write_match_to_notion(notion, page_id, score, gaps)
    print("[match] Written to Notion.")


if __name__ == "__main__":
    if len(sys.argv) < 2:
        print("Usage: python scripts/match.py <notion-page-url-or-id>")
        sys.exit(1)
    run_match(sys.argv[1])
```

**Step 4: Install sklearn (needed by match.py)**

```bash
conda run -n job-seeker pip install scikit-learn beautifulsoup4 pypdf
```

**Step 5: Run tests**

```bash
conda run -n job-seeker pytest tests/test_match.py -v
```

Expected: 3 tests PASS.

**Step 6: Commit**

```bash
cd /devl/job-seeker
git add scripts/match.py tests/test_match.py
git commit -m "feat: add resume match scoring with Notion write-back"
```

---

## Task 8: Clone and Configure AIHawk

**Step 1: Clone AIHawk**

```bash
cd /devl/job-seeker
git clone https://github.com/feder-cr/Auto_Jobs_Applier_AIHawk.git aihawk
```

**Step 2: Install AIHawk dependencies**

```bash
conda run -n job-seeker pip install -r /devl/job-seeker/aihawk/requirements.txt
```

**Step 3: Install Playwright browsers (AIHawk uses Playwright for browser automation)**

```bash
conda run -n job-seeker playwright install chromium
```

**Step 4: Create AIHawk personal info config**

AIHawk reads a `personal_info.yaml`. Create it in AIHawk's data directory:

```bash
cp /devl/job-seeker/aihawk/data_folder/plain_text_resume.yaml \
   /devl/job-seeker/aihawk/data_folder/plain_text_resume.yaml.bak
```

Edit `/devl/job-seeker/aihawk/data_folder/plain_text_resume.yaml` with Alex's info.
Key fields to fill:
- `personal_information`: name, email, phone, linkedin, github (leave blank), location
- `work_experience`: pull from the SVG content already extracted
- `education`: Texas State University, Mass Communications & PR, 2012-2015
- `skills`: Zendesk, Intercom, Asana, Jira, etc.

**Step 5: Configure AIHawk to use the LLM router**

AIHawk's config (`aihawk/data_folder/config.yaml`) has an `llm_model_type` and `llm_model` field.
Set it to use the local OpenAI-compatible endpoint:

```yaml
# In aihawk/data_folder/config.yaml
llm_model_type: openai
llm_model: claude-code-terminal
openai_api_url: http://localhost:3009/v1   # or whichever backend is running
```

If 3009 is down, change to `http://localhost:11434/v1` (Ollama).

**Step 6: Run AIHawk in dry-run mode first**

```bash
conda run -n job-seeker python /devl/job-seeker/aihawk/main.py --help
```

Review the flags. Start with a test run before enabling real submissions.

**Step 7: Commit the environment update**

```bash
cd /devl/job-seeker
conda env export -n job-seeker > environment.yml
git add environment.yml
git commit -m "chore: update environment.yml with all installed packages"
```

---

## Task 9: End-to-End Smoke Test

**Step 1: Run full test suite**

```bash
conda run -n job-seeker pytest tests/ -v
```

Expected: all tests PASS.

**Step 2: Run discovery**

```bash
conda run -n job-seeker python scripts/discover.py
```

Expected: new listings appear in Notion with Status=New.

**Step 3: Run match on one listing**

Copy the URL of a Notion page from the DB and run:

```bash
conda run -n job-seeker python scripts/match.py "https://www.notion.so/..."
```

Expected: Match Score and Keyword Gaps written back to that Notion page.

**Step 4: Commit anything left**

```bash
cd /devl/job-seeker
git status
git add -p   # stage only code/config, not secrets
git commit -m "chore: final smoke test cleanup"
```

---

## Quick Reference

| Command | What it does |
|---|---|
| `conda run -n job-seeker python scripts/discover.py` | Scrape boards → push new listings to Notion |
| `conda run -n job-seeker python scripts/match.py <url>` | Score a listing → write back to Notion |
| `conda run -n job-seeker streamlit run resume_matcher/streamlit_app.py --server.port 8501` | Open Resume Matcher UI |
| `conda run -n job-seeker pytest tests/ -v` | Run all tests |
| `cd "/Library/Documents/Post Fight Processing" && ./manage.sh start` | Start Claude Code pipeline (port 3009) |
| `cd "/Library/Documents/Post Fight Processing" && ./manage-copilot.sh start` | Start Copilot wrapper (port 3010) |