feat: add Jobgether recruiter framing to cover letter generation

When source == "jobgether", build_prompt() injects a recruiter context note directing the LLM to address the Jobgether recruiter using "Your client [at {company}] will appreciate..." framing rather than addressing the employer directly. generate() and task_runner both thread the is_jobgether flag through automatically.
feat: add Jobgether URL detection and scraper to scrape_url.py
2026-03-15 09:45:51 -07:00 · 2026-03-15 09:45:50 -07:00 · 2026-03-15 09:45:50 -07:00 · 2026-03-15 09:45:50 -07:00 · 2026-03-15 09:45:50 -07:00 · 2026-03-15 09:45:50 -07:00
10 changed files with 1050 additions and 3 deletions
--- a/config/blocklist.yaml
+++ b/config/blocklist.yaml
@ -3,7 +3,8 @@

 # Company name blocklist — partial case-insensitive match on the company field.
 # e.g. "Amazon" blocks any listing where company contains "amazon".
-companies: []
+companies:
+  - jobgether

 # Industry/content blocklist — blocked if company name OR job description contains any keyword.
 # Use this for industries you will never work in regardless of company.
--- a/docs/superpowers/plans/2026-03-15-jobgether-integration.md
+++ b/docs/superpowers/plans/2026-03-15-jobgether-integration.md
@ -0,0 +1,700 @@
+# Jobgether Integration Implementation Plan
+
+> **For agentic workers:** REQUIRED: Use superpowers:subagent-driven-development (if subagents available) or superpowers:executing-plans to implement this plan. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Filter Jobgether listings out of all other scrapers, add a dedicated Jobgether scraper and URL scraper (Playwright-based), and add recruiter-aware cover letter framing for Jobgether jobs.
+
+**Architecture:** Blocklist config handles filtering with zero code changes. A new `_scrape_jobgether()` in `scrape_url.py` handles manual URL imports via Playwright with URL slug fallback. A new `scripts/custom_boards/jobgether.py` handles discovery. Cover letter framing is an `is_jobgether` flag threaded from `task_runner.py` → `generate()` → `build_prompt()`.
+
+**Tech Stack:** Python, Playwright (already installed), SQLite, PyTest, YAML config
+
+**Spec:** `/Library/Development/CircuitForge/peregrine/docs/superpowers/specs/2026-03-15-jobgether-integration-design.md`
+
+---
+
+## Worktree Setup
+
+- [ ] **Create worktree for this feature**
+
+```bash
+cd /Library/Development/CircuitForge/peregrine
+git worktree add .worktrees/jobgether-integration -b feature/jobgether-integration
+```
+
+All implementation work happens in `/Library/Development/CircuitForge/peregrine/.worktrees/jobgether-integration/`.
+
+---
+
+## Chunk 1: Blocklist filter + scrape_url.py
+
+### Task 1: Add Jobgether to blocklist
+
+**Files:**
+- Modify: `/Library/Development/CircuitForge/peregrine/config/blocklist.yaml`
+
+- [ ] **Step 1: Edit blocklist.yaml**
+
+```yaml
+companies:
+  - jobgether
+```
+
+- [ ] **Step 2: Verify the existing `_is_blocklisted` test passes (or write one)**
+
+Check `/Library/Development/CircuitForge/peregrine/tests/test_discover.py` for existing blocklist tests. If none cover company matching, add:
+
+```python
+def test_is_blocklisted_jobgether():
+    from scripts.discover import _is_blocklisted
+    blocklist = {"companies": ["jobgether"], "industries": [], "locations": []}
+    assert _is_blocklisted({"company": "Jobgether", "location": "", "description": ""}, blocklist)
+    assert _is_blocklisted({"company": "jobgether inc", "location": "", "description": ""}, blocklist)
+    assert not _is_blocklisted({"company": "Acme Corp", "location": "", "description": ""}, blocklist)
+```
+
+Run: `conda run -n job-seeker python -m pytest tests/test_discover.py -v -k "blocklist"`
+Expected: PASS
+
+- [ ] **Step 3: Commit**
+
+```bash
+git add config/blocklist.yaml tests/test_discover.py
+git commit -m "feat: filter Jobgether listings via blocklist"
+```
+
+---
+
+### Task 2: Add Jobgether detection to scrape_url.py
+
+**Files:**
+- Modify: `/Library/Development/CircuitForge/peregrine/scripts/scrape_url.py`
+- Modify: `/Library/Development/CircuitForge/peregrine/tests/test_scrape_url.py`
+
+- [ ] **Step 1: Write failing tests**
+
+In `/Library/Development/CircuitForge/peregrine/tests/test_scrape_url.py`, add:
+
+```python
+def test_detect_board_jobgether():
+    from scripts.scrape_url import _detect_board
+    assert _detect_board("https://jobgether.com/offer/69b42d9d24d79271ee0618e8-csm---resware") == "jobgether"
+    assert _detect_board("https://www.jobgether.com/offer/abc-role---company") == "jobgether"
+
+
+def test_jobgether_slug_company_extraction():
+    from scripts.scrape_url import _company_from_jobgether_url
+    assert _company_from_jobgether_url(
+        "https://jobgether.com/offer/69b42d9d24d79271ee0618e8-customer-success-manager---resware"
+    ) == "Resware"
+    assert _company_from_jobgether_url(
+        "https://jobgether.com/offer/abc123-director-of-cs---acme-corp"
+    ) == "Acme Corp"
+    assert _company_from_jobgether_url(
+        "https://jobgether.com/offer/abc123-no-separator-here"
+    ) == ""
+
+
+def test_scrape_jobgether_no_playwright(tmp_path):
+    """When Playwright is unavailable, _scrape_jobgether falls back to URL slug for company."""
+    # Patch playwright.sync_api to None in sys.modules so the local import inside
+    # _scrape_jobgether raises ImportError at call time (local imports run at call time,
+    # not at module load time — so no reload needed).
+    import sys
+    import unittest.mock as mock
+
+    url = "https://jobgether.com/offer/69b42d9d24d79271ee0618e8-customer-success-manager---resware"
+    with mock.patch.dict(sys.modules, {"playwright": None, "playwright.sync_api": None}):
+        from scripts.scrape_url import _scrape_jobgether
+        result = _scrape_jobgether(url)
+
+    assert result.get("company") == "Resware"
+    assert result.get("source") == "jobgether"
+```
+
+Run: `conda run -n job-seeker python -m pytest tests/test_scrape_url.py::test_detect_board_jobgether tests/test_scrape_url.py::test_jobgether_slug_company_extraction tests/test_scrape_url.py::test_scrape_jobgether_no_playwright -v`
+Expected: FAIL (functions not yet defined)
+
+- [ ] **Step 2: Add `_company_from_jobgether_url()` to scrape_url.py**
+
+Add after the `_STRIP_PARAMS` block (around line 34):
+
+```python
+def _company_from_jobgether_url(url: str) -> str:
+    """Extract company name from Jobgether offer URL slug.
+
+    Slug format: /offer/{24-hex-hash}-{title-slug}---{company-slug}
+    Triple-dash separator delimits title from company.
+    Returns title-cased company name, or "" if pattern not found.
+    """
+    m = re.search(r"---([^/?]+)$", urlparse(url).path)
+    if not m:
+        print(f"[scrape_url] Jobgether URL slug: no company separator found in {url}")
+        return ""
+    return m.group(1).replace("-", " ").title()
+```
+
+- [ ] **Step 3: Add `"jobgether"` branch to `_detect_board()`**
+
+In `/Library/Development/CircuitForge/peregrine/scripts/scrape_url.py`, modify `_detect_board()` (add before `return "generic"`):
+
+```python
+    if "jobgether.com" in url_lower:
+        return "jobgether"
+```
+
+- [ ] **Step 4: Add `_scrape_jobgether()` function**
+
+Add after `_scrape_glassdoor()` (around line 137):
+
+```python
+def _scrape_jobgether(url: str) -> dict:
+    """Scrape a Jobgether offer page using Playwright to bypass 403.
+
+    Falls back to URL slug for company name when Playwright is unavailable.
+    Does not use requests — no raise_for_status().
+    """
+    try:
+        from playwright.sync_api import sync_playwright
+    except ImportError:
+        company = _company_from_jobgether_url(url)
+        if company:
+            print(f"[scrape_url] Jobgether: Playwright not installed, using slug fallback → {company}")
+        return {"company": company, "source": "jobgether"} if company else {}
+
+    try:
+        with sync_playwright() as p:
+            browser = p.chromium.launch(headless=True)
+            try:
+                ctx = browser.new_context(user_agent=_HEADERS["User-Agent"])
+                page = ctx.new_page()
+                page.goto(url, timeout=30_000)
+                page.wait_for_load_state("networkidle", timeout=20_000)
+
+                result = page.evaluate("""() => {
+                    const title = document.querySelector('h1')?.textContent?.trim() || '';
+                    const company = document.querySelector('[class*="company"], [class*="employer"], [data-testid*="company"]')
+                        ?.textContent?.trim() || '';
+                    const location = document.querySelector('[class*="location"], [data-testid*="location"]')
+                        ?.textContent?.trim() || '';
+                    const desc = document.querySelector('[class*="description"], [class*="job-desc"], article')
+                        ?.innerText?.trim() || '';
+                    return { title, company, location, description: desc };
+                }""")
+            finally:
+                browser.close()
+
+        # Fall back to slug for company if DOM extraction missed it
+        if not result.get("company"):
+            result["company"] = _company_from_jobgether_url(url)
+
+        result["source"] = "jobgether"
+        return {k: v for k, v in result.items() if v}
+
+    except Exception as exc:
+        print(f"[scrape_url] Jobgether Playwright error for {url}: {exc}")
+        # Last resort: slug fallback
+        company = _company_from_jobgether_url(url)
+        return {"company": company, "source": "jobgether"} if company else {}
+```
+
+> ⚠️ **The CSS selectors in the `page.evaluate()` call are placeholders.** Before committing, inspect `https://jobgether.com/offer/` in a browser to find the actual class names for title, company, location, and description. Update the selectors accordingly.
+
+- [ ] **Step 5: Add dispatch branch in `scrape_job_url()`**
+
+In the `if board == "linkedin":` dispatch chain (around line 208), add before the `else`:
+
+```python
+        elif board == "jobgether":
+            fields = _scrape_jobgether(url)
+```
+
+- [ ] **Step 6: Run tests to verify they pass**
+
+Run: `conda run -n job-seeker python -m pytest tests/test_scrape_url.py -v`
+Expected: All PASS (including pre-existing tests)
+
+- [ ] **Step 7: Commit**
+
+```bash
+git add scripts/scrape_url.py tests/test_scrape_url.py
+git commit -m "feat: add Jobgether URL detection and scraper to scrape_url.py"
+```
+
+---
+
+## Chunk 2: Jobgether custom board scraper
+
+> ⚠️ **Pre-condition:** Before writing the scraper, inspect `https://jobgether.com/remote-jobs` live to determine the actual URL/filter param format and DOM card selectors. Use the Playwright MCP browser tool or Chrome devtools. Record: (1) the query param for job title search, (2) the job card CSS selectors for title, company, URL, location, salary.
+
+### Task 3: Inspect Jobgether search live
+
+**Files:** None (research step)
+
+- [ ] **Step 1: Navigate to Jobgether remote jobs and inspect search params**
+
+Using browser devtools or Playwright network capture, navigate to `https://jobgether.com/remote-jobs`, search for "Customer Success Manager", and capture:
+- The resulting URL (query params)
+- Network requests (XHR/fetch) if the page uses API calls
+- CSS selectors for job card elements
+
+Record findings here before proceeding.
+
+- [ ] **Step 2: Test a Playwright page.evaluate() extraction manually**
+
+```python
+# Run interactively to validate selectors
+from playwright.sync_api import sync_playwright
+with sync_playwright() as p:
+    browser = p.chromium.launch(headless=False)  # headless=False to see the page
+    page = browser.new_page()
+    page.goto("https://jobgether.com/remote-jobs")
+    page.wait_for_load_state("networkidle")
+    # Test your selectors here
+    cards = page.query_selector_all("[YOUR_CARD_SELECTOR]")
+    print(len(cards))
+    browser.close()
+```
+
+---
+
+### Task 4: Write jobgether.py scraper
+
+**Files:**
+- Create: `/Library/Development/CircuitForge/peregrine/scripts/custom_boards/jobgether.py`
+- Modify: `/Library/Development/CircuitForge/peregrine/tests/test_discover.py` (or create `tests/test_jobgether.py`)
+
+- [ ] **Step 1: Write failing test**
+
+In `/Library/Development/CircuitForge/peregrine/tests/test_discover.py` (or a new `tests/test_jobgether.py`):
+
+```python
+def test_jobgether_scraper_returns_empty_on_missing_playwright(monkeypatch):
+    """Graceful fallback when Playwright is unavailable."""
+    import scripts.custom_boards.jobgether as jg
+    monkeypatch.setattr("scripts.custom_boards.jobgether.sync_playwright", None)
+    result = jg.scrape({"titles": ["Customer Success Manager"]}, "Remote", results_wanted=5)
+    assert result == []
+
+
+def test_jobgether_scraper_respects_results_wanted(monkeypatch):
+    """Scraper caps results at results_wanted."""
+    import scripts.custom_boards.jobgether as jg
+
+    fake_jobs = [
+        {"title": f"CSM {i}", "href": f"/offer/abc{i}-csm---acme", "company": f"Acme {i}",
+         "location": "Remote", "is_remote": True, "salary": ""}
+        for i in range(20)
+    ]
+
+    class FakePage:
+        def goto(self, *a, **kw): pass
+        def wait_for_load_state(self, *a, **kw): pass
+        def evaluate(self, _): return fake_jobs
+
+    class FakeCtx:
+        def new_page(self): return FakePage()
+
+    class FakeBrowser:
+        def new_context(self, **kw): return FakeCtx()
+        def close(self): pass
+
+    class FakeChromium:
+        def launch(self, **kw): return FakeBrowser()
+
+    class FakeP:
+        chromium = FakeChromium()
+        def __enter__(self): return self
+        def __exit__(self, *a): pass
+
+    monkeypatch.setattr("scripts.custom_boards.jobgether.sync_playwright", lambda: FakeP())
+    result = jg.scrape({"titles": ["CSM"]}, "Remote", results_wanted=5)
+    assert len(result) <= 5
+```
+
+Run: `conda run -n job-seeker python -m pytest tests/ -v -k "jobgether"`
+Expected: FAIL (module not found)
+
+- [ ] **Step 2: Create `scripts/custom_boards/jobgether.py`**
+
+```python
+"""Jobgether scraper — Playwright-based (requires chromium installed).
+
+Jobgether (jobgether.com) is a remote-work job aggregator. It blocks plain
+requests with 403, so we use Playwright to render the page and extract cards.
+
+Install Playwright: conda run -n job-seeker pip install playwright &&
+                   conda run -n job-seeker python -m playwright install chromium
+
+Returns a list of dicts compatible with scripts.db.insert_job().
+"""
+from __future__ import annotations
+
+import re
+import time
+from typing import Any
+
+_BASE = "https://jobgether.com"
+_SEARCH_PATH = "/remote-jobs"
+
+# TODO: Replace with confirmed query param key after live inspection (Task 3)
+_QUERY_PARAM = "search"
+
+# Module-level import so tests can monkeypatch scripts.custom_boards.jobgether.sync_playwright
+try:
+    from playwright.sync_api import sync_playwright
+except ImportError:
+    sync_playwright = None
+
+
+def scrape(profile: dict, location: str, results_wanted: int = 50) -> list[dict]:
+    """
+    Scrape job listings from Jobgether using Playwright.
+
+    Args:
+        profile: Search profile dict (uses 'titles').
+        location: Location string — Jobgether is remote-focused; location used
+                  only if the site exposes a location filter.
+        results_wanted: Maximum results to return across all titles.
+
+    Returns:
+        List of job dicts with keys: title, company, url, source, location,
+        is_remote, salary, description.
+    """
+    if sync_playwright is None:
+        print(
+            "    [jobgether] playwright not installed.\n"
+            "    Install: conda run -n job-seeker pip install playwright && "
+            "conda run -n job-seeker python -m playwright install chromium"
+        )
+        return []
+
+    results: list[dict] = []
+    seen_urls: set[str] = set()
+
+    with sync_playwright() as p:
+        browser = p.chromium.launch(headless=True)
+        ctx = browser.new_context(
+            user_agent=(
+                "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 "
+                "(KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36"
+            )
+        )
+        page = ctx.new_page()
+
+        for title in profile.get("titles", []):
+            if len(results) >= results_wanted:
+                break
+
+            # TODO: Confirm URL param format from live inspection (Task 3)
+            url = f"{_BASE}{_SEARCH_PATH}?{_QUERY_PARAM}={title.replace(' ', '+')}"
+
+            try:
+                page.goto(url, timeout=30_000)
+                page.wait_for_load_state("networkidle", timeout=20_000)
+            except Exception as exc:
+                print(f"    [jobgether] Page load error for '{title}': {exc}")
+                continue
+
+            # TODO: Replace JS selector with confirmed card selector from Task 3
+            try:
+                raw_jobs: list[dict[str, Any]] = page.evaluate(_extract_jobs_js())
+            except Exception as exc:
+                print(f"    [jobgether] JS extract error for '{title}': {exc}")
+                continue
+
+            if not raw_jobs:
+                print(f"    [jobgether] No cards found for '{title}' — selector may need updating")
+                continue
+
+            for job in raw_jobs:
+                href = job.get("href", "")
+                if not href:
+                    continue
+                full_url = _BASE + href if href.startswith("/") else href
+                if full_url in seen_urls:
+                    continue
+                seen_urls.add(full_url)
+
+                results.append({
+                    "title":       job.get("title", ""),
+                    "company":     job.get("company", ""),
+                    "url":         full_url,
+                    "source":      "jobgether",
+                    "location":    job.get("location") or "Remote",
+                    "is_remote":   True,  # Jobgether is remote-focused
+                    "salary":      job.get("salary") or "",
+                    "description": "",  # not in card view; scrape_url fills in
+                })
+
+                if len(results) >= results_wanted:
+                    break
+
+            time.sleep(1)  # polite pacing between titles
+
+        browser.close()
+
+    return results[:results_wanted]
+
+
+def _extract_jobs_js() -> str:
+    """JS to run in page context — extracts job data from rendered card elements.
+
+    TODO: Replace selectors with confirmed values from Task 3 live inspection.
+    """
+    return """() => {
+        // TODO: replace '[class*=job-card]' with confirmed card selector
+        const cards = document.querySelectorAll('[class*="job-card"], [data-testid*="job"]');
+        return Array.from(cards).map(card => {
+            // TODO: replace these selectors with confirmed values
+            const titleEl = card.querySelector('h2, h3, [class*="title"]');
+            const companyEl = card.querySelector('[class*="company"], [class*="employer"]');
+            const linkEl = card.querySelector('a');
+            const salaryEl = card.querySelector('[class*="salary"]');
+            const locationEl = card.querySelector('[class*="location"]');
+            return {
+                title: titleEl ? titleEl.textContent.trim() : null,
+                company: companyEl ? companyEl.textContent.trim() : null,
+                href: linkEl ? linkEl.getAttribute('href') : null,
+                salary: salaryEl ? salaryEl.textContent.trim() : null,
+                location: locationEl ? locationEl.textContent.trim() : null,
+                is_remote: true,
+            };
+        }).filter(j => j.title && j.href);
+    }"""
+```
+
+- [ ] **Step 3: Run tests**
+
+Run: `conda run -n job-seeker python -m pytest tests/ -v -k "jobgether"`
+Expected: PASS
+
+- [ ] **Step 4: Commit**
+
+```bash
+git add scripts/custom_boards/jobgether.py tests/test_discover.py
+git commit -m "feat: add Jobgether custom board scraper (selectors pending live inspection)"
+```
+
+---
+
+## Chunk 3: Registration, config, cover letter framing
+
+### Task 5: Register scraper in discover.py + update search_profiles.yaml
+
+**Files:**
+- Modify: `/Library/Development/CircuitForge/peregrine/scripts/discover.py`
+- Modify: `/Library/Development/CircuitForge/peregrine/config/search_profiles.yaml`
+- Modify: `/Library/Development/CircuitForge/peregrine/config/search_profiles.yaml.example` (if it exists)
+
+- [ ] **Step 1: Add import to discover.py import block (lines 20–22)**
+
+`jobgether.py` absorbs the Playwright `ImportError` internally (module-level `try/except`), so it always imports successfully. Match the existing pattern exactly:
+
+```python
+from scripts.custom_boards import jobgether as _jobgether
+```
+
+- [ ] **Step 2: Add to CUSTOM_SCRAPERS dict literal (lines 30–34)**
+
+```python
+CUSTOM_SCRAPERS: dict[str, object] = {
+    "adzuna":     _adzuna.scrape,
+    "theladders": _theladders.scrape,
+    "craigslist": _craigslist.scrape,
+    "jobgether":  _jobgether.scrape,
+}
+```
+
+When Playwright is absent, `_jobgether.scrape()` returns `[]` gracefully — no special guard needed in `discover.py`.
+
+- [ ] **Step 3: Add `jobgether` to remote-eligible profiles in search_profiles.yaml**
+
+Add `- jobgether` to the `custom_boards` list for every profile that has `Remote` in its `locations`. Based on the current file, that means: `cs_leadership`, `music_industry`, `animal_welfare`, `education`. Do NOT add it to `default` (locations: San Francisco CA only).
+
+- [ ] **Step 4: Run discover tests**
+
+Run: `conda run -n job-seeker python -m pytest tests/test_discover.py -v`
+Expected: All PASS
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add scripts/discover.py config/search_profiles.yaml
+git commit -m "feat: register Jobgether scraper and add to remote search profiles"
+```
+
+---
+
+### Task 6: Cover letter recruiter framing
+
+**Files:**
+- Modify: `/Library/Development/CircuitForge/peregrine/scripts/generate_cover_letter.py`
+- Modify: `/Library/Development/CircuitForge/peregrine/scripts/task_runner.py`
+- Modify: `/Library/Development/CircuitForge/peregrine/tests/test_match.py` or add `tests/test_cover_letter.py`
+
+- [ ] **Step 1: Write failing test**
+
+Create or add to `/Library/Development/CircuitForge/peregrine/tests/test_cover_letter.py`:
+
+```python
+def test_build_prompt_jobgether_framing_unknown_company():
+    from scripts.generate_cover_letter import build_prompt
+    prompt = build_prompt(
+        title="Customer Success Manager",
+        company="Jobgether",
+        description="CSM role at an undisclosed company.",
+        examples=[],
+        is_jobgether=True,
+    )
+    assert "Your client" in prompt
+    assert "recruiter" in prompt.lower() or "jobgether" in prompt.lower()
+
+
+def test_build_prompt_jobgether_framing_known_company():
+    from scripts.generate_cover_letter import build_prompt
+    prompt = build_prompt(
+        title="Customer Success Manager",
+        company="Resware",
+        description="CSM role at Resware.",
+        examples=[],
+        is_jobgether=True,
+    )
+    assert "Your client at Resware" in prompt
+
+
+def test_build_prompt_no_jobgether_framing_by_default():
+    from scripts.generate_cover_letter import build_prompt
+    prompt = build_prompt(
+        title="Customer Success Manager",
+        company="Acme Corp",
+        description="CSM role.",
+        examples=[],
+    )
+    assert "Your client" not in prompt
+```
+
+Run: `conda run -n job-seeker python -m pytest tests/test_cover_letter.py -v`
+Expected: FAIL
+
+- [ ] **Step 2: Add `is_jobgether` to `build_prompt()` in generate_cover_letter.py**
+
+Modify the `build_prompt()` signature (line 186):
+
+```python
+def build_prompt(
+    title: str,
+    company: str,
+    description: str,
+    examples: list[dict],
+    mission_hint: str | None = None,
+    is_jobgether: bool = False,
+) -> str:
+```
+
+Add the recruiter hint block after the `mission_hint` block (after line 203):
+
+```python
+    if is_jobgether:
+        if company and company.lower() != "jobgether":
+            recruiter_note = (
+                f"🤝 Recruiter context: This listing is posted by Jobgether on behalf of "
+                f"{company}. Address the cover letter to the Jobgether recruiter, not directly "
+                f"to the hiring company. Use framing like 'Your client at {company} will "
+                f"appreciate...' rather than addressing {company} directly. The role "
+                f"requirements are those of the actual employer."
+            )
+        else:
+            recruiter_note = (
+                "🤝 Recruiter context: This listing is posted by Jobgether on behalf of an "
+                "undisclosed employer. Address the cover letter to the Jobgether recruiter. "
+                "Use framing like 'Your client will appreciate...' rather than addressing "
+                "the company directly."
+            )
+        parts.append(f"{recruiter_note}\n")
+```
+
+- [ ] **Step 3: Add `is_jobgether` to `generate()` signature**
+
+Modify `generate()` (line 233):
+
+```python
+def generate(
+    title: str,
+    company: str,
+    description: str = "",
+    previous_result: str = "",
+    feedback: str = "",
+    is_jobgether: bool = False,
+    _router=None,
+) -> str:
+```
+
+Pass it through to `build_prompt()` (line 254):
+
+```python
+    prompt = build_prompt(title, company, description, examples,
+                          mission_hint=mission_hint, is_jobgether=is_jobgether)
+```
+
+- [ ] **Step 4: Pass `is_jobgether` from task_runner.py**
+
+In `/Library/Development/CircuitForge/peregrine/scripts/task_runner.py`, modify the `generate()` call inside the `cover_letter` task block (`elif task_type == "cover_letter":` starts at line 152; the `generate()` call is at ~line 156):
+
+```python
+        elif task_type == "cover_letter":
+            import json as _json
+            p = _json.loads(params or "{}")
+            from scripts.generate_cover_letter import generate
+            result = generate(
+                job.get("title", ""),
+                job.get("company", ""),
+                job.get("description", ""),
+                previous_result=p.get("previous_result", ""),
+                feedback=p.get("feedback", ""),
+                is_jobgether=job.get("source") == "jobgether",
+            )
+            update_cover_letter(db_path, job_id, result)
+```
+
+- [ ] **Step 5: Run tests**
+
+Run: `conda run -n job-seeker python -m pytest tests/test_cover_letter.py -v`
+Expected: All PASS
+
+- [ ] **Step 6: Run full test suite**
+
+Run: `conda run -n job-seeker python -m pytest tests/ -v`
+Expected: All PASS
+
+- [ ] **Step 7: Commit**
+
+```bash
+git add scripts/generate_cover_letter.py scripts/task_runner.py tests/test_cover_letter.py
+git commit -m "feat: add Jobgether recruiter framing to cover letter generation"
+```
+
+---
+
+## Final: Merge
+
+- [ ] **Merge worktree branch to main**
+
+```bash
+cd /Library/Development/CircuitForge/peregrine
+git merge feature/jobgether-integration
+git worktree remove .worktrees/jobgether-integration
+```
+
+- [ ] **Push to remote**
+
+```bash
+git push origin main
+```
+
+---
+
+## Manual verification after merge
+
+1. Add the stuck Jobgether manual import (job 2286) — delete the old stuck row and re-add the URL via "Add Jobs by URL" in the Home page. Verify the scraper resolves company = "Resware".
+2. Run a short discovery (`discover.py` with `results_per_board: 5`) and confirm no `company="Jobgether"` rows appear in `staging.db`.
+3. Generate a cover letter for a Jobgether-sourced job and confirm recruiter framing appears.
--- a/docs/superpowers/specs/2026-03-15-jobgether-integration-design.md
+++ b/docs/superpowers/specs/2026-03-15-jobgether-integration-design.md
@ -0,0 +1,173 @@
+# Jobgether Integration Design
+
+**Date:** 2026-03-15
+**Status:** Approved
+**Scope:** Peregrine — discovery pipeline + manual URL import
+
+---
+
+## Problem
+
+Jobgether is a job aggregator that posts listings on LinkedIn and other boards with `company = "Jobgether"` rather than the actual employer. This causes two problems:
+
+1. **Misleading listings** — Jobs appear to be at "Jobgether" rather than the real hiring company. Meg sees "Jobgether" as employer throughout the pipeline (Job Review, cover letters, company research).
+2. **Broken manual import** — Direct `jobgether.com` URLs return HTTP 403 when scraped with plain `requests`, leaving jobs stuck as `title = "Importing…"`.
+
+**Evidence from DB:** 29+ Jobgether-sourced LinkedIn listings with `company = "Jobgether"`. Actual employer is intentionally withheld by Jobgether's business model ("on behalf of a partner company").
+
+---
+
+## Decision: Option A — Filter + Dedicated Scraper
+
+Drop Jobgether listings from other scrapers entirely and replace with a direct Jobgether scraper that retrieves accurate company names. Existing Jobgether-via-LinkedIn listings in the DB are left as-is for manual review/rejection.
+
+**Why not Option B (follow-through):** LinkedIn→Jobgether→employer is a two-hop chain where the employer is deliberately hidden. Jobgether blocks `requests`. Not worth the complexity for unreliable data.
+
+---
+
+## Components
+
+### 1. Jobgether company filter — `config/blocklist.yaml`
+
+Add `"jobgether"` to the `companies` list in `config/blocklist.yaml`. The existing `_is_blocklisted()` function in `discover.py` already performs a partial case-insensitive match on the company field and applies to all scrapers (JobSpy boards + all custom boards). No code change required.
+
+```yaml
+companies:
+  - jobgether
+```
+
+This is the correct mechanism — it is user-visible, config-driven, and applies uniformly. Log output already reports blocklisted jobs per run.
+
+### 2. URL handling in `scrape_url.py`
+
+Three changes required:
+
+**a) `_detect_board()`** — add `"jobgether"` branch returning `"jobgether"` when `"jobgether.com"` is in the URL. Must be added before the `return "generic"` fallback.
+
+**b) dispatch block in `scrape_job_url()`** — add `elif board == "jobgether": fields = _scrape_jobgether(url)` to the `if/elif` chain (lines 208–215). Without this, the new `_detect_board()` branch silently falls through to `_scrape_generic()`.
+
+**c) `_scrape_jobgether(url)`** — Playwright-based scraper to bypass 403. Extracts:
+- `title` — job title from page heading
+- `company` — actual employer name (visible on Jobgether offer pages)
+- `location` — remote/location info
+- `description` — full job description
+- `source = "jobgether"`
+
+Playwright errors (`playwright.sync_api.Error`, `TimeoutError`) are not subclasses of `requests.RequestException` but are caught by the existing broad `except Exception` handler in `scrape_job_url()` — no changes needed to the error handling block.
+
+**URL slug fallback for company name (manual import path only):** Jobgether offer URLs follow the pattern:
+```
+https://jobgether.com/offer/{24-hex-hash}-{title-slug}---{company-slug}
+```
+When Playwright is unavailable, parse `company-slug` using:
+```python
+m = re.search(r'---([^/?]+)$', parsed_path)
+company = m.group(1).replace("-", " ").title() if m else ""
+```
+Example: `/offer/69b42d9d24d79271ee0618e8-customer-success-manager---resware` → `"Resware"`.
+
+This fallback is scoped to `_scrape_jobgether()` in `scrape_url.py` only; the discovery scraper always gets company name from the rendered DOM. `_scrape_jobgether()` does not make any `requests` calls — there is no `raise_for_status()` — so the `requests.RequestException` handler in `scrape_job_url()` is irrelevant to this path; only the broad `except Exception` applies.
+
+**Pre-implementation checkpoint:** Confirm that Jobgether offer URLs have no tracking query params beyond UTM (already covered by `_STRIP_PARAMS`). No `canonicalize_url()` changes are expected but verify before implementation.
+
+### 3. `scripts/custom_boards/jobgether.py`
+
+Playwright-based search scraper following the same interface as `theladders.py`:
+
+```python
+def scrape(profile: dict, location: str, results_wanted: int = 50) -> list[dict]
+```
+
+- Base URL: `https://jobgether.com/remote-jobs`
+- Search strategy: iterate over `profile["titles"]`, apply search/filter params
+- **Pre-condition — do not begin implementation of this file until live URL inspection is complete.** Use browser dev tools or a Playwright `page.on("request")` capture to determine the actual query parameter format for title/location filtering. Jobgether may use URL query params, path segments, or JS-driven state — this cannot be assumed from the URL alone.
+- Extraction: job cards from rendered DOM (Playwright `page.evaluate()`)
+- Returns standard job dicts: `title, company, url, source, location, is_remote, salary, description`
+- `source = "jobgether"`
+- Graceful `ImportError` handling if Playwright not installed (same pattern as `theladders.py`)
+- Polite pacing: 1s sleep between title iterations
+- Company name comes from DOM; URL slug parse is not needed in this path
+
+### 4. Registration + config
+
+**`discover.py` — import block (lines 20–22):**
+```python
+from scripts.custom_boards import jobgether as _jobgether
+```
+
+**`discover.py` — `CUSTOM_SCRAPERS` dict literal (lines 30–34):**
+```python
+CUSTOM_SCRAPERS: dict[str, object] = {
+    "adzuna":     _adzuna.scrape,
+    "theladders": _theladders.scrape,
+    "craigslist": _craigslist.scrape,
+    "jobgether":  _jobgether.scrape,   # ← add this line
+}
+```
+
+**`config/search_profiles.yaml` (and `.example`):**
+Add `jobgether` to `custom_boards` for any profile that includes `Remote` in its `locations` list. Jobgether is a remote-work-focused aggregator; adding it to location-specific non-remote profiles is not useful. Do not add a `custom_boards` key to profiles that don't already have one unless they are remote-eligible.
+```yaml
+custom_boards:
+  - jobgether
+```
+
+---
+
+## Data Flow
+
+```
+discover.py
+  ├── JobSpy boards       → _is_blocklisted(company="jobgether") → drop → DB insert
+  ├── custom: adzuna      → _is_blocklisted(company="jobgether") → drop → DB insert
+  ├── custom: theladders  → _is_blocklisted(company="jobgether") → drop → DB insert
+  ├── custom: craigslist  → _is_blocklisted(company="jobgether") → drop → DB insert
+  └── custom: jobgether   → (company = real employer, never "jobgether") → DB insert
+
+scrape_url.py
+  └── jobgether.com URL → _detect_board() = "jobgether"
+                        → _scrape_jobgether()
+                          ├── Playwright available → full job fields from page
+                          └── Playwright unavailable → company from URL slug only
+```
+
+---
+
+## Implementation Notes
+
+- **Slug fallback None-guard:** The regex `r'---([^/?]+)$'` returns a wrong value (not `None`) if the URL slug doesn't follow the expected format. Add a logged warning and return `""` rather than title-casing garbage.
+- **Import guard in `discover.py`:** Wrap the `jobgether` import with `try/except ImportError`, setting `_jobgether = None`, and gate the `CUSTOM_SCRAPERS` registration with `if _jobgether is not None`. This ensures the graceful ImportError in `jobgether.py` (for missing Playwright) propagates cleanly to the caller rather than crashing discovery.
+
+### 5. Cover letter recruiter framing — `scripts/generate_cover_letter.py`
+
+When `source = "jobgether"`, inject a system hint that shifts the cover letter addressee from the employer to the Jobgether recruiter. Use Policy A: recruiter framing applies for all Jobgether-sourced jobs regardless of whether the real company name was resolved.
+
+- If company is known (e.g. "Resware"): *"Your client at Resware will appreciate..."*
+- If company is unknown: *"Your client will appreciate..."*
+
+The real company name is always stored in the DB as resolved by the scraper — this is internal knowledge only. The framing shift is purely in the generated letter text, not in how the job is stored or displayed.
+
+Implementation: add an `is_jobgether` flag to the cover letter prompt context (same pattern as `mission_hint` injection). Add a conditional block in the system prompt / Para 1 instructions when the flag is true.
+
+---
+
+## Out of Scope
+
+- Retroactively fixing existing `company = "Jobgether"` rows in the DB (left for manual review/rejection)
+- Jobgether discovery scraper — **decided against during implementation (2026-03-15)**: Cloudflare Turnstile blocks all headless browsers on all Jobgether pages; `filter-api.jobgether.com` requires auth; `robots.txt` blocks all bots. The email digest → manual URL paste → slug company extraction flow covers the actual use case.
+- Jobgether authentication / logged-in scraping
+- Pagination
+- Dedup between Jobgether and other boards (existing URL dedup handles this)
+
+---
+
+## Files Changed
+
+| File | Change |
+|------|--------|
+| `config/blocklist.yaml` | Add `"jobgether"` to `companies` list |
+| `scripts/discover.py` | Add import + entry in `CUSTOM_SCRAPERS` dict literal |
+| `scripts/scrape_url.py` | Add `_detect_board` branch, dispatch branch, `_scrape_jobgether()` |
+| `scripts/custom_boards/jobgether.py` | New file — Playwright search scraper |
+| `config/search_profiles.yaml` | Add `jobgether` to `custom_boards` |
+| `config/search_profiles.yaml.example` | Same |
--- a/scripts/generate_cover_letter.py
+++ b/scripts/generate_cover_letter.py
@ -189,6 +189,7 @@ def build_prompt(
    description: str,
    examples: list[dict],
    mission_hint: str | None = None,
+    is_jobgether: bool = False,
 ) -> str:
    parts = [SYSTEM_CONTEXT.strip(), ""]
    if examples:
@ -202,6 +203,24 @@ def build_prompt(
    if mission_hint:
        parts.append(f"⭐ Mission alignment note (for Para 3): {mission_hint}\n")

+    if is_jobgether:
+        if company and company.lower() != "jobgether":
+            recruiter_note = (
+                f"🤝 Recruiter context: This listing is posted by Jobgether on behalf of "
+                f"{company}. Address the cover letter to the Jobgether recruiter, not directly "
+                f"to the hiring company. Use framing like 'Your client at {company} will "
+                f"appreciate...' rather than addressing {company} directly. The role "
+                f"requirements are those of the actual employer."
+            )
+        else:
+            recruiter_note = (
+                "🤝 Recruiter context: This listing is posted by Jobgether on behalf of an "
+                "undisclosed employer. Address the cover letter to the Jobgether recruiter. "
+                "Use framing like 'Your client will appreciate...' rather than addressing "
+                "the company directly."
+            )
+        parts.append(f"{recruiter_note}\n")
+
    parts.append(f"Now write a new cover letter for:")
    parts.append(f"  Role: {title}")
    parts.append(f"  Company: {company}")
@ -236,6 +255,7 @@ def generate(
    description: str = "",
    previous_result: str = "",
    feedback: str = "",
+    is_jobgether: bool = False,
    _router=None,
 ) -> str:
    """Generate a cover letter and return it as a string.
@ -251,7 +271,8 @@ def generate(
    mission_hint = detect_mission_alignment(company, description)
    if mission_hint:
        print(f"[cover-letter] Mission alignment detected for {company}", file=sys.stderr)
-    prompt = build_prompt(title, company, description, examples, mission_hint=mission_hint)
+    prompt = build_prompt(title, company, description, examples,
+                          mission_hint=mission_hint, is_jobgether=is_jobgether)

    if previous_result:
        prompt += f"\n\n---\nPrevious draft:\n{previous_result}"
--- a/scripts/scrape_url.py
+++ b/scripts/scrape_url.py
@ -33,6 +33,20 @@ _STRIP_PARAMS = {
    "eid", "otpToken", "ssid", "fmid",
 }

+def _company_from_jobgether_url(url: str) -> str:
+    """Extract company name from Jobgether offer URL slug.
+
+    Slug format: /offer/{24-hex-hash}-{title-slug}---{company-slug}
+    Triple-dash separator delimits title from company.
+    Returns title-cased company name, or "" if pattern not found.
+    """
+    m = re.search(r"---([^/?]+)$", urlparse(url).path)
+    if not m:
+        print(f"[scrape_url] Jobgether URL slug: no company separator found in {url}")
+        return ""
+    return m.group(1).replace("-", " ").title()
+
+
 _HEADERS = {
    "User-Agent": (
        "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 "
@ -51,6 +65,8 @@ def _detect_board(url: str) -> str:
        return "indeed"
    if "glassdoor.com" in url_lower:
        return "glassdoor"
+    if "jobgether.com" in url_lower:
+        return "jobgether"
    return "generic"


@ -136,6 +152,55 @@ def _scrape_glassdoor(url: str) -> dict:
        return {}


+def _scrape_jobgether(url: str) -> dict:
+    """Scrape a Jobgether offer page using Playwright to bypass 403.
+
+    Falls back to URL slug for company name when Playwright is unavailable.
+    Does not use requests — no raise_for_status().
+    """
+    try:
+        from playwright.sync_api import sync_playwright
+    except ImportError:
+        company = _company_from_jobgether_url(url)
+        if company:
+            print(f"[scrape_url] Jobgether: Playwright not installed, using slug fallback → {company}")
+        return {"company": company, "source": "jobgether"} if company else {}
+
+    try:
+        with sync_playwright() as p:
+            browser = p.chromium.launch(headless=True)
+            try:
+                ctx = browser.new_context(user_agent=_HEADERS["User-Agent"])
+                page = ctx.new_page()
+                page.goto(url, timeout=30_000)
+                page.wait_for_load_state("networkidle", timeout=20_000)
+
+                result = page.evaluate("""() => {
+                    const title = document.querySelector('h1')?.textContent?.trim() || '';
+                    const company = document.querySelector('[class*="company"], [class*="employer"], [data-testid*="company"]')
+                        ?.textContent?.trim() || '';
+                    const location = document.querySelector('[class*="location"], [data-testid*="location"]')
+                        ?.textContent?.trim() || '';
+                    const desc = document.querySelector('[class*="description"], [class*="job-desc"], article')
+                        ?.innerText?.trim() || '';
+                    return { title, company, location, description: desc };
+                }""")
+            finally:
+                browser.close()
+
+        # Fall back to slug for company if DOM extraction missed it
+        if not result.get("company"):
+            result["company"] = _company_from_jobgether_url(url)
+
+        result["source"] = "jobgether"
+        return {k: v for k, v in result.items() if v}
+
+    except Exception as exc:
+        print(f"[scrape_url] Jobgether Playwright error for {url}: {exc}")
+        company = _company_from_jobgether_url(url)
+        return {"company": company, "source": "jobgether"} if company else {}
+
+
 def _parse_json_ld_or_og(html: str) -> dict:
    """Extract job fields from JSON-LD structured data, then og: meta tags."""
    soup = BeautifulSoup(html, "html.parser")
@ -211,6 +276,8 @@ def scrape_job_url(db_path: Path = DEFAULT_DB, job_id: int = None) -> dict:
            fields = _scrape_indeed(url)
        elif board == "glassdoor":
            fields = _scrape_glassdoor(url)
+        elif board == "jobgether":
+            fields = _scrape_jobgether(url)
        else:
            fields = _scrape_generic(url)
    except requests.RequestException as exc:
--- a/scripts/task_runner.py
+++ b/scripts/task_runner.py
@ -169,6 +169,7 @@ def _run_task(db_path: Path, task_id: int, task_type: str, job_id: int,
                job.get("description", ""),
                previous_result=p.get("previous_result", ""),
                feedback=p.get("feedback", ""),
+                is_jobgether=job.get("source") == "jobgether",
            )
            update_cover_letter(db_path, job_id, result)

--- a/tests/test_cover_letter.py
+++ b/tests/test_cover_letter.py
@ -115,3 +115,41 @@ def test_generate_calls_llm_router():

    mock_router.complete.assert_called_once()
    assert "Alex Rivera" in result
+
+
+# ── Jobgether recruiter framing tests ─────────────────────────────────────────
+
+def test_build_prompt_jobgether_framing_unknown_company():
+    from scripts.generate_cover_letter import build_prompt
+    prompt = build_prompt(
+        title="Customer Success Manager",
+        company="Jobgether",
+        description="CSM role at an undisclosed company.",
+        examples=[],
+        is_jobgether=True,
+    )
+    assert "Your client" in prompt
+    assert "jobgether" in prompt.lower()
+
+
+def test_build_prompt_jobgether_framing_known_company():
+    from scripts.generate_cover_letter import build_prompt
+    prompt = build_prompt(
+        title="Customer Success Manager",
+        company="Resware",
+        description="CSM role at Resware.",
+        examples=[],
+        is_jobgether=True,
+    )
+    assert "Your client at Resware" in prompt
+
+
+def test_build_prompt_no_jobgether_framing_by_default():
+    from scripts.generate_cover_letter import build_prompt
+    prompt = build_prompt(
+        title="Customer Success Manager",
+        company="Acme Corp",
+        description="CSM role.",
+        examples=[],
+    )
+    assert "Your client" not in prompt
--- a/tests/test_cover_letter_refinement.py
+++ b/tests/test_cover_letter_refinement.py
@ -79,10 +79,12 @@ class TestTaskRunnerCoverLetterParams:
        """Invoke _run_task for cover_letter and return captured generate() kwargs."""
        captured = {}

-        def mock_generate(title, company, description="", previous_result="", feedback="", _router=None):
+        def mock_generate(title, company, description="", previous_result="", feedback="",
+                          is_jobgether=False, _router=None):
            captured.update({
                "title": title, "company": company,
                "previous_result": previous_result, "feedback": feedback,
+                "is_jobgether": is_jobgether,
            })
            return "Generated letter"

--- a/tests/test_discover.py
+++ b/tests/test_discover.py
@ -183,3 +183,14 @@ def test_discover_custom_board_deduplicates(tmp_path):

    assert count == 0  # duplicate skipped
    assert len(get_jobs_by_status(db_path, "pending")) == 1
+
+
+# ── Blocklist integration ─────────────────────────────────────────────────────
+
+def test_is_blocklisted_jobgether():
+    """_is_blocklisted filters jobs from Jobgether (case-insensitive)."""
+    from scripts.discover import _is_blocklisted
+    blocklist = {"companies": ["jobgether"], "industries": [], "locations": []}
+    assert _is_blocklisted({"company": "Jobgether", "location": "", "description": ""}, blocklist)
+    assert _is_blocklisted({"company": "jobgether inc", "location": "", "description": ""}, blocklist)
+    assert not _is_blocklisted({"company": "Acme Corp", "location": "", "description": ""}, blocklist)
--- a/tests/test_scrape_url.py
+++ b/tests/test_scrape_url.py
@ -133,3 +133,36 @@ def test_scrape_url_graceful_on_http_error(tmp_path):
    row = conn.execute("SELECT id FROM jobs WHERE id=?", (job_id,)).fetchone()
    conn.close()
    assert row is not None
+
+
+def test_detect_board_jobgether():
+    from scripts.scrape_url import _detect_board
+    assert _detect_board("https://jobgether.com/offer/69b42d9d24d79271ee0618e8-csm---resware") == "jobgether"
+    assert _detect_board("https://www.jobgether.com/offer/abc-role---company") == "jobgether"
+
+
+def test_jobgether_slug_company_extraction():
+    from scripts.scrape_url import _company_from_jobgether_url
+    assert _company_from_jobgether_url(
+        "https://jobgether.com/offer/69b42d9d24d79271ee0618e8-customer-success-manager---resware"
+    ) == "Resware"
+    assert _company_from_jobgether_url(
+        "https://jobgether.com/offer/abc123-director-of-cs---acme-corp"
+    ) == "Acme Corp"
+    assert _company_from_jobgether_url(
+        "https://jobgether.com/offer/abc123-no-separator-here"
+    ) == ""
+
+
+def test_scrape_jobgether_no_playwright(tmp_path):
+    """When Playwright is unavailable, _scrape_jobgether falls back to URL slug for company."""
+    import sys
+    import unittest.mock as mock
+
+    url = "https://jobgether.com/offer/69b42d9d24d79271ee0618e8-customer-success-manager---resware"
+    with mock.patch.dict(sys.modules, {"playwright": None, "playwright.sync_api": None}):
+        from scripts.scrape_url import _scrape_jobgether
+        result = _scrape_jobgether(url)
+
+    assert result.get("company") == "Resware"
+    assert result.get("source") == "jobgether"
Author	SHA1	Message	Date
pyr0ball	94fd2ea332	feat: add Jobgether recruiter framing to cover letter generation Some checks failed CI / test (push) Failing after 32s Details When source == "jobgether", build_prompt() injects a recruiter context note directing the LLM to address the Jobgether recruiter using "Your client [at {company}] will appreciate..." framing rather than addressing the employer directly. generate() and task_runner both thread the is_jobgether flag through automatically.	2026-03-15 09:45:51 -07:00
pyr0ball	e6257249cb	feat: add Jobgether URL detection and scraper to scrape_url.py	2026-03-15 09:45:50 -07:00
pyr0ball	4028667c62	feat: filter Jobgether listings via blocklist	2026-03-15 09:45:50 -07:00
pyr0ball	a005397d5d	docs: update spec — Jobgether discovery scraper not viable (Cloudflare + robots.txt)	2026-03-15 09:45:50 -07:00
pyr0ball	17f7baae3c	docs: add Jobgether integration implementation plan	2026-03-15 09:45:50 -07:00
pyr0ball	ef9cb29518	docs: add cover letter recruiter framing to Jobgether spec	2026-03-15 09:45:50 -07:00
pyr0ball	fe2f27c2a2	docs: add Jobgether integration design spec	2026-03-15 09:45:50 -07:00