pyr0ball fc6ef88a05 docs: add Jobgether integration implementation plan

2026-03-15 09:45:50 -07:00

25 KiB

Raw Blame History

Jobgether Integration Implementation Plan

For agentic workers: REQUIRED: Use superpowers:subagent-driven-development (if subagents available) or superpowers:executing-plans to implement this plan. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Filter Jobgether listings out of all other scrapers, add a dedicated Jobgether scraper and URL scraper (Playwright-based), and add recruiter-aware cover letter framing for Jobgether jobs.

Architecture: Blocklist config handles filtering with zero code changes. A new _scrape_jobgether() in scrape_url.py handles manual URL imports via Playwright with URL slug fallback. A new scripts/custom_boards/jobgether.py handles discovery. Cover letter framing is an is_jobgether flag threaded from task_runner.py → generate() → build_prompt().

Tech Stack: Python, Playwright (already installed), SQLite, PyTest, YAML config

Spec: /Library/Development/CircuitForge/peregrine/docs/superpowers/specs/2026-03-15-jobgether-integration-design.md

Worktree Setup

Create worktree for this feature

cd /Library/Development/CircuitForge/peregrine
git worktree add .worktrees/jobgether-integration -b feature/jobgether-integration

All implementation work happens in /Library/Development/CircuitForge/peregrine/.worktrees/jobgether-integration/.

Chunk 1: Blocklist filter + scrape_url.py

Task 1: Add Jobgether to blocklist

Files:

Modify: /Library/Development/CircuitForge/peregrine/config/blocklist.yaml
Step 1: Edit blocklist.yaml

companies:
  - jobgether

Step 2: Verify the existing _is_blocklisted test passes (or write one)

Check /Library/Development/CircuitForge/peregrine/tests/test_discover.py for existing blocklist tests. If none cover company matching, add:

def test_is_blocklisted_jobgether():
    from scripts.discover import _is_blocklisted
    blocklist = {"companies": ["jobgether"], "industries": [], "locations": []}
    assert _is_blocklisted({"company": "Jobgether", "location": "", "description": ""}, blocklist)
    assert _is_blocklisted({"company": "jobgether inc", "location": "", "description": ""}, blocklist)
    assert not _is_blocklisted({"company": "Acme Corp", "location": "", "description": ""}, blocklist)

Run: conda run -n job-seeker python -m pytest tests/test_discover.py -v -k "blocklist" Expected: PASS

Step 3: Commit

git add config/blocklist.yaml tests/test_discover.py
git commit -m "feat: filter Jobgether listings via blocklist"

Task 2: Add Jobgether detection to scrape_url.py

Files:

Modify: /Library/Development/CircuitForge/peregrine/scripts/scrape_url.py
Modify: /Library/Development/CircuitForge/peregrine/tests/test_scrape_url.py
Step 1: Write failing tests

In /Library/Development/CircuitForge/peregrine/tests/test_scrape_url.py, add:

def test_detect_board_jobgether():
    from scripts.scrape_url import _detect_board
    assert _detect_board("https://jobgether.com/offer/69b42d9d24d79271ee0618e8-csm---resware") == "jobgether"
    assert _detect_board("https://www.jobgether.com/offer/abc-role---company") == "jobgether"


def test_jobgether_slug_company_extraction():
    from scripts.scrape_url import _company_from_jobgether_url
    assert _company_from_jobgether_url(
        "https://jobgether.com/offer/69b42d9d24d79271ee0618e8-customer-success-manager---resware"
    ) == "Resware"
    assert _company_from_jobgether_url(
        "https://jobgether.com/offer/abc123-director-of-cs---acme-corp"
    ) == "Acme Corp"
    assert _company_from_jobgether_url(
        "https://jobgether.com/offer/abc123-no-separator-here"
    ) == ""


def test_scrape_jobgether_no_playwright(tmp_path):
    """When Playwright is unavailable, _scrape_jobgether falls back to URL slug for company."""
    # Patch playwright.sync_api to None in sys.modules so the local import inside
    # _scrape_jobgether raises ImportError at call time (local imports run at call time,
    # not at module load time — so no reload needed).
    import sys
    import unittest.mock as mock

    url = "https://jobgether.com/offer/69b42d9d24d79271ee0618e8-customer-success-manager---resware"
    with mock.patch.dict(sys.modules, {"playwright": None, "playwright.sync_api": None}):
        from scripts.scrape_url import _scrape_jobgether
        result = _scrape_jobgether(url)

    assert result.get("company") == "Resware"
    assert result.get("source") == "jobgether"

Run: conda run -n job-seeker python -m pytest tests/test_scrape_url.py::test_detect_board_jobgether tests/test_scrape_url.py::test_jobgether_slug_company_extraction tests/test_scrape_url.py::test_scrape_jobgether_no_playwright -v Expected: FAIL (functions not yet defined)

Step 2: Add _company_from_jobgether_url() to scrape_url.py

Add after the _STRIP_PARAMS block (around line 34):

def _company_from_jobgether_url(url: str) -> str:
    """Extract company name from Jobgether offer URL slug.

    Slug format: /offer/{24-hex-hash}-{title-slug}---{company-slug}
    Triple-dash separator delimits title from company.
    Returns title-cased company name, or "" if pattern not found.
    """
    m = re.search(r"---([^/?]+)$", urlparse(url).path)
    if not m:
        print(f"[scrape_url] Jobgether URL slug: no company separator found in {url}")
        return ""
    return m.group(1).replace("-", " ").title()

Step 3: Add "jobgether" branch to _detect_board()

In /Library/Development/CircuitForge/peregrine/scripts/scrape_url.py, modify _detect_board() (add before return "generic"):

    if "jobgether.com" in url_lower:
        return "jobgether"

Step 4: Add _scrape_jobgether() function

Add after _scrape_glassdoor() (around line 137):

def _scrape_jobgether(url: str) -> dict:
    """Scrape a Jobgether offer page using Playwright to bypass 403.

    Falls back to URL slug for company name when Playwright is unavailable.
    Does not use requests — no raise_for_status().
    """
    try:
        from playwright.sync_api import sync_playwright
    except ImportError:
        company = _company_from_jobgether_url(url)
        if company:
            print(f"[scrape_url] Jobgether: Playwright not installed, using slug fallback → {company}")
        return {"company": company, "source": "jobgether"} if company else {}

    try:
        with sync_playwright() as p:
            browser = p.chromium.launch(headless=True)
            try:
                ctx = browser.new_context(user_agent=_HEADERS["User-Agent"])
                page = ctx.new_page()
                page.goto(url, timeout=30_000)
                page.wait_for_load_state("networkidle", timeout=20_000)

                result = page.evaluate("""() => {
                    const title = document.querySelector('h1')?.textContent?.trim() || '';
                    const company = document.querySelector('[class*="company"], [class*="employer"], [data-testid*="company"]')
                        ?.textContent?.trim() || '';
                    const location = document.querySelector('[class*="location"], [data-testid*="location"]')
                        ?.textContent?.trim() || '';
                    const desc = document.querySelector('[class*="description"], [class*="job-desc"], article')
                        ?.innerText?.trim() || '';
                    return { title, company, location, description: desc };
                }""")
            finally:
                browser.close()

        # Fall back to slug for company if DOM extraction missed it
        if not result.get("company"):
            result["company"] = _company_from_jobgether_url(url)

        result["source"] = "jobgether"
        return {k: v for k, v in result.items() if v}

    except Exception as exc:
        print(f"[scrape_url] Jobgether Playwright error for {url}: {exc}")
        # Last resort: slug fallback
        company = _company_from_jobgether_url(url)
        return {"company": company, "source": "jobgether"} if company else {}

⚠️ The CSS selectors in the page.evaluate() call are placeholders. Before committing, inspect https://jobgether.com/offer/ in a browser to find the actual class names for title, company, location, and description. Update the selectors accordingly.

Step 5: Add dispatch branch in scrape_job_url()

In the if board == "linkedin": dispatch chain (around line 208), add before the else:

        elif board == "jobgether":
            fields = _scrape_jobgether(url)

Step 6: Run tests to verify they pass

Run: conda run -n job-seeker python -m pytest tests/test_scrape_url.py -v Expected: All PASS (including pre-existing tests)

Step 7: Commit

git add scripts/scrape_url.py tests/test_scrape_url.py
git commit -m "feat: add Jobgether URL detection and scraper to scrape_url.py"

Chunk 2: Jobgether custom board scraper

⚠️ Pre-condition: Before writing the scraper, inspect https://jobgether.com/remote-jobs live to determine the actual URL/filter param format and DOM card selectors. Use the Playwright MCP browser tool or Chrome devtools. Record: (1) the query param for job title search, (2) the job card CSS selectors for title, company, URL, location, salary.

Task 3: Inspect Jobgether search live

Files: None (research step)

Step 1: Navigate to Jobgether remote jobs and inspect search params

Using browser devtools or Playwright network capture, navigate to https://jobgether.com/remote-jobs, search for "Customer Success Manager", and capture:

The resulting URL (query params)
Network requests (XHR/fetch) if the page uses API calls
CSS selectors for job card elements

Record findings here before proceeding.

Step 2: Test a Playwright page.evaluate() extraction manually

# Run interactively to validate selectors
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)  # headless=False to see the page
    page = browser.new_page()
    page.goto("https://jobgether.com/remote-jobs")
    page.wait_for_load_state("networkidle")
    # Test your selectors here
    cards = page.query_selector_all("[YOUR_CARD_SELECTOR]")
    print(len(cards))
    browser.close()

Task 4: Write jobgether.py scraper

Files:

Create: /Library/Development/CircuitForge/peregrine/scripts/custom_boards/jobgether.py
Modify: /Library/Development/CircuitForge/peregrine/tests/test_discover.py (or create tests/test_jobgether.py)
Step 1: Write failing test

In /Library/Development/CircuitForge/peregrine/tests/test_discover.py (or a new tests/test_jobgether.py):

def test_jobgether_scraper_returns_empty_on_missing_playwright(monkeypatch):
    """Graceful fallback when Playwright is unavailable."""
    import scripts.custom_boards.jobgether as jg
    monkeypatch.setattr("scripts.custom_boards.jobgether.sync_playwright", None)
    result = jg.scrape({"titles": ["Customer Success Manager"]}, "Remote", results_wanted=5)
    assert result == []


def test_jobgether_scraper_respects_results_wanted(monkeypatch):
    """Scraper caps results at results_wanted."""
    import scripts.custom_boards.jobgether as jg

    fake_jobs = [
        {"title": f"CSM {i}", "href": f"/offer/abc{i}-csm---acme", "company": f"Acme {i}",
         "location": "Remote", "is_remote": True, "salary": ""}
        for i in range(20)
    ]

    class FakePage:
        def goto(self, *a, **kw): pass
        def wait_for_load_state(self, *a, **kw): pass
        def evaluate(self, _): return fake_jobs

    class FakeCtx:
        def new_page(self): return FakePage()

    class FakeBrowser:
        def new_context(self, **kw): return FakeCtx()
        def close(self): pass

    class FakeChromium:
        def launch(self, **kw): return FakeBrowser()

    class FakeP:
        chromium = FakeChromium()
        def __enter__(self): return self
        def __exit__(self, *a): pass

    monkeypatch.setattr("scripts.custom_boards.jobgether.sync_playwright", lambda: FakeP())
    result = jg.scrape({"titles": ["CSM"]}, "Remote", results_wanted=5)
    assert len(result) <= 5

Run: conda run -n job-seeker python -m pytest tests/ -v -k "jobgether" Expected: FAIL (module not found)

Step 2: Create scripts/custom_boards/jobgether.py

"""Jobgether scraper — Playwright-based (requires chromium installed).

Jobgether (jobgether.com) is a remote-work job aggregator. It blocks plain
requests with 403, so we use Playwright to render the page and extract cards.

Install Playwright: conda run -n job-seeker pip install playwright &&
                   conda run -n job-seeker python -m playwright install chromium

Returns a list of dicts compatible with scripts.db.insert_job().
"""
from __future__ import annotations

import re
import time
from typing import Any

_BASE = "https://jobgether.com"
_SEARCH_PATH = "/remote-jobs"

# TODO: Replace with confirmed query param key after live inspection (Task 3)
_QUERY_PARAM = "search"

# Module-level import so tests can monkeypatch scripts.custom_boards.jobgether.sync_playwright
try:
    from playwright.sync_api import sync_playwright
except ImportError:
    sync_playwright = None


def scrape(profile: dict, location: str, results_wanted: int = 50) -> list[dict]:
    """
    Scrape job listings from Jobgether using Playwright.

    Args:
        profile: Search profile dict (uses 'titles').
        location: Location string — Jobgether is remote-focused; location used
                  only if the site exposes a location filter.
        results_wanted: Maximum results to return across all titles.

    Returns:
        List of job dicts with keys: title, company, url, source, location,
        is_remote, salary, description.
    """
    if sync_playwright is None:
        print(
            "    [jobgether] playwright not installed.\n"
            "    Install: conda run -n job-seeker pip install playwright && "
            "conda run -n job-seeker python -m playwright install chromium"
        )
        return []

    results: list[dict] = []
    seen_urls: set[str] = set()

    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        ctx = browser.new_context(
            user_agent=(
                "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 "
                "(KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36"
            )
        )
        page = ctx.new_page()

        for title in profile.get("titles", []):
            if len(results) >= results_wanted:
                break

            # TODO: Confirm URL param format from live inspection (Task 3)
            url = f"{_BASE}{_SEARCH_PATH}?{_QUERY_PARAM}={title.replace(' ', '+')}"

            try:
                page.goto(url, timeout=30_000)
                page.wait_for_load_state("networkidle", timeout=20_000)
            except Exception as exc:
                print(f"    [jobgether] Page load error for '{title}': {exc}")
                continue

            # TODO: Replace JS selector with confirmed card selector from Task 3
            try:
                raw_jobs: list[dict[str, Any]] = page.evaluate(_extract_jobs_js())
            except Exception as exc:
                print(f"    [jobgether] JS extract error for '{title}': {exc}")
                continue

            if not raw_jobs:
                print(f"    [jobgether] No cards found for '{title}' — selector may need updating")
                continue

            for job in raw_jobs:
                href = job.get("href", "")
                if not href:
                    continue
                full_url = _BASE + href if href.startswith("/") else href
                if full_url in seen_urls:
                    continue
                seen_urls.add(full_url)

                results.append({
                    "title":       job.get("title", ""),
                    "company":     job.get("company", ""),
                    "url":         full_url,
                    "source":      "jobgether",
                    "location":    job.get("location") or "Remote",
                    "is_remote":   True,  # Jobgether is remote-focused
                    "salary":      job.get("salary") or "",
                    "description": "",  # not in card view; scrape_url fills in
                })

                if len(results) >= results_wanted:
                    break

            time.sleep(1)  # polite pacing between titles

        browser.close()

    return results[:results_wanted]


def _extract_jobs_js() -> str:
    """JS to run in page context — extracts job data from rendered card elements.

    TODO: Replace selectors with confirmed values from Task 3 live inspection.
    """
    return """() => {
        // TODO: replace '[class*=job-card]' with confirmed card selector
        const cards = document.querySelectorAll('[class*="job-card"], [data-testid*="job"]');
        return Array.from(cards).map(card => {
            // TODO: replace these selectors with confirmed values
            const titleEl = card.querySelector('h2, h3, [class*="title"]');
            const companyEl = card.querySelector('[class*="company"], [class*="employer"]');
            const linkEl = card.querySelector('a');
            const salaryEl = card.querySelector('[class*="salary"]');
            const locationEl = card.querySelector('[class*="location"]');
            return {
                title: titleEl ? titleEl.textContent.trim() : null,
                company: companyEl ? companyEl.textContent.trim() : null,
                href: linkEl ? linkEl.getAttribute('href') : null,
                salary: salaryEl ? salaryEl.textContent.trim() : null,
                location: locationEl ? locationEl.textContent.trim() : null,
                is_remote: true,
            };
        }).filter(j => j.title && j.href);
    }"""

Step 3: Run tests

Run: conda run -n job-seeker python -m pytest tests/ -v -k "jobgether" Expected: PASS

Step 4: Commit

git add scripts/custom_boards/jobgether.py tests/test_discover.py
git commit -m "feat: add Jobgether custom board scraper (selectors pending live inspection)"

Chunk 3: Registration, config, cover letter framing

Task 5: Register scraper in discover.py + update search_profiles.yaml

Files:

Modify: /Library/Development/CircuitForge/peregrine/scripts/discover.py
Modify: /Library/Development/CircuitForge/peregrine/config/search_profiles.yaml
Modify: /Library/Development/CircuitForge/peregrine/config/search_profiles.yaml.example (if it exists)
Step 1: Add import to discover.py import block (lines 20–22)

jobgether.py absorbs the Playwright ImportError internally (module-level try/except), so it always imports successfully. Match the existing pattern exactly:

from scripts.custom_boards import jobgether as _jobgether

Step 2: Add to CUSTOM_SCRAPERS dict literal (lines 30–34)

CUSTOM_SCRAPERS: dict[str, object] = {
    "adzuna":     _adzuna.scrape,
    "theladders": _theladders.scrape,
    "craigslist": _craigslist.scrape,
    "jobgether":  _jobgether.scrape,
}

When Playwright is absent, _jobgether.scrape() returns [] gracefully — no special guard needed in discover.py.

Step 3: Add jobgether to remote-eligible profiles in search_profiles.yaml

Add - jobgether to the custom_boards list for every profile that has Remote in its locations. Based on the current file, that means: cs_leadership, music_industry, animal_welfare, education. Do NOT add it to default (locations: San Francisco CA only).

Step 4: Run discover tests

Run: conda run -n job-seeker python -m pytest tests/test_discover.py -v Expected: All PASS

Step 5: Commit

git add scripts/discover.py config/search_profiles.yaml
git commit -m "feat: register Jobgether scraper and add to remote search profiles"

Task 6: Cover letter recruiter framing

Files:

Modify: /Library/Development/CircuitForge/peregrine/scripts/generate_cover_letter.py
Modify: /Library/Development/CircuitForge/peregrine/scripts/task_runner.py
Modify: /Library/Development/CircuitForge/peregrine/tests/test_match.py or add tests/test_cover_letter.py
Step 1: Write failing test

Create or add to /Library/Development/CircuitForge/peregrine/tests/test_cover_letter.py:

def test_build_prompt_jobgether_framing_unknown_company():
    from scripts.generate_cover_letter import build_prompt
    prompt = build_prompt(
        title="Customer Success Manager",
        company="Jobgether",
        description="CSM role at an undisclosed company.",
        examples=[],
        is_jobgether=True,
    )
    assert "Your client" in prompt
    assert "recruiter" in prompt.lower() or "jobgether" in prompt.lower()


def test_build_prompt_jobgether_framing_known_company():
    from scripts.generate_cover_letter import build_prompt
    prompt = build_prompt(
        title="Customer Success Manager",
        company="Resware",
        description="CSM role at Resware.",
        examples=[],
        is_jobgether=True,
    )
    assert "Your client at Resware" in prompt


def test_build_prompt_no_jobgether_framing_by_default():
    from scripts.generate_cover_letter import build_prompt
    prompt = build_prompt(
        title="Customer Success Manager",
        company="Acme Corp",
        description="CSM role.",
        examples=[],
    )
    assert "Your client" not in prompt

Run: conda run -n job-seeker python -m pytest tests/test_cover_letter.py -v Expected: FAIL

Step 2: Add is_jobgether to build_prompt() in generate_cover_letter.py

Modify the build_prompt() signature (line 186):

def build_prompt(
    title: str,
    company: str,
    description: str,
    examples: list[dict],
    mission_hint: str | None = None,
    is_jobgether: bool = False,
) -> str:

Add the recruiter hint block after the mission_hint block (after line 203):

    if is_jobgether:
        if company and company.lower() != "jobgether":
            recruiter_note = (
                f"🤝 Recruiter context: This listing is posted by Jobgether on behalf of "
                f"{company}. Address the cover letter to the Jobgether recruiter, not directly "
                f"to the hiring company. Use framing like 'Your client at {company} will "
                f"appreciate...' rather than addressing {company} directly. The role "
                f"requirements are those of the actual employer."
            )
        else:
            recruiter_note = (
                "🤝 Recruiter context: This listing is posted by Jobgether on behalf of an "
                "undisclosed employer. Address the cover letter to the Jobgether recruiter. "
                "Use framing like 'Your client will appreciate...' rather than addressing "
                "the company directly."
            )
        parts.append(f"{recruiter_note}\n")

Step 3: Add is_jobgether to generate() signature

Modify generate() (line 233):

def generate(
    title: str,
    company: str,
    description: str = "",
    previous_result: str = "",
    feedback: str = "",
    is_jobgether: bool = False,
    _router=None,
) -> str:

Pass it through to build_prompt() (line 254):

    prompt = build_prompt(title, company, description, examples,
                          mission_hint=mission_hint, is_jobgether=is_jobgether)

Step 4: Pass is_jobgether from task_runner.py

In /Library/Development/CircuitForge/peregrine/scripts/task_runner.py, modify the generate() call inside the cover_letter task block (elif task_type == "cover_letter": starts at line 152; the generate() call is at ~line 156):

        elif task_type == "cover_letter":
            import json as _json
            p = _json.loads(params or "{}")
            from scripts.generate_cover_letter import generate
            result = generate(
                job.get("title", ""),
                job.get("company", ""),
                job.get("description", ""),
                previous_result=p.get("previous_result", ""),
                feedback=p.get("feedback", ""),
                is_jobgether=job.get("source") == "jobgether",
            )
            update_cover_letter(db_path, job_id, result)

Step 5: Run tests

Run: conda run -n job-seeker python -m pytest tests/test_cover_letter.py -v Expected: All PASS

Step 6: Run full test suite

Run: conda run -n job-seeker python -m pytest tests/ -v Expected: All PASS

Step 7: Commit

git add scripts/generate_cover_letter.py scripts/task_runner.py tests/test_cover_letter.py
git commit -m "feat: add Jobgether recruiter framing to cover letter generation"

Final: Merge

Merge worktree branch to main

cd /Library/Development/CircuitForge/peregrine
git merge feature/jobgether-integration
git worktree remove .worktrees/jobgether-integration

Push to remote

git push origin main

Manual verification after merge

Add the stuck Jobgether manual import (job 2286) — delete the old stuck row and re-add the URL via "Add Jobs by URL" in the Home page. Verify the scraper resolves company = "Resware".
Run a short discovery (discover.py with results_per_board: 5) and confirm no company="Jobgether" rows appear in staging.db.
Generate a cover letter for a Jobgether-sourced job and confirm recruiter framing appears.

25 KiB Raw Blame History Unescape Escape

Jobgether Integration Implementation Plan

Worktree Setup

Chunk 1: Blocklist filter + scrape_url.py

Task 1: Add Jobgether to blocklist

Task 2: Add Jobgether detection to scrape_url.py

Chunk 2: Jobgether custom board scraper

Task 3: Inspect Jobgether search live

Task 4: Write jobgether.py scraper

Chunk 3: Registration, config, cover letter framing

Task 5: Register scraper in discover.py + update search_profiles.yaml

Task 6: Cover letter recruiter framing

Final: Merge

Manual verification after merge

25 KiB

Raw Blame History