peregrine/docs/plans/2026-02-20-job-seeker-design.md
pyr0ball c368c7a977 chore: seed Peregrine from personal job-seeker (pre-generalization)
App: Peregrine
Company: Circuit Forge LLC
Source: github.com/pyr0ball/job-seeker (personal fork, not linked)
2026-02-24 18:25:39 -08:00

6.9 KiB
Raw Blame History

Job Seeker Platform — Design Document

Date: 2026-02-20 Status: Approved Candidate: Meghan McCann


Overview

A monorepo project at /devl/job-seeker/ that integrates three FOSS tools into a cohesive job search pipeline: automated discovery (JobSpy), resume-to-listing keyword matching (Resume Matcher), and automated application submission (AIHawk). Job listings and interactive documents are tracked in Notion; source documents live in /Library/Documents/JobSearch/.


Project Structure

/devl/job-seeker/
├── config/
│   ├── search_profiles.yaml    # JobSpy queries (titles, locations, boards)
│   ├── llm.yaml                # LLM router: backends + fallback order
│   └── notion.yaml             # Notion DB IDs and field mappings
├── aihawk/                     # git clone — Auto_Jobs_Applier_AIHawk
├── resume_matcher/             # git clone — Resume-Matcher
├── scripts/
│   ├── discover.py             # JobSpy → deduplicate → push to Notion
│   ├── match.py                # Notion job URL → Resume Matcher → write score back
│   └── llm_router.py           # LLM abstraction layer with priority fallback chain
├── docs/plans/                 # Design and implementation docs (no resume files)
├── environment.yml             # conda env spec (env name: job-seeker)
└── .gitignore

Document storage rule: Resumes, cover letters, and any interactable documents live in /Library/Documents/JobSearch/ or Notion — never committed to this repo.


Architecture

Data Flow

JobSpy (LinkedIn / Indeed / Glassdoor / ZipRecruiter)
    └─▶ discover.py
            ├─ deduplicate by URL against existing Notion records
            └─▶ Notion DB  (Status: "New")

Notion DB  (daily review — decide what to pursue)
    └─▶ match.py <notion-page-url>
            ├─ fetch job description from listing URL
            ├─ run Resume Matcher vs. /Library/Documents/JobSearch/Meghan_McCann_Resume_02-19-2025.pdf
            └─▶ write Match Score + Keyword Gaps back to Notion page

AIHawk  (when ready to apply)
    ├─ reads config pointing to same resume + personal_info.yaml
    ├─ llm_router.py → best available LLM backend
    ├─ submits LinkedIn Easy Apply
    └─▶ Notion status → "Applied"

Notion Database Schema

Field Type Notes
Job Title Title Primary identifier
Company Text
Location Text
Remote Checkbox
URL URL Deduplication key
Source Select LinkedIn / Indeed / Glassdoor / ZipRecruiter
Status Select New → Reviewing → Applied → Interview → Offer → Rejected
Match Score Number 0100, written by match.py
Keyword Gaps Text Comma-separated missing keywords from Resume Matcher
Salary Text If listed
Date Found Date Set at discovery time
Notes Text Manual field

LLM Router (scripts/llm_router.py)

Single complete(prompt, system=None) interface. On each call: health-check each backend in configured order, use the first that responds. Falls back silently on connection error, timeout, or 5xx. Logs which backend was used.

All backends except Anthropic use the openai Python package (OpenAI-compatible endpoints). Anthropic uses the anthropic package.

config/llm.yaml

fallback_order:
  - claude_code      # port 3009 — Claude via local pipeline (highest quality)
  - ollama           # port 11434 — local, always-on
  - vllm             # port 8000 — start when needed
  - github_copilot   # port 3010 — Copilot via gh token
  - anthropic        # cloud fallback, burns API credits

backends:
  claude_code:
    type: openai_compat
    base_url: http://localhost:3009/v1
    model: claude-code-terminal
    api_key: "any"

  ollama:
    type: openai_compat
    base_url: http://localhost:11434/v1
    model: llama3.2
    api_key: "ollama"

  vllm:
    type: openai_compat
    base_url: http://localhost:8000/v1
    model: __auto__
    api_key: ""

  github_copilot:
    type: openai_compat
    base_url: http://localhost:3010/v1
    model: gpt-4o
    api_key: "any"

  anthropic:
    type: anthropic
    model: claude-sonnet-4-6
    api_key_env: ANTHROPIC_API_KEY

Job Search Profile

config/search_profiles.yaml (initial)

profiles:
  - name: cs_leadership
    titles:
      - "Customer Success Manager"
      - "Director of Customer Success"
      - "VP Customer Success"
      - "Head of Customer Success"
      - "Technical Account Manager"
      - "Revenue Operations Manager"
      - "Customer Experience Lead"
    locations:
      - "Remote"
      - "San Francisco Bay Area, CA"
    boards:
      - linkedin
      - indeed
      - glassdoor
      - zip_recruiter
    results_per_board: 25
    remote_only: false          # remote preferred but Bay Area in-person ok
    hours_old: 72               # listings posted in last 3 days

Conda Environment

New dedicated env job-seeker (not base). Core packages:

  • python-jobspy — job scraping
  • notion-client — Notion API
  • openai — OpenAI-compatible calls (Ollama, vLLM, Copilot, Claude pipeline)
  • anthropic — Anthropic API fallback
  • pyyaml — config parsing
  • pandas — CSV handling and dedup
  • Resume Matcher dependencies (sentence-transformers, streamlit — installed from clone)

Resume Matcher Streamlit UI runs on port 8501 (confirmed clear).


Port Map

Port Service Status
3009 Claude Code OpenAI wrapper Start via manage.sh in Post Fight Processing
3010 GitHub Copilot wrapper Start via manage-copilot.sh
11434 Ollama Running
8000 vLLM Start when needed
8501 Resume Matcher (Streamlit) Start when needed

Out of Scope (this phase)

  • Scheduled/cron automation (run discover.py manually for now)
  • Email/SMS alerts for new listings
  • ATS resume rebuild (separate task)
  • Applications to non-LinkedIn platforms via AIHawk