App: Peregrine Company: Circuit Forge LLC Source: github.com/pyr0ball/job-seeker (personal fork, not linked)
13 KiB
13 KiB
Job Seeker Platform — Claude Context
Project
Automated job discovery + resume matching + application pipeline for Meghan McCann.
Full pipeline:
JobSpy → discover.py → SQLite (staging.db) → match.py → Job Review UI
→ Apply Workspace (cover letter + PDF) → Interviews kanban
→ phone_screen → interviewing → offer → hired
↓
Notion DB (synced via sync.py)
Environment
- Python env:
conda run -n job-seeker <cmd>— always use this, never bare python - Run tests:
/devl/miniconda3/envs/job-seeker/bin/pytest tests/ -v(use direct binary —conda run pytestcan spawn runaway processes) - Run discovery:
conda run -n job-seeker python scripts/discover.py - Recreate env:
conda env create -f environment.yml - pytest.ini scopes test collection to
tests/only — never widen this
⚠️ AIHawk env isolation — CRITICAL
- NEVER
pip install -r aihawk/requirements.txtinto the job-seeker env - AIHawk pulls torch + CUDA (~7GB) which causes OOM during test runs
- AIHawk must run in its own env:
conda create -n aihawk-env python=3.12 - job-seeker env must stay lightweight (no torch, no sentence-transformers, no CUDA)
Web UI (Streamlit)
- Run:
bash scripts/manage-ui.sh start→ http://localhost:8501 - Manage:
start | stop | restart | status | logs - Direct binary:
/devl/miniconda3/envs/job-seeker/bin/streamlit run app/app.py - Entry point:
app/app.py(usesst.navigation()— do NOT runapp/Home.pydirectly) staging.dbis gitignored — SQLite staging layer between discovery and Notion
Pages
| Page | File | Purpose |
|---|---|---|
| Home | app/Home.py |
Dashboard, discovery trigger, danger-zone purge |
| Job Review | app/pages/1_Job_Review.py |
Batch approve/reject with sorting |
| Settings | app/pages/2_Settings.py |
LLM backends, search profiles, Notion, services |
| Resume Profile | Settings → Resume Profile tab | Edit AIHawk YAML profile (was standalone 3_Resume_Editor.py) |
| Apply Workspace | app/pages/4_Apply.py |
Cover letter gen + PDF export + mark applied + reject listing |
| Interviews | app/pages/5_Interviews.py |
Kanban: phone_screen→interviewing→offer→hired |
| Interview Prep | app/pages/6_Interview_Prep.py |
Live reference sheet during calls + Practice Q&A |
| Survey Assistant | app/pages/7_Survey.py |
Culture-fit survey help: text paste + screenshot (moondream2) |
Job Status Pipeline
pending → approved/rejected (Job Review)
approved → applied (Apply Workspace — mark applied)
approved → rejected (Apply Workspace — reject listing button)
applied → survey (Interviews — "📋 Survey" button; pre-kanban section)
applied → phone_screen (Interviews — triggers company research)
survey → phone_screen (Interviews — after survey completed)
phone_screen → interviewing
interviewing → offer
offer → hired
any stage → rejected (rejection_stage captured for analytics)
applied/approved → synced (sync.py → Notion)
SQLite Schema (staging.db)
jobs table key columns
- Standard:
id, title, company, url, source, location, is_remote, salary, description - Scores:
match_score, keyword_gaps - Dates:
date_found, applied_at, survey_at, phone_screen_at, interviewing_at, offer_at, hired_at - Interview:
interview_date, rejection_stage - Content:
cover_letter, notion_page_id
Additional tables
job_contacts— email thread log per job (direction, subject, from/to, body, received_at)company_research— LLM-generated brief per job (company_brief, ceo_brief, talking_points, raw_output, accessibility_brief)background_tasks— async LLM task queue (task_type, job_id, status: queued/running/completed/failed)survey_responses— per-job Q&A pairs (survey_name, received_at, source, raw_input, image_path, mode, llm_output, reported_score)
Scripts
| Script | Purpose |
|---|---|
scripts/discover.py |
JobSpy + custom board scrape → SQLite insert |
scripts/custom_boards/adzuna.py |
Adzuna Jobs API (app_id + app_key in config/adzuna.yaml) |
scripts/custom_boards/theladders.py |
The Ladders scraper via curl_cffi + NEXT_DATA SSR parse |
scripts/match.py |
Resume keyword matching → match_score |
scripts/sync.py |
Push approved/applied jobs to Notion |
scripts/llm_router.py |
LLM fallback chain (reads config/llm.yaml) |
scripts/generate_cover_letter.py |
Cover letter via LLM; detects mission-aligned companies (music/animal welfare/education) and injects Para 3 hint |
scripts/company_research.py |
Pre-interview brief via LLM + optional SearXNG scrape; includes Inclusion & Accessibility section |
scripts/prepare_training_data.py |
Extract cover letter JSONL for fine-tuning |
scripts/finetune_local.py |
Unsloth QLoRA fine-tune on local GPU |
scripts/db.py |
All SQLite helpers (single source of truth) |
scripts/task_runner.py |
Background thread executor — submit_task(db, type, job_id) dispatches daemon threads for LLM jobs |
scripts/vision_service/main.py |
FastAPI moondream2 inference on port 8002; manage-vision.sh lifecycle |
LLM Router
- Config:
config/llm.yaml - Cover letter fallback order:
claude_code → ollama (meghan-cover-writer:latest) → vllm → copilot → anthropic - Research fallback order:
claude_code → vllm (__auto__, ouroboros) → ollama_research (llama3.1:8b) → ... meghan-cover-writer:latestis cover-letter only — it doesn't follow structured markdown prompts for researchLLMRouter.complete()acceptsfallback_order=override for per-task routingLLMRouter.complete()acceptsimages: list[str](base64) — vision backends only; non-vision backends skipped when images present- Vision fallback order config key:
vision_fallback_order: [vision_service, claude_code, anthropic] vision_servicebackend type: POST to/analyze; skipped automatically when no images provided- Claude Code wrapper:
/Library/Documents/Post Fight Processing/server-openai-wrapper-v2.js - Copilot wrapper:
/Library/Documents/Post Fight Processing/manage-copilot.sh start
Fine-Tuned Model
- Model:
meghan-cover-writer:latestregistered in Ollama - Base:
unsloth/Llama-3.2-3B-Instruct(QLoRA, rank 16, 10 epochs) - Training data: 62 cover letters from
/Library/Documents/JobSearch/ - JSONL:
/Library/Documents/JobSearch/training_data/cover_letters.jsonl - Adapter:
/Library/Documents/JobSearch/training_data/finetune_output/adapter/ - Merged:
/Library/Documents/JobSearch/training_data/gguf/meghan-cover-writer/ - Re-train:
conda run -n ogma python scripts/finetune_local.py(usesogmaenv with unsloth + trl; pin to GPU 0 withCUDA_VISIBLE_DEVICES=0)
Background Tasks
- Cover letter gen and company research run as daemon threads via
scripts/task_runner.py - Tasks survive page navigation; results written to existing tables when done
- On server restart,
app.pystartup clears any stuckrunning/queuedrows tofailed - Dedup: only one queued/running task per
(task_type, job_id)at a time - Sidebar indicator (
app/app.py) polls every 3s via@st.fragment(run_every=3) - ⚠️ Streamlit fragment + sidebar: use
with st.sidebar: _fragment()— sidebar context must WRAP the call, not be inside the fragment body
Vision Service
- Script:
scripts/vision_service/main.py(FastAPI, port 8002) - Model:
vikhyatk/moondream2revision2025-01-09— lazy-loaded on first/analyze(~1.8GB download) - GPU: 4-bit quantization when CUDA available (~1.5GB VRAM); CPU fallback
- Conda env:
job-seeker-vision— separate from job-seeker (torch + transformers live here) - Create env:
conda env create -f scripts/vision_service/environment.yml - Manage:
bash scripts/manage-vision.sh start|stop|restart|status|logs - Survey page degrades gracefully to text-only when vision service is down
- ⚠️ Never install vision deps (torch, bitsandbytes, transformers) into the job-seeker env
Company Research
- Script:
scripts/company_research.py - Auto-triggered when a job moves to
phone_screenin the Interviews kanban - Three-phase: (1) SearXNG company scrape → (1b) SearXNG news snippets → (2) LLM synthesis
- SearXNG scraper:
/Library/Development/scrapers/companyScraper.py - SearXNG Docker: run
docker compose up -dfrom/Library/Development/scrapers/SearXNG/(port 8888) beautifulsoup4andfake-useragentare installed in job-seeker env (required for scraper)- News search hits
/search?format=json— JSON format must be enabled insearxng-config/settings.yml - ⚠️
settings.ymlowned by UID 977 (container user) — usedocker cpto update, not direct writes - ⚠️
settings.ymlrequiresuse_default_settings: trueat the top or SearXNG fails schema validation companyScrapercallssys.exit()on missing deps — useexcept BaseExceptionnotexcept Exception
Email Classifier Labels
Six labels: interview_request, rejection, offer, follow_up, survey_received, other
survey_received— links or requests to complete a culture-fit survey/assessment
Services (managed via Settings → Services tab)
| Service | Port | Notes |
|---|---|---|
| Streamlit UI | 8501 | bash scripts/manage-ui.sh start |
| Ollama | 11434 | sudo systemctl start ollama |
| Claude Code Wrapper | 3009 | manage-services.sh start in Post Fight Processing |
| GitHub Copilot Wrapper | 3010 | manage-copilot.sh start in Post Fight Processing |
| vLLM Server | 8000 | Manual start only |
| SearXNG | 8888 | docker compose up -d in scrapers/SearXNG/ |
| Vision Service | 8002 | bash scripts/manage-vision.sh start — moondream2 survey screenshot analysis |
Notion
- DB: "Tracking Job Applications" (ID:
1bd75cff-7708-8007-8c00-f1de36620a0a) config/notion.yamlis gitignored (live token);.exampleis committed- Field names are non-obvious — always read from
field_mapinconfig/notion.yaml - "Salary" = Notion title property (unusual — it's the page title field)
- "Job Source" =
multi_selecttype - "Role Link" = URL field
- "Status of Application" = status field; new listings use "Application Submitted"
- Sync pushes
approved+appliedjobs; marks themsyncedafter
Key Config Files
config/notion.yaml— gitignored, has token + field_mapconfig/notion.yaml.example— committed templateconfig/search_profiles.yaml— titles, locations, boards, custom_boards, exclude_keywords, mission_tags (per profile)config/llm.yaml— LLM backend priority chain + enabled flagsconfig/tokens.yaml— gitignored, stores HF token (chmod 600)config/adzuna.yaml— gitignored, Adzuna API app_id + app_keyconfig/adzuna.yaml.example— committed template
Custom Job Board Scrapers
scripts/custom_boards/adzuna.py— Adzuna Jobs API; credentials inconfig/adzuna.yamlscripts/custom_boards/theladders.py— The Ladders SSR scraper; needscurl_cffiinstalled- Scrapers registered in
CUSTOM_SCRAPERSdict indiscover.py - Activated per-profile via
custom_boards: [adzuna, theladders]insearch_profiles.yaml enrich_all_descriptions()inenrich_descriptions.pycovers all sources (not just Glassdoor)- Home page "Fill Missing Descriptions" button dispatches
enrich_descriptionstask
Mission Alignment & Accessibility
- Preferred industries: music, animal welfare, children's education (hardcoded in
generate_cover_letter.py) detect_mission_alignment(company, description)injects a Para 3 hint into cover letters for aligned companies- Company research includes an "Inclusion & Accessibility" section (8th section of the brief) in every brief
- Accessibility search query in
_SEARCH_QUERIEShits SearXNG for ADA/ERG/disability signals accessibility_briefcolumn incompany_researchtable; shown in Interview Prep under ♿ section- This info is for personal decision-making ONLY — never disclosed in applications
- In generalization: these become
profile.mission_industries+profile.accessibility_priorityinuser.yaml
Document Rule
Resumes and cover letters live in /Library/Documents/JobSearch/ or Notion — never committed to this repo.
AIHawk (LinkedIn Easy Apply)
- Cloned to
aihawk/(gitignored) - Config:
aihawk/data_folder/plain_text_resume.yaml— search FILL_IN for gaps - Self-ID: non-binary, pronouns any, no disability/drug-test disclosure
- Run:
conda run -n job-seeker python aihawk/main.py - Playwright:
conda run -n job-seeker python -m playwright install chromium
Git Remote
- Forgejo self-hosted at https://git.opensourcesolarpunk.com (username: pyr0ball)
git remote add origin https://git.opensourcesolarpunk.com/pyr0ball/job-seeker.git
Subagents
Use general-purpose subagent type (not Bash) when tasks require file writes.