# Job Seeker Platform — Claude Context ## Project Automated job discovery + resume matching + application pipeline for Meghan McCann. Full pipeline: ``` JobSpy → discover.py → SQLite (staging.db) → match.py → Job Review UI → Apply Workspace (cover letter + PDF) → Interviews kanban → phone_screen → interviewing → offer → hired ↓ Notion DB (synced via sync.py) ``` ## Environment - Python env: `conda run -n job-seeker ` — always use this, never bare python - Run tests: `/devl/miniconda3/envs/job-seeker/bin/pytest tests/ -v` (use direct binary — `conda run pytest` can spawn runaway processes) - Run discovery: `conda run -n job-seeker python scripts/discover.py` - Recreate env: `conda env create -f environment.yml` - pytest.ini scopes test collection to `tests/` only — never widen this ## ⚠️ AIHawk env isolation — CRITICAL - NEVER `pip install -r aihawk/requirements.txt` into the job-seeker env - AIHawk pulls torch + CUDA (~7GB) which causes OOM during test runs - AIHawk must run in its own env: `conda create -n aihawk-env python=3.12` - job-seeker env must stay lightweight (no torch, no sentence-transformers, no CUDA) ## Web UI (Streamlit) - Run: `bash scripts/manage-ui.sh start` → http://localhost:8501 - Manage: `start | stop | restart | status | logs` - Direct binary: `/devl/miniconda3/envs/job-seeker/bin/streamlit run app/app.py` - Entry point: `app/app.py` (uses `st.navigation()` — do NOT run `app/Home.py` directly) - `staging.db` is gitignored — SQLite staging layer between discovery and Notion ### Pages | Page | File | Purpose | |------|------|---------| | Home | `app/Home.py` | Dashboard, discovery trigger, danger-zone purge | | Job Review | `app/pages/1_Job_Review.py` | Batch approve/reject with sorting | | Settings | `app/pages/2_Settings.py` | LLM backends, search profiles, Notion, services | | Resume Profile | Settings → Resume Profile tab | Edit AIHawk YAML profile (was standalone `3_Resume_Editor.py`) | | Apply Workspace | `app/pages/4_Apply.py` | Cover letter gen + PDF export + mark applied + reject listing | | Interviews | `app/pages/5_Interviews.py` | Kanban: phone_screen→interviewing→offer→hired | | Interview Prep | `app/pages/6_Interview_Prep.py` | Live reference sheet during calls + Practice Q&A | | Survey Assistant | `app/pages/7_Survey.py` | Culture-fit survey help: text paste + screenshot (moondream2) | ## Job Status Pipeline ``` pending → approved/rejected (Job Review) approved → applied (Apply Workspace — mark applied) approved → rejected (Apply Workspace — reject listing button) applied → survey (Interviews — "📋 Survey" button; pre-kanban section) applied → phone_screen (Interviews — triggers company research) survey → phone_screen (Interviews — after survey completed) phone_screen → interviewing interviewing → offer offer → hired any stage → rejected (rejection_stage captured for analytics) applied/approved → synced (sync.py → Notion) ``` ## SQLite Schema (`staging.db`) ### `jobs` table key columns - Standard: `id, title, company, url, source, location, is_remote, salary, description` - Scores: `match_score, keyword_gaps` - Dates: `date_found, applied_at, survey_at, phone_screen_at, interviewing_at, offer_at, hired_at` - Interview: `interview_date, rejection_stage` - Content: `cover_letter, notion_page_id` ### Additional tables - `job_contacts` — email thread log per job (direction, subject, from/to, body, received_at) - `company_research` — LLM-generated brief per job (company_brief, ceo_brief, talking_points, raw_output, accessibility_brief) - `background_tasks` — async LLM task queue (task_type, job_id, status: queued/running/completed/failed) - `survey_responses` — per-job Q&A pairs (survey_name, received_at, source, raw_input, image_path, mode, llm_output, reported_score) ## Scripts | Script | Purpose | |--------|---------| | `scripts/discover.py` | JobSpy + custom board scrape → SQLite insert | | `scripts/custom_boards/adzuna.py` | Adzuna Jobs API (app_id + app_key in config/adzuna.yaml) | | `scripts/custom_boards/theladders.py` | The Ladders scraper via curl_cffi + __NEXT_DATA__ SSR parse | | `scripts/match.py` | Resume keyword matching → match_score | | `scripts/sync.py` | Push approved/applied jobs to Notion | | `scripts/llm_router.py` | LLM fallback chain (reads config/llm.yaml) | | `scripts/generate_cover_letter.py` | Cover letter via LLM; detects mission-aligned companies (music/animal welfare/education) and injects Para 3 hint | | `scripts/company_research.py` | Pre-interview brief via LLM + optional SearXNG scrape; includes Inclusion & Accessibility section | | `scripts/prepare_training_data.py` | Extract cover letter JSONL for fine-tuning | | `scripts/finetune_local.py` | Unsloth QLoRA fine-tune on local GPU | | `scripts/db.py` | All SQLite helpers (single source of truth) | | `scripts/task_runner.py` | Background thread executor — `submit_task(db, type, job_id)` dispatches daemon threads for LLM jobs | | `scripts/vision_service/main.py` | FastAPI moondream2 inference on port 8002; `manage-vision.sh` lifecycle | ## LLM Router - Config: `config/llm.yaml` - Cover letter fallback order: `claude_code → ollama (meghan-cover-writer:latest) → vllm → copilot → anthropic` - Research fallback order: `claude_code → vllm (__auto__, ouroboros) → ollama_research (llama3.1:8b) → ...` - `meghan-cover-writer:latest` is cover-letter only — it doesn't follow structured markdown prompts for research - `LLMRouter.complete()` accepts `fallback_order=` override for per-task routing - `LLMRouter.complete()` accepts `images: list[str]` (base64) — vision backends only; non-vision backends skipped when images present - Vision fallback order config key: `vision_fallback_order: [vision_service, claude_code, anthropic]` - `vision_service` backend type: POST to `/analyze`; skipped automatically when no images provided - Claude Code wrapper: `/Library/Documents/Post Fight Processing/server-openai-wrapper-v2.js` - Copilot wrapper: `/Library/Documents/Post Fight Processing/manage-copilot.sh start` ## Fine-Tuned Model - Model: `meghan-cover-writer:latest` registered in Ollama - Base: `unsloth/Llama-3.2-3B-Instruct` (QLoRA, rank 16, 10 epochs) - Training data: 62 cover letters from `/Library/Documents/JobSearch/` - JSONL: `/Library/Documents/JobSearch/training_data/cover_letters.jsonl` - Adapter: `/Library/Documents/JobSearch/training_data/finetune_output/adapter/` - Merged: `/Library/Documents/JobSearch/training_data/gguf/meghan-cover-writer/` - Re-train: `conda run -n ogma python scripts/finetune_local.py` (uses `ogma` env with unsloth + trl; pin to GPU 0 with `CUDA_VISIBLE_DEVICES=0`) ## Background Tasks - Cover letter gen and company research run as daemon threads via `scripts/task_runner.py` - Tasks survive page navigation; results written to existing tables when done - On server restart, `app.py` startup clears any stuck `running`/`queued` rows to `failed` - Dedup: only one queued/running task per `(task_type, job_id)` at a time - Sidebar indicator (`app/app.py`) polls every 3s via `@st.fragment(run_every=3)` - ⚠️ Streamlit fragment + sidebar: use `with st.sidebar: _fragment()` — sidebar context must WRAP the call, not be inside the fragment body ## Vision Service - Script: `scripts/vision_service/main.py` (FastAPI, port 8002) - Model: `vikhyatk/moondream2` revision `2025-01-09` — lazy-loaded on first `/analyze` (~1.8GB download) - GPU: 4-bit quantization when CUDA available (~1.5GB VRAM); CPU fallback - Conda env: `job-seeker-vision` — separate from job-seeker (torch + transformers live here) - Create env: `conda env create -f scripts/vision_service/environment.yml` - Manage: `bash scripts/manage-vision.sh start|stop|restart|status|logs` - Survey page degrades gracefully to text-only when vision service is down - ⚠️ Never install vision deps (torch, bitsandbytes, transformers) into the job-seeker env ## Company Research - Script: `scripts/company_research.py` - Auto-triggered when a job moves to `phone_screen` in the Interviews kanban - Three-phase: (1) SearXNG company scrape → (1b) SearXNG news snippets → (2) LLM synthesis - SearXNG scraper: `/Library/Development/scrapers/companyScraper.py` - SearXNG Docker: run `docker compose up -d` from `/Library/Development/scrapers/SearXNG/` (port 8888) - `beautifulsoup4` and `fake-useragent` are installed in job-seeker env (required for scraper) - News search hits `/search?format=json` — JSON format must be enabled in `searxng-config/settings.yml` - ⚠️ `settings.yml` owned by UID 977 (container user) — use `docker cp` to update, not direct writes - ⚠️ `settings.yml` requires `use_default_settings: true` at the top or SearXNG fails schema validation - `companyScraper` calls `sys.exit()` on missing deps — use `except BaseException` not `except Exception` ## Email Classifier Labels Six labels: `interview_request`, `rejection`, `offer`, `follow_up`, `survey_received`, `other` - `survey_received` — links or requests to complete a culture-fit survey/assessment ## Services (managed via Settings → Services tab) | Service | Port | Notes | |---------|------|-------| | Streamlit UI | 8501 | `bash scripts/manage-ui.sh start` | | Ollama | 11434 | `sudo systemctl start ollama` | | Claude Code Wrapper | 3009 | `manage-services.sh start` in Post Fight Processing | | GitHub Copilot Wrapper | 3010 | `manage-copilot.sh start` in Post Fight Processing | | vLLM Server | 8000 | Manual start only | | SearXNG | 8888 | `docker compose up -d` in scrapers/SearXNG/ | | Vision Service | 8002 | `bash scripts/manage-vision.sh start` — moondream2 survey screenshot analysis | ## Notion - DB: "Tracking Job Applications" (ID: `1bd75cff-7708-8007-8c00-f1de36620a0a`) - `config/notion.yaml` is gitignored (live token); `.example` is committed - Field names are non-obvious — always read from `field_map` in `config/notion.yaml` - "Salary" = Notion title property (unusual — it's the page title field) - "Job Source" = `multi_select` type - "Role Link" = URL field - "Status of Application" = status field; new listings use "Application Submitted" - Sync pushes `approved` + `applied` jobs; marks them `synced` after ## Key Config Files - `config/notion.yaml` — gitignored, has token + field_map - `config/notion.yaml.example` — committed template - `config/search_profiles.yaml` — titles, locations, boards, custom_boards, exclude_keywords, mission_tags (per profile) - `config/llm.yaml` — LLM backend priority chain + enabled flags - `config/tokens.yaml` — gitignored, stores HF token (chmod 600) - `config/adzuna.yaml` — gitignored, Adzuna API app_id + app_key - `config/adzuna.yaml.example` — committed template ## Custom Job Board Scrapers - `scripts/custom_boards/adzuna.py` — Adzuna Jobs API; credentials in `config/adzuna.yaml` - `scripts/custom_boards/theladders.py` — The Ladders SSR scraper; needs `curl_cffi` installed - Scrapers registered in `CUSTOM_SCRAPERS` dict in `discover.py` - Activated per-profile via `custom_boards: [adzuna, theladders]` in `search_profiles.yaml` - `enrich_all_descriptions()` in `enrich_descriptions.py` covers all sources (not just Glassdoor) - Home page "Fill Missing Descriptions" button dispatches `enrich_descriptions` task ## Mission Alignment & Accessibility - Preferred industries: music, animal welfare, children's education (hardcoded in `generate_cover_letter.py`) - `detect_mission_alignment(company, description)` injects a Para 3 hint into cover letters for aligned companies - Company research includes an "Inclusion & Accessibility" section (8th section of the brief) in every brief - Accessibility search query in `_SEARCH_QUERIES` hits SearXNG for ADA/ERG/disability signals - `accessibility_brief` column in `company_research` table; shown in Interview Prep under ♿ section - This info is for personal decision-making ONLY — never disclosed in applications - In generalization: these become `profile.mission_industries` + `profile.accessibility_priority` in `user.yaml` ## Document Rule Resumes and cover letters live in `/Library/Documents/JobSearch/` or Notion — never committed to this repo. ## AIHawk (LinkedIn Easy Apply) - Cloned to `aihawk/` (gitignored) - Config: `aihawk/data_folder/plain_text_resume.yaml` — search FILL_IN for gaps - Self-ID: non-binary, pronouns any, no disability/drug-test disclosure - Run: `conda run -n job-seeker python aihawk/main.py` - Playwright: `conda run -n job-seeker python -m playwright install chromium` ## Git Remote - Forgejo self-hosted at https://git.opensourcesolarpunk.com (username: pyr0ball) - `git remote add origin https://git.opensourcesolarpunk.com/pyr0ball/job-seeker.git` ## Subagents Use `general-purpose` subagent type (not `Bash`) when tasks require file writes.