refactor: replace sidebar LLM generate panel with inline field buttons

Removed the dropdown-based sidebar panel in favour of ✨ Generate buttons placed directly below Career Summary, Voice & Personality, and each Mission & Values row. Prompts now incorporate the live field value as a draft to improve, plus resume experience bullets as context for Career Summary.
feat: searchable tag UI for skills/domains/keywords
2026-02-26 13:40:52 -08:00 · 2026-02-26 13:14:55 -08:00 · 2026-02-26 13:09:32 -08:00 · 2026-02-26 12:32:28 -08:00
15 changed files with 908 additions and 745 deletions
--- a/.gitignore
+++ b/.gitignore
@ -19,6 +19,7 @@ unsloth_compiled_cache/
 data/survey_screenshots/*
 !data/survey_screenshots/.gitkeep
 config/user.yaml
+config/plain_text_resume.yaml
 config/.backup-*
 config/integrations/*.yaml
 !config/integrations/*.yaml.example
@ -30,3 +31,7 @@ scrapers/raw_scrapes/

 compose.override.yml
 config/license.json
+config/user.yaml.working
+
+# Claude context files — kept out of version control
+CLAUDE.md
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -1,212 +0,0 @@
-# Job Seeker Platform — Claude Context
-
-## Project
-Automated job discovery + resume matching + application pipeline for Meghan McCann.
-
-Full pipeline:
-```
-JobSpy → discover.py → SQLite (staging.db) → match.py → Job Review UI
-→ Apply Workspace (cover letter + PDF) → Interviews kanban
-→ phone_screen → interviewing → offer → hired
-         ↓
-      Notion DB (synced via sync.py)
-```
-
-## Environment
- Python env: `conda run -n job-seeker <cmd>` — always use this, never bare python
- Run tests: `/devl/miniconda3/envs/job-seeker/bin/pytest tests/ -v`
-  (use direct binary — `conda run pytest` can spawn runaway processes)
- Run discovery: `conda run -n job-seeker python scripts/discover.py`
- Recreate env: `conda env create -f environment.yml`
- pytest.ini scopes test collection to `tests/` only — never widen this
-
-## ⚠️ AIHawk env isolation — CRITICAL
- NEVER `pip install -r aihawk/requirements.txt` into the job-seeker env
- AIHawk pulls torch + CUDA (~7GB) which causes OOM during test runs
- AIHawk must run in its own env: `conda create -n aihawk-env python=3.12`
- job-seeker env must stay lightweight (no torch, no sentence-transformers, no CUDA)
-
-## Web UI (Streamlit)
- Run: `bash scripts/manage-ui.sh start` → http://localhost:8501
- Manage: `start | stop | restart | status | logs`
- Direct binary: `/devl/miniconda3/envs/job-seeker/bin/streamlit run app/app.py`
- Entry point: `app/app.py` (uses `st.navigation()` — do NOT run `app/Home.py` directly)
- `staging.db` is gitignored — SQLite staging layer between discovery and Notion
-
-### Pages
-| Page | File | Purpose |
-|------|------|---------|
-| Home | `app/Home.py` | Dashboard, discovery trigger, danger-zone purge |
-| Job Review | `app/pages/1_Job_Review.py` | Batch approve/reject with sorting |
-| Settings | `app/pages/2_Settings.py` | LLM backends, search profiles, Notion, services |
-| Resume Profile | Settings → Resume Profile tab | Edit AIHawk YAML profile (was standalone `3_Resume_Editor.py`) |
-| Apply Workspace | `app/pages/4_Apply.py` | Cover letter gen + PDF export + mark applied + reject listing |
-| Interviews | `app/pages/5_Interviews.py` | Kanban: phone_screen→interviewing→offer→hired |
-| Interview Prep | `app/pages/6_Interview_Prep.py` | Live reference sheet during calls + Practice Q&A |
-| Survey Assistant | `app/pages/7_Survey.py` | Culture-fit survey help: text paste + screenshot (moondream2) |
-
-## Job Status Pipeline
-```
-pending → approved/rejected          (Job Review)
-approved → applied                   (Apply Workspace — mark applied)
-approved → rejected                  (Apply Workspace — reject listing button)
-applied → survey                     (Interviews — "📋 Survey" button; pre-kanban section)
-applied → phone_screen               (Interviews — triggers company research)
-survey → phone_screen                (Interviews — after survey completed)
-phone_screen → interviewing
-interviewing → offer
-offer → hired
-any stage → rejected (rejection_stage captured for analytics)
-applied/approved → synced            (sync.py → Notion)
-```
-
-## SQLite Schema (`staging.db`)
-### `jobs` table key columns
- Standard: `id, title, company, url, source, location, is_remote, salary, description`
- Scores: `match_score, keyword_gaps`
- Dates: `date_found, applied_at, survey_at, phone_screen_at, interviewing_at, offer_at, hired_at`
- Interview: `interview_date, rejection_stage`
- Content: `cover_letter, notion_page_id`
-
-### Additional tables
- `job_contacts` — email thread log per job (direction, subject, from/to, body, received_at)
- `company_research` — LLM-generated brief per job (company_brief, ceo_brief, talking_points, raw_output, accessibility_brief)
- `background_tasks` — async LLM task queue (task_type, job_id, status: queued/running/completed/failed)
- `survey_responses` — per-job Q&A pairs (survey_name, received_at, source, raw_input, image_path, mode, llm_output, reported_score)
-
-## Scripts
-| Script | Purpose |
-|--------|---------|
-| `scripts/discover.py` | JobSpy + custom board scrape → SQLite insert |
-| `scripts/custom_boards/adzuna.py` | Adzuna Jobs API (app_id + app_key in config/adzuna.yaml) |
-| `scripts/custom_boards/theladders.py` | The Ladders scraper via curl_cffi + __NEXT_DATA__ SSR parse |
-| `scripts/match.py` | Resume keyword matching → match_score |
-| `scripts/sync.py` | Push approved/applied jobs to Notion |
-| `scripts/llm_router.py` | LLM fallback chain (reads config/llm.yaml) |
-| `scripts/generate_cover_letter.py` | Cover letter via LLM; detects mission-aligned companies (music/animal welfare/education) and injects Para 3 hint |
-| `scripts/company_research.py` | Pre-interview brief via LLM + optional SearXNG scrape; includes Inclusion & Accessibility section |
-| `scripts/prepare_training_data.py` | Extract cover letter JSONL for fine-tuning |
-| `scripts/finetune_local.py` | Unsloth QLoRA fine-tune on local GPU |
-| `scripts/db.py` | All SQLite helpers (single source of truth) |
-| `scripts/task_runner.py` | Background thread executor — `submit_task(db, type, job_id)` dispatches daemon threads for LLM jobs |
-| `scripts/vision_service/main.py` | FastAPI moondream2 inference on port 8002; `manage-vision.sh` lifecycle |
-
-## LLM Router
- Config: `config/llm.yaml`
- Cover letter fallback order: `claude_code → ollama (meghan-cover-writer:latest) → vllm → copilot → anthropic`
- Research fallback order: `claude_code → vllm (__auto__, ouroboros) → ollama_research (llama3.1:8b) → ...`
- `meghan-cover-writer:latest` is cover-letter only — it doesn't follow structured markdown prompts for research
- `LLMRouter.complete()` accepts `fallback_order=` override for per-task routing
- `LLMRouter.complete()` accepts `images: list[str]` (base64) — vision backends only; non-vision backends skipped when images present
- Vision fallback order config key: `vision_fallback_order: [vision_service, claude_code, anthropic]`
- `vision_service` backend type: POST to `/analyze`; skipped automatically when no images provided
- Claude Code wrapper: `/Library/Documents/Post Fight Processing/server-openai-wrapper-v2.js`
- Copilot wrapper: `/Library/Documents/Post Fight Processing/manage-copilot.sh start`
-
-## Fine-Tuned Model
- Model: `meghan-cover-writer:latest` registered in Ollama
- Base: `unsloth/Llama-3.2-3B-Instruct` (QLoRA, rank 16, 10 epochs)
- Training data: 62 cover letters from `/Library/Documents/JobSearch/`
- JSONL: `/Library/Documents/JobSearch/training_data/cover_letters.jsonl`
- Adapter: `/Library/Documents/JobSearch/training_data/finetune_output/adapter/`
- Merged: `/Library/Documents/JobSearch/training_data/gguf/meghan-cover-writer/`
- Re-train: `conda run -n ogma python scripts/finetune_local.py`
-  (uses `ogma` env with unsloth + trl; pin to GPU 0 with `CUDA_VISIBLE_DEVICES=0`)
-
-## Background Tasks
- Cover letter gen and company research run as daemon threads via `scripts/task_runner.py`
- Tasks survive page navigation; results written to existing tables when done
- On server restart, `app.py` startup clears any stuck `running`/`queued` rows to `failed`
- Dedup: only one queued/running task per `(task_type, job_id)` at a time
- Sidebar indicator (`app/app.py`) polls every 3s via `@st.fragment(run_every=3)`
- ⚠️ Streamlit fragment + sidebar: use `with st.sidebar: _fragment()` — sidebar context must WRAP the call, not be inside the fragment body
-
-## Vision Service
- Script: `scripts/vision_service/main.py` (FastAPI, port 8002)
- Model: `vikhyatk/moondream2` revision `2025-01-09` — lazy-loaded on first `/analyze` (~1.8GB download)
- GPU: 4-bit quantization when CUDA available (~1.5GB VRAM); CPU fallback
- Conda env: `job-seeker-vision` — separate from job-seeker (torch + transformers live here)
- Create env: `conda env create -f scripts/vision_service/environment.yml`
- Manage: `bash scripts/manage-vision.sh start|stop|restart|status|logs`
- Survey page degrades gracefully to text-only when vision service is down
- ⚠️ Never install vision deps (torch, bitsandbytes, transformers) into the job-seeker env
-
-## Company Research
- Script: `scripts/company_research.py`
- Auto-triggered when a job moves to `phone_screen` in the Interviews kanban
- Three-phase: (1) SearXNG company scrape → (1b) SearXNG news snippets → (2) LLM synthesis
- SearXNG scraper: `/Library/Development/scrapers/companyScraper.py`
- SearXNG Docker: run `docker compose up -d` from `/Library/Development/scrapers/SearXNG/` (port 8888)
- `beautifulsoup4` and `fake-useragent` are installed in job-seeker env (required for scraper)
- News search hits `/search?format=json` — JSON format must be enabled in `searxng-config/settings.yml`
- ⚠️ `settings.yml` owned by UID 977 (container user) — use `docker cp` to update, not direct writes
- ⚠️ `settings.yml` requires `use_default_settings: true` at the top or SearXNG fails schema validation
- `companyScraper` calls `sys.exit()` on missing deps — use `except BaseException` not `except Exception`
-
-## Email Classifier Labels
-Six labels: `interview_request`, `rejection`, `offer`, `follow_up`, `survey_received`, `other`
- `survey_received` — links or requests to complete a culture-fit survey/assessment
-
-## Services (managed via Settings → Services tab)
-| Service | Port | Notes |
-|---------|------|-------|
-| Streamlit UI | 8501 | `bash scripts/manage-ui.sh start` |
-| Ollama | 11434 | `sudo systemctl start ollama` |
-| Claude Code Wrapper | 3009 | `manage-services.sh start` in Post Fight Processing |
-| GitHub Copilot Wrapper | 3010 | `manage-copilot.sh start` in Post Fight Processing |
-| vLLM Server | 8000 | Manual start only |
-| SearXNG | 8888 | `docker compose up -d` in scrapers/SearXNG/ |
-| Vision Service | 8002 | `bash scripts/manage-vision.sh start` — moondream2 survey screenshot analysis |
-
-## Notion
- DB: "Tracking Job Applications" (ID: `1bd75cff-7708-8007-8c00-f1de36620a0a`)
- `config/notion.yaml` is gitignored (live token); `.example` is committed
- Field names are non-obvious — always read from `field_map` in `config/notion.yaml`
- "Salary" = Notion title property (unusual — it's the page title field)
- "Job Source" = `multi_select` type
- "Role Link" = URL field
- "Status of Application" = status field; new listings use "Application Submitted"
- Sync pushes `approved` + `applied` jobs; marks them `synced` after
-
-## Key Config Files
- `config/notion.yaml` — gitignored, has token + field_map
- `config/notion.yaml.example` — committed template
- `config/search_profiles.yaml` — titles, locations, boards, custom_boards, exclude_keywords, mission_tags (per profile)
- `config/llm.yaml` — LLM backend priority chain + enabled flags
- `config/tokens.yaml` — gitignored, stores HF token (chmod 600)
- `config/adzuna.yaml` — gitignored, Adzuna API app_id + app_key
- `config/adzuna.yaml.example` — committed template
-
-## Custom Job Board Scrapers
- `scripts/custom_boards/adzuna.py` — Adzuna Jobs API; credentials in `config/adzuna.yaml`
- `scripts/custom_boards/theladders.py` — The Ladders SSR scraper; needs `curl_cffi` installed
- Scrapers registered in `CUSTOM_SCRAPERS` dict in `discover.py`
- Activated per-profile via `custom_boards: [adzuna, theladders]` in `search_profiles.yaml`
- `enrich_all_descriptions()` in `enrich_descriptions.py` covers all sources (not just Glassdoor)
- Home page "Fill Missing Descriptions" button dispatches `enrich_descriptions` task
-
-## Mission Alignment & Accessibility
- Preferred industries: music, animal welfare, children's education (hardcoded in `generate_cover_letter.py`)
- `detect_mission_alignment(company, description)` injects a Para 3 hint into cover letters for aligned companies
- Company research includes an "Inclusion & Accessibility" section (8th section of the brief) in every brief
- Accessibility search query in `_SEARCH_QUERIES` hits SearXNG for ADA/ERG/disability signals
- `accessibility_brief` column in `company_research` table; shown in Interview Prep under ♿ section
- This info is for personal decision-making ONLY — never disclosed in applications
- In generalization: these become `profile.mission_industries` + `profile.accessibility_priority` in `user.yaml`
-
-## Document Rule
-Resumes and cover letters live in `/Library/Documents/JobSearch/` or Notion — never committed to this repo.
-
-## AIHawk (LinkedIn Easy Apply)
- Cloned to `aihawk/` (gitignored)
- Config: `aihawk/data_folder/plain_text_resume.yaml` — search FILL_IN for gaps
- Self-ID: non-binary, pronouns any, no disability/drug-test disclosure
- Run: `conda run -n job-seeker python aihawk/main.py`
- Playwright: `conda run -n job-seeker python -m playwright install chromium`
-
-## Git Remote
- Forgejo self-hosted at https://git.opensourcesolarpunk.com (username: pyr0ball)
- `git remote add origin https://git.opensourcesolarpunk.com/pyr0ball/job-seeker.git`
-
-## Subagents
-Use `general-purpose` subagent type (not `Bash`) when tasks require file writes.
--- a/app/pages/0_Setup.py
+++ b/app/pages/0_Setup.py
@ -405,7 +405,7 @@ elif step == 4:
        if errs:
            st.error("\n".join(errs))
        else:
-            resume_yaml_path = _ROOT / "aihawk" / "data_folder" / "plain_text_resume.yaml"
+            resume_yaml_path = _ROOT / "config" / "plain_text_resume.yaml"
            resume_yaml_path.parent.mkdir(parents=True, exist_ok=True)
            resume_data = {**parsed, "experience": experience} if parsed else {"experience": experience}
            resume_yaml_path.write_text(
--- a/app/pages/2_Settings.py
+++ b/app/pages/2_Settings.py
--- a/app/pages/4_Apply.py
+++ b/app/pages/4_Apply.py
@ -28,7 +28,7 @@ from scripts.db import (
 from scripts.task_runner import submit_task

 DOCS_DIR = _profile.docs_dir if _profile else Path.home() / "Documents" / "JobSearch"
-RESUME_YAML = Path(__file__).parent.parent.parent / "aihawk" / "data_folder" / "plain_text_resume.yaml"
+RESUME_YAML = Path(__file__).parent.parent.parent / "config" / "plain_text_resume.yaml"

 st.title("🚀 Apply Workspace")

--- a/config/search_profiles.yaml
+++ b/config/search_profiles.yaml
@ -1,4 +1,15 @@
 profiles:
+- boards:
+  - linkedin
+  - indeed
+  - glassdoor
+  - zip_recruiter
+  job_titles:
+  - Customer Service Specialist
+  locations:
+  - San Francisco CA
+  name: default
+  remote_only: false
 - boards:
  - linkedin
  - indeed
--- a/config/skills_suggestions.yaml
+++ b/config/skills_suggestions.yaml
@ -0,0 +1,193 @@
+# skills_suggestions.yaml — Bundled tag suggestions for the Skills & Keywords UI.
+# Shown as searchable options in the multiselect. Users can add custom tags beyond these.
+# Future: community aggregate (paid tier) will supplement this list from anonymised installs.
+
+skills:
+  # ── Customer Success & Account Management ──
+  - Customer Success
+  - Technical Account Management
+  - Account Management
+  - Customer Onboarding
+  - Renewal Management
+  - Churn Prevention
+  - Expansion Revenue
+  - Executive Relationship Management
+  - Escalation Management
+  - QBR Facilitation
+  - Customer Advocacy
+  - Voice of the Customer
+  - Customer Health Scoring
+  - Success Planning
+  - Customer Education
+  - Implementation Management
+  # ── Revenue & Operations ──
+  - Revenue Operations
+  - Sales Operations
+  - Pipeline Management
+  - Forecasting
+  - Contract Negotiation
+  - Upsell & Cross-sell
+  - ARR / MRR Management
+  - NRR Optimization
+  - Quota Attainment
+  # ── Leadership & Management ──
+  - Team Leadership
+  - People Management
+  - Cross-functional Collaboration
+  - Change Management
+  - Stakeholder Management
+  - Executive Presentation
+  - Strategic Planning
+  - OKR Setting
+  - Hiring & Recruiting
+  - Coaching & Mentoring
+  - Performance Management
+  # ── Project & Program Management ──
+  - Project Management
+  - Program Management
+  - Agile / Scrum
+  - Kanban
+  - Risk Management
+  - Resource Planning
+  - Process Improvement
+  - SOP Development
+  # ── Technical Skills ──
+  - SQL
+  - Python
+  - Data Analysis
+  - Tableau
+  - Looker
+  - Power BI
+  - Excel / Google Sheets
+  - REST APIs
+  - Salesforce
+  - HubSpot
+  - Gainsight
+  - Totango
+  - ChurnZero
+  - Zendesk
+  - Intercom
+  - Jira
+  - Confluence
+  - Notion
+  - Slack
+  - Zoom
+  # ── Communications & Writing ──
+  - Executive Communication
+  - Technical Writing
+  - Proposal Writing
+  - Presentation Skills
+  - Public Speaking
+  - Stakeholder Communication
+  # ── Compliance & Security ──
+  - Compliance
+  - Risk Assessment
+  - SOC 2
+  - ISO 27001
+  - GDPR
+  - Security Awareness
+  - Vendor Management
+
+domains:
+  # ── Software & Tech ──
+  - B2B SaaS
+  - Enterprise Software
+  - Cloud Infrastructure
+  - Developer Tools
+  - Cybersecurity
+  - Data & Analytics
+  - AI / ML Platform
+  - FinTech
+  - InsurTech
+  - LegalTech
+  - HR Tech
+  - MarTech
+  - AdTech
+  - DevOps / Platform Engineering
+  - Open Source
+  # ── Industry Verticals ──
+  - Healthcare / HealthTech
+  - Education / EdTech
+  - Non-profit / Social Impact
+  - Government / GovTech
+  - E-commerce / Retail
+  - Manufacturing
+  - Financial Services
+  - Media & Entertainment
+  - Music Industry
+  - Logistics & Supply Chain
+  - Real Estate / PropTech
+  - Energy / CleanTech
+  - Hospitality & Travel
+  # ── Market Segments ──
+  - Enterprise
+  - Mid-Market
+  - SMB / SME
+  - Startup
+  - Fortune 500
+  - Public Sector
+  - International / Global
+  # ── Business Models ──
+  - Subscription / SaaS
+  - Marketplace
+  - Usage-based Pricing
+  - Professional Services
+  - Self-serve / PLG
+
+keywords:
+  # ── CS Metrics & Outcomes ──
+  - NPS
+  - CSAT
+  - CES
+  - Churn Rate
+  - Net Revenue Retention
+  - Gross Revenue Retention
+  - Logo Retention
+  - Time-to-Value
+  - Product Adoption
+  - Feature Utilisation
+  - Health Score
+  - Customer Lifetime Value
+  # ── Sales & Growth ──
+  - ARR
+  - MRR
+  - GRR
+  - NRR
+  - Expansion ARR
+  - Pipeline Coverage
+  - Win Rate
+  - Average Contract Value
+  - Land & Expand
+  - Multi-threading
+  # ── Process & Delivery ──
+  - Onboarding
+  - Implementation
+  - Knowledge Transfer
+  - Escalation
+  - SLA
+  - Root Cause Analysis
+  - Post-mortem
+  - Runbook
+  - Playbook Development
+  - Feedback Loop
+  - Product Roadmap Input
+  # ── Team & Culture ──
+  - Cross-functional
+  - Distributed Team
+  - Remote-first
+  - High-growth
+  - Fast-paced
+  - Autonomous
+  - Data-driven
+  - Customer-centric
+  - Empathetic Leadership
+  - Inclusive Culture
+  # ── Job-seeker Keywords ──
+  - Strategic
+  - Proactive
+  - Hands-on
+  - Scalable Processes
+  - Operational Excellence
+  - Business Impact
+  - Executive Visibility
+  - Player-Coach
--- a/environment.yml
+++ b/environment.yml
@ -28,7 +28,7 @@ dependencies:
    - fake-useragent      # company scraper rotation

    # ── LLM / AI backends ─────────────────────────────────────────────────────
-    - openai>=1.0         # used for OpenAI-compat backends (ollama, vllm, wrappers)
+    - openai>=1.55.0,<2.0.0  # >=1.55 required for httpx 0.28 compat; <2.0 for langchain-openai
    - anthropic>=0.80     # direct Anthropic API fallback
    - ollama              # Python client for Ollama management
    - langchain>=0.2
@ -54,6 +54,9 @@ dependencies:
    - pyyaml>=6.0
    - python-dotenv

+    # ── Auth / licensing ──────────────────────────────────────────────────────
+    - PyJWT>=2.8
+
    # ── Utilities ─────────────────────────────────────────────────────────────
    - sqlalchemy
    - tqdm
--- a/requirements.txt
+++ b/requirements.txt
@ -22,7 +22,7 @@ curl_cffi
 fake-useragent

 # ── LLM / AI backends ─────────────────────────────────────────────────────
-openai>=1.0
+openai>=1.55.0,<2.0.0  # >=1.55 required for httpx 0.28 compat; <2.0 for langchain-openai
 anthropic>=0.80
 ollama
 langchain>=0.2
@ -51,6 +51,9 @@ json-repair
 pyyaml>=6.0
 python-dotenv

+# ── Auth / licensing ──────────────────────────────────────────────────────
+PyJWT>=2.8
+
 # ── Utilities ─────────────────────────────────────────────────────────────
 sqlalchemy
 tqdm
--- a/scripts/company_research.py
+++ b/scripts/company_research.py
@ -193,7 +193,7 @@ def _parse_sections(text: str) -> dict[str, str]:
    return sections


-_RESUME_YAML = Path(__file__).parent.parent / "aihawk" / "data_folder" / "plain_text_resume.yaml"
+_RESUME_YAML = Path(__file__).parent.parent / "config" / "plain_text_resume.yaml"
 _KEYWORDS_YAML = Path(__file__).parent.parent / "config" / "resume_keywords.yaml"


--- a/scripts/generate_cover_letter.py
+++ b/scripts/generate_cover_letter.py
@ -26,11 +26,19 @@ LETTERS_DIR = _profile.docs_dir if _profile else Path.home() / "Documents" / "Jo
 LETTER_GLOB = "*Cover Letter*.md"

 # Background injected into every prompt so the model has the candidate's facts
-SYSTEM_CONTEXT = (
-    f"You are writing cover letters for {_profile.name}. {_profile.career_summary}"
-    if _profile else
-    "You are a professional cover letter writer. Write in first person."
-)
+def _build_system_context() -> str:
+    if not _profile:
+        return "You are a professional cover letter writer. Write in first person."
+    parts = [f"You are writing cover letters for {_profile.name}. {_profile.career_summary}"]
+    if _profile.candidate_voice:
+        parts.append(
+            f"Voice and personality: {_profile.candidate_voice} "
+            "Write in a way that reflects these authentic traits — not as a checklist, "
+            "but as a natural expression of who this person is."
+        )
+    return " ".join(parts)
+
+SYSTEM_CONTEXT = _build_system_context()


 # ── Mission-alignment detection ───────────────────────────────────────────────
@ -58,6 +66,13 @@ _MISSION_SIGNALS: dict[str, list[str]] = {
        "instructure", "canvas lms", "clever", "district", "teacher",
        "k-12", "k12", "grade", "pedagogy",
    ],
+    "social_impact": [
+        "nonprofit", "non-profit", "501(c)", "social impact", "mission-driven",
+        "public benefit", "community", "underserved", "equity", "justice",
+        "humanitarian", "advocacy", "charity", "foundation", "ngo",
+        "social good", "civic", "public health", "mental health", "food security",
+        "housing", "homelessness", "poverty", "workforce development",
+    ],
 }

 _candidate = _profile.name if _profile else "the candidate"
@ -79,6 +94,11 @@ _MISSION_DEFAULTS: dict[str, str] = {
        f"{_candidate}'s values. Para 3 should reflect this authentic connection specifically "
        "and warmly."
    ),
+    "social_impact": (
+        f"This organization is mission-driven / social impact focused — exactly the kind of "
+        f"cause {_candidate} cares deeply about. Para 3 should warmly reflect their genuine "
+        "desire to apply their skills to work that makes a real difference in people's lives."
+    ),
 }


--- a/scripts/migrate.py
+++ b/scripts/migrate.py
@ -84,9 +84,9 @@ def _extract_career_summary(source: Path) -> str:

 def _extract_personal_info(source: Path) -> dict:
    """Extract personal info from aihawk resume yaml."""
-    resume = source / "aihawk" / "data_folder" / "plain_text_resume.yaml"
+    resume = source / "config" / "plain_text_resume.yaml"
    if not resume.exists():
-        resume = source / "config" / "plain_text_resume.yaml"
+        resume = source / "aihawk" / "data_folder" / "plain_text_resume.yaml"
    if not resume.exists():
        return {}
    data = _load_yaml(resume)
@ -197,8 +197,10 @@ def _copy_configs(source: Path, dest: Path, apply: bool) -> None:

 def _copy_aihawk_resume(source: Path, dest: Path, apply: bool) -> None:
    print("\n── Copying AIHawk resume profile")
-    src = source / "aihawk" / "data_folder" / "plain_text_resume.yaml"
-    dst = dest / "aihawk" / "data_folder" / "plain_text_resume.yaml"
+    src = source / "config" / "plain_text_resume.yaml"
+    if not src.exists():
+        src = source / "aihawk" / "data_folder" / "plain_text_resume.yaml"
+    dst = dest / "config" / "plain_text_resume.yaml"
    _copy_file(src, dst, apply)


--- a/scripts/resume_parser.py
+++ b/scripts/resume_parser.py
@ -92,6 +92,18 @@ def _find_column_split(page) -> float | None:
    return split_x if split_x and best_gap > page.width * 0.03 else None


+_CID_BULLETS = {127, 149, 183}  # common bullet CIDs across ATS-reembedded fonts
+
+def _clean_cid(text: str) -> str:
+    """Replace (cid:NNN) glyph references emitted by pdfplumber when a PDF font
+    lacks a ToUnicode map.  Known bullet CIDs become '•'; everything else is
+    stripped so downstream section parsing sees clean text."""
+    def _replace(m: re.Match) -> str:
+        n = int(m.group(1))
+        return "•" if n in _CID_BULLETS else ""
+    return re.sub(r"\(cid:(\d+)\)", _replace, text)
+
+
 def extract_text_from_pdf(file_bytes: bytes) -> str:
    """Extract text from PDF, handling two-column layouts via gutter detection.

@ -116,12 +128,12 @@ def extract_text_from_pdf(file_bytes: bytes) -> str:
                    pages.append("\n".join(filter(None, [header_text, left_text, right_text])))
                    continue
            pages.append(page.extract_text() or "")
-    return "\n".join(pages)
+    return _clean_cid("\n".join(pages))


 def extract_text_from_docx(file_bytes: bytes) -> str:
    doc = Document(io.BytesIO(file_bytes))
-    return "\n".join(p.text for p in doc.paragraphs if p.text.strip())
+    return _clean_cid("\n".join(p.text for p in doc.paragraphs if p.text.strip()))


 def extract_text_from_odt(file_bytes: bytes) -> str:
@ -139,7 +151,7 @@ def extract_text_from_odt(file_bytes: bytes) -> str:
            text = "".join(elem.itertext()).strip()
            if text:
                lines.append(text)
-    return "\n".join(lines)
+    return _clean_cid("\n".join(lines))


 # ── Section splitter ──────────────────────────────────────────────────────────
--- a/scripts/skills_utils.py
+++ b/scripts/skills_utils.py
@ -0,0 +1,67 @@
+"""
+skills_utils.py — Content filter and suggestion loader for the skills tagging system.
+
+load_suggestions(category)  → list[str]   bundled suggestions for a category
+filter_tag(tag)             → str | None   cleaned tag, or None if rejected
+"""
+from __future__ import annotations
+import re
+from pathlib import Path
+
+_SUGGESTIONS_FILE = Path(__file__).parent.parent / "config" / "skills_suggestions.yaml"
+
+# ── Content filter ─────────────────────────────────────────────────────────────
+# Tags must be short, human-readable skill/domain labels. No URLs, no abuse.
+
+_BLOCKED = {
+    # profanity placeholder — extend as needed
+    "fuck", "shit", "ass", "bitch", "cunt", "dick", "bastard", "damn",
+}
+
+_URL_RE = re.compile(r"https?://|www\.|\.com\b|\.net\b|\.org\b", re.I)
+_ALLOWED_CHARS = re.compile(r"^[\w\s\-\.\+\#\/\&\(\)]+$", re.UNICODE)
+
+
+def filter_tag(raw: str) -> str | None:
+    """Return a cleaned tag string, or None if the tag should be rejected.
+
+    Rejection criteria:
+    - Blank after stripping
+    - Too short (< 2 chars) or too long (> 60 chars)
+    - Contains a URL pattern
+    - Contains disallowed characters
+    - Matches a blocked term (case-insensitive, whole-word)
+    - Repeated character run (e.g. 'aaaaa')
+    """
+    tag = " ".join(raw.strip().split())  # normalise whitespace
+    if not tag or len(tag) < 2:
+        return None
+    if len(tag) > 60:
+        return None
+    if _URL_RE.search(tag):
+        return None
+    if not _ALLOWED_CHARS.match(tag):
+        return None
+    lower = tag.lower()
+    for blocked in _BLOCKED:
+        if re.search(rf"\b{re.escape(blocked)}\b", lower):
+            return None
+    if re.search(r"(.)\1{4,}", lower):  # 5+ repeated chars
+        return None
+    return tag
+
+
+# ── Suggestion loader ──────────────────────────────────────────────────────────
+
+def load_suggestions(category: str) -> list[str]:
+    """Return the bundled suggestion list for a category ('skills'|'domains'|'keywords').
+    Returns an empty list if the file is missing or the category is not found.
+    """
+    if not _SUGGESTIONS_FILE.exists():
+        return []
+    try:
+        import yaml
+        data = yaml.safe_load(_SUGGESTIONS_FILE.read_text()) or {}
+        return list(data.get(category, []))
+    except Exception:
+        return []
--- a/scripts/user_profile.py
+++ b/scripts/user_profile.py
@ -15,6 +15,7 @@ _DEFAULTS = {
    "phone": "",
    "linkedin": "",
    "career_summary": "",
+    "candidate_voice": "",
    "nda_companies": [],
    "docs_dir": "~/Documents/JobSearch",
    "ollama_models_dir": "~/models/ollama",
@ -61,6 +62,7 @@ class UserProfile:
        self.phone: str = data["phone"]
        self.linkedin: str = data["linkedin"]
        self.career_summary: str = data["career_summary"]
+        self.candidate_voice: str = data.get("candidate_voice", "")
        self.nda_companies: list[str] = [c.lower() for c in data["nda_companies"]]
        self.docs_dir: Path = Path(data["docs_dir"]).expanduser().resolve()
        self.ollama_models_dir: Path = Path(data["ollama_models_dir"]).expanduser().resolve()
Author	SHA1	Message	Date
pyr0ball	d9f2b452e8	refactor: replace sidebar LLM generate panel with inline field buttons Removed the dropdown-based sidebar panel in favour of ✨ Generate buttons placed directly below Career Summary, Voice & Personality, and each Mission & Values row. Prompts now incorporate the live field value as a draft to improve, plus resume experience bullets as context for Career Summary.	2026-02-26 13:40:52 -08:00
pyr0ball	fedb558b1e	feat: searchable tag UI for skills/domains/keywords Replace chip-button tag management with st.multiselect backed by bundled suggestions. Existing user tags are preserved as custom options alongside the suggestion list. Custom tag input validates through filter_tag() before adding — rejects URLs, profanity, overlong strings, and bad characters. Changes auto-save on multiselect interaction; custom tags append on + click.	2026-02-26 13:14:55 -08:00
pyr0ball	15c2a1d4ef	feat: bundled skills suggestion list and content filter utility - config/skills_suggestions.yaml: 168 curated tags across skills (77), domains (40), keywords (51) covering CS/TAM/ops and common tech roles; structured for future community aggregate (paid tier backlog) - scripts/skills_utils.py: filter_tag() rejects blanks, URLs, profanity, overlong strings, disallowed chars, and repeated-char runs; load_suggestions() reads bundled YAML per category	2026-02-26 13:09:32 -08:00
pyr0ball	d5cf02096b	fix: resume CID glyphs, resume YAML path, PyJWT dep, candidate voice & mission UI - resume_parser: add _clean_cid() to strip (cid:NNN) glyph refs from ATS PDFs; CIDs 127/149/183 become bullets, unknowns are stripped; applied to PDF/DOCX/ODT - resume YAML: canonicalize plain_text_resume.yaml path to config/ across all references (Settings, Apply, Setup, company_research, migrate); was pointing at unmounted aihawk/data_folder/ in Docker - requirements/environment: add PyJWT>=2.8 (was missing; broke Settings page) - user_profile: add candidate_voice field - generate_cover_letter: inject candidate_voice into SYSTEM_CONTEXT; add social_impact mission signal category (nonprofit, community, equity, etc.) - Settings: add Voice & Personality textarea to Identity expander; add Mission & Values expander with editable fields for all 4 mission categories - .gitignore: exclude CLAUDE.md, config/plain_text_resume.yaml, config/user.yaml.working - search_profiles: add default profile	2026-02-26 12:32:28 -08:00