peregrine/docs/plans/2026-02-24-expanded-wizard-plan.md

2623 lines
94 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Expanded First-Run Wizard — Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
**Goal:** Replace the 5-step surface-level wizard with a comprehensive onboarding flow covering resume upload/parsing, guided config walkthroughs, LLM-assisted generation, and free/paid/premium feature gating.
**Architecture:** `app/wizard/` package holds all step logic; `scripts/integrations/` registry holds all integration drivers; `app/pages/0_Setup.py` becomes a thin orchestrator. `wizard_complete` flag in `user.yaml` gates the main app. Each mandatory step writes immediately to `user.yaml` so partial progress survives a crash or browser close.
**Tech Stack:** Streamlit, pdfminer.six, python-docx, PyYAML, existing task_runner.py + llm_router.py, pytest with unittest.mock.
**Design doc:** `docs/plans/2026-02-24-expanded-wizard-design.md`
---
## Before You Start
```bash
# Verify tests pass baseline
conda run -n job-seeker python -m pytest tests/ -v
# Confirm current wizard exists
ls app/pages/0_Setup.py app/wizard/ 2>/dev/null || echo "wizard/ not yet created"
```
---
## Task 1: UserProfile — wizard fields + DB params column
**Files:**
- Modify: `scripts/user_profile.py`
- Modify: `config/user.yaml.example`
- Modify: `scripts/db.py` (init_db + insert_task + update_task_stage)
- Test: `tests/test_user_profile.py` (add cases)
- Test: `tests/test_db.py` (add cases)
New fields needed in `user.yaml`:
```yaml
tier: free # free | paid | premium
dev_tier_override: null # overrides tier for local testing; set to free|paid|premium
wizard_complete: false # flipped true only when all mandatory steps pass + Finish
wizard_step: 0 # last completed step number (1-6); 0 = not started
dismissed_banners: [] # list of banner keys the user has dismissed on Home
```
New column needed in `background_tasks`: `params TEXT NULL` (JSON for wizard_generate tasks).
**Step 1: Add test cases for new UserProfile fields**
```python
# tests/test_user_profile.py — add to existing file
def test_wizard_defaults(tmp_path):
p = tmp_path / "user.yaml"
p.write_text("name: Test\nemail: t@t.com\ncareer_summary: x\n")
u = UserProfile(p)
assert u.wizard_complete is False
assert u.wizard_step == 0
assert u.tier == "free"
assert u.dev_tier_override is None
assert u.dismissed_banners == []
def test_effective_tier_override(tmp_path):
p = tmp_path / "user.yaml"
p.write_text("name: T\nemail: t@t.com\ncareer_summary: x\ntier: free\ndev_tier_override: premium\n")
u = UserProfile(p)
assert u.effective_tier == "premium"
def test_effective_tier_no_override(tmp_path):
p = tmp_path / "user.yaml"
p.write_text("name: T\nemail: t@t.com\ncareer_summary: x\ntier: paid\n")
u = UserProfile(p)
assert u.effective_tier == "paid"
```
**Step 2: Run — expect FAIL**
```bash
conda run -n job-seeker python -m pytest tests/test_user_profile.py -k "wizard" -v
```
Expected: `AttributeError: 'UserProfile' object has no attribute 'wizard_complete'`
**Step 3: Add fields to `_DEFAULTS` and `UserProfile.__init__` in `scripts/user_profile.py`**
In `_DEFAULTS`, add:
```python
"tier": "free",
"dev_tier_override": None,
"wizard_complete": False,
"wizard_step": 0,
"dismissed_banners": [],
```
In `__init__`, add after existing field assignments:
```python
self.tier: str = data.get("tier", "free")
self.dev_tier_override: str | None = data.get("dev_tier_override") or None
self.wizard_complete: bool = bool(data.get("wizard_complete", False))
self.wizard_step: int = int(data.get("wizard_step", 0))
self.dismissed_banners: list[str] = list(data.get("dismissed_banners", []))
```
Add `effective_tier` property:
```python
@property
def effective_tier(self) -> str:
"""Returns dev_tier_override if set, otherwise tier."""
return self.dev_tier_override or self.tier
```
**Step 4: Update `config/user.yaml.example`** — add after `candidate_lgbtq_focus`:
```yaml
tier: free # free | paid | premium
dev_tier_override: null # overrides tier locally (for testing only)
wizard_complete: false
wizard_step: 0
dismissed_banners: []
```
**Step 5: Add insert_task params test**
```python
# tests/test_db.py — add after existing insert_task tests
def test_insert_task_with_params(tmp_path):
db = tmp_path / "t.db"
init_db(db)
import json
params = json.dumps({"section": "career_summary"})
task_id, is_new = insert_task(db, "wizard_generate", 0, params=params)
assert is_new is True
# Second call with same params = dedup
task_id2, is_new2 = insert_task(db, "wizard_generate", 0, params=params)
assert is_new2 is False
assert task_id == task_id2
# Different section = new task
params2 = json.dumps({"section": "job_titles"})
task_id3, is_new3 = insert_task(db, "wizard_generate", 0, params=params2)
assert is_new3 is True
```
**Step 6: Run — expect FAIL**
```bash
conda run -n job-seeker python -m pytest tests/test_db.py -k "params" -v
```
Expected: `TypeError: insert_task() got unexpected keyword argument 'params'`
**Step 7: Add `params` column to `background_tasks` in `scripts/db.py`**
In `init_db`, add `params TEXT` to the CREATE TABLE statement for `background_tasks`:
```sql
CREATE TABLE IF NOT EXISTS background_tasks (
id INTEGER PRIMARY KEY AUTOINCREMENT,
task_type TEXT NOT NULL,
job_id INTEGER DEFAULT 0,
params TEXT,
status TEXT DEFAULT 'queued',
stage TEXT,
error TEXT,
created_at TEXT DEFAULT (datetime('now')),
updated_at TEXT DEFAULT (datetime('now')),
finished_at TEXT
)
```
Also add a migration for existing DBs (after CREATE TABLE):
```python
# Migrate: add params column if missing
try:
conn.execute("ALTER TABLE background_tasks ADD COLUMN params TEXT")
except Exception:
pass # column already exists
```
Update `insert_task` signature and dedup query:
```python
def insert_task(db_path: Path, task_type: str, job_id: int,
params: str | None = None) -> tuple[int, bool]:
"""Insert a task row if no identical active task exists.
Dedup key: (task_type, job_id) when params is None;
(task_type, job_id, params) when params is provided.
"""
conn = sqlite3.connect(db_path)
try:
if params is not None:
existing = conn.execute(
"SELECT id FROM background_tasks WHERE task_type=? AND job_id=? "
"AND params=? AND status IN ('queued','running')",
(task_type, job_id, params)
).fetchone()
else:
existing = conn.execute(
"SELECT id FROM background_tasks WHERE task_type=? AND job_id=? "
"AND status IN ('queued','running')",
(task_type, job_id)
).fetchone()
if existing:
return existing[0], False
cur = conn.execute(
"INSERT INTO background_tasks (task_type, job_id, params) VALUES (?,?,?)",
(task_type, job_id, params)
)
conn.commit()
return cur.lastrowid, True
finally:
conn.close()
```
Update `submit_task` in `scripts/task_runner.py` to accept and pass params:
```python
def submit_task(db_path: Path = DEFAULT_DB, task_type: str = "",
job_id: int = None, params: str | None = None) -> tuple[int, bool]:
task_id, is_new = insert_task(db_path, task_type, job_id or 0, params=params)
if is_new:
t = threading.Thread(
target=_run_task,
args=(db_path, task_id, task_type, job_id or 0, params),
daemon=True,
)
t.start()
return task_id, is_new
```
Update `_run_task` signature: `def _run_task(db_path, task_id, task_type, job_id, params=None)`
**Step 8: Run tests**
```bash
conda run -n job-seeker python -m pytest tests/test_user_profile.py tests/test_db.py tests/test_task_runner.py -v
```
Expected: all pass (existing tests unaffected, new tests pass)
**Step 9: Commit**
```bash
git add scripts/user_profile.py scripts/db.py scripts/task_runner.py config/user.yaml.example tests/test_user_profile.py tests/test_db.py
git commit -m "feat: wizard fields in UserProfile + params column in background_tasks"
```
---
## Task 2: Tier system (`app/wizard/tiers.py`)
**Files:**
- Create: `app/wizard/__init__.py`
- Create: `app/wizard/tiers.py`
- Create: `tests/test_wizard_tiers.py`
**Step 1: Write failing tests**
```python
# tests/test_wizard_tiers.py
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent))
from app.wizard.tiers import can_use, tier_label, TIERS, FEATURES
def test_tiers_list():
assert TIERS == ["free", "paid", "premium"]
def test_can_use_free_feature_always():
# google_drive is free (not in FEATURES dict = available to all)
assert can_use("free", "google_drive_sync") is True
def test_can_use_paid_feature_free_tier():
assert can_use("free", "company_research") is False
def test_can_use_paid_feature_paid_tier():
assert can_use("paid", "company_research") is True
def test_can_use_paid_feature_premium_tier():
assert can_use("premium", "company_research") is True
def test_can_use_premium_feature_paid_tier():
assert can_use("paid", "model_fine_tuning") is False
def test_can_use_premium_feature_premium_tier():
assert can_use("premium", "model_fine_tuning") is True
def test_can_use_unknown_feature_always_true():
# Unknown features are not gated
assert can_use("free", "nonexistent_feature") is True
def test_tier_label_paid():
label = tier_label("company_research")
assert "Paid" in label or "paid" in label.lower()
def test_tier_label_premium():
label = tier_label("model_fine_tuning")
assert "Premium" in label or "premium" in label.lower()
def test_tier_label_free_feature():
# Free features have no lock label
label = tier_label("unknown_free_feature")
assert label == ""
```
**Step 2: Run — expect FAIL**
```bash
conda run -n job-seeker python -m pytest tests/test_wizard_tiers.py -v
```
Expected: `ModuleNotFoundError: No module named 'app.wizard'`
**Step 3: Create `app/wizard/__init__.py`** (empty)
**Step 4: Create `app/wizard/tiers.py`**
```python
"""
Tier definitions and feature gates for Peregrine.
Tiers: free < paid < premium
FEATURES maps feature key → minimum tier required.
Features not in FEATURES are available to all tiers.
"""
from __future__ import annotations
TIERS = ["free", "paid", "premium"]
# Maps feature key → minimum tier string required.
# Features absent from this dict are free (available to all).
FEATURES: dict[str, str] = {
# Wizard LLM generation
"llm_career_summary": "paid",
"llm_expand_bullets": "paid",
"llm_suggest_skills": "paid",
"llm_voice_guidelines": "premium",
"llm_job_titles": "paid",
"llm_keywords_blocklist": "paid",
"llm_mission_notes": "paid",
# App features
"company_research": "paid",
"interview_prep": "paid",
"email_classifier": "paid",
"survey_assistant": "paid",
"model_fine_tuning": "premium",
"shared_cover_writer_model": "paid",
"multi_user": "premium",
# Integrations (paid)
"notion_sync": "paid",
"google_sheets_sync": "paid",
"airtable_sync": "paid",
"google_calendar_sync": "paid",
"apple_calendar_sync": "paid",
"slack_notifications": "paid",
}
# Free integrations (not in FEATURES):
# google_drive_sync, dropbox_sync, onedrive_sync, mega_sync,
# nextcloud_sync, discord_notifications, home_assistant
def can_use(tier: str, feature: str) -> bool:
"""Return True if the given tier has access to the feature."""
required = FEATURES.get(feature)
if required is None:
return True # not gated
try:
return TIERS.index(tier) >= TIERS.index(required)
except ValueError:
return False
def tier_label(feature: str) -> str:
"""Return a display label for a locked feature, or '' if free."""
required = FEATURES.get(feature)
if required is None:
return ""
return "🔒 Paid" if required == "paid" else "⭐ Premium"
```
**Step 5: Run tests**
```bash
conda run -n job-seeker python -m pytest tests/test_wizard_tiers.py -v
```
Expected: all 11 tests pass.
**Step 6: Commit**
```bash
git add app/wizard/__init__.py app/wizard/tiers.py tests/test_wizard_tiers.py
git commit -m "feat: tier system with FEATURES gate + can_use() + tier_label()"
```
---
## Task 3: Step validate functions — hardware, tier, identity, resume, inference, search
Each step module exports only `validate(data: dict) -> list[str]` and constants. The Streamlit render function is in a later task (Task 16 — orchestrator). This task builds the pure-logic layer that is fully testable without Streamlit.
**Files:**
- Create: `app/wizard/step_hardware.py`
- Create: `app/wizard/step_tier.py`
- Create: `app/wizard/step_identity.py`
- Create: `app/wizard/step_resume.py`
- Create: `app/wizard/step_inference.py`
- Create: `app/wizard/step_search.py`
- Create: `tests/test_wizard_steps.py`
**Step 1: Write all failing tests**
```python
# tests/test_wizard_steps.py
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent))
# ── Hardware ───────────────────────────────────────────────────────────────────
from app.wizard.step_hardware import validate as hw_validate, PROFILES
def test_hw_valid():
assert hw_validate({"inference_profile": "remote"}) == []
def test_hw_missing():
assert hw_validate({}) != []
def test_hw_invalid():
assert hw_validate({"inference_profile": "turbo"}) != []
def test_hw_all_profiles():
for p in PROFILES:
assert hw_validate({"inference_profile": p}) == []
# ── Tier ───────────────────────────────────────────────────────────────────────
from app.wizard.step_tier import validate as tier_validate
def test_tier_valid():
assert tier_validate({"tier": "free"}) == []
def test_tier_missing():
assert tier_validate({}) != []
def test_tier_invalid():
assert tier_validate({"tier": "enterprise"}) != []
# ── Identity ───────────────────────────────────────────────────────────────────
from app.wizard.step_identity import validate as id_validate
def test_id_all_required_fields():
d = {"name": "Alice", "email": "a@b.com", "career_summary": "10 years of stuff."}
assert id_validate(d) == []
def test_id_missing_name():
d = {"name": "", "email": "a@b.com", "career_summary": "x"}
assert any("name" in e.lower() for e in id_validate(d))
def test_id_missing_email():
d = {"name": "Alice", "email": "", "career_summary": "x"}
assert any("email" in e.lower() for e in id_validate(d))
def test_id_missing_summary():
d = {"name": "Alice", "email": "a@b.com", "career_summary": ""}
assert any("summary" in e.lower() or "career" in e.lower() for e in id_validate(d))
# ── Resume ─────────────────────────────────────────────────────────────────────
from app.wizard.step_resume import validate as resume_validate
def test_resume_no_experience():
assert resume_validate({"experience": []}) != []
def test_resume_one_entry():
d = {"experience": [{"company": "Acme", "title": "Engineer", "bullets": ["did stuff"]}]}
assert resume_validate(d) == []
def test_resume_missing_experience_key():
assert resume_validate({}) != []
# ── Inference ──────────────────────────────────────────────────────────────────
from app.wizard.step_inference import validate as inf_validate
def test_inference_not_confirmed():
assert inf_validate({"endpoint_confirmed": False}) != []
def test_inference_confirmed():
assert inf_validate({"endpoint_confirmed": True}) == []
def test_inference_missing():
assert inf_validate({}) != []
# ── Search ─────────────────────────────────────────────────────────────────────
from app.wizard.step_search import validate as search_validate
def test_search_valid():
d = {"job_titles": ["Software Engineer"], "locations": ["Remote"]}
assert search_validate(d) == []
def test_search_missing_titles():
d = {"job_titles": [], "locations": ["Remote"]}
assert any("title" in e.lower() for e in search_validate(d))
def test_search_missing_locations():
d = {"job_titles": ["SWE"], "locations": []}
assert any("location" in e.lower() for e in search_validate(d))
def test_search_missing_both():
assert len(search_validate({})) == 2
```
**Step 2: Run — expect FAIL (modules don't exist)**
```bash
conda run -n job-seeker python -m pytest tests/test_wizard_steps.py -v
```
**Step 3: Create the six step modules**
`app/wizard/step_hardware.py`:
```python
"""Step 1 — Hardware detection and inference profile selection."""
PROFILES = ["remote", "cpu", "single-gpu", "dual-gpu"]
def validate(data: dict) -> list[str]:
errors = []
profile = data.get("inference_profile", "")
if not profile:
errors.append("Inference profile is required.")
elif profile not in PROFILES:
errors.append(f"Invalid inference profile '{profile}'. Choose: {', '.join(PROFILES)}.")
return errors
```
`app/wizard/step_tier.py`:
```python
"""Step 2 — Tier selection (free / paid / premium)."""
from app.wizard.tiers import TIERS
def validate(data: dict) -> list[str]:
errors = []
tier = data.get("tier", "")
if not tier:
errors.append("Tier selection is required.")
elif tier not in TIERS:
errors.append(f"Invalid tier '{tier}'. Choose: {', '.join(TIERS)}.")
return errors
```
`app/wizard/step_identity.py`:
```python
"""Step 3 — Identity (name, email, phone, linkedin, career_summary)."""
def validate(data: dict) -> list[str]:
errors = []
if not (data.get("name") or "").strip():
errors.append("Full name is required.")
if not (data.get("email") or "").strip():
errors.append("Email address is required.")
if not (data.get("career_summary") or "").strip():
errors.append("Career summary is required.")
return errors
```
`app/wizard/step_resume.py`:
```python
"""Step 4 — Resume (upload or guided form builder)."""
def validate(data: dict) -> list[str]:
errors = []
experience = data.get("experience", [])
if not experience:
errors.append("At least one work experience entry is required.")
return errors
```
`app/wizard/step_inference.py`:
```python
"""Step 5 — LLM inference backend configuration and key entry."""
def validate(data: dict) -> list[str]:
errors = []
if not data.get("endpoint_confirmed"):
errors.append("At least one working LLM endpoint must be confirmed.")
return errors
```
`app/wizard/step_search.py`:
```python
"""Step 6 — Job search preferences (titles, locations, boards, keywords)."""
def validate(data: dict) -> list[str]:
errors = []
titles = data.get("job_titles") or []
locations = data.get("locations") or []
if not titles:
errors.append("At least one job title is required.")
if not locations:
errors.append("At least one location is required.")
return errors
```
**Step 4: Run tests**
```bash
conda run -n job-seeker python -m pytest tests/test_wizard_steps.py -v
```
Expected: all 22 tests pass.
**Step 5: Commit**
```bash
git add app/wizard/step_hardware.py app/wizard/step_tier.py app/wizard/step_identity.py \
app/wizard/step_resume.py app/wizard/step_inference.py app/wizard/step_search.py \
tests/test_wizard_steps.py
git commit -m "feat: wizard step validate() functions — all six mandatory steps"
```
---
## Task 4: Resume parser (`scripts/resume_parser.py`)
Parses PDF and DOCX files to raw text, then calls the LLM to structure the text into `plain_text_resume.yaml` fields.
**Files:**
- Create: `scripts/resume_parser.py`
- Create: `tests/test_resume_parser.py`
**Step 1: Write failing tests**
```python
# tests/test_resume_parser.py
import sys
from pathlib import Path
from unittest.mock import patch, MagicMock
sys.path.insert(0, str(Path(__file__).parent.parent))
from scripts.resume_parser import extract_text_from_pdf, extract_text_from_docx, structure_resume
def test_extract_pdf_returns_string():
mock_pages = [MagicMock()]
mock_pages[0].get_text.return_value = "Jane Doe\nSoftware Engineer"
with patch("scripts.resume_parser.pdfplumber") as mock_pdf:
mock_pdf.open.return_value.__enter__.return_value.pages = mock_pages
result = extract_text_from_pdf(b"%PDF-fake")
assert "Jane Doe" in result
def test_extract_docx_returns_string():
mock_doc = MagicMock()
mock_doc.paragraphs = [MagicMock(text="Alice Smith"), MagicMock(text="Senior Developer")]
with patch("scripts.resume_parser.Document", return_value=mock_doc):
result = extract_text_from_docx(b"PK fake docx bytes")
assert "Alice Smith" in result
def test_structure_resume_returns_dict():
raw_text = "Jane Doe\nSoftware Engineer at Acme 2020-2023"
mock_llm = MagicMock(return_value='{"name": "Jane Doe", "experience": [{"company": "Acme"}]}')
with patch("scripts.resume_parser._llm_structure", mock_llm):
result = structure_resume(raw_text)
assert "experience" in result
assert isinstance(result["experience"], list)
def test_structure_resume_invalid_json_returns_empty():
with patch("scripts.resume_parser._llm_structure", return_value="not json at all"):
result = structure_resume("some text")
# Should return empty dict rather than crash
assert isinstance(result, dict)
```
**Step 2: Run — expect FAIL**
```bash
conda run -n job-seeker python -m pytest tests/test_resume_parser.py -v
```
**Step 3: Create `scripts/resume_parser.py`**
```python
"""
Resume parser — extract text from PDF/DOCX and structure via LLM.
Fast path: file bytes → raw text → LLM structures into resume dict.
Result dict keys mirror plain_text_resume.yaml sections.
"""
from __future__ import annotations
import io
import json
import re
from pathlib import Path
def extract_text_from_pdf(file_bytes: bytes) -> str:
"""Extract raw text from PDF bytes using pdfplumber."""
import pdfplumber
with pdfplumber.open(io.BytesIO(file_bytes)) as pdf:
pages = [page.get_text() or "" for page in pdf.pages]
return "\n".join(pages)
def extract_text_from_docx(file_bytes: bytes) -> str:
"""Extract raw text from DOCX bytes using python-docx."""
from docx import Document
doc = Document(io.BytesIO(file_bytes))
return "\n".join(p.text for p in doc.paragraphs if p.text.strip())
def _llm_structure(raw_text: str) -> str:
"""Call LLM to convert raw resume text to JSON. Returns raw LLM output string."""
from scripts.llm_router import LLMRouter
prompt = f"""You are a resume parser. Convert the following resume text into a JSON object.
Required JSON keys:
- name (string)
- email (string, may be empty)
- phone (string, may be empty)
- career_summary (string: 2-4 sentence professional summary)
- experience (list of objects with: company, title, start_date, end_date, bullets list of strings)
- education (list of objects with: institution, degree, field, graduation_year)
- skills (list of strings)
- achievements (list of strings, may be empty)
Return ONLY valid JSON. No markdown, no explanation.
Resume text:
{raw_text[:6000]}"""
router = LLMRouter()
return router.complete(prompt)
def structure_resume(raw_text: str) -> dict:
"""Convert raw resume text to a structured dict via LLM.
Returns an empty dict on parse failure — caller should fall back to form builder.
"""
try:
raw = _llm_structure(raw_text)
# Strip markdown code fences if present
raw = re.sub(r"^```(?:json)?\s*", "", raw.strip())
raw = re.sub(r"\s*```$", "", raw)
return json.loads(raw)
except Exception:
return {}
```
**Step 4: Run tests**
```bash
conda run -n job-seeker python -m pytest tests/test_resume_parser.py -v
```
Expected: all 4 tests pass.
**Step 5: Commit**
```bash
git add scripts/resume_parser.py tests/test_resume_parser.py
git commit -m "feat: resume parser — PDF/DOCX extraction + LLM structuring"
```
---
## Task 5: Integration base class and registry
**Files:**
- Create: `scripts/integrations/__init__.py`
- Create: `scripts/integrations/base.py`
- Create: `tests/test_integrations.py`
**Step 1: Write failing tests**
```python
# tests/test_integrations.py
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent))
def test_registry_loads():
from scripts.integrations import REGISTRY
assert isinstance(REGISTRY, dict)
assert len(REGISTRY) > 0
def test_all_registry_entries_are_integration_base():
from scripts.integrations import REGISTRY
from scripts.integrations.base import IntegrationBase
for name, cls in REGISTRY.items():
assert issubclass(cls, IntegrationBase), f"{name} must subclass IntegrationBase"
def test_each_integration_has_required_attributes():
from scripts.integrations import REGISTRY
for name, cls in REGISTRY.items():
assert hasattr(cls, "name"), f"{name} missing .name"
assert hasattr(cls, "label"), f"{name} missing .label"
assert hasattr(cls, "tier"), f"{name} missing .tier"
def test_fields_returns_list_of_dicts():
from scripts.integrations import REGISTRY
for name, cls in REGISTRY.items():
instance = cls()
fields = instance.fields()
assert isinstance(fields, list), f"{name}.fields() must return list"
for f in fields:
assert "key" in f, f"{name} field missing 'key'"
assert "label" in f, f"{name} field missing 'label'"
assert "type" in f, f"{name} field missing 'type'"
def test_notion_in_registry():
from scripts.integrations import REGISTRY
assert "notion" in REGISTRY
def test_discord_in_registry():
from scripts.integrations import REGISTRY
assert "discord" in REGISTRY
```
**Step 2: Run — expect FAIL**
```bash
conda run -n job-seeker python -m pytest tests/test_integrations.py -v
```
**Step 3: Create `scripts/integrations/base.py`**
```python
"""Base class for all Peregrine integrations."""
from __future__ import annotations
from abc import ABC, abstractmethod
from pathlib import Path
import yaml
class IntegrationBase(ABC):
"""All integrations inherit from this class.
Subclasses declare class-level:
name : str — machine key, matches yaml filename (e.g. "notion")
label : str — display name (e.g. "Notion")
tier : str — minimum tier required: "free" | "paid" | "premium"
"""
name: str
label: str
tier: str
@abstractmethod
def fields(self) -> list[dict]:
"""Return form field definitions for the wizard connection card.
Each dict: {"key": str, "label": str, "type": "text"|"password"|"url"|"checkbox",
"placeholder": str, "required": bool, "help": str}
"""
@abstractmethod
def connect(self, config: dict) -> bool:
"""Store config in memory, return True (actual validation happens in test())."""
@abstractmethod
def test(self) -> bool:
"""Verify the stored credentials actually work. Returns True on success."""
def sync(self, jobs: list[dict]) -> int:
"""Push jobs to the external service. Returns count synced. Override if applicable."""
return 0
@classmethod
def config_path(cls, config_dir: Path) -> Path:
return config_dir / "integrations" / f"{cls.name}.yaml"
@classmethod
def is_configured(cls, config_dir: Path) -> bool:
return cls.config_path(config_dir).exists()
def save_config(self, config: dict, config_dir: Path) -> None:
"""Write config to config/integrations/<name>.yaml (only after test() passes)."""
path = self.config_path(config_dir)
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(yaml.dump(config, default_flow_style=False, allow_unicode=True))
def load_config(self, config_dir: Path) -> dict:
path = self.config_path(config_dir)
if not path.exists():
return {}
return yaml.safe_load(path.read_text()) or {}
```
**Step 4: Create `scripts/integrations/__init__.py`**
```python
"""Integration registry — auto-discovers all IntegrationBase subclasses."""
from __future__ import annotations
from scripts.integrations.base import IntegrationBase
# Import all integration modules to trigger subclass registration
from scripts.integrations import ( # noqa: F401
notion, google_drive, google_sheets, airtable,
dropbox, onedrive, mega, nextcloud,
google_calendar, apple_calendar,
slack, discord, home_assistant,
)
REGISTRY: dict[str, type[IntegrationBase]] = {
cls.name: cls
for cls in IntegrationBase.__subclasses__()
}
```
**Step 5: Run tests** — will still fail because integration modules don't exist yet. That's expected — proceed to Task 6.
---
## Task 6: Integration implementations (all 13)
Create all 13 integration stub modules. Each has: class-level name/label/tier, `fields()`, `connect()`, `test()`. For v1, `test()` does a real HTTP/API call where possible; complex OAuth flows are stubbed with a clear `# TODO: OAuth` comment and return True after config write.
**Files:**
- Create: `scripts/integrations/notion.py`
- Create: `scripts/integrations/google_drive.py`
- Create: `scripts/integrations/google_sheets.py`
- Create: `scripts/integrations/airtable.py`
- Create: `scripts/integrations/dropbox.py`
- Create: `scripts/integrations/onedrive.py`
- Create: `scripts/integrations/mega.py`
- Create: `scripts/integrations/nextcloud.py`
- Create: `scripts/integrations/google_calendar.py`
- Create: `scripts/integrations/apple_calendar.py`
- Create: `scripts/integrations/slack.py`
- Create: `scripts/integrations/discord.py`
- Create: `scripts/integrations/home_assistant.py`
- Create: `config/integrations/` (directory with .yaml.example files)
**Step 1: Create `scripts/integrations/notion.py`** (has real test())
```python
from scripts.integrations.base import IntegrationBase
class NotionIntegration(IntegrationBase):
name = "notion"
label = "Notion"
tier = "paid"
def __init__(self):
self._token = ""
self._database_id = ""
def fields(self) -> list[dict]:
return [
{"key": "token", "label": "Integration Token", "type": "password",
"placeholder": "secret_…", "required": True,
"help": "Settings → Connections → Develop or manage integrations → New integration"},
{"key": "database_id", "label": "Database ID", "type": "text",
"placeholder": "32-character ID from Notion URL", "required": True,
"help": "Open your Notion database → Share → Copy link → extract the ID"},
]
def connect(self, config: dict) -> bool:
self._token = config.get("token", "")
self._database_id = config.get("database_id", "")
return bool(self._token and self._database_id)
def test(self) -> bool:
try:
from notion_client import Client
db = Client(auth=self._token).databases.retrieve(self._database_id)
return bool(db)
except Exception:
return False
```
**Step 2: Create file storage integrations**`google_drive.py`, `dropbox.py`, `onedrive.py`, `mega.py`, `nextcloud.py`
Pattern (show google_drive, others follow same structure with different name/label/fields):
```python
# scripts/integrations/google_drive.py
from scripts.integrations.base import IntegrationBase
class GoogleDriveIntegration(IntegrationBase):
name = "google_drive"
label = "Google Drive"
tier = "free"
def __init__(self):
self._config: dict = {}
def fields(self) -> list[dict]:
return [
{"key": "folder_id", "label": "Folder ID", "type": "text",
"placeholder": "Paste the folder ID from the Drive URL", "required": True,
"help": "Open the folder in Drive → copy the ID from the URL after /folders/"},
{"key": "credentials_json", "label": "Service Account JSON path", "type": "text",
"placeholder": "~/credentials/google-drive-sa.json", "required": True,
"help": "Download from Google Cloud Console → Service Accounts → Keys"},
]
def connect(self, config: dict) -> bool:
self._config = config
return bool(config.get("folder_id") and config.get("credentials_json"))
def test(self) -> bool:
# TODO: use google-api-python-client to list the folder
# For v1, verify the credentials file exists
import os
creds = os.path.expanduser(self._config.get("credentials_json", ""))
return os.path.exists(creds)
```
Create similarly for:
- `dropbox.py` — name="dropbox", label="Dropbox", tier="free", fields: access_token + folder_path; test: GET /files/list_folder (requests)
- `onedrive.py` — name="onedrive", label="OneDrive", tier="free", fields: client_id + client_secret + folder_path; test: TODO OAuth
- `mega.py` — name="mega", label="MEGA", tier="free", fields: email + password + folder_path; test: TODO (mega.py SDK)
- `nextcloud.py` — name="nextcloud", label="Nextcloud", tier="free", fields: host + username + password + folder_path; test: WebDAV PROPFIND
**Step 3: Create tracker integrations**`google_sheets.py`, `airtable.py`
```python
# scripts/integrations/google_sheets.py
from scripts.integrations.base import IntegrationBase
class GoogleSheetsIntegration(IntegrationBase):
name = "google_sheets"
label = "Google Sheets"
tier = "paid"
def __init__(self): self._config: dict = {}
def fields(self) -> list[dict]:
return [
{"key": "spreadsheet_id", "label": "Spreadsheet ID", "type": "text",
"placeholder": "From the URL: /d/<ID>/edit", "required": True, "help": ""},
{"key": "sheet_name", "label": "Sheet name", "type": "text",
"placeholder": "Jobs", "required": True, "help": "Name of the tab to write to"},
{"key": "credentials_json", "label": "Service Account JSON path", "type": "text",
"placeholder": "~/credentials/google-sheets-sa.json", "required": True, "help": ""},
]
def connect(self, config: dict) -> bool:
self._config = config
return bool(config.get("spreadsheet_id") and config.get("credentials_json"))
def test(self) -> bool:
import os
creds = os.path.expanduser(self._config.get("credentials_json", ""))
return os.path.exists(creds) # TODO: gspread open_by_key()
```
```python
# scripts/integrations/airtable.py
from scripts.integrations.base import IntegrationBase
class AirtableIntegration(IntegrationBase):
name = "airtable"
label = "Airtable"
tier = "paid"
def __init__(self): self._config: dict = {}
def fields(self) -> list[dict]:
return [
{"key": "api_key", "label": "Personal Access Token", "type": "password",
"placeholder": "patXXX…", "required": True,
"help": "airtable.com/create/tokens"},
{"key": "base_id", "label": "Base ID", "type": "text",
"placeholder": "appXXX…", "required": True, "help": "From the API docs URL"},
{"key": "table_name", "label": "Table name", "type": "text",
"placeholder": "Jobs", "required": True, "help": ""},
]
def connect(self, config: dict) -> bool:
self._config = config
return bool(config.get("api_key") and config.get("base_id"))
def test(self) -> bool:
try:
import requests
r = requests.get(
f"https://api.airtable.com/v0/{self._config['base_id']}/{self._config['table_name']}",
headers={"Authorization": f"Bearer {self._config['api_key']}"},
params={"maxRecords": 1}, timeout=8,
)
return r.status_code == 200
except Exception:
return False
```
**Step 4: Create calendar integrations**`google_calendar.py`, `apple_calendar.py`
```python
# scripts/integrations/google_calendar.py
from scripts.integrations.base import IntegrationBase
class GoogleCalendarIntegration(IntegrationBase):
name = "google_calendar"
label = "Google Calendar"
tier = "paid"
def __init__(self): self._config: dict = {}
def fields(self) -> list[dict]:
return [
{"key": "calendar_id", "label": "Calendar ID", "type": "text",
"placeholder": "primary or xxxxx@group.calendar.google.com", "required": True,
"help": "Settings → Calendars → [name] → Integrate calendar → Calendar ID"},
{"key": "credentials_json", "label": "Service Account JSON path", "type": "text",
"placeholder": "~/credentials/google-calendar-sa.json", "required": True, "help": ""},
]
def connect(self, config: dict) -> bool:
self._config = config
return bool(config.get("calendar_id") and config.get("credentials_json"))
def test(self) -> bool:
import os
creds = os.path.expanduser(self._config.get("credentials_json", ""))
return os.path.exists(creds) # TODO: google-api-python-client calendars().get()
```
```python
# scripts/integrations/apple_calendar.py
from scripts.integrations.base import IntegrationBase
class AppleCalendarIntegration(IntegrationBase):
name = "apple_calendar"
label = "Apple Calendar (CalDAV)"
tier = "paid"
def __init__(self): self._config: dict = {}
def fields(self) -> list[dict]:
return [
{"key": "caldav_url", "label": "CalDAV URL", "type": "url",
"placeholder": "https://caldav.icloud.com/", "required": True,
"help": "iCloud: https://caldav.icloud.com/ | self-hosted: your server URL"},
{"key": "username", "label": "Apple ID / username", "type": "text",
"placeholder": "you@icloud.com", "required": True, "help": ""},
{"key": "app_password", "label": "App-Specific Password", "type": "password",
"placeholder": "xxxx-xxxx-xxxx-xxxx", "required": True,
"help": "appleid.apple.com → Security → App-Specific Passwords → Generate"},
{"key": "calendar_name", "label": "Calendar name", "type": "text",
"placeholder": "Interviews", "required": True, "help": ""},
]
def connect(self, config: dict) -> bool:
self._config = config
return bool(config.get("caldav_url") and config.get("username") and config.get("app_password"))
def test(self) -> bool:
try:
import caldav
client = caldav.DAVClient(
url=self._config["caldav_url"],
username=self._config["username"],
password=self._config["app_password"],
)
principal = client.principal()
return principal is not None
except Exception:
return False
```
**Step 5: Create notification integrations**`slack.py`, `discord.py`, `home_assistant.py`
```python
# scripts/integrations/slack.py
from scripts.integrations.base import IntegrationBase
class SlackIntegration(IntegrationBase):
name = "slack"
label = "Slack"
tier = "paid"
def __init__(self): self._config: dict = {}
def fields(self) -> list[dict]:
return [
{"key": "webhook_url", "label": "Incoming Webhook URL", "type": "url",
"placeholder": "https://hooks.slack.com/services/…", "required": True,
"help": "api.slack.com → Your Apps → Incoming Webhooks → Add"},
{"key": "channel", "label": "Channel (optional)", "type": "text",
"placeholder": "#job-alerts", "required": False,
"help": "Leave blank to use the webhook's default channel"},
]
def connect(self, config: dict) -> bool:
self._config = config
return bool(config.get("webhook_url"))
def test(self) -> bool:
try:
import requests
r = requests.post(
self._config["webhook_url"],
json={"text": "Peregrine connected successfully."},
timeout=8,
)
return r.status_code == 200
except Exception:
return False
```
```python
# scripts/integrations/discord.py
from scripts.integrations.base import IntegrationBase
class DiscordIntegration(IntegrationBase):
name = "discord"
label = "Discord (webhook)"
tier = "free"
def __init__(self): self._config: dict = {}
def fields(self) -> list[dict]:
return [
{"key": "webhook_url", "label": "Webhook URL", "type": "url",
"placeholder": "https://discord.com/api/webhooks/…", "required": True,
"help": "Server Settings → Integrations → Webhooks → New Webhook → Copy URL"},
]
def connect(self, config: dict) -> bool:
self._config = config
return bool(config.get("webhook_url"))
def test(self) -> bool:
try:
import requests
r = requests.post(
self._config["webhook_url"],
json={"content": "Peregrine connected successfully."},
timeout=8,
)
return r.status_code in (200, 204)
except Exception:
return False
```
```python
# scripts/integrations/home_assistant.py
from scripts.integrations.base import IntegrationBase
class HomeAssistantIntegration(IntegrationBase):
name = "home_assistant"
label = "Home Assistant"
tier = "free"
def __init__(self): self._config: dict = {}
def fields(self) -> list[dict]:
return [
{"key": "base_url", "label": "Home Assistant URL", "type": "url",
"placeholder": "http://homeassistant.local:8123", "required": True, "help": ""},
{"key": "token", "label": "Long-Lived Access Token", "type": "password",
"placeholder": "eyJ0eXAiOiJKV1Qi…", "required": True,
"help": "Profile → Long-Lived Access Tokens → Create Token"},
{"key": "notification_service", "label": "Notification service", "type": "text",
"placeholder": "notify.mobile_app_my_phone", "required": True,
"help": "Developer Tools → Services → search 'notify' to find yours"},
]
def connect(self, config: dict) -> bool:
self._config = config
return bool(config.get("base_url") and config.get("token"))
def test(self) -> bool:
try:
import requests
r = requests.get(
f"{self._config['base_url'].rstrip('/')}/api/",
headers={"Authorization": f"Bearer {self._config['token']}"},
timeout=8,
)
return r.status_code == 200
except Exception:
return False
```
**Step 6: Create `config/integrations/` directory and `.yaml.example` files**
```bash
mkdir -p /Library/Development/devl/peregrine/config/integrations
```
Create `config/integrations/notion.yaml.example`:
```yaml
token: "secret_..."
database_id: "32-character-notion-db-id"
```
Create one `.yaml.example` per integration (notion, google_drive, google_sheets, airtable, dropbox, onedrive, mega, nextcloud, google_calendar, apple_calendar, slack, discord, home_assistant).
Add to `.gitignore`:
```
config/integrations/*.yaml
!config/integrations/*.yaml.example
```
**Step 7: Run integration tests**
```bash
conda run -n job-seeker python -m pytest tests/test_integrations.py -v
```
Expected: all 6 tests pass.
**Step 8: Commit**
```bash
git add scripts/integrations/ config/integrations/ tests/test_integrations.py .gitignore
git commit -m "feat: integration base class + registry + 13 integration implementations"
```
---
## Task 7: `wizard_generate` task type in task_runner
**Files:**
- Modify: `scripts/task_runner.py`
- Modify: `tests/test_task_runner.py`
The `wizard_generate` task accepts `params` JSON with `{"section": "...", "input": {...}}`, calls the LLM, and stores the result as JSON in `background_tasks.error`.
Supported sections: `career_summary`, `expand_bullets`, `suggest_skills`, `voice_guidelines`, `job_titles`, `keywords`, `blocklist`, `mission_notes`
**Step 1: Add tests**
```python
# tests/test_task_runner.py — add to existing file
import json
def test_wizard_generate_career_summary(tmp_path):
"""wizard_generate with career_summary section calls LLM and stores result."""
db = tmp_path / "t.db"
from scripts.db import init_db, get_task_status
init_db(db)
params = json.dumps({
"section": "career_summary",
"input": {"resume_text": "10 years Python dev"}
})
with patch("scripts.task_runner._run_wizard_generate") as mock_gen:
mock_gen.return_value = "Experienced Python developer."
from scripts.task_runner import submit_task
task_id, is_new = submit_task(db, "wizard_generate", 0, params=params)
assert is_new is True
def test_wizard_generate_unknown_section(tmp_path):
"""wizard_generate with unknown section marks task failed."""
db = tmp_path / "t.db"
from scripts.db import init_db, update_task_status
init_db(db)
params = json.dumps({"section": "nonexistent", "input": {}})
# Run inline (don't spawn thread — call _run_task directly)
from scripts.task_runner import _run_task
from scripts.db import insert_task
task_id, _ = insert_task(db, "wizard_generate", 0, params=params)
_run_task(db, task_id, "wizard_generate", 0, params=params)
import sqlite3
conn = sqlite3.connect(db)
row = conn.execute("SELECT status FROM background_tasks WHERE id=?", (task_id,)).fetchone()
conn.close()
assert row[0] == "failed"
```
**Step 2: Run — expect FAIL**
```bash
conda run -n job-seeker python -m pytest tests/test_task_runner.py -k "wizard_generate" -v
```
**Step 3: Add wizard_generate handler to `scripts/task_runner.py`**
Add helper function before `_run_task`:
```python
_WIZARD_PROMPTS = {
"career_summary": (
"Based on the following resume text, write a concise 2-4 sentence professional "
"career summary in first person. Focus on years of experience, key skills, and "
"what makes this person distinctive. Return only the summary text.\n\nResume:\n{resume_text}"
),
"expand_bullets": (
"Rewrite these rough responsibility notes as polished STAR-format bullet points "
"(Situation/Task, Action, Result). Each bullet should start with a strong action verb. "
"Return a JSON array of bullet strings.\n\nNotes:\n{bullet_notes}"
),
"suggest_skills": (
"Based on these work experience descriptions, suggest additional skills to add to "
"a resume. Return a JSON array of skill strings only — no explanations.\n\n"
"Experience:\n{experience_text}"
),
"voice_guidelines": (
"Analyze the writing style and tone of this resume and cover letter corpus. "
"Return 3-5 concise guidelines for maintaining this person's authentic voice in "
"future cover letters (e.g. 'Uses direct, confident statements', 'Avoids buzzwords'). "
"Return a JSON array of guideline strings.\n\nContent:\n{content}"
),
"job_titles": (
"Given these job titles and resume, suggest 5-8 additional job title variations "
"this person should search for. Return a JSON array of title strings only.\n\n"
"Current titles: {current_titles}\nResume summary: {resume_text}"
),
"keywords": (
"Based on this resume and target job titles, suggest important keywords and phrases "
"to include in applications. Return a JSON array of keyword strings.\n\n"
"Titles: {titles}\nResume: {resume_text}"
),
"blocklist": (
"Based on this resume and job search context, suggest companies or keywords to "
"blocklist (avoid in job search). Return a JSON array of strings.\n\n"
"Context: {resume_text}"
),
"mission_notes": (
"Based on this resume, write a short personal note (1-2 sentences) about why this "
"person might care about each of these industries: music, animal_welfare, education. "
"Return a JSON object with industry keys and note values. If the resume shows no "
"connection to an industry, set its value to empty string.\n\nResume: {resume_text}"
),
}
def _run_wizard_generate(section: str, input_data: dict) -> str:
"""Run LLM generation for a wizard section. Returns result string."""
template = _WIZARD_PROMPTS.get(section)
if template is None:
raise ValueError(f"Unknown wizard_generate section: {section!r}")
prompt = template.format(**{k: str(v) for k, v in input_data.items()})
from scripts.llm_router import LLMRouter
return LLMRouter().complete(prompt)
```
In `_run_task`, add the `wizard_generate` branch inside the `try` block:
```python
elif task_type == "wizard_generate":
import json as _json
p = _json.loads(params or "{}")
section = p.get("section", "")
input_data = p.get("input", {})
result = _run_wizard_generate(section, input_data)
# Store result in error field (used as result payload for wizard polling)
update_task_status(
db_path, task_id, "completed",
error=_json.dumps({"section": section, "result": result})
)
return
```
**Step 4: Run tests**
```bash
conda run -n job-seeker python -m pytest tests/test_task_runner.py -v
```
Expected: all pass (new cases + existing unaffected).
**Step 5: Commit**
```bash
git add scripts/task_runner.py tests/test_task_runner.py
git commit -m "feat: wizard_generate task type — 8 LLM generation sections"
```
---
## Task 8: Step integrations module + step_integrations validate
**Files:**
- Create: `app/wizard/step_integrations.py`
- Modify: `tests/test_wizard_steps.py`
The integrations step is optional (never blocks Finish), so `validate()` always returns `[]`. The step module also provides helper functions used by the orchestrator.
**Step 1: Add test**
```python
# tests/test_wizard_steps.py — add at end
from app.wizard.step_integrations import validate as int_validate
def test_integrations_always_passes():
assert int_validate({}) == []
assert int_validate({"connected": ["notion", "slack"]}) == []
```
**Step 2: Create `app/wizard/step_integrations.py`**
```python
"""Step 7 — Optional integrations (cloud storage, calendars, notifications)."""
from __future__ import annotations
from pathlib import Path
def validate(data: dict) -> list[str]:
"""Integrations step is always optional — never blocks Finish."""
return []
def get_available(tier: str) -> list[str]:
"""Return list of integration names available for the given tier."""
from scripts.integrations import REGISTRY
from app.wizard.tiers import can_use
return [
name for name, cls in REGISTRY.items()
if can_use(tier, f"{name}_sync") or can_use(tier, f"{name}_notifications") or cls.tier == "free"
]
def is_connected(name: str, config_dir: Path) -> bool:
"""Return True if an integration config file exists for this name."""
return (config_dir / "integrations" / f"{name}.yaml").exists()
```
**Step 3: Run tests**
```bash
conda run -n job-seeker python -m pytest tests/test_wizard_steps.py -v
```
Expected: all 24 tests pass.
**Step 4: Commit**
```bash
git add app/wizard/step_integrations.py tests/test_wizard_steps.py
git commit -m "feat: step_integrations module with validate() + tier-filtered available list"
```
---
## Task 9: Wizard orchestrator — rewrite `app/pages/0_Setup.py`
This is the largest UI task. The orchestrator drives all 6 mandatory steps plus the optional integrations step. It reads/writes `user.yaml` on each "Next" for crash recovery and renders LLM generation polling via `@st.fragment`.
**Files:**
- Rewrite: `app/pages/0_Setup.py`
- Modify: `tests/test_wizard_flow.py` (create new)
**Step 1: Write flow tests (no Streamlit)**
```python
# tests/test_wizard_flow.py
"""
Tests for wizard orchestration logic — no Streamlit dependency.
Tests the _write_step_to_yaml() and _load_wizard_state() helpers.
"""
import sys
from pathlib import Path
import yaml
sys.path.insert(0, str(Path(__file__).parent.parent))
def _make_profile_yaml(tmp_path, extra: dict = None) -> Path:
data = {
"name": "Test User", "email": "t@t.com",
"career_summary": "10 years testing.", "wizard_complete": False
}
if extra:
data.update(extra)
p = tmp_path / "user.yaml"
p.write_text(yaml.dump(data))
return p
def test_all_mandatory_steps_validate():
"""Validate functions for all 6 mandatory steps accept minimal valid data."""
from app.wizard.step_hardware import validate as hw
from app.wizard.step_tier import validate as tier
from app.wizard.step_identity import validate as ident
from app.wizard.step_resume import validate as resume
from app.wizard.step_inference import validate as inf
from app.wizard.step_search import validate as search
assert hw({"inference_profile": "remote"}) == []
assert tier({"tier": "free"}) == []
assert ident({"name": "A", "email": "a@b.com", "career_summary": "x"}) == []
assert resume({"experience": [{"company": "X", "title": "T", "bullets": []}]}) == []
assert inf({"endpoint_confirmed": True}) == []
assert search({"job_titles": ["SWE"], "locations": ["Remote"]}) == []
def test_wizard_state_inferred_from_yaml(tmp_path):
"""Wizard resumes at the right step based on wizard_step field in user.yaml."""
p = _make_profile_yaml(tmp_path, {"wizard_step": 3})
data = yaml.safe_load(p.read_text())
# Step stored is last *completed* step; wizard should show step 4
assert data["wizard_step"] == 3
assert data["wizard_complete"] is False
def test_wizard_complete_flag(tmp_path):
"""wizard_complete: true is written at Finish."""
p = _make_profile_yaml(tmp_path)
data = yaml.safe_load(p.read_text())
data["wizard_complete"] = True
data.pop("wizard_step", None)
p.write_text(yaml.dump(data))
reloaded = yaml.safe_load(p.read_text())
assert reloaded["wizard_complete"] is True
assert "wizard_step" not in reloaded
```
**Step 2: Run — confirm logic tests pass even before orchestrator rewrite**
```bash
conda run -n job-seeker python -m pytest tests/test_wizard_flow.py -v
```
Expected: all pass (tests only use validate functions + yaml, no Streamlit).
**Step 3: Rewrite `app/pages/0_Setup.py`**
Key design points:
- Each `render_step_N()` function renders the Streamlit UI and updates `st.session_state.wizard_data` + `wizard_step`
- On "Next", calls `validate()` → if errors, shows them; if clean, writes to `user.yaml` and advances step
- On "Back", decrements step (no write)
- LLM generation buttons submit `wizard_generate` task and show inline fragment polling
- Finish writes `wizard_complete: true` and clears `wizard_step`
```python
"""
First-run setup wizard orchestrator.
Shown by app.py when user.yaml is absent OR wizard_complete is False.
Drives 6 mandatory steps + 1 optional integrations step.
All step logic lives in app/wizard/; this file only orchestrates.
"""
from __future__ import annotations
import json
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent.parent))
import streamlit as st
import yaml
CONFIG_DIR = Path(__file__).parent.parent.parent / "config"
USER_YAML = CONFIG_DIR / "user.yaml"
STEPS = 6
STEP_LABELS = [
"Hardware", "Tier", "Identity", "Resume", "Inference", "Search"
]
# ── Helpers ────────────────────────────────────────────────────────────────────
def _load_yaml() -> dict:
if USER_YAML.exists():
return yaml.safe_load(USER_YAML.read_text()) or {}
return {}
def _save_yaml(updates: dict) -> None:
existing = _load_yaml()
existing.update(updates)
CONFIG_DIR.mkdir(parents=True, exist_ok=True)
USER_YAML.write_text(yaml.dump(existing, default_flow_style=False, allow_unicode=True))
def _detect_gpus() -> list[str]:
import subprocess
try:
out = subprocess.check_output(
["nvidia-smi", "--query-gpu=name", "--format=csv,noheader"],
text=True, timeout=5
)
return [l.strip() for l in out.strip().splitlines() if l.strip()]
except Exception:
return []
def _suggest_profile(gpus: list[str]) -> str:
if len(gpus) >= 2: return "dual-gpu"
if len(gpus) == 1: return "single-gpu"
return "remote"
def _submit_wizard_task(section: str, input_data: dict) -> int:
"""Submit a wizard_generate background task. Returns task_id."""
from scripts.db import DEFAULT_DB
from scripts.task_runner import submit_task
params = json.dumps({"section": section, "input": input_data})
task_id, _ = submit_task(DEFAULT_DB, "wizard_generate", 0, params=params)
return task_id
def _poll_wizard_task(section: str) -> dict | None:
"""Return most recent wizard_generate task for a section, or None."""
from scripts.db import DEFAULT_DB
import sqlite3
params_match = json.dumps({"section": section}).rstrip("}") # prefix match
conn = sqlite3.connect(DEFAULT_DB)
conn.row_factory = sqlite3.Row
row = conn.execute(
"SELECT * FROM background_tasks WHERE task_type='wizard_generate' "
"AND params LIKE ? ORDER BY id DESC LIMIT 1",
(f'%"section": "{section}"%',)
).fetchone()
conn.close()
return dict(row) if row else None
# ── Wizard state init ──────────────────────────────────────────────────────────
if "wizard_step" not in st.session_state:
saved = _load_yaml()
st.session_state.wizard_step = min(saved.get("wizard_step", 0) + 1, STEPS)
st.session_state.wizard_data = {}
step = st.session_state.wizard_step
data = st.session_state.wizard_data
# Load tier for feature gating
_saved_yaml = _load_yaml()
_tier = _saved_yaml.get("dev_tier_override") or _saved_yaml.get("tier", "free")
from app.wizard.tiers import can_use, tier_label
st.title("👋 Welcome to Peregrine")
st.caption("Complete the setup to start your job search. All fields are saved as you go.")
st.progress(min(step / STEPS, 1.0), text=f"Step {min(step, STEPS)} of {STEPS}")
st.divider()
# ── Step 1: Hardware ───────────────────────────────────────────────────────────
if step == 1:
from app.wizard.step_hardware import validate, PROFILES
st.subheader("Step 1 — Hardware Detection")
gpus = _detect_gpus()
suggested = _suggest_profile(gpus)
if gpus:
st.success(f"Found {len(gpus)} GPU(s): {', '.join(gpus)}")
else:
st.info("No NVIDIA GPUs detected. Recommend 'remote' or 'cpu' mode.")
profile = st.selectbox("Inference mode", PROFILES, index=PROFILES.index(suggested),
help="Controls which Docker services start. Change later in Settings.")
if profile in ("single-gpu", "dual-gpu") and not gpus:
st.warning("No GPUs detected — GPU profiles require NVIDIA Container Toolkit.")
if st.button("Next →", type="primary"):
errs = validate({"inference_profile": profile})
if errs:
st.error("\n".join(errs))
else:
_save_yaml({"inference_profile": profile, "wizard_step": 1})
st.session_state.wizard_step = 2
st.session_state.wizard_data["inference_profile"] = profile
st.rerun()
# ── Step 2: Tier ───────────────────────────────────────────────────────────────
elif step == 2:
from app.wizard.step_tier import validate
st.subheader("Step 2 — Choose Your Plan")
st.caption("Free is fully functional for local self-hosted use. Paid/Premium unlock LLM-assisted features.")
tier_opts = {
"free": "**Free** — Local discovery, apply workspace, interviews kanban",
"paid": "**Paid** — + AI career summary, company research, email classifier, calendar sync",
"premium": "**Premium** — + Voice guidelines, model fine-tuning, multi-user",
}
selected_tier = st.radio("Plan", list(tier_opts.keys()),
format_func=lambda x: tier_opts[x],
index=0)
col_back, col_next = st.columns([1, 4])
if col_back.button("← Back"):
st.session_state.wizard_step = 1
st.rerun()
if col_next.button("Next →", type="primary"):
errs = validate({"tier": selected_tier})
if errs:
st.error("\n".join(errs))
else:
_save_yaml({"tier": selected_tier, "wizard_step": 2})
st.session_state.wizard_data["tier"] = selected_tier
st.session_state.wizard_step = 3
st.rerun()
# ── Step 3: Identity ───────────────────────────────────────────────────────────
elif step == 3:
from app.wizard.step_identity import validate
st.subheader("Step 3 — Your Identity")
st.caption("Used in cover letter PDFs, LLM prompts, and the app header.")
saved = _load_yaml()
c1, c2 = st.columns(2)
name = c1.text_input("Full Name *", saved.get("name", ""))
email = c1.text_input("Email *", saved.get("email", ""))
phone = c2.text_input("Phone", saved.get("phone", ""))
linkedin = c2.text_input("LinkedIn URL", saved.get("linkedin", ""))
summary_default = saved.get("career_summary", "")
summary = st.text_area("Career Summary *", summary_default, height=120,
placeholder="Experienced professional with X years in [field].")
# LLM generation button (paid only)
if can_use(_tier, "llm_career_summary"):
gen_col, _ = st.columns([2, 8])
if gen_col.button("✨ Generate from resume"):
resume_text = saved.get("_raw_resume_text", "")
if resume_text:
_submit_wizard_task("career_summary", {"resume_text": resume_text})
st.rerun()
else:
st.info("Complete Step 4 (Resume) first to use AI generation.")
else:
st.caption(f"{tier_label('llm_career_summary')} Generate career summary with AI")
# Poll for completed generation
@st.fragment(run_every=3)
def _poll_career_summary():
task = _poll_wizard_task("career_summary")
if not task:
return
if task["status"] == "completed":
payload = json.loads(task.get("error") or "{}")
result = payload.get("result", "")
if result and result != st.session_state.get("_career_summary_gen"):
st.session_state["_career_summary_gen"] = result
st.info(f"✨ Suggested summary (click to use):\n\n{result}")
_poll_career_summary()
col_back, col_next = st.columns([1, 4])
if col_back.button("← Back"):
st.session_state.wizard_step = 2
st.rerun()
if col_next.button("Next →", type="primary"):
errs = validate({"name": name, "email": email, "career_summary": summary})
if errs:
st.error("\n".join(errs))
else:
_save_yaml({
"name": name, "email": email, "phone": phone,
"linkedin": linkedin, "career_summary": summary,
"wizard_complete": False, "wizard_step": 3,
})
st.session_state.wizard_step = 4
st.rerun()
# ── Step 4: Resume ─────────────────────────────────────────────────────────────
elif step == 4:
from app.wizard.step_resume import validate
st.subheader("Step 4 — Resume")
st.caption("Upload your resume for fast parsing, or build it section by section.")
tab_upload, tab_builder = st.tabs(["📎 Upload Resume", "📝 Build Resume"])
saved = _load_yaml()
with tab_upload:
uploaded = st.file_uploader("Upload PDF or DOCX", type=["pdf", "docx"])
if uploaded:
if st.button("Parse Resume", type="primary"):
from scripts.resume_parser import extract_text_from_pdf, extract_text_from_docx, structure_resume
file_bytes = uploaded.read()
ext = uploaded.name.rsplit(".", 1)[-1].lower()
raw_text = extract_text_from_pdf(file_bytes) if ext == "pdf" else extract_text_from_docx(file_bytes)
with st.spinner("Parsing…"):
parsed = structure_resume(raw_text)
if parsed:
st.session_state["_parsed_resume"] = parsed
st.session_state["_raw_resume_text"] = raw_text
_save_yaml({"_raw_resume_text": raw_text[:8000]}) # for career_summary generation
st.success("Resume parsed! Review below.")
else:
st.warning("Couldn't auto-parse — switch to the Build tab.")
if "parsed" in st.session_state.get("_parsed_resume", {}):
st.json(st.session_state["_parsed_resume"])
with tab_builder:
st.caption("Add your work experience entries manually.")
experience = st.session_state.get("_experience", saved.get("experience", []))
for i, entry in enumerate(experience):
with st.expander(f"{entry.get('title', 'Entry')} at {entry.get('company', '?')}", expanded=False):
entry["company"] = st.text_input("Company", entry.get("company", ""), key=f"co_{i}")
entry["title"] = st.text_input("Title", entry.get("title", ""), key=f"ti_{i}")
raw_bullets = st.text_area("Responsibilities (one per line)",
"\n".join(entry.get("bullets", [])),
key=f"bu_{i}", height=80)
entry["bullets"] = [b.strip() for b in raw_bullets.splitlines() if b.strip()]
if st.button("Remove", key=f"rm_{i}"):
experience.pop(i)
st.session_state["_experience"] = experience
st.rerun()
if st.button(" Add Entry"):
experience.append({"company": "", "title": "", "bullets": []})
st.session_state["_experience"] = experience
st.rerun()
col_back, col_next = st.columns([1, 4])
if col_back.button("← Back"):
st.session_state.wizard_step = 3
st.rerun()
if col_next.button("Next →", type="primary"):
# Resolve experience from upload parse or builder
parsed = st.session_state.get("_parsed_resume", {})
experience = parsed.get("experience") or st.session_state.get("_experience", [])
errs = validate({"experience": experience})
if errs:
st.error("\n".join(errs))
else:
# Write resume yaml
resume_yaml_path = Path(__file__).parent.parent.parent / "aihawk" / "data_folder" / "plain_text_resume.yaml"
resume_yaml_path.parent.mkdir(parents=True, exist_ok=True)
resume_data = {**parsed, "experience": experience} if parsed else {"experience": experience}
resume_yaml_path.write_text(yaml.dump(resume_data, default_flow_style=False, allow_unicode=True))
_save_yaml({"wizard_step": 4})
st.session_state.wizard_step = 5
st.rerun()
# ── Step 5: Inference ──────────────────────────────────────────────────────────
elif step == 5:
from app.wizard.step_inference import validate
st.subheader("Step 5 — Inference & API Keys")
saved = _load_yaml()
profile = saved.get("inference_profile", "remote")
if profile == "remote":
st.info("Remote mode: at least one external API key is required.")
anthropic_key = st.text_input("Anthropic API Key", type="password", placeholder="sk-ant-…")
openai_url = st.text_input("OpenAI-compatible endpoint (optional)", placeholder="https://api.together.xyz/v1")
openai_key = st.text_input("Endpoint API Key (optional)", type="password") if openai_url else ""
else:
st.info(f"Local mode ({profile}): Ollama provides inference.")
anthropic_key = ""
openai_url = ""
openai_key = ""
st.divider()
with st.expander("Advanced — Service Ports & Hosts"):
st.caption("Change only if services run on non-default ports or remote hosts.")
svc = saved.get("services", {})
for svc_name, default_host, default_port in [
("ollama", "localhost", 11434),
("vllm", "localhost", 8000),
("searxng","localhost", 8888),
]:
c1, c2, c3 = st.columns([2, 1, 1])
svc[f"{svc_name}_host"] = c1.text_input(f"{svc_name} host", svc.get(f"{svc_name}_host", default_host), key=f"h_{svc_name}")
svc[f"{svc_name}_port"] = int(c2.number_input("port", value=int(svc.get(f"{svc_name}_port", default_port)), step=1, key=f"p_{svc_name}"))
svc[f"{svc_name}_ssl"] = c3.checkbox("SSL", svc.get(f"{svc_name}_ssl", False), key=f"ssl_{svc_name}")
confirmed = False
if profile == "remote":
if st.button("🔌 Test LLM connection"):
from scripts.llm_router import LLMRouter
try:
r = LLMRouter().complete("Say 'OK' and nothing else.")
if r and len(r.strip()) > 0:
st.success("LLM responding.")
confirmed = True
st.session_state["_inf_confirmed"] = True
except Exception as e:
st.error(f"LLM test failed: {e}")
else:
# Local profile: Ollama availability is tested
if st.button("🔌 Test Ollama connection"):
import requests
ollama_url = f"http://{svc.get('ollama_host','localhost')}:{svc.get('ollama_port',11434)}"
try:
requests.get(f"{ollama_url}/api/tags", timeout=5)
st.success("Ollama is running.")
st.session_state["_inf_confirmed"] = True
except Exception:
st.warning("Ollama not responding — you can skip and configure later in Settings.")
st.session_state["_inf_confirmed"] = True # allow skip
confirmed = st.session_state.get("_inf_confirmed", False)
col_back, col_next = st.columns([1, 4])
if col_back.button("← Back"):
st.session_state.wizard_step = 4
st.rerun()
if col_next.button("Next →", type="primary", disabled=not confirmed):
errs = validate({"endpoint_confirmed": confirmed})
if errs:
st.error("\n".join(errs))
else:
# Write API keys to .env
env_path = CONFIG_DIR.parent / ".env"
env_lines = env_path.read_text().splitlines() if env_path.exists() else []
def _set_env(lines, key, val):
for i, l in enumerate(lines):
if l.startswith(f"{key}="):
lines[i] = f"{key}={val}"; return lines
lines.append(f"{key}={val}"); return lines
if anthropic_key: env_lines = _set_env(env_lines, "ANTHROPIC_API_KEY", anthropic_key)
if openai_url: env_lines = _set_env(env_lines, "OPENAI_COMPAT_URL", openai_url)
if openai_key: env_lines = _set_env(env_lines, "OPENAI_COMPAT_KEY", openai_key)
if anthropic_key or openai_url:
env_path.write_text("\n".join(env_lines) + "\n")
_save_yaml({"services": svc, "wizard_step": 5})
st.session_state.wizard_step = 6
st.rerun()
# ── Step 6: Search ─────────────────────────────────────────────────────────────
elif step == 6:
from app.wizard.step_search import validate
st.subheader("Step 6 — Job Search Preferences")
saved = _load_yaml()
_tier_now = saved.get("dev_tier_override") or saved.get("tier", "free")
titles = st.session_state.get("_titles", [])
locations = st.session_state.get("_locations", [])
c1, c2 = st.columns(2)
with c1:
st.markdown("**Job Titles**")
for i, t in enumerate(titles):
col_t, col_rm = st.columns([4, 1])
col_t.text(t)
if col_rm.button("×", key=f"rmtitle_{i}"):
titles.pop(i); st.session_state["_titles"] = titles; st.rerun()
new_title = st.text_input("Add title", key="new_title_wiz", placeholder="Software Engineer…")
tc1, tc2 = st.columns([3, 1])
if tc2.button("", key="add_title"):
if new_title.strip() and new_title.strip() not in titles:
titles.append(new_title.strip()); st.session_state["_titles"] = titles; st.rerun()
if can_use(_tier_now, "llm_job_titles"):
if tc1.button("✨ Suggest titles"):
resume_text = saved.get("_raw_resume_text", "")
_submit_wizard_task("job_titles", {"resume_text": resume_text, "current_titles": titles})
st.rerun()
else:
st.caption(f"{tier_label('llm_job_titles')} AI title suggestions")
with c2:
st.markdown("**Locations**")
for i, l in enumerate(locations):
lc1, lc2 = st.columns([4, 1])
lc1.text(l)
if lc2.button("×", key=f"rmloc_{i}"):
locations.pop(i); st.session_state["_locations"] = locations; st.rerun()
new_loc = st.text_input("Add location", key="new_loc_wiz", placeholder="Remote, New York NY…")
ll1, ll2 = st.columns([3, 1])
if ll2.button("", key="add_loc"):
if new_loc.strip():
locations.append(new_loc.strip()); st.session_state["_locations"] = locations; st.rerun()
# Poll job titles suggestion
@st.fragment(run_every=3)
def _poll_titles():
task = _poll_wizard_task("job_titles")
if task and task["status"] == "completed":
payload = json.loads(task.get("error") or "{}")
result = payload.get("result", "")
st.info(f"✨ Suggested titles:\n\n{result}")
_poll_titles()
col_back, col_next = st.columns([1, 4])
if col_back.button("← Back"):
st.session_state.wizard_step = 5
st.rerun()
if col_next.button("Next →", type="primary"):
errs = validate({"job_titles": titles, "locations": locations})
if errs:
st.error("\n".join(errs))
else:
# Write search profile
import datetime
search_profile = {
"profiles": [{
"name": "default",
"job_titles": titles,
"locations": locations,
"remote_only": False,
"boards": ["linkedin", "indeed", "glassdoor", "zip_recruiter"],
}]
}
(CONFIG_DIR / "search_profiles.yaml").write_text(
yaml.dump(search_profile, default_flow_style=False, allow_unicode=True)
)
_save_yaml({"wizard_step": 6})
st.session_state.wizard_step = 7 # integrations (optional)
st.rerun()
# ── Step 7: Integrations (optional) ───────────────────────────────────────────
elif step == 7:
st.subheader("Step 7 — Integrations (Optional)")
st.caption("Connect cloud services, calendars, and notification tools. Skip to finish setup.")
saved = _load_yaml()
_tier_now = saved.get("dev_tier_override") or saved.get("tier", "free")
from scripts.integrations import REGISTRY
from app.wizard.tiers import can_use
for name, cls in sorted(REGISTRY.items(), key=lambda x: (x[1].tier != "free", x[0])):
is_accessible = can_use(_tier_now, f"{name}_sync") or can_use(_tier_now, f"{name}_notifications") or cls.tier == "free"
is_conn = (CONFIG_DIR / "integrations" / f"{name}.yaml").exists()
with st.expander(f"{'✅' if is_conn else '○'} {cls.label} {'🔒 Paid' if cls.tier == 'paid' else '⭐ Premium' if cls.tier == 'premium' else ''}"):
if not is_accessible:
st.caption(f"Upgrade to {cls.tier} to unlock {cls.label}.")
else:
inst = cls()
config = {}
for field in inst.fields():
val = st.text_input(field["label"],
type="password" if field["type"] == "password" else "default",
placeholder=field.get("placeholder", ""),
help=field.get("help", ""),
key=f"int_{name}_{field['key']}")
config[field["key"]] = val
if st.button(f"Connect {cls.label}", key=f"conn_{name}",
disabled=not all(config.get(f["key"]) for f in inst.fields() if f.get("required"))):
inst.connect(config)
with st.spinner("Testing connection…"):
if inst.test():
inst.save_config(config, CONFIG_DIR)
st.success(f"{cls.label} connected!")
st.rerun()
else:
st.error(f"Connection test failed. Check your credentials for {cls.label}.")
st.divider()
col_skip, col_finish = st.columns([1, 3])
if col_skip.button("← Back"):
st.session_state.wizard_step = 6
st.rerun()
if col_finish.button("🎉 Finish Setup", type="primary"):
# Apply service URLs to llm.yaml and set wizard_complete
from scripts.user_profile import UserProfile
from scripts.generate_llm_config import apply_service_urls
profile_obj = UserProfile(USER_YAML)
from scripts.db import DEFAULT_DB
apply_service_urls(profile_obj, CONFIG_DIR / "llm.yaml")
_save_yaml({"wizard_complete": True})
# Remove wizard_step so it doesn't interfere on next load
data_clean = yaml.safe_load(USER_YAML.read_text()) or {}
data_clean.pop("wizard_step", None)
USER_YAML.write_text(yaml.dump(data_clean, default_flow_style=False, allow_unicode=True))
st.session_state.clear()
st.success("Setup complete! Loading Peregrine…")
st.rerun()
```
**Step 4: Run flow tests**
```bash
conda run -n job-seeker python -m pytest tests/test_wizard_flow.py -v
```
Expected: all 3 tests pass.
**Step 5: Commit**
```bash
git add app/pages/0_Setup.py tests/test_wizard_flow.py
git commit -m "feat: wizard orchestrator — 6 mandatory steps + optional integrations + LLM generation polling"
```
---
## Task 10: Update `app/app.py` — `wizard_complete` gate
**Files:**
- Modify: `app/app.py`
- Modify: `tests/test_app_gating.py`
**Step 1: Add test cases**
```python
# tests/test_app_gating.py — add to existing file
def test_wizard_incomplete_triggers_wizard(tmp_path):
"""wizard_complete: false should be treated as 'wizard not done'."""
p = tmp_path / "user.yaml"
p.write_text("name: T\nemail: t@t.com\ncareer_summary: x\nwizard_complete: false\n")
from scripts.user_profile import UserProfile
u = UserProfile(p)
assert u.wizard_complete is False
def test_wizard_complete_does_not_trigger(tmp_path):
p = tmp_path / "user.yaml"
p.write_text("name: T\nemail: t@t.com\ncareer_summary: x\nwizard_complete: true\n")
from scripts.user_profile import UserProfile
u = UserProfile(p)
assert u.wizard_complete is True
```
**Step 2: Run — should pass already (UserProfile already has wizard_complete)**
```bash
conda run -n job-seeker python -m pytest tests/test_app_gating.py -v
```
**Step 3: Update the gate in `app/app.py`**
Replace:
```python
if not _UserProfile.exists(_USER_YAML):
_setup_page = st.Page("pages/0_Setup.py", title="Setup", icon="👋")
st.navigation({"": [_setup_page]}).run()
st.stop()
```
With:
```python
_show_wizard = (
not _UserProfile.exists(_USER_YAML)
or not _UserProfile(_USER_YAML).wizard_complete
)
if _show_wizard:
_setup_page = st.Page("pages/0_Setup.py", title="Setup", icon="👋")
st.navigation({"": [_setup_page]}).run()
st.stop()
```
**Step 4: Also add `wizard_generate` to the sidebar task label map in `app/app.py`**
In the `_task_indicator` fragment, add:
```python
elif task_type == "wizard_generate":
label = "Wizard generation"
```
**Step 5: Run full test suite**
```bash
conda run -n job-seeker python -m pytest tests/ -v
```
Expected: all tests pass.
**Step 6: Commit**
```bash
git add app/app.py tests/test_app_gating.py
git commit -m "feat: app.py checks wizard_complete flag to gate main app"
```
---
## Task 11: Home page — dismissible setup banners
After wizard completion, the Home page shows contextual setup prompts for remaining optional tasks. Each is dismissible; dismissed state persists in `user.yaml`.
**Files:**
- Modify: `app/Home.py`
- Modify: `scripts/user_profile.py` (save_dismissed_banner helper)
- Create: `tests/test_home_banners.py`
**Step 1: Write failing tests**
```python
# tests/test_home_banners.py
import sys
from pathlib import Path
import yaml
sys.path.insert(0, str(Path(__file__).parent.parent))
_USER_YAML = Path(__file__).parent.parent / "config" / "user.yaml"
def test_banner_config_is_complete():
"""All banner keys are strings and all have link destinations."""
from app.Home import _SETUP_BANNERS
for b in _SETUP_BANNERS:
assert "key" in b
assert "text" in b
assert "link_label" in b
def test_banner_dismissed_persists(tmp_path):
"""Dismissing a banner writes to dismissed_banners in user.yaml."""
p = tmp_path / "user.yaml"
p.write_text("name: T\nemail: t@t.com\ncareer_summary: x\nwizard_complete: true\n")
data = yaml.safe_load(p.read_text()) or {}
data.setdefault("dismissed_banners", [])
data["dismissed_banners"].append("connect_cloud")
p.write_text(yaml.dump(data))
reloaded = yaml.safe_load(p.read_text())
assert "connect_cloud" in reloaded["dismissed_banners"]
```
**Step 2: Run — expect FAIL on _SETUP_BANNERS import**
```bash
conda run -n job-seeker python -m pytest tests/test_home_banners.py -v
```
**Step 3: Add banners to `app/Home.py`**
Near the top (after imports), add the banner config list:
```python
_SETUP_BANNERS = [
{"key": "connect_cloud", "text": "Connect a cloud service for resume/cover letter storage",
"link_label": "Settings → Integrations"},
{"key": "setup_email", "text": "Set up email sync to catch recruiter outreach",
"link_label": "Settings → Email"},
{"key": "setup_email_labels", "text": "Set up email label filters for auto-classification",
"link_label": "Settings → Email (label guide)"},
{"key": "tune_mission", "text": "Tune your mission preferences for better cover letters",
"link_label": "Settings → My Profile"},
{"key": "configure_keywords", "text": "Configure keywords and blocklist for smarter search",
"link_label": "Settings → Search"},
{"key": "upload_corpus", "text": "Upload your cover letter corpus for voice fine-tuning",
"link_label": "Settings → Fine-Tune"},
{"key": "configure_linkedin", "text": "Configure LinkedIn Easy Apply automation",
"link_label": "Settings → AIHawk"},
{"key": "setup_searxng", "text": "Set up company research with SearXNG",
"link_label": "Settings → Services"},
{"key": "target_companies", "text": "Build a target company list for focused outreach",
"link_label": "Settings → Search"},
{"key": "setup_notifications", "text": "Set up notifications for stage changes",
"link_label": "Settings → Integrations"},
{"key": "tune_model", "text": "Tune a custom cover letter model on your writing",
"link_label": "Settings → Fine-Tune"},
{"key": "review_training", "text": "Review and curate training data for model tuning",
"link_label": "Settings → Fine-Tune"},
{"key": "setup_calendar", "text": "Set up calendar sync to track interview dates",
"link_label": "Settings → Integrations"},
]
```
After existing dashboard content, add the banner render block:
```python
# ── Setup banners ─────────────────────────────────────────────────────────────
if _profile and _profile.wizard_complete:
_dismissed = set(_profile.dismissed_banners)
_pending_banners = [b for b in _SETUP_BANNERS if b["key"] not in _dismissed]
if _pending_banners:
st.divider()
st.markdown("#### Finish setting up Peregrine")
for banner in _pending_banners:
_bcol, _bdismiss = st.columns([10, 1])
with _bcol:
st.info(f"💡 {banner['text']} → _{banner['link_label']}_")
with _bdismiss:
st.write("")
if st.button("✕", key=f"dismiss_banner_{banner['key']}", help="Dismiss"):
# Write dismissed_banners back to user.yaml
_data = yaml.safe_load(USER_YAML.read_text()) if USER_YAML.exists() else {} # type: ignore[name-defined]
_data.setdefault("dismissed_banners", [])
if banner["key"] not in _data["dismissed_banners"]:
_data["dismissed_banners"].append(banner["key"])
USER_YAML.write_text(yaml.dump(_data, default_flow_style=False, allow_unicode=True)) # type: ignore[name-defined]
st.rerun()
```
Add `import yaml` to `app/Home.py` imports.
Add `_USER_YAML = Path(__file__).parent.parent / "config" / "user.yaml"` near the top if not already present.
**Step 4: Run tests**
```bash
conda run -n job-seeker python -m pytest tests/test_home_banners.py tests/ -v
```
Expected: all pass.
**Step 5: Commit**
```bash
git add app/Home.py tests/test_home_banners.py
git commit -m "feat: dismissible setup banners on Home page (13 contextual prompts)"
```
---
## Task 12: Developer tab in Settings
The Developer tab enables tier override for testing and a wizard reset button. Visible when `dev_tier_override` is set in `user.yaml` OR `DEV_MODE=true` in `.env`.
**Files:**
- Modify: `app/pages/2_Settings.py`
- Create: `tests/test_dev_tab.py`
**Step 1: Write failing tests**
```python
# tests/test_dev_tab.py
import sys
from pathlib import Path
import yaml
sys.path.insert(0, str(Path(__file__).parent.parent))
def test_dev_tab_visible_when_override_set(tmp_path):
p = tmp_path / "user.yaml"
p.write_text("name: T\nemail: t@t.com\ncareer_summary: x\ndev_tier_override: premium\n")
from scripts.user_profile import UserProfile
u = UserProfile(p)
assert u.dev_tier_override == "premium"
assert u.effective_tier == "premium"
def test_dev_tab_not_visible_without_override(tmp_path):
p = tmp_path / "user.yaml"
p.write_text("name: T\nemail: t@t.com\ncareer_summary: x\ntier: free\n")
from scripts.user_profile import UserProfile
u = UserProfile(p)
assert u.dev_tier_override is None
assert u.effective_tier == "free"
def test_can_use_uses_effective_tier(tmp_path):
p = tmp_path / "user.yaml"
p.write_text("name: T\nemail: t@t.com\ncareer_summary: x\ntier: free\ndev_tier_override: premium\n")
from scripts.user_profile import UserProfile
from app.wizard.tiers import can_use
u = UserProfile(p)
assert can_use(u.effective_tier, "model_fine_tuning") is True
assert can_use(u.tier, "model_fine_tuning") is False
```
**Step 2: Run — some should pass already**
```bash
conda run -n job-seeker python -m pytest tests/test_dev_tab.py -v
```
**Step 3: Add Developer tab to `app/pages/2_Settings.py`**
The Settings page uses tabs. Find where tabs are defined and add "Developer" tab. The tab should only render if `DEV_MODE=true` in env OR `dev_tier_override` is set:
```python
import os as _os
_dev_mode = _os.getenv("DEV_MODE", "").lower() in ("true", "1", "yes")
_show_dev_tab = _dev_mode or bool(_u.get("dev_tier_override"))
```
In the tab list, conditionally append:
```python
tab_names = ["LLM", "Search", "Email", "My Profile", "Services", "Integrations", "AIHawk", "Fine-Tune"]
if _show_dev_tab:
tab_names.append("Developer")
tabs = st.tabs(tab_names)
```
Developer tab content (in the last tab when `_show_dev_tab`):
```python
with tabs[-1]: # Developer tab
st.subheader("Developer Settings")
st.caption("These settings are for local testing only and are never used in production.")
st.markdown("**Tier Override**")
st.caption("Instantly switches effective tier without changing your billing tier.")
from app.wizard.tiers import TIERS
current_override = _u.get("dev_tier_override") or ""
override_opts = ["(none — use real tier)"] + TIERS
override_idx = (TIERS.index(current_override) + 1) if current_override in TIERS else 0
new_override = st.selectbox("dev_tier_override", override_opts, index=override_idx)
new_override_val = None if new_override.startswith("(none") else new_override
if st.button("Apply tier override", key="apply_tier_override"):
_u["dev_tier_override"] = new_override_val
_save_user(_u) # uses existing save helper in Settings page
st.success(f"Tier override set to: {new_override_val or 'none'}. Page will reload.")
st.rerun()
st.divider()
st.markdown("**Wizard Reset**")
st.caption("Sets `wizard_complete: false` to re-enter the wizard without deleting your config.")
if st.button("↩ Reset wizard", key="reset_wizard"):
_u["wizard_complete"] = False
_u["wizard_step"] = 0
_save_user(_u)
st.success("Wizard reset. Reload the app to re-run setup.")
```
**Step 4: Run all tests**
```bash
conda run -n job-seeker python -m pytest tests/ -v
```
Expected: all tests pass.
**Step 5: Commit**
```bash
git add app/pages/2_Settings.py tests/test_dev_tab.py
git commit -m "feat: Developer tab in Settings — tier override + wizard reset button"
```
---
## Task 13: Final integration test pass
**Step 1: Run full test suite**
```bash
conda run -n job-seeker python -m pytest tests/ -v --tb=short
```
Fix any failures before proceeding.
**Step 2: Manual smoke test — trigger the wizard**
In Settings → Developer tab: click "Reset wizard". Reload app.
Verify:
- Wizard shows with progress bar "Step 1 of 6"
- Step 1 auto-detects GPU (or shows "None detected")
- Each "Next →" advances the step
- "← Back" returns to previous step
- Step 3 identity validates name/email/summary before advancing
- Step 4 resume upload parses PDF
- Step 5 inference test button works
- Step 6 search requires at least one title + location
- Step 7 integrations can be skipped
- "Finish Setup" sets `wizard_complete: true`, redirects to main app
- Home page shows setup banners
**Step 3: Verify tier gating**
In Developer tab: set override to "free". Confirm ✨ buttons are hidden/disabled for paid features.
Set override to "paid". Confirm ✨ buttons appear for career_summary, job_titles, etc.
Set override to "premium". Confirm voice_guidelines becomes available.
**Step 4: Final commit**
```bash
git add -A
git commit -m "feat: expanded first-run wizard — complete implementation"
```
---
## Appendix: New Dependencies
Add to `requirements.txt` / `environment.yml` if not already present:
```
pdfplumber # PDF text extraction (alternative to pdfminer.six — simpler API)
python-docx # DOCX text extraction
caldav # Apple Calendar CalDAV support (Task 6)
```
Check with:
```bash
conda run -n job-seeker pip show pdfplumber python-docx caldav
```
Install if missing:
```bash
conda run -n job-seeker pip install pdfplumber python-docx caldav
```
---
## Appendix: File Tree Summary
```
app/
app.py ← modified: wizard_complete gate
Home.py ← modified: setup banners
pages/
0_Setup.py ← rewritten: thin orchestrator, 7 step renders
2_Settings.py ← modified: Developer tab
wizard/
__init__.py ← new (empty)
tiers.py ← new: FEATURES, can_use(), tier_label()
step_hardware.py ← new: validate()
step_tier.py ← new: validate()
step_identity.py ← new: validate()
step_resume.py ← new: validate()
step_inference.py ← new: validate()
step_search.py ← new: validate()
step_integrations.py ← new: validate(), get_available()
scripts/
user_profile.py ← modified: tier, dev_tier_override, wizard_complete, wizard_step, dismissed_banners, effective_tier
db.py ← modified: params column + insert_task update
task_runner.py ← modified: params arg + wizard_generate handler
resume_parser.py ← new: extract_text_from_pdf/docx, structure_resume
integrations/
__init__.py ← new: REGISTRY auto-discovery
base.py ← new: IntegrationBase ABC
notion.py ← new (13 total integrations)
... (12 more)
config/
user.yaml.example ← modified: tier/wizard_complete/dismissed_banners fields
integrations/
*.yaml.example ← new (13 files)
tests/
test_wizard_tiers.py ← new
test_wizard_steps.py ← new
test_wizard_flow.py ← new
test_resume_parser.py ← new
test_integrations.py ← new
test_home_banners.py ← new
test_dev_tab.py ← new
test_user_profile.py ← modified (additions)
test_db.py ← modified (additions)
test_task_runner.py ← modified (additions)
test_app_gating.py ← modified (additions)
```