docs: expanded first-run wizard design

Architecture: wizard module system, mandatory 6-step flow, optional
home banners, tier gating (free/paid/premium + dev_tier_override),
resume upload/parse/builder, LLM generation via background tasks,
integrations registry pattern with 14 v1 services.
This commit is contained in:
pyr0ball 2026-02-24 21:30:05 -08:00
parent 3d3f81c252
commit ec2f35380a

View file

@ -0,0 +1,291 @@
# Expanded First-Run Wizard — Design
**Date:** 2026-02-24
**Status:** Approved
---
## Goal
Replace the current 5-step surface-level wizard with a comprehensive onboarding flow that covers resume upload/parsing/building, guided config walkthroughs, LLM-assisted generation for key sections, and tier-based feature gating — while enforcing a minimum viable setup before the user can access the main app.
---
## Architecture
`0_Setup.py` becomes a thin orchestrator. All step logic moves into a new `app/wizard/` package. Resume parsing moves into `scripts/resume_parser.py`.
```
app/
app.py # gate: user.yaml exists AND wizard_complete: true
wizard/
tiers.py # tier definitions, feature gates, can_use() helper
step_hardware.py # Step 1: GPU detection → profile recommendation
step_tier.py # Step 2: free/paid/premium + dev_tier_override
step_identity.py # Step 3: name/email/phone/linkedin/career_summary
step_resume.py # Step 4: upload→parse OR guided form builder
step_inference.py # Step 5: LLM backend config + API keys
step_search.py # Step 6: job titles, locations, boards, keywords
step_integrations.py # Step 7: optional cloud/calendar/notification services
pages/
0_Setup.py # imports steps, drives progress state
scripts/
resume_parser.py # PDF/DOCX text extraction → LLM structuring
integrations/
__init__.py # registry: {name: IntegrationBase subclass}
base.py # IntegrationBase: connect(), test(), sync(), fields()
notion.py
google_drive.py
google_sheets.py
airtable.py
dropbox.py
onedrive.py
mega.py
nextcloud.py
google_calendar.py
apple_calendar.py # CalDAV
slack.py
discord.py # webhook only
home_assistant.py
config/
integrations/ # one gitignored yaml per connected service
notion.yaml.example
google_drive.yaml.example
...
```
---
## Gate Logic
`app.py` gate changes from a single existence check to:
```python
if not UserProfile.exists(_USER_YAML):
show_wizard()
elif not _profile.wizard_complete:
show_wizard() # resumes at last incomplete mandatory step
```
`wizard_complete: false` is written to `user.yaml` at the start of Step 3 (identity). It is only flipped to `true` when all mandatory steps pass validation on the final Finish action.
---
## Mandatory Steps
The wizard cannot be exited until all six mandatory steps pass validation.
| Step | File | Minimum to pass |
|------|------|----------------|
| 1. Hardware | `step_hardware.py` | Profile selected (auto-detected default accepted) |
| 2. Tier | `step_tier.py` | Tier selected (free is valid) |
| 3. Identity | `step_identity.py` | name + email + career_summary non-empty |
| 4. Resume | `step_resume.py` | At least one work experience entry |
| 5. Inference | `step_inference.py` | At least one working LLM endpoint confirmed |
| 6. Search | `step_search.py` | At least one job title + one location |
Each mandatory step's module exports `validate(data: dict) -> list[str]` — an errors list; empty = pass. These are pure functions, fully testable without Streamlit.
---
## Tier System
### `app/wizard/tiers.py`
```python
TIERS = ["free", "paid", "premium"]
FEATURES = {
# Wizard LLM generation
"llm_career_summary": "paid",
"llm_expand_bullets": "paid",
"llm_suggest_skills": "paid",
"llm_voice_guidelines": "premium",
"llm_job_titles": "paid",
"llm_keywords_blocklist": "paid",
"llm_mission_notes": "paid",
# App features
"company_research": "paid",
"interview_prep": "paid",
"email_classifier": "paid",
"survey_assistant": "paid",
"model_fine_tuning": "premium",
"shared_cover_writer_model": "paid",
"multi_user": "premium",
"search_profiles_limit": {free: 1, paid: 5, premium: None},
# Integrations
"notion_sync": "paid",
"google_sheets_sync": "paid",
"airtable_sync": "paid",
"google_calendar_sync": "paid",
"apple_calendar_sync": "paid",
"slack_notifications": "paid",
}
# Free-tier integrations: google_drive, dropbox, onedrive, mega,
# nextcloud, discord, home_assistant
```
### Storage in `user.yaml`
```yaml
tier: free # free | paid | premium
dev_tier_override: premium # overrides tier locally — for testing only
```
### Dev override UI
Settings → Developer tab (visible when `dev_tier_override` is set or `DEV_MODE=true` in `.env`). Single selectbox to switch tier instantly — page reruns, all gates re-evaluate, no restart needed. Also exposes a "Reset wizard" button that sets `wizard_complete: false` to re-enter the wizard without deleting existing config.
### Gated UI behaviour
Paid/premium features show a muted `tier_label()` badge (`🔒 Paid` / `⭐ Premium`) and a disabled state rather than being hidden entirely — free users see what they're missing. Clicking a locked `✨` button opens an upsell tooltip, not an error.
---
## Resume Handling (Step 4)
### Fast path — upload
1. PDF → `pdfminer.six` extracts raw text
2. DOCX → `python-docx` extracts paragraphs
3. Raw text → LLM structures into `plain_text_resume.yaml` fields via background task
4. Populated form rendered for review/correction
### Fallback — guided form builder
Walks through `plain_text_resume.yaml` section by section:
- Personal info (pre-filled from Step 3)
- Work experience (add/remove entries)
- Education
- Skills
- Achievements (optional)
Both paths converge on the same review form before saving. `career_summary` from the resume is fed back to populate Step 3 if not already set.
### Outputs
- `aihawk/data_folder/plain_text_resume.yaml`
- `career_summary` written back to `user.yaml`
---
## LLM Generation Map
All `✨` actions submit a background task via `task_runner.py` using task type `wizard_generate` with a `section` parameter. The wizard step polls via `@st.fragment(run_every=3)` and shows inline status stages. Results land in `session_state` keyed by section and auto-populate the field on completion.
**Status stages for all wizard generation tasks:**
`Queued → Analyzing → Generating → Done`
| Step | Action | Tier | Input | Output |
|------|--------|------|-------|--------|
| Identity | ✨ Generate career summary | Paid | Resume text | `career_summary` in user.yaml |
| Resume | ✨ Expand bullet points | Paid | Rough responsibility notes | Polished STAR-format bullets |
| Resume | ✨ Suggest skills | Paid | Experience descriptions | Skills list additions |
| Resume | ✨ Infer voice guidelines | Premium | Resume + uploaded cover letters | Voice/tone hints in user.yaml |
| Search | ✨ Suggest job titles | Paid | Resume + current titles | Additional title suggestions |
| Search | ✨ Suggest keywords | Paid | Resume + titles | `resume_keywords.yaml` additions |
| Search | ✨ Suggest blocklist | Paid | Resume + titles | `blocklist.yaml` additions |
| My Profile (post-wizard) | ✨ Suggest mission notes | Paid | Resume + LinkedIn URL | `mission_preferences` notes |
---
## Optional Steps — Home Banners
After wizard completion, dismissible banners on the Home page surface remaining setup. Dismissed state stored as `dismissed_banners: [...]` in `user.yaml`.
| Banner | Links to |
|--------|---------|
| Connect a cloud service | Settings → Integrations |
| Set up email sync | Settings → Email |
| Set up email labels | Settings → Email (label guide) |
| Tune your mission preferences | Settings → My Profile |
| Configure keywords & blocklist | Settings → Search |
| Upload cover letter corpus | Settings → Fine-Tune |
| Configure LinkedIn Easy Apply | Settings → AIHawk |
| Set up company research | Settings → Services (SearXNG) |
| Build a target company list | Settings → Search |
| Set up notifications | Settings → Integrations |
| Tune a model | Settings → Fine-Tune |
| Review training data | Settings → Fine-Tune |
| Set up calendar sync | Settings → Integrations |
---
## Integrations Architecture
The registry pattern means adding a new integration requires one file in `scripts/integrations/` and one `.yaml.example` in `config/integrations/` — the wizard and Settings tab auto-discover it.
```python
class IntegrationBase:
name: str
label: str
tier: str
def connect(self, config: dict) -> bool: ...
def test(self) -> bool: ...
def sync(self, jobs: list[dict]) -> int: ...
def fields(self) -> list[dict]: ... # form field definitions for wizard card
```
Integration configs written to `config/integrations/<name>.yaml` only after a successful `test()` — never on partial input.
### v1 Integration List
| Integration | Purpose | Tier |
|-------------|---------|------|
| Notion | Job tracking DB sync | Paid |
| Notion Calendar | Covered by Notion integration | Paid |
| Google Sheets | Simpler tracker alternative | Paid |
| Airtable | Alternative tracker | Paid |
| Google Drive | Resume/cover letter storage | Free |
| Dropbox | Document storage | Free |
| OneDrive | Document storage | Free |
| MEGA | Document storage (privacy-first, cross-platform) | Free |
| Nextcloud | Self-hosted document storage | Free |
| Google Calendar | Write interview dates | Paid |
| Apple Calendar | Write interview dates (CalDAV) | Paid |
| Slack | Stage change notifications | Paid |
| Discord | Stage change notifications (webhook) | Free |
| Home Assistant | Notifications + automations (self-hosted) | Free |
---
## Data Flow
```
Wizard step → Written to
──────────────────────────────────────────────────────────────
Hardware → user.yaml (inference_profile)
Tier → user.yaml (tier, dev_tier_override)
Identity → user.yaml (name, email, phone, linkedin,
career_summary, wizard_complete: false)
Resume (upload) → aihawk/data_folder/plain_text_resume.yaml
Resume (builder) → aihawk/data_folder/plain_text_resume.yaml
Inference → user.yaml (services block)
.env (ANTHROPIC_API_KEY, OPENAI_COMPAT_URL/KEY)
Search → config/search_profiles.yaml
config/resume_keywords.yaml
config/blocklist.yaml
Finish → user.yaml (wizard_complete: true)
config/llm.yaml (via apply_service_urls())
Integrations → config/integrations/<name>.yaml (per service,
only after successful test())
Background tasks → staging.db background_tasks table
LLM results → session_state[section] → field → user saves step
```
**Key rules:**
- Each mandatory step writes immediately on "Next" — partial progress survives crash or browser close
- `apply_service_urls()` called once at Finish, not per-step
- Integration configs never written on partial input — only after `test()` passes
---
## Testing
- **Tier switching:** Settings → Developer tab selectbox — instant rerun, no restart
- **Wizard re-entry:** Settings → Developer "Reset wizard" button sets `wizard_complete: false`
- **Unit tests:** `validate(data) -> list[str]` on each step module — pure functions, no Streamlit
- **Integration tests:** `tests/test_wizard_flow.py` — full step sequence with mock LLM router and mock file writes
- **`DEV_MODE=true`** in `.env` makes Developer tab always visible regardless of `dev_tier_override`