Architecture: wizard module system, mandatory 6-step flow, optional home banners, tier gating (free/paid/premium + dev_tier_override), resume upload/parse/builder, LLM generation via background tasks, integrations registry pattern with 14 v1 services.
12 KiB
Expanded First-Run Wizard — Design
Date: 2026-02-24 Status: Approved
Goal
Replace the current 5-step surface-level wizard with a comprehensive onboarding flow that covers resume upload/parsing/building, guided config walkthroughs, LLM-assisted generation for key sections, and tier-based feature gating — while enforcing a minimum viable setup before the user can access the main app.
Architecture
0_Setup.py becomes a thin orchestrator. All step logic moves into a new app/wizard/ package. Resume parsing moves into scripts/resume_parser.py.
app/
app.py # gate: user.yaml exists AND wizard_complete: true
wizard/
tiers.py # tier definitions, feature gates, can_use() helper
step_hardware.py # Step 1: GPU detection → profile recommendation
step_tier.py # Step 2: free/paid/premium + dev_tier_override
step_identity.py # Step 3: name/email/phone/linkedin/career_summary
step_resume.py # Step 4: upload→parse OR guided form builder
step_inference.py # Step 5: LLM backend config + API keys
step_search.py # Step 6: job titles, locations, boards, keywords
step_integrations.py # Step 7: optional cloud/calendar/notification services
pages/
0_Setup.py # imports steps, drives progress state
scripts/
resume_parser.py # PDF/DOCX text extraction → LLM structuring
integrations/
__init__.py # registry: {name: IntegrationBase subclass}
base.py # IntegrationBase: connect(), test(), sync(), fields()
notion.py
google_drive.py
google_sheets.py
airtable.py
dropbox.py
onedrive.py
mega.py
nextcloud.py
google_calendar.py
apple_calendar.py # CalDAV
slack.py
discord.py # webhook only
home_assistant.py
config/
integrations/ # one gitignored yaml per connected service
notion.yaml.example
google_drive.yaml.example
...
Gate Logic
app.py gate changes from a single existence check to:
if not UserProfile.exists(_USER_YAML):
show_wizard()
elif not _profile.wizard_complete:
show_wizard() # resumes at last incomplete mandatory step
wizard_complete: false is written to user.yaml at the start of Step 3 (identity). It is only flipped to true when all mandatory steps pass validation on the final Finish action.
Mandatory Steps
The wizard cannot be exited until all six mandatory steps pass validation.
| Step | File | Minimum to pass |
|---|---|---|
| 1. Hardware | step_hardware.py |
Profile selected (auto-detected default accepted) |
| 2. Tier | step_tier.py |
Tier selected (free is valid) |
| 3. Identity | step_identity.py |
name + email + career_summary non-empty |
| 4. Resume | step_resume.py |
At least one work experience entry |
| 5. Inference | step_inference.py |
At least one working LLM endpoint confirmed |
| 6. Search | step_search.py |
At least one job title + one location |
Each mandatory step's module exports validate(data: dict) -> list[str] — an errors list; empty = pass. These are pure functions, fully testable without Streamlit.
Tier System
app/wizard/tiers.py
TIERS = ["free", "paid", "premium"]
FEATURES = {
# Wizard LLM generation
"llm_career_summary": "paid",
"llm_expand_bullets": "paid",
"llm_suggest_skills": "paid",
"llm_voice_guidelines": "premium",
"llm_job_titles": "paid",
"llm_keywords_blocklist": "paid",
"llm_mission_notes": "paid",
# App features
"company_research": "paid",
"interview_prep": "paid",
"email_classifier": "paid",
"survey_assistant": "paid",
"model_fine_tuning": "premium",
"shared_cover_writer_model": "paid",
"multi_user": "premium",
"search_profiles_limit": {free: 1, paid: 5, premium: None},
# Integrations
"notion_sync": "paid",
"google_sheets_sync": "paid",
"airtable_sync": "paid",
"google_calendar_sync": "paid",
"apple_calendar_sync": "paid",
"slack_notifications": "paid",
}
# Free-tier integrations: google_drive, dropbox, onedrive, mega,
# nextcloud, discord, home_assistant
Storage in user.yaml
tier: free # free | paid | premium
dev_tier_override: premium # overrides tier locally — for testing only
Dev override UI
Settings → Developer tab (visible when dev_tier_override is set or DEV_MODE=true in .env). Single selectbox to switch tier instantly — page reruns, all gates re-evaluate, no restart needed. Also exposes a "Reset wizard" button that sets wizard_complete: false to re-enter the wizard without deleting existing config.
Gated UI behaviour
Paid/premium features show a muted tier_label() badge (🔒 Paid / ⭐ Premium) and a disabled state rather than being hidden entirely — free users see what they're missing. Clicking a locked ✨ button opens an upsell tooltip, not an error.
Resume Handling (Step 4)
Fast path — upload
- PDF →
pdfminer.sixextracts raw text - DOCX →
python-docxextracts paragraphs - Raw text → LLM structures into
plain_text_resume.yamlfields via background task - Populated form rendered for review/correction
Fallback — guided form builder
Walks through plain_text_resume.yaml section by section:
- Personal info (pre-filled from Step 3)
- Work experience (add/remove entries)
- Education
- Skills
- Achievements (optional)
Both paths converge on the same review form before saving. career_summary from the resume is fed back to populate Step 3 if not already set.
Outputs
aihawk/data_folder/plain_text_resume.yamlcareer_summarywritten back touser.yaml
LLM Generation Map
All ✨ actions submit a background task via task_runner.py using task type wizard_generate with a section parameter. The wizard step polls via @st.fragment(run_every=3) and shows inline status stages. Results land in session_state keyed by section and auto-populate the field on completion.
Status stages for all wizard generation tasks:
Queued → Analyzing → Generating → Done
| Step | Action | Tier | Input | Output |
|---|---|---|---|---|
| Identity | ✨ Generate career summary | Paid | Resume text | career_summary in user.yaml |
| Resume | ✨ Expand bullet points | Paid | Rough responsibility notes | Polished STAR-format bullets |
| Resume | ✨ Suggest skills | Paid | Experience descriptions | Skills list additions |
| Resume | ✨ Infer voice guidelines | Premium | Resume + uploaded cover letters | Voice/tone hints in user.yaml |
| Search | ✨ Suggest job titles | Paid | Resume + current titles | Additional title suggestions |
| Search | ✨ Suggest keywords | Paid | Resume + titles | resume_keywords.yaml additions |
| Search | ✨ Suggest blocklist | Paid | Resume + titles | blocklist.yaml additions |
| My Profile (post-wizard) | ✨ Suggest mission notes | Paid | Resume + LinkedIn URL | mission_preferences notes |
Optional Steps — Home Banners
After wizard completion, dismissible banners on the Home page surface remaining setup. Dismissed state stored as dismissed_banners: [...] in user.yaml.
| Banner | Links to |
|---|---|
| Connect a cloud service | Settings → Integrations |
| Set up email sync | Settings → Email |
| Set up email labels | Settings → Email (label guide) |
| Tune your mission preferences | Settings → My Profile |
| Configure keywords & blocklist | Settings → Search |
| Upload cover letter corpus | Settings → Fine-Tune |
| Configure LinkedIn Easy Apply | Settings → AIHawk |
| Set up company research | Settings → Services (SearXNG) |
| Build a target company list | Settings → Search |
| Set up notifications | Settings → Integrations |
| Tune a model | Settings → Fine-Tune |
| Review training data | Settings → Fine-Tune |
| Set up calendar sync | Settings → Integrations |
Integrations Architecture
The registry pattern means adding a new integration requires one file in scripts/integrations/ and one .yaml.example in config/integrations/ — the wizard and Settings tab auto-discover it.
class IntegrationBase:
name: str
label: str
tier: str
def connect(self, config: dict) -> bool: ...
def test(self) -> bool: ...
def sync(self, jobs: list[dict]) -> int: ...
def fields(self) -> list[dict]: ... # form field definitions for wizard card
Integration configs written to config/integrations/<name>.yaml only after a successful test() — never on partial input.
v1 Integration List
| Integration | Purpose | Tier |
|---|---|---|
| Notion | Job tracking DB sync | Paid |
| Notion Calendar | Covered by Notion integration | Paid |
| Google Sheets | Simpler tracker alternative | Paid |
| Airtable | Alternative tracker | Paid |
| Google Drive | Resume/cover letter storage | Free |
| Dropbox | Document storage | Free |
| OneDrive | Document storage | Free |
| MEGA | Document storage (privacy-first, cross-platform) | Free |
| Nextcloud | Self-hosted document storage | Free |
| Google Calendar | Write interview dates | Paid |
| Apple Calendar | Write interview dates (CalDAV) | Paid |
| Slack | Stage change notifications | Paid |
| Discord | Stage change notifications (webhook) | Free |
| Home Assistant | Notifications + automations (self-hosted) | Free |
Data Flow
Wizard step → Written to
──────────────────────────────────────────────────────────────
Hardware → user.yaml (inference_profile)
Tier → user.yaml (tier, dev_tier_override)
Identity → user.yaml (name, email, phone, linkedin,
career_summary, wizard_complete: false)
Resume (upload) → aihawk/data_folder/plain_text_resume.yaml
Resume (builder) → aihawk/data_folder/plain_text_resume.yaml
Inference → user.yaml (services block)
.env (ANTHROPIC_API_KEY, OPENAI_COMPAT_URL/KEY)
Search → config/search_profiles.yaml
config/resume_keywords.yaml
config/blocklist.yaml
Finish → user.yaml (wizard_complete: true)
config/llm.yaml (via apply_service_urls())
Integrations → config/integrations/<name>.yaml (per service,
only after successful test())
Background tasks → staging.db background_tasks table
LLM results → session_state[section] → field → user saves step
Key rules:
- Each mandatory step writes immediately on "Next" — partial progress survives crash or browser close
apply_service_urls()called once at Finish, not per-step- Integration configs never written on partial input — only after
test()passes
Testing
- Tier switching: Settings → Developer tab selectbox — instant rerun, no restart
- Wizard re-entry: Settings → Developer "Reset wizard" button sets
wizard_complete: false - Unit tests:
validate(data) -> list[str]on each step module — pure functions, no Streamlit - Integration tests:
tests/test_wizard_flow.py— full step sequence with mock LLM router and mock file writes DEV_MODE=truein.envmakes Developer tab always visible regardless ofdev_tier_override