Compare commits

...

227 commits

Author SHA1 Message Date
b4116e8bae feat: add pre-commit sensitive file blocker and support request issue template
All checks were successful
CI / test (push) Successful in 43s
Completes issue #7 (public mirror setup):
- .githooks/pre-commit: blocks sensitive filenames (.env, *.key, *.pem,
  id_rsa, credentials.json, etc.) and credential content patterns (private
  key headers, AWS keys, GitHub tokens, Stripe secret keys, generic API
  key assignments) from being committed
- .github/ISSUE_TEMPLATE/support_request.md: third issue template for
  usage questions alongside existing bug report and feature request
2026-03-16 11:30:11 -07:00
00a567768b fix: get_config_dir had one extra .parent, resolved to /config not /app/config 2026-03-15 17:14:48 -07:00
1ce283bb79 Merge pull request 'feat: LLM queue optimizer — resource-aware batch scheduler (closes #2)' (#15) from feature/llm-queue-optimizer into main 2026-03-15 16:48:37 -07:00
ab564741f4 fix: _trim_to_letter_end matches full name when no profile set
When _profile is None the fallback pattern \w+ only matched the first
word of a two-word sign-off (e.g. 'Alex' from 'Alex Rivera'), silently
dropping the last name. Switch fallback to \w+(?:\s+\w+)? so a full
first+last sign-off is preserved in no-config environments (CI, first run).
2026-03-15 16:43:27 -07:00
869cb2f197 ci: apt-get update before installing libsqlcipher-dev 2026-03-15 16:37:46 -07:00
27d6fc01fc ci: install libsqlcipher-dev before pip install 2026-03-15 16:36:50 -07:00
e034a07509 ci: re-trigger after actions enabled 2026-03-15 15:54:27 -07:00
2b9a6c8a22 ci: enable forgejo actions 2026-03-15 15:48:35 -07:00
e62548a22e ci: trigger runner 2026-03-15 15:39:45 -07:00
3267a895b0 docs: add Jobgether non-headless Playwright scraper to backlog
Xvfb-based Playwright can bypass Cloudflare Turnstile on jobgether.com.
Live inspection confirmed selectors; deferred pending Xvfb integration.
2026-03-15 11:59:48 -07:00
522534d28e feat: add Jobgether recruiter framing to cover letter generation
When source == "jobgether", build_prompt() injects a recruiter context
note directing the LLM to address the Jobgether recruiter using
"Your client [at {company}] will appreciate..." framing rather than
addressing the employer directly. generate() and task_runner both
thread the is_jobgether flag through automatically.
2026-03-15 09:45:51 -07:00
37119cb332 feat: add Jobgether URL detection and scraper to scrape_url.py 2026-03-15 09:45:50 -07:00
8d9e17d749 feat: filter Jobgether listings via blocklist 2026-03-15 09:45:50 -07:00
4d08e64acf docs: update spec — Jobgether discovery scraper not viable (Cloudflare + robots.txt) 2026-03-15 09:45:50 -07:00
fc6ef88a05 docs: add Jobgether integration implementation plan 2026-03-15 09:45:50 -07:00
952b21377f docs: add cover letter recruiter framing to Jobgether spec 2026-03-15 09:45:50 -07:00
9c87ed1cf2 docs: add Jobgether integration design spec 2026-03-15 09:45:50 -07:00
a1a1141616 Merge pull request 'feat: LLM queue optimizer — resource-aware batch scheduler (closes #2)' (#13) from feature/llm-queue-optimizer into main
Reviewed-on: #13
2026-03-15 05:11:29 -07:00
27d4b0e732 feat: LLM queue optimizer complete — closes #2
Resource-aware batch scheduler for LLM tasks:
- scripts/task_scheduler.py (new): TaskScheduler singleton with VRAM-aware
  batch scheduling, durability, thread-safe singleton, memory safety
- scripts/task_runner.py: submit_task() routes LLM types through scheduler
- scripts/db.py: reset_running_tasks() for durable restart behavior
- app/app.py: _startup() preserves queued tasks on restart
- config/llm.yaml.example: scheduler VRAM budget config documented
- tests/test_task_scheduler.py (new): 24 tests covering all behaviors

Pre-existing failure: test_generate_calls_llm_router (issue #12, unrelated)
2026-03-15 05:01:24 -07:00
95378c106e feat(app): use reset_running_tasks() on startup to preserve queued tasks 2026-03-15 04:57:49 -07:00
07c627cdb0 feat(task_runner): route LLM tasks through scheduler in submit_task()
Replaces the spawn-per-task model for LLM task types with scheduler
routing: cover_letter, company_research, and wizard_generate are now
enqueued via the TaskScheduler singleton for VRAM-aware batching.
Non-LLM tasks (discovery, email_sync, etc.) continue to spawn daemon
threads directly. Adds autouse clean_scheduler fixture to
test_task_runner.py to prevent singleton cross-test contamination.
2026-03-15 04:52:42 -07:00
bcd918fb67 feat(scheduler): add durability — re-queue surviving LLM tasks on startup 2026-03-15 04:24:11 -07:00
207d3816b3 feat(scheduler): implement thread-safe singleton get_scheduler/reset_scheduler 2026-03-15 04:19:23 -07:00
3984a9c743 feat(scheduler): implement scheduler loop and batch worker with VRAM-aware scheduling 2026-03-15 04:14:56 -07:00
4d055f6bcd feat(scheduler): implement enqueue() with depth guard and ghost-row cleanup 2026-03-15 04:05:22 -07:00
28e66001a3 refactor(scheduler): use module-level _get_gpus directly in __init__ 2026-03-15 04:01:01 -07:00
535c0ae9e0 feat(scheduler): implement TaskScheduler.__init__ with budget loading and VRAM detection 2026-03-15 03:32:11 -07:00
3d7f6f7ff1 feat(scheduler): add task_scheduler.py skeleton with constants and TaskSpec 2026-03-15 03:28:43 -07:00
52470759a4 docs(config): add scheduler VRAM budget config to llm.yaml.example 2026-03-15 03:28:26 -07:00
d51066e8c2 refactor(tests): remove unused imports from test_task_scheduler 2026-03-15 03:27:17 -07:00
905db2f147 feat(db): add reset_running_tasks() for durable scheduler restart 2026-03-15 03:22:45 -07:00
eef2478948 docs: add LLM queue optimizer implementation plan
11-task TDD plan across 3 reviewed chunks. Covers:
- reset_running_tasks() db helper
- TaskScheduler skeleton + __init__ + enqueue + loop + workers
- Thread-safe singleton, durability, submit_task routing shim
- app.py startup change + full suite verification
2026-03-14 17:11:49 -07:00
beb1553821 docs: revise queue optimizer spec after review
Addresses 16 review findings across two passes:
- Clarify _active.pop/double-decrement non-issue
- Fix app.py change target (inline SQL, not kill_stuck_tasks)
- Scope durability to LLM types only
- Add _budgets to state table with load logic
- Fix singleton safety explanation (lock, not GIL)
- Ghost row fix: mark dropped tasks failed in DB
- Document static _available_vram as known limitation
- Fix test_llm_tasks_batch_by_type description
- Eliminate circular import via routing split in submit_task()
- Add missing budget warning at construction
2026-03-14 16:46:38 -07:00
61dc2122e4 docs: add LLM queue optimizer design spec
Resource-aware batch scheduler for LLM tasks. Closes #2.
2026-03-14 16:38:47 -07:00
0f80b698ff chore: add .worktrees/ to .gitignore
Prevents worktree directories from being tracked.
2026-03-14 16:30:38 -07:00
097def4bba fix(linkedin): update selectors for 2025 public DOM; surface login-wall limitation in UI
LinkedIn's unauthenticated public profile only exposes name, summary (truncated),
current employer name, and certifications. Past roles, education, and skills are
blurred server-side behind a login wall — not a scraper limitation.

- Update selectors: data-section='summary' (was 'about'), .profile-section-card
  for certs, .visible-list for current experience entry
- Strip login-wall noise injected into summary text after 'see more'
- Skip aria-hidden blurred placeholder experience items
- Add info callout in UI directing users to data export zip for full history
2026-03-13 19:47:21 -07:00
1a50bc1392 chore: update changelog for v0.4.0 release 2026-03-13 11:28:03 -07:00
d1fb4abd56 docs: update backlog with LinkedIn import follow-up items 2026-03-13 11:24:55 -07:00
6c7499752c fix(cloud): use per-user config dir for wizard gate; redirect on invalid session
- app.py: wizard gate now reads get_config_dir()/user.yaml instead of
  hardcoded repo-level config/ — fixes perpetual onboarding loop in
  cloud mode where per-user wizard_complete was never seen
- app.py: page title corrected to "Peregrine"
- cloud_session.py: add get_config_dir() returning per-user config path
  in cloud mode, repo config/ locally
- cloud_session.py: replace st.error() with JS redirect on missing/invalid
  session token so users land on login page instead of error screen
- Home.py, 4_Apply.py, migrate.py: remove remaining AIHawk UI references
2026-03-13 11:24:42 -07:00
42f0e6261c fix(linkedin): conservative settings merge, mkdir guard, split dockerfile playwright layer 2026-03-13 10:58:58 -07:00
1e12da45f1 fix(linkedin): move session state pop before tabs; add rerun after settings merge
- Pop _linkedin_extracted before st.tabs() so tab_builder sees the
  freshly populated _parsed_resume in the same render pass (no extra rerun needed)
- Fix tab label capitalisation: "Build Manually" (capital M) per spec
- Add st.rerun() after LinkedIn merge in Settings so form fields
  refresh immediately to show the newly applied data
2026-03-13 10:55:25 -07:00
b80e4de050 feat(linkedin): install Playwright Chromium in Docker image 2026-03-13 10:44:03 -07:00
7489c1c12a feat(linkedin): add LinkedIn import expander to Settings Resume Profile tab 2026-03-13 10:44:02 -07:00
97ab8b94e5 feat(linkedin): add LinkedIn tab to wizard resume step 2026-03-13 10:43:53 -07:00
bd0e9240eb feat(linkedin): add shared LinkedIn import Streamlit widget 2026-03-13 10:32:23 -07:00
5344dc8e7a feat(linkedin): add staging file parser with re-parse support 2026-03-13 10:18:01 -07:00
fba6796b8a fix(linkedin): improve scraper error handling, current-job date range, add missing tests 2026-03-13 06:02:03 -07:00
f759f5fbc0 feat(linkedin): add scraper (Playwright + export zip) with URL validation 2026-03-13 01:06:39 -07:00
530f4346d1 feat(linkedin): add HTML parser utils with fixture tests 2026-03-13 01:01:05 -07:00
db26b9aaf9 feat(cloud): add Heimdall tier resolution to cloud_session
Calls /admin/cloud/resolve after JWT validation to inject the user's
current subscription tier (free/paid/premium/ultra) into session_state
as cloud_tier. Cached 5 minutes via st.cache_data to avoid Heimdall
spam on every Streamlit rerun. Degrades gracefully to free on timeout
or missing token.

New env vars: HEIMDALL_URL, HEIMDALL_ADMIN_TOKEN (added to .env.example
and compose.cloud.yml). HEIMDALL_URL defaults to http://cf-license:8000
for internal Docker network access.

New helper: get_cloud_tier() — returns tier string in cloud mode, "local"
in local-first mode, so pages can distinguish self-hosted from cloud.
2026-03-10 12:31:14 -07:00
97b695c3e3 fix(cloud): extract cf_session cookie by name from X-CF-Session header 2026-03-10 09:22:08 -07:00
72320315e2 docs: add cloud architecture + cloud-deployment.md
architecture.md: updated Docker Compose table (3 compose files), database
layer (Postgres platform + SQLite-per-user), cloud session middleware,
telemetry system, and cloud design decisions.

cloud-deployment.md (new): full operational runbook — env vars, data root
layout, GDPR deletion, platform DB queries, telemetry, backup/restore,
Caddy routing, demo instance, and onboarding a new app to the cloud.
2026-03-09 23:02:29 -07:00
37dcdec754 feat(cloud): fix backup/restore for cloud mode — SQLCipher encrypt/decrypt
T13: Three fixes:
1. backup.py: _decrypt_db_to_bytes() decrypts SQLCipher DB before archiving
   so the zip is portable to any local Docker install (plain SQLite).
2. backup.py: _encrypt_db_from_bytes() re-encrypts on restore in cloud mode
   so the app can open the restored DB normally.
3. 2_Settings.py: _base_dir uses get_db_path().parent in cloud mode (user's
   per-tenant data dir) instead of the hardcoded app root; db_key wired
   through both create_backup() and restore_backup() calls.

6 new cloud backup tests + 2 unit tests for SQLCipher helpers (pysqlcipher3
mocked — not available in the local conda test env). 419/419 total passing.
2026-03-09 22:41:44 -07:00
ce19e00cfe feat(cloud): Privacy & Telemetry tab in Settings + update_consent()
T11: Add CLOUD_MODE-gated Privacy tab to Settings with full telemetry
consent UI — hard kill switch, anonymous usage toggle, de-identified
content sharing toggle, and time-limited support access grant. All changes
persist to telemetry_consent table via new update_consent() in telemetry.py.

Tab and all DB calls are completely no-op in local mode (CLOUD_MODE=false).
2026-03-09 22:14:22 -07:00
8f9955fa96 feat(cloud): add compose.cloud.yml and telemetry consent middleware
T8: compose.cloud.yml — multi-tenant cloud stack on port 8505, CLOUD_MODE=true,
per-user encrypted data at /devl/menagerie-data, joins caddy-proxy_caddy-internal
network; .env.example extended with five cloud-only env vars.

T10: app/telemetry.py — log_usage_event() is the ONLY entry point to usage_events
table; hard kill switch (all_disabled) checked before any DB write; complete no-op
in local mode; swallows all exceptions so telemetry never crashes the app;
psycopg2-binary added to requirements.txt. Event calls wired into 4_Apply.py at
cover_letter_generated and job_applied. 5 tests, 413/413 total passing.
2026-03-09 22:10:18 -07:00
5a1fceda84 feat(peregrine): wire cloud_session into pages for multi-tenant db path routing
resolve_session() is a no-op in local mode — no behavior change for existing users.
In cloud mode, injects user-scoped db_path into st.session_state at page load.
2026-03-09 20:22:17 -07:00
634e31968f feat(peregrine): add cloud_session middleware + SQLCipher get_connection()
cloud_session.py: no-op in local mode; in cloud mode resolves Directus JWT
from X-CF-Session header to per-user db_path in st.session_state.

get_connection() in scripts/db.py: transparent SQLCipher/sqlite3 switch —
uses encrypted driver when CLOUD_MODE=true and key provided, vanilla sqlite3
otherwise. libsqlcipher-dev added to Dockerfile for Docker builds.

6 new cloud_session tests + 1 new get_connection test — 34/34 db tests pass.
2026-03-09 19:43:42 -07:00
2fdf6f725e fix(peregrine): correct port comment in compose.demo.yml, update CLAUDE.md 2026-03-09 15:22:10 -07:00
fbd47368ff chore(peregrine): rename compose.menagerie.yml to compose.demo.yml
Public demo instances moving to demo.circuitforge.tech;
menagerie.circuitforge.tech reserved for cloud-hosted managed instances.
2026-03-09 14:55:38 -07:00
2124b24e3d docs: update features table to reflect BYOK tier policy
AI features (cover letter gen, research, interview prep, survey assistant)
are now correctly shown as unlockable at the free tier with any local LLM
or user-supplied API key. Paid tier value prop is managed cloud inference
+ integrations + email sync, not AI feature gating.

Also fixes circuitforge.io → circuitforge.tech throughout.
2026-03-07 22:17:18 -08:00
88f28c2b41 chore: move internal plans to circuitforge-plans repo
All docs/plans/ files migrated to pyr0ball/circuitforge-plans.
Keeping docs/ for future user-facing documentation.
2026-03-07 15:38:47 -08:00
28cc03ba70 chore: expand peregrine .gitleaks.toml allowlists for history scan
Suppress false positives found during pre-push history scan:
- Path allowlists: docs/plans/*, tests/*, Streamlit app files,
  SearXNG default config, apple_calendar.py placeholder
- Regex allowlists: Unix epoch timestamps, localhost ports,
  555-area-code variants, CFG-* example license key patterns
- All 164 history commits now scan clean
2026-03-07 13:24:18 -08:00
7de630e065 chore: activate circuitforge-hooks, add peregrine .gitleaks.toml
- Wire core.hooksPath → circuitforge-hooks/hooks via install.sh
- Add .gitleaks.toml extending shared base config with Peregrine-specific
  allowlists (Craigslist/LinkedIn IDs, localhost port patterns)
- Remove .githooks/pre-commit (superseded by gitleaks hook)
- Update setup.sh activate_git_hooks() to call circuitforge-hooks/install.sh
  with .githooks/ as fallback if hooks repo not present
2026-03-07 13:20:52 -08:00
1cf6e370b1 docs: circuitforge-hooks implementation plan (8 tasks, TDD) 2026-03-07 12:27:47 -08:00
9d2ed1d00d docs: circuitforge-hooks design — gitleaks-based secret + PII scanning
Centralised pre-commit/pre-push hook repo design covering the token leak
root causes: unactivated hooksPath and insufficient regex coverage.
2026-03-07 12:23:54 -08:00
1b500b9f26 docs: update changelog for v0.3.0 release
- Add v0.3.0 section: feedback button, BYOK warning, LLM suggest,
  backup/restore, privacy scrub
- Retroactively document v0.2.0 (was in [Unreleased])
- Clear [Unreleased] for future work
2026-03-06 16:04:28 -08:00
d1c5c89da7 feat: merge feedback-button branch — BYOK warning, PII scrub, LLM suggest, sidebar indicator
Key changes in this branch:
- BYOK cloud backend detection (scripts/byok_guard.py) with full test coverage
- Sidebar amber badge when any cloud LLM backend is active
- Activation warning + acknowledgment required when enabling cloud backend in Settings
- Privacy policy reference doc added
- Suggest search terms, resume keywords, and LLM suggest button in Settings
- Test suite anonymized: real personal data replaced with fictional Alex Rivera
- Full PII scrub from git history (name, email, phone number)
- Digest email parser design doc
- Settings widget crash fixes, Docker service controls, backup/restore script
2026-03-06 16:01:44 -08:00
bf8eee8a62 test: anonymize real personal data — use fictional Alex Rivera throughout test suite 2026-03-06 15:35:04 -08:00
d3f86f2143 fix: remove dead byok_cloud_acknowledged scalar key — list is the authority 2026-03-06 15:17:26 -08:00
8da36f251c docs: clarify byok acknowledgment semantics and double-read intent 2026-03-06 15:14:26 -08:00
89f11b0cae feat: byok activation warning — require acknowledgment when enabling cloud LLM 2026-03-06 15:09:43 -08:00
84862b8ab8 fix: use explicit utf-8 encoding when reading llm.yaml in sidebar 2026-03-06 14:52:22 -08:00
5827386789 feat: sidebar cloud LLM indicator — amber badge when any cloud backend active 2026-03-06 14:48:20 -08:00
7ca348b97f test: add missing base_url edge case + clarify 0.0.0.0 marker intent
Document defensive behavior: openai_compat with no base_url returns True
(cloud) because unknown destination is assumed cloud. Add explanatory
comment to LOCAL_URL_MARKERS for the 0.0.0.0 bind-address case.
2026-03-06 14:43:45 -08:00
329baf013f feat: byok_guard — cloud backend detection with full test coverage 2026-03-06 14:40:06 -08:00
67634d459a docs: digest parsers implementation plan (TDD, 6 tasks) 2026-03-05 22:41:40 -08:00
5124d18770 docs: add privacy policy reference 2026-03-05 20:59:01 -08:00
92e0ea0ba1 feat: add LLM suggest button to Skills & Keywords section
Places a  Suggest button inline with the Skills & Keywords subheader.
On click, calls suggest_resume_keywords() and stores results in session
state. Suggestions render as per-category chip panels (skills, domains,
keywords); clicking a chip appends it to the YAML and removes it from
the panel. A ✕ Clear button dismisses the panel entirely.
2026-03-05 15:13:57 -08:00
0e30096a88 feat: wire enhanced suggest_search_terms into Search tab (three-angle excludes)
- Remove old inline _suggest_search_terms (no blocklist/profile awareness)
- Replace with import shim delegating to scripts/suggest_helpers.py
- Call site now loads blocklist.yaml + user.yaml and passes them through
- Update button help text to reflect blocklist, mission values, career background
2026-03-05 15:08:07 -08:00
2bae1a92ed feat: add suggest_resume_keywords for skills/domains/keywords gap analysis
Replaces NotImplementedError stub with full LLM-backed implementation.
Builds a prompt from the last 3 resume positions plus already-selected
skills/domains/keywords, calls LLMRouter, and returns de-duped suggestions
in all three categories.
2026-03-05 15:00:53 -08:00
dbcd2710ae fix: guard mission_preferences values against non-string types in suggest_search_terms 2026-03-05 13:40:53 -08:00
5f1c372c0a feat: add suggest_search_terms with three-angle exclude analysis
Replaces NotImplementedError stub with a real LLMRouter-backed implementation
that builds a structured prompt covering blocklist alias expansion, values
misalignment, and role-type filtering, then parses the JSON response into
suggested_titles and suggested_excludes lists.

Moves LLMRouter import to module level so tests can patch it at
scripts.suggest_helpers.LLMRouter.
2026-03-05 13:15:25 -08:00
efe71150e3 docs: digest email parser design — LinkedIn/Adzuna/Ladders registry + Avocet bucket 2026-03-05 12:56:53 -08:00
8166204c05 fix: Settings widget crash, stale setup banners, Docker service controls
- Settings → Search: add-title (+) and Import buttons crashed with
  StreamlitAPIException when writing to _sp_titles_multi after it was
  already instantiated. Fix: pending-key pattern (_sp_titles_pending /
  _sp_locs_pending) applied before widget renders on next pass.

- Home setup banners: fired for email/notion/keywords even when those
  features were already configured. Add 'done' condition callables
  (_email_configured, _notion_configured, _keywords_configured) to
  suppress banners automatically when config files are present.

- Services tab start/stop buttons: docker CLI was unavailable inside
  the container so _docker_available was False and buttons never showed.
  Bind-mount host /usr/bin/docker (ro) + /var/run/docker.sock into the
  app container so it can control sibling containers via DooD pattern.
2026-03-04 12:11:23 -08:00
11997f8a13 fix: DEFAULT_DB respects STAGING_DB env var — was ignoring Docker-set path 2026-03-04 11:47:59 -08:00
e5d606ab4b feat: backup/restore script with multi-instance and legacy support
- create_backup() / restore_backup() / list_backup_contents() public API
- --base-dir PATH flag: targets any instance root (default: this repo)
  --base-dir /devl/job-seeker backs up the legacy Conda install
- _DB_CANDIDATES fallback: data/staging.db (Peregrine) or staging.db root (legacy)
- Manifest records source label (dir name), source_path, created_at, files, includes_db
- Added config/resume_keywords.yaml and config/server.yaml to backup lists
- 21 tests covering create, list, restore, legacy DB path, overwrite, roundtrip
2026-03-04 10:52:51 -08:00
db3dff268a fix: save form data to non-widget state on Next, fix disabled timing, pass page title 2026-03-03 15:17:45 -08:00
e9b389feb6 fix: llm_backend reads fallback_order, logs tee'd to data/.streamlit.log in Docker 2026-03-03 15:04:18 -08:00
483ca00f1a feat: paste/drag-drop image component, remove server-side Playwright capture button 2026-03-03 14:40:47 -08:00
ecad32cd6f fix: remove st.rerun() from dialog nav buttons — caused dialog to close on Next/Back 2026-03-03 13:28:26 -08:00
d05cb91401 fix: pass FORGEJO env vars into app container 2026-03-03 13:17:37 -08:00
3d17122334 fix: lazy-import playwright in screenshot_page, fix SQLite connection leak in collect_listings 2026-03-03 12:45:39 -08:00
2ab396bad0 feat: wire feedback button into app.py sidebar 2026-03-03 12:38:53 -08:00
199daebb87 feat: floating feedback button + two-step dialog (Streamlit shell) 2026-03-03 12:20:27 -08:00
f7f438df70 feat: feedback_api — screenshot_page with Playwright (graceful fallback) 2026-03-03 12:14:33 -08:00
e1f65d8fe9 feat: feedback_api — Forgejo label management + issue filing + attachment upload 2026-03-03 12:09:11 -08:00
20f9933e99 feat: feedback_api — build_issue_body 2026-03-03 12:00:01 -08:00
60dab647f2 feat: feedback_api — collect_logs + collect_listings 2026-03-03 11:56:35 -08:00
cad7b9ba35 chore: remove unused imports from feedback_api (will be re-added in later tasks) 2026-03-03 11:45:14 -08:00
5f466fa107 feat: feedback_api — mask_pii + collect_context 2026-03-03 11:43:35 -08:00
c3dc05fe34 chore: add playwright dep and Forgejo env config for feedback button 2026-03-03 11:38:14 -08:00
1efb033b6f docs: feedback button implementation plan (8 tasks, TDD) 2026-03-03 11:31:19 -08:00
2d9b8d10f9 docs: feedback button design (floating button, Forgejo integration, PII masking, screenshot support) 2026-03-03 11:22:20 -08:00
791e11d5d5 ci: add GitHub Actions pytest workflow 2026-03-02 20:44:33 -08:00
86613d0218 docs: add canonical-source banner and CI badge to README 2026-03-02 20:44:23 -08:00
5254212cb4 feat: issue templates, PR template, security redirect 2026-03-02 19:35:06 -08:00
435f2e71fd docs: add CONTRIBUTING.md with BSL policy and CLA note 2026-03-02 19:26:25 -08:00
0d6aa5975e docs: add SECURITY.md — responsible disclosure policy 2026-03-02 19:26:23 -08:00
476ede4267 feat: setup.sh activates .githooks on clone 2026-03-02 19:17:05 -08:00
a2f4102d78 feat: commit-msg hook enforces conventional commit format 2026-03-02 19:14:31 -08:00
0306b3716d feat: pre-commit hook blocks sensitive files and key patterns 2026-03-02 19:12:14 -08:00
adc3526470 docs: public mirror strategy design (GitHub + Codeberg + git hooks) 2026-03-02 18:49:03 -08:00
75499bc250 docs: update tier-system reference with BYOK policy + demo user.yaml
docs/reference/tier-system.md:
  - Rewritten tier table: free tier now described as "AI unlocks with BYOK"
  - New BYOK section explaining the policy and rationale
  - Feature gate table gains BYOK-unlocks? column
  - API reference updated: can_use, tier_label, has_configured_llm with examples
  - "Adding a new feature gate" guide updated to cover BYOK_UNLOCKABLE

demo/config/user.yaml:
  - Reformatted by YAML linter; added dismissed_banners for demo UX
2026-03-02 13:22:10 -08:00
1e5d354209 feat: BYOK unlocks LLM features regardless of tier
BYOK policy: if a user supplies any LLM backend (local ollama/vllm or
their own API key), they get full access to AI generation features.
Charging for the UI around a service they already pay for is bad UX.

app/wizard/tiers.py:
  - BYOK_UNLOCKABLE frozenset: pure LLM-call features that unlock with
    any configured backend (llm_career_summary, company_research,
    interview_prep, survey_assistant, voice guidelines, etc.)
  - has_configured_llm(): checks llm.yaml for any enabled non-vision
    backend; local + external API keys both count
  - can_use(tier, feature, has_byok=False): BYOK_UNLOCKABLE features
    return True when has_byok=True regardless of tier
  - tier_label(feature, has_byok=False): suppresses lock icon for
    BYOK_UNLOCKABLE features when BYOK is active

Still gated (require CF infrastructure, not just an LLM call):
  llm_keywords_blocklist, email_classifier, model_fine_tuning,
  shared_cover_writer_model, multi_user, all integrations

app/pages/2_Settings.py:
  - Compute _byok = has_configured_llm() once at page load
  - Pass has_byok=_byok to can_use() for _gen_panel_active
  - Update caption to mention BYOK as an alternative to paid tier

app/pages/0_Setup.py:
  - Wizard generation widget passes has_byok=has_configured_llm()
    to can_use() and tier_label()

tests/test_wizard_tiers.py:
  - 6 new BYOK-specific tests covering unlock, non-unlock, and
    label suppression cases
2026-03-02 11:34:36 -08:00
bc7e3c8952 feat: DEMO_MODE — isolated public menagerie demo instance
Adds a fully neutered public demo for menagerie.circuitforge.tech/peregrine
that shows the Peregrine UI without exposing any personal data or real LLM inference.

scripts/llm_router.py:
  - Block all inference when DEMO_MODE env var is set (1/true/yes)
  - Raises RuntimeError with a user-friendly "public demo" message

app/app.py:
  - IS_DEMO constant from DEMO_MODE env var
  - Wizard gate bypassed in demo mode (demo/config/user.yaml pre-seeds a fake profile)
  - Demo banner in sidebar: explains read-only status + links to circuitforge.tech

compose.menagerie.yml (new):
  - Separate Docker Compose project (peregrine-demo) on host port 8504
  - Mounts demo/config/ and demo/data/ — isolated from personal instance
  - DEMO_MODE=true, no API keys, no /docs mount
  - Project name: peregrine-demo (run alongside personal instance)

demo/config/user.yaml:
  - Generic "Demo User" profile, wizard_complete=true, no real personal info

demo/config/llm.yaml:
  - All backends disabled (belt-and-suspenders alongside DEMO_MODE block)

demo/data/.gitkeep:
  - staging.db is auto-created on first run, gitignored via demo/data/*.db

.gitignore: add demo/data/*.db

Caddy routes menagerie.circuitforge.tech/peregrine* → 8504 (demo instance).
Personal Peregrine remains on 8502, unchanged.
2026-03-02 11:22:38 -08:00
044b25e838 feat: add reverse-proxy basepath support (Streamlit MIME fix)
- compose.yml: pass STREAMLIT_SERVER_BASE_URL_PATH from .env into container
  Streamlit prefixes all asset URLs with the path so Caddy handle_path routing works.
  Without this, /static/* requests skip the /peregrine* route → 503 text/plain MIME error.
- config/server.yaml.example: document base_url_path + server_port settings
- .gitignore: ignore config/server.yaml (local gitignored instance of server.yaml.example)
- app/pages/2_Settings.py: add Deployment/Server expander under System tab
  Shows active base URL path from env; saves edits to config/server.yaml + .env;
  prompts user to run ./manage.sh restart to apply.

Refs: https://docs.streamlit.io/develop/api-reference/configuration/config.toml#server.baseUrlPath
2026-03-01 22:49:29 -08:00
43bf30fac5 feat: discard button — removes email from queue without writing to score file 2026-02-27 15:48:47 -08:00
39e8194679 fix: RerankerAdapter falls back to label name when no LABEL_DESCRIPTIONS entry 2026-02-27 14:54:31 -08:00
7dab560938 feat: label_tool — 9 labels, wildcard Other, InvalidCharacterError fix; sync with avocet canonical 2026-02-27 14:34:24 -08:00
30a2962797 feat: add health mission category, trim-to-sign-off, max_tokens cap for cover letters
- _MISSION_SIGNALS: add health category (pharma, clinical, patient care, etc.)
  listed last so music/animals/education/social_impact take priority
- _MISSION_DEFAULTS: health note steers toward people-first framing, not
  industry enthusiasm — focuses on patients navigating rare/invisible journeys
- _trim_to_letter_end(): cuts output at first sign-off + first name to prevent
  fine-tuned models from looping into repetitive garbage after completing letter
- generate(): pass max_tokens=1200 to router (prevents runaway output)
- user.yaml.example: add health + social_impact to mission_preferences,
  add candidate_voice field for per-user voice/personality context
2026-02-27 12:31:06 -08:00
9b24599832 feat: dual-GPU DUAL_GPU_MODE complete — ollama/vllm/mixed GPU 1 selection 2026-02-27 06:20:57 -08:00
7e96e57d92 feat: benchmark_classifier — MODEL_REGISTRY, --list-models, --score, --compare modes 2026-02-27 06:19:32 -08:00
6febea216e feat: inject DUAL_GPU_MODE sub-profile in Makefile; update manage.sh help 2026-02-27 06:18:34 -08:00
207fbdbb69 feat: add ollama_research service and update profiles for dual-gpu sub-profiles 2026-02-27 06:16:17 -08:00
ca1e4b062a feat: assign ollama_research to GPU 1 in Docker and Podman GPU overlays 2026-02-27 06:16:04 -08:00
88908ceca2 feat: add DUAL_GPU_MODE default, VRAM warning, and download size report to preflight
- Add _mixed_mode_vram_warning() to flag low VRAM on GPU 1 in mixed mode
- Wire download size report block into main() before closing border line
- Wire mixed-mode VRAM warning into report if triggered
- Write DUAL_GPU_MODE=ollama default to .env for new 2-GPU setups (no override if already set)
- Promote import os to top-level (was local import inside get_cpu_cores)
2026-02-27 00:17:00 -08:00
be28aba07f feat: add _download_size_mb() pure function for preflight size warning 2026-02-27 00:15:26 -08:00
637e8379b6 feat: add ollama_research to preflight service table and LLM backend map 2026-02-27 00:14:04 -08:00
128ab11763 test: add failing tests for dual-gpu preflight additions 2026-02-27 00:11:39 -08:00
efc7a1f0bc feat: ZeroShotAdapter, GLiClassAdapter, RerankerAdapter with full mock test coverage 2026-02-27 00:10:43 -08:00
e4b6456bc9 feat: ClassifierAdapter ABC + compute_metrics() with full test coverage 2026-02-27 00:09:45 -08:00
488fa71891 feat: add vllm_research backend and update research_fallback_order 2026-02-27 00:09:00 -08:00
ea708321e4 feat: add scoring JSONL example and gitignore for benchmark data files 2026-02-26 23:46:29 -08:00
85f0f648b0 feat: add job-seeker-classifiers conda env for HF classifier benchmark 2026-02-26 23:43:41 -08:00
2df61eedd2 docs: email classifier benchmark implementation plan — 10 tasks, TDD, 9-model registry 2026-02-26 23:20:04 -08:00
a7fe4d9ff4 docs: email classifier benchmark design — adapter pattern, 9-model registry, compare+eval modes 2026-02-26 22:56:11 -08:00
ae7c985fab fix: remove lib-resume-builder-aihawk from Docker requirements
The package is never imported in the app — it was pulling torch + CUDA
(~7GB) into the main app container for no reason. AIHawk runs in its own
conda env (aihawk-env) outside Docker per design.
2026-02-26 22:16:28 -08:00
6dd89a0863 fix: auto-configure git safe.directory in setup.sh for /opt-style installs
Git 2.35.2+ rejects repos where directory owner != current user, which
is the common case when cloned as root into /opt. setup.sh now detects
this and calls git config --global --add safe.directory automatically.
When run via sudo, it writes into SUDO_USER's config rather than root's.
README updated with both fixes: git safe.directory and chown for preflight.
2026-02-26 22:07:39 -08:00
c287392c39 docs: add install notes for /opt ownership, Podman rootless, Docker group 2026-02-26 21:15:42 -08:00
b4f7a7317d fix: skip --profile for remote profile; fixes podman-compose compat
podman-compose 1.0.6 has no --profile flag, causing a fatal parse error.
'remote' profile means base services only — no service in compose.yml is
tagged 'remote', so --profile remote was always a no-op with Docker too.
Introduce PROFILE_ARG that only adds --profile for cpu/gpu profiles where
it actually activates optional services.
2026-02-26 21:12:12 -08:00
2fe0e0e2f2 fix: render banner link as clickable page_link instead of italic text 2026-02-26 20:53:54 -08:00
657f9c4060 fix: install make in setup.sh; guard manage.sh against missing make
setup.sh now installs make (via apt/dnf/pacman/brew) before git and
Docker so that manage.sh commands work out of the box on minimal server
installs. manage.sh adds a preflight guard that catches a missing make
early and redirects the user to ./manage.sh setup. Also fixes the
post-setup next-steps hint to use ./manage.sh instead of bare make.
2026-02-26 20:51:34 -08:00
3b2870ddf1 feat: show version tag in sidebar footer 2026-02-26 14:39:47 -08:00
bef92d667e feat: multiselect tags for job titles & locations; remove duplicate Notion section; docker detection for services panel
- Job titles and locations: replaced text_area with st.multiselect + + add button + paste-list expander
-  Suggest now populates the titles dropdown (not auto-selected) — user picks what they want
- Suggested exclusions still use click-to-add chip buttons
- Removed duplicate Notion expander from System Settings (handled by Integrations tab)
- Services panel: show host terminal copy-paste command when docker CLI unavailable (app runs inside container)
2026-02-26 14:26:58 -08:00
de8fb1ddc7 fix: add address field to Resume Profile — was hidden, triggering false FILL_IN banner 2026-02-26 14:03:55 -08:00
fe09e23f4c fix: port drift on restart — down before preflight, read port from .env
Makefile restart target now runs compose down before preflight so ports
are free when preflight assigns them; previously preflight ran first while
the old container still held 8502, causing it to bump to 8503.

manage.sh start/restart/open now read STREAMLIT_PORT from .env instead
of re-running preflight after startup (which would see the live container
and bump the reported port again).
2026-02-26 13:57:12 -08:00
8caf7b6356 feat: resume upload in Settings + improved config hints
- Resume Profile tab: upload widget replaces error+stop when YAML missing;
  collapsed "Replace Resume" expander when profile exists; saves parsed
  data and raw text (for LLM context) in one step
- FILL_IN banner with clickable link to Setup wizard when incomplete fields detected
- Ollama not reachable hint references Services section below
- Fine-tune hint clarifies "My Profile tab above" with inference profile names
- vLLM no-models hint links to Fine-Tune tab
2026-02-26 13:53:01 -08:00
8887955e7d refactor: replace sidebar LLM generate panel with inline field buttons
Removed the dropdown-based sidebar panel in favour of  Generate buttons
placed directly below Career Summary, Voice & Personality, and each Mission
& Values row. Prompts now incorporate the live field value as a draft to
improve, plus resume experience bullets as context for Career Summary.
2026-02-26 13:40:52 -08:00
d13505e760 feat: searchable tag UI for skills/domains/keywords
Replace chip-button tag management with st.multiselect backed by bundled
suggestions. Existing user tags are preserved as custom options alongside
the suggestion list. Custom tag input validates through filter_tag() before
adding — rejects URLs, profanity, overlong strings, and bad characters.
Changes auto-save on multiselect interaction; custom tags append on + click.
2026-02-26 13:14:55 -08:00
64487a6abb feat: bundled skills suggestion list and content filter utility
- config/skills_suggestions.yaml: 168 curated tags across skills (77),
  domains (40), keywords (51) covering CS/TAM/ops and common tech roles;
  structured for future community aggregate (paid tier backlog)
- scripts/skills_utils.py: filter_tag() rejects blanks, URLs, profanity,
  overlong strings, disallowed chars, and repeated-char runs;
  load_suggestions() reads bundled YAML per category
2026-02-26 13:09:32 -08:00
84b9490f46 fix: resume CID glyphs, resume YAML path, PyJWT dep, candidate voice & mission UI
- resume_parser: add _clean_cid() to strip (cid:NNN) glyph refs from ATS PDFs;
  CIDs 127/149/183 become bullets, unknowns are stripped; applied to PDF/DOCX/ODT
- resume YAML: canonicalize plain_text_resume.yaml path to config/ across all
  references (Settings, Apply, Setup, company_research, migrate); was pointing at
  unmounted aihawk/data_folder/ in Docker
- requirements/environment: add PyJWT>=2.8 (was missing; broke Settings page)
- user_profile: add candidate_voice field
- generate_cover_letter: inject candidate_voice into SYSTEM_CONTEXT; add
  social_impact mission signal category (nonprofit, community, equity, etc.)
- Settings: add Voice & Personality textarea to Identity expander; add
  Mission & Values expander with editable fields for all 4 mission categories
- .gitignore: exclude CLAUDE.md, config/plain_text_resume.yaml,
  config/user.yaml.working
- search_profiles: add default profile
2026-02-26 12:32:28 -08:00
e54208fc14 feat: ODT support, two-column PDF column-split extraction, title/company layout detection hardening 2026-02-26 10:33:28 -08:00
01a341e4c5 fix: harden resume section detection — anchor patterns to full line, expand header synonyms, fix name heuristic for hyphenated/middle-initial names, add parse diagnostics UI 2026-02-26 09:28:31 -08:00
d6545cf496 refactor: replace LLM-based resume parser with section regex parser
Primary parse path is now fully deterministic — no LLM, no token limits,
no JSON generation. Handles two-column experience headers, institution-before-
or-after-degree education layouts, and header bleed prevention via
looks_like_header detection.

LLM path retained as optional career_summary enhancement only (1500 chars,
falls back silently). structure_resume() now returns tuple[dict, str].
Tests updated to match the new API.
2026-02-26 07:34:25 -08:00
9fb207c15c fix: resume parser — max_tokens, json-repair fallback, logging, PYTHONUNBUFFERED 2026-02-26 00:00:23 -08:00
f35fec33e9 fix: add python-docx to container requirements 2026-02-25 23:43:30 -08:00
35056161d7 fix: add /v1 prefix to all license server API paths 2026-02-25 23:35:58 -08:00
8ff134addd feat: License tab in Settings (activate/deactivate UI) + startup refresh 2026-02-25 23:08:20 -08:00
5739d1935b feat: wire license.effective_tier into tiers.py; add dev_override priority 2026-02-25 23:05:55 -08:00
52f912f938 feat: license.py client — verify_local, effective_tier, activate, refresh, report_usage 2026-02-25 22:53:11 -08:00
124b950ca3 fix: GPU detection + pdfplumber + pass GPU env vars into app container
- preflight.py now writes PEREGRINE_GPU_COUNT and PEREGRINE_GPU_NAMES to
  .env so the app container gets GPU info without needing nvidia-smi access
- compose.yml passes PEREGRINE_GPU_COUNT, PEREGRINE_GPU_NAMES, and
  RECOMMENDED_PROFILE as env vars to the app service
- 0_Setup.py _detect_gpus() reads PEREGRINE_GPU_NAMES env var first;
  falls back to nvidia-smi (bare / GPU-passthrough environments)
- 0_Setup.py _suggest_profile() reads RECOMMENDED_PROFILE env var first
- requirements.txt: add pdfplumber (needed for resume PDF parsing)
2026-02-25 21:58:28 -08:00
c3f3fa97a7 fix: add app/__init__.py so wizard submodule is importable inside Docker
Without __init__.py, Python treats app/ as a namespace package that
doesn't resolve correctly when running from WORKDIR /app inside the
container. 'from app.wizard.step_hardware import ...' raises
ModuleNotFoundError: No module named 'app.wizard'; 'app' is not a package
2026-02-25 21:41:09 -08:00
26fc97dfe5 fix: stub-port adoption — stubs bind free ports, app routes to external via host.docker.internal
Three inter-related fixes for the service adoption flow:
- preflight: stub_port field — adopted services get a free port for their
  no-op container (avoids binding conflict with external service on real port)
  while update_llm_yaml still uses the real external port for host.docker.internal URLs
- preflight: write_env now uses stub_port (not resolved) for adopted services
  so SEARXNG_PORT etc point to the stub's harmless port, not the occupied one
- preflight: stub containers use sleep infinity + CMD true healthcheck so
  depends_on: service_healthy is satisfied without holding any real port
- Makefile: finetune profile changed from [cpu,single-gpu,dual-gpu] to [finetune]
  so the pytorch/cuda base image is not built during make start
2026-02-25 21:38:23 -08:00
8e3f58cf46 fix: ollama docker_owned=True; finetune gets own profile to avoid build on start
- preflight: ollama was incorrectly marked docker_owned=False — Docker does
  define an ollama service, so external detection now correctly disables it
  via compose.override.yml when host Ollama is already running
- compose.yml: finetune moves from [cpu,single-gpu,dual-gpu] profiles to
  [finetune] profile so it is never built during 'make start' (pytorch/cuda
  base is 3.7GB+ and unnecessary for the UI)
- compose.yml: remove depends_on ollama from finetune — it reaches Ollama
  via OLLAMA_URL env var which works whether Ollama is Docker or host
- Makefile: finetune target uses --profile finetune + compose.gpu.yml overlay
2026-02-25 21:24:33 -08:00
2662bab1e6 feat: smart service adoption in preflight — use external services instead of conflicting
preflight.py now detects when a managed service (ollama, vllm, vision,
searxng) is already running on its configured port and adopts it rather
than reassigning or conflicting:

- Generates compose.override.yml disabling Docker containers for adopted
  services (profiles: [_external_] — a profile never passed via --profile)
- Rewrites config/llm.yaml base_url entries to host.docker.internal:<port>
  so the app container can reach host-side services through Docker's
  host-gateway mapping
- compose.yml: adds extra_hosts host.docker.internal:host-gateway to the
  app service (required on Linux; no-op on macOS Docker Desktop)
- .gitignore: excludes compose.override.yml (auto-generated, host-specific)

Only streamlit is non-adoptable and continues to reassign on conflict.
2026-02-25 19:23:02 -08:00
0174a5396d docs: use ./manage.sh setup in quickstart 2026-02-25 17:18:03 -08:00
d0371e8525 docs: update README — manage.sh CLI reference + correct Forgejo clone URL 2026-02-25 16:59:34 -08:00
3aac7b167f feat: add manage.sh — single CLI entry point for beta testers 2026-02-25 16:51:30 -08:00
5e63cd731c fix: fix dual-gpu port conflict + move GPU config to overlay files
- Remove ollama-gpu service (was colliding with ollama on port 11434)
- Strip inline deploy.resources GPU blocks from vision and vllm
- Add compose.gpu.yml: Docker NVIDIA overlay for ollama (GPU 0),
  vision (GPU 0), vllm (GPU 1), finetune (GPU 0)
- Fix compose.podman-gpu.yml: rename ollama-gpu → ollama to match
  service name after removal of ollama-gpu
- Update Makefile: apply compose.gpu.yml for Docker + GPU profiles
  (was only applying podman-gpu.yml for Podman + GPU profiles)
2026-02-25 16:44:59 -08:00
946924524d feat: wire fine-tune UI end-to-end + harden setup.sh
- setup.sh: replace docker-image-based NVIDIA test with nvidia-ctk validate
  (faster, no 100MB pull, no daemon required); add check_docker_running()
  to auto-start the Docker service on Linux or warn on macOS
- prepare_training_data.py: also scan training_data/uploads/*.{md,txt}
  so web-uploaded letters are included in training data
- task_runner.py: add prepare_training task type (calls build_records +
  write_jsonl inline; reports pair count in task result)
- Settings fine-tune tab: Step 1 accepts .md/.txt uploads; Step 2 Extract
  button submits prepare_training background task + shows status; Step 3
  shows make finetune command + live Ollama model status poller
2026-02-25 16:31:53 -08:00
feb7bab43e feat: containerize fine-tune pipeline (Dockerfile.finetune + make finetune)
- Dockerfile.finetune: PyTorch 2.3/CUDA 12.1 base + unsloth + training stack
- finetune_local.py: auto-register model via Ollama HTTP API after GGUF
  export; path-translate between finetune container mount and Ollama's view;
  update config/llm.yaml automatically; DOCS_DIR env override for Docker
- prepare_training_data.py: DOCS_DIR env override so make prepare-training
  works correctly inside the app container
- compose.yml: add finetune service (cpu/single-gpu/dual-gpu profiles);
  DOCS_DIR=/docs injected into app + finetune containers
- compose.podman-gpu.yml: CDI device override for finetune service
- Makefile: make prepare-training + make finetune targets
2026-02-25 16:22:48 -08:00
e94695ef1a feat: prompt for model weights directory during install
Interactive prompt lets users with split-drive setups point Ollama and
vLLM model dirs at a dedicated storage drive. Reads current .env value
as default so re-runs are idempotent. Skips prompts in non-interactive
(piped) mode. Creates the target directory immediately and updates .env
in-place via portable awk (Linux + macOS). Also simplifies next-steps
output since model paths are now configured at install time.
2026-02-25 16:08:14 -08:00
4e1748ca62 fix: repair beta installer path for Docker-first deployment
- llm.yaml + example: replace localhost URLs with Docker service names
  (ollama:11434, vllm:8000, vision:8002); replace personal model names
  (alex-cover-writer, llama3.1:8b) with llama3.2:3b
- user.yaml.example: update service hosts to Docker names (ollama, vllm,
  searxng) and searxng port from 8888 (host-mapped) to 8080 (internal)
- wizard step 5: fix hardcoded localhost defaults — wizard runs inside
  Docker, so service name defaults are required for connection tests to pass
- scrapers/companyScraper.py: bundle scraper so Dockerfile COPY succeeds
- setup.sh: remove host Ollama install (conflicts with Docker Ollama on
  port 11434); Docker entrypoint handles model download automatically
- README + setup.sh banner: add Circuit Forge mission statement
2026-02-25 16:03:10 -08:00
67aaf7c0b7 feat: add Ollama install + service start + model pull to setup.sh 2026-02-25 15:42:56 -08:00
11662dde4a feat: Podman support — auto-detect COMPOSE, CDI GPU override, podman-compose in setup.sh 2026-02-25 15:36:36 -08:00
f26f948377 docs: fix license server paths — dev under CircuitForge/, live at /devl/ 2026-02-25 15:28:32 -08:00
6258b9e34d docs: CircuitForge license server implementation plan (11 tasks) 2026-02-25 15:27:39 -08:00
bd326162f1 docs: CircuitForge license server design doc
RS256 JWT, FastAPI + SQLite, multi-product schema, offline-capable
client integration. Covers server, Peregrine client, deployment,
admin workflow, and testing strategy.
2026-02-25 15:21:07 -08:00
f08f1b16d0 docs: mark cover letter refinement complete in backlog + changelog 2026-02-25 14:44:50 -08:00
bdbbc06702 feat: cover letter iterative refinement — feedback UI + backend params
- generate() accepts previous_result + feedback; appends both to LLM prompt
- task_runner cover_letter handler parses params JSON, passes fields through
- Apply Workspace: "Refine with Feedback" expander with text area + Regenerate
  button; only shown when a draft exists; clears feedback after submitting
- 8 new tests (TestGenerateRefinement + TestTaskRunnerCoverLetterParams)
2026-02-25 14:44:20 -08:00
46d10f5daa docs: finalise Circuit Forge product suite naming + product brief 2026-02-25 14:16:56 -08:00
d8348e4906 docs: backlog — Circuit Forge product expansion (heinous tasks platform) 2026-02-25 14:02:07 -08:00
a149b65d5d docs: mark email sync test checklist complete 2026-02-25 13:56:55 -08:00
f9e974a957 test: complete email sync test coverage — 44 new tests across all checklist sections 2026-02-25 13:55:55 -08:00
f78ac24657 chore: mkdocs deps, CHANGELOG, remove dead Resume Editor page, backlog gap items 2026-02-25 13:51:13 -08:00
41019269a2 docs: LICENSE-MIT + LICENSE-BSL + updated README for 7-step wizard and current feature set 2026-02-25 12:06:28 -08:00
41c7954b9d docs: mkdocs wiki — installation, user guide, developer guide, reference
Adds a full MkDocs documentation site under docs/ with Material theme.

Getting Started: installation walkthrough, 7-step first-run wizard guide,
Docker Compose profile reference with GPU memory guidance and preflight.py
description.

User Guide: job discovery (search profiles, custom boards, enrichment),
job review (sorting, match scores, batch actions), apply workspace (cover
letter gen, PDF export, mark applied), interviews (kanban stages, company
research auto-trigger, survey assistant), email sync (IMAP, Gmail App
Password, classification labels, stage auto-updates), integrations (all 13
drivers with tier requirements), settings (every tab documented).

Developer Guide: contributing (dev env setup, code style, branch naming, PR
checklist), architecture (ASCII layer diagram, design decisions), adding
scrapers (full scrape() interface, registration, search profile config,
test patterns), adding integrations (IntegrationBase full interface, auto-
discovery, tier gating, test patterns), testing (patterns, fixtures, what
not to test).

Reference: tier system (full FEATURES table, can_use/tier_label API, dev
override, adding gates), LLM router (backend types, complete() signature,
fallback chains, vision routing, __auto__ resolution, adding backends),
config files (every file with field-level docs and gitignore status).

Also adds CONTRIBUTING.md at repo root pointing to the docs site.
2026-02-25 12:05:49 -08:00
85e8034093 docs: backlog — Ultra tier managed applications concept 2026-02-25 11:40:55 -08:00
09a4b38a99 feat: Integrations tab in Settings — connect/test/disconnect all 12 integration drivers 2026-02-25 11:30:44 -08:00
e1cc0e9210 refactor: move HF token to Developer tab — hidden from standard user UI 2026-02-25 11:04:13 -08:00
7efbf95840 feat: expanded first-run wizard — complete implementation
13-task implementation covering:
- UserProfile wizard fields (wizard_complete, wizard_step, tier, dev_tier_override,
  dismissed_banners, effective_tier) + params column in background_tasks
- Tier system: FEATURES gate, can_use(), tier_label() (app/wizard/tiers.py)
- Six pure validate() step modules (hardware, tier, identity, resume, inference, search)
- Resume parser: PDF (pdfplumber) + DOCX (python-docx) extraction + LLM structuring
- Integration base class + auto-discovery registry (scripts/integrations/)
- 13 integration drivers (Notion, Google Sheets, Airtable, Google Drive, Dropbox,
  OneDrive, MEGA, Nextcloud, Google Calendar, Apple Calendar, Slack, Discord,
  Home Assistant) + config/integrations/*.yaml.example files
- wizard_generate task type with 8 LLM generation sections + iterative refinement
  (previous_result + feedback support)
- step_integrations module: validate(), get_available(), is_connected()
- Wizard orchestrator rewrite (0_Setup.py): 7 steps, crash recovery, LLM polling
- app.py gate: checks wizard_complete flag in addition to file existence
- Home page: 13 dismissible contextual setup banners (wizard_complete-gated)
- Settings: Developer tab — tier override selectbox + wizard reset button

219 tests passing.
2026-02-25 10:54:24 -08:00
350591bc48 feat: Developer tab in Settings — tier override + wizard reset button 2026-02-25 10:50:14 -08:00
ca17994e00 feat: dismissible setup banners on Home page (13 contextual prompts) 2026-02-25 09:53:34 -08:00
fd215a22f6 feat: app.py checks wizard_complete flag to gate main app 2026-02-25 09:43:53 -08:00
1a74793804 feat: wizard orchestrator — 7 steps, LLM generation polling, crash recovery
Replaces the old 5-step wizard with a 7-step orchestrator that uses the
step modules built in Tasks 2-8. Steps 1-6 are mandatory (hardware, tier,
identity, resume, inference, search); step 7 (integrations) is optional.
Each Next click validates, writes wizard_step to user.yaml for crash recovery,
and resumes at the correct step on page reload. LLM generation buttons
submit wizard_generate tasks and poll via @st.fragment(run_every=3). Finish
sets wizard_complete=True, removes wizard_step, and calls apply_service_urls.

Adds tests/test_wizard_flow.py (7 tests) covering validate() chain, yaml
persistence helpers, and wizard state inference.
2026-02-25 09:10:51 -08:00
4c7f74c669 feat: step_integrations module with validate() + tier-filtered available list 2026-02-25 08:35:16 -08:00
4748cd3672 docs: backlog — cover letter iterative refinement feedback loop 2026-02-25 08:30:24 -08:00
51e48f8eee feat: wizard_generate — feedback + previous_result support for iterative refinement 2026-02-25 08:29:56 -08:00
9b0ca6457a feat: wizard_generate task type — 8 LLM generation sections 2026-02-25 08:25:17 -08:00
3f85c00359 docs: backlog — Podman support + FastAPI migration path 2026-02-25 08:22:24 -08:00
beb32e576d feat: 13 integration implementations + config examples
Add all 13 integration modules (Notion, Google Drive, Google Sheets,
Airtable, Dropbox, OneDrive, MEGA, Nextcloud, Google Calendar, Apple
Calendar/CalDAV, Slack, Discord, Home Assistant) with fields(), connect(),
and test() implementations. Add config/integrations/*.yaml.example files
and gitignore rules for live config files. Add 5 new registry/schema
tests bringing total to 193 passing.
2026-02-25 08:18:45 -08:00
d3b941134e feat: integration base class + auto-discovery registry 2026-02-25 08:13:14 -08:00
27112c7ed2 feat: resume parser — PDF/DOCX extraction + LLM structuring 2026-02-25 08:04:48 -08:00
0546c0e289 feat: wizard step validate() functions — all six mandatory steps 2026-02-25 08:00:18 -08:00
1dbb91dc31 feat: tier system with FEATURES gate + can_use() + tier_label() 2026-02-25 07:55:47 -08:00
edb169959a feat: wizard fields in UserProfile + params column in background_tasks
- Add tier, dev_tier_override, wizard_complete, wizard_step, dismissed_banners
  fields to UserProfile with defaults and effective_tier property
- Add params TEXT column to background_tasks table (CREATE + migration)
- Update insert_task() to accept params with params-aware dedup logic
- Update submit_task() and _run_task() to thread params through
- Add test_wizard_defaults, test_effective_tier_override,
  test_effective_tier_no_override, and test_insert_task_with_params
2026-02-25 07:27:14 -08:00
eac747d999 docs: expanded wizard implementation plan — 13 tasks, TDD throughout 2026-02-25 06:29:23 -08:00
5d2428f1b9 docs: expanded first-run wizard design
Architecture: wizard module system, mandatory 6-step flow, optional
home banners, tier gating (free/paid/premium + dev_tier_override),
resume upload/parse/builder, LLM generation via background tasks,
integrations registry pattern with 14 v1 services.
2026-02-24 21:30:05 -08:00
dc770d151b chore: add backlog.md + gitignore config/.backup-* dirs 2026-02-24 20:54:12 -08:00
e332b8a069 feat: startup preflight — port collision avoidance + resource checks
scripts/preflight.py (stdlib-only, no psutil):
- Port probing: owned services auto-reassign to next free port; external
  services (Ollama) show ✓ reachable / ⚠ not responding
- System resources: CPU cores, RAM (total + available), GPU VRAM via
  nvidia-smi; works on Linux + macOS
- Profile recommendation: remote / cpu / single-gpu / dual-gpu
- vLLM KV cache offload: calculates CPU_OFFLOAD_GB when VRAM < 10 GB
  free and RAM headroom > 4 GB (uses up to 25% of available headroom)
- Writes resolved values to .env for docker compose; single-service mode
  (--service streamlit) for scripted port queries
- Exit 0 unless an owned port genuinely can't be resolved

scripts/manage-ui.sh:
- Calls preflight.py --service streamlit before bind; falls back to
  pure-bash port scan if Python/yaml unavailable

compose.yml:
- vllm command: adds --cpu-offload-gb ${CPU_OFFLOAD_GB:-0}

Makefile:
- start / restart depend on preflight target
- PYTHON variable for env portability
- test target uses PYTHON variable
2026-02-24 20:36:16 -08:00
c7fb9a00f1 feat: migration tool + portable startup scripts
scripts/migrate.py:
- dry-run by default; --apply writes files; --copy-db migrates staging.db
- generates config/user.yaml from source repo's resume + cover letter scripts
- copies gitignored configs (notion, email, adzuna, craigslist, search profiles,
  resume keywords, blocklist, aihawk resume)
- merges fine-tuned model name from source llm.yaml into dest llm.yaml

scripts/manage-ui.sh:
- STREAMLIT_BIN no longer hardcoded; auto-resolves via conda env or PATH;
  override with STREAMLIT_BIN env var

scripts/manage-vllm.sh:
- VLLM_BIN and MODEL_DIR now read from env vars with portable defaults
2026-02-24 20:25:54 -08:00
7abf753469 feat: LGBTQIA+ focus + Phase 2/3 audit fixes
LGBTQIA+ inclusion section in research briefs:
- user_profile.py: add candidate_lgbtq_focus bool accessor
- user.yaml.example: add candidate_lgbtq_focus flag (default false)
- company_research.py: gate new LGBTQIA+ section behind flag; section
  count now dynamic (7 base + 1 per opt-in section, max 9)
- 2_Settings.py: add "Research Brief Preferences" expander with
  checkboxes for both accessibility and LGBTQIA+ focus flags;
  mission_preferences now round-trips through save (no silent drop)

Phase 2 fixes:
- manage-vllm.sh: MODEL_DIR and VLLM_BIN now read from env vars
  (VLLM_MODELS_DIR, VLLM_BIN) with portable defaults
- search_profiles.yaml: replace personal CS/TAM/Bay Area profiles
  with a documented generic starter profile

Phase 3 fix:
- llm.yaml: rename alex-cover-writer:latest → llama3.2:3b with
  inline comment for users to substitute their fine-tuned model;
  fix model-exclusion comment
2026-02-24 20:02:03 -08:00
cf185dfbaf fix: remove hardcoded personal values — Phase 1 audit findings
- 3_Resume_Editor.py: replace "Alex's" in docstring and caption
- user_profile.py: expose mission_preferences and candidate_accessibility_focus
- user.yaml.example: add mission_preferences section + candidate_accessibility_focus flag
- generate_cover_letter.py: build _MISSION_NOTES from user profile instead of
  hardcoded personal passion notes; falls back to generic defaults when not set
- company_research.py: gate "Inclusion & Accessibility" section behind
  candidate_accessibility_focus flag; section count adjusts (7 or 8) accordingly
2026-02-24 19:57:03 -08:00
633a7f2d1c feat: add cross-platform dependency installer and Makefile for Linux/macOS 2026-02-24 19:47:06 -08:00
af5237e3c2 feat: complete generalization — smoke tests, README, all personal refs extracted
- UserProfile class drives all personal data
- First-run wizard gates app until user.yaml exists
- Docker Compose stack: remote/cpu/single-gpu/dual-gpu profiles
- Vision service containerized (single-gpu/dual-gpu)
- All Alex/Library references removed from app and scripts
- Circuit Forge LLC / Peregrine branding throughout
2026-02-24 19:41:09 -08:00
f13c49d5f1 feat: add vision service to compose stack and fine-tune wizard tab to Settings
- Add moondream2 vision service to compose.yml (single-gpu + dual-gpu profiles)
- Create scripts/vision_service/Dockerfile for the vision container
- Add VISION_PORT, VISION_MODEL, VISION_REVISION vars to .env.example
- Add Vision Service entry to SERVICES list in Settings (hidden unless gpu profile active)
- Add Fine-Tune Wizard tab (Task 10) to Settings with 3-step upload→preview→train flow
- Tab is always rendered; shows info message when non-GPU profile is active
2026-02-24 19:37:55 -08:00
1a68b07076 feat: services tab uses docker compose commands and SSL-aware health checks
Replace hardcoded systemd/shell-script service commands with docker compose
profile-aware commands. Add inference_profile-based filtering (hidden flag
removes Ollama on remote profile, vLLM unless dual-gpu). Replace TCP socket
health check with HTTP-based _port_open() that accepts host/ssl/verify params
for remote/TLS-terminated service support.
2026-02-24 19:34:44 -08:00
aacde4f623 feat: add Docker Compose stack with remote/cpu/single-gpu/dual-gpu profiles 2026-02-24 19:31:57 -08:00
bb656194e1 fix: persist API keys to .env and write notion.yaml with field_map defaults in wizard 2026-02-24 19:24:51 -08:00
e40128e289 feat: first-run setup wizard gates app until user.yaml is created 2026-02-24 19:20:35 -08:00
46790a64d3 feat: add My Profile tab to Settings with full user.yaml editing and URL auto-generation 2026-02-24 19:16:31 -08:00
306c90c9da test: add ollama_research URL assertion to llm config generation test 2026-02-24 19:14:33 -08:00
33d3994fb8 feat: auto-generate llm.yaml base_url values from user profile services config 2026-02-24 19:10:54 -08:00
a8fa1eb115 feat: extract hard-coded personal references from all app pages via UserProfile 2026-02-24 19:00:47 -08:00
f28d91d4d7 fix: thread searxng URL through research functions via _SEARXNG_URL constant
- Add module-level _SEARXNG_URL derived from UserProfile.searxng_url (or default localhost:8888)
- Update all _searxng_running() call sites to pass _SEARXNG_URL explicitly
- Replace hardcoded "http://localhost:8888/" in _scrape_company() with _SEARXNG_URL + "/"
- Replace hardcoded "http://localhost:8888/search" in _run_search_query() with f"{_SEARXNG_URL}/search"
- Guard _profile.name.split() against empty string in finetune_local.py OLLAMA_NAME
2026-02-24 18:52:10 -08:00
af41d14241 feat: extract hard-coded personal references from all scripts via UserProfile
Replace hard-coded paths (/Library/Documents/JobSearch), names (Alex Rivera),
NDA sets (_NDA_COMPANIES), and the scraper path with UserProfile-driven lookups.
Update tests to be profile-agnostic (no user.yaml in peregrine config dir).
2026-02-24 18:45:39 -08:00
6493cf5c5b feat: add UserProfile class with service URL generation and NDA helpers 2026-02-24 18:29:45 -08:00
217 changed files with 24581 additions and 13480 deletions

20
.dockerignore Normal file
View file

@ -0,0 +1,20 @@
.git
__pycache__
*.pyc
*.pyo
staging.db
config/user.yaml
config/notion.yaml
config/email.yaml
config/tokens.yaml
config/craigslist.yaml
.streamlit.pid
.streamlit.log
aihawk/
docs/
tests/
.env
data/
log/
unsloth_compiled_cache/
resume_matcher/

38
.env.example Normal file
View file

@ -0,0 +1,38 @@
# .env.example — copy to .env
# Auto-generated by the setup wizard, or fill in manually.
# NEVER commit .env to git.
STREAMLIT_PORT=8501
OLLAMA_PORT=11434
VLLM_PORT=8000
SEARXNG_PORT=8888
VISION_PORT=8002
VISION_MODEL=vikhyatk/moondream2
VISION_REVISION=2025-01-09
DOCS_DIR=~/Documents/JobSearch
OLLAMA_MODELS_DIR=~/models/ollama
VLLM_MODELS_DIR=~/models/vllm
VLLM_MODEL=Ouro-1.4B
OLLAMA_DEFAULT_MODEL=llama3.2:3b
# API keys (required for remote profile)
ANTHROPIC_API_KEY=
OPENAI_COMPAT_URL=
OPENAI_COMPAT_KEY=
# Feedback button — Forgejo issue filing
FORGEJO_API_TOKEN=
FORGEJO_REPO=pyr0ball/peregrine
FORGEJO_API_URL=https://git.opensourcesolarpunk.com/api/v1
# GITHUB_TOKEN= # future — enable when public mirror is active
# GITHUB_REPO= # future
# Cloud multi-tenancy (compose.cloud.yml only — do not set for local installs)
CLOUD_MODE=false
CLOUD_DATA_ROOT=/devl/menagerie-data
DIRECTUS_JWT_SECRET= # must match website/.env DIRECTUS_SECRET value
CF_SERVER_SECRET= # random 64-char hex — generate: openssl rand -hex 32
PLATFORM_DB_URL=postgresql://cf_platform:<password>@host.docker.internal:5433/circuitforge_platform
HEIMDALL_URL=http://cf-license:8000 # internal Docker URL; override for external access
HEIMDALL_ADMIN_TOKEN= # must match ADMIN_TOKEN in circuitforge-license .env

View file

@ -0,0 +1,30 @@
---
name: Bug report
about: Something isn't working correctly
labels: bug
---
## Describe the bug
<!-- A clear description of what went wrong. -->
## Steps to reproduce
1.
2.
3.
## Expected behaviour
## Actual behaviour
<!-- Paste relevant log output below (redact any API keys or personal info): -->
```
## Environment
- Peregrine version: <!-- output of `./manage.sh status` or git tag -->
- OS:
- Runtime: Docker / conda-direct
- GPU profile: remote / cpu / single-gpu / dual-gpu

View file

@ -0,0 +1,26 @@
---
name: Feature request
about: Suggest an improvement or new capability
labels: enhancement
---
## Problem statement
<!-- What are you trying to do that's currently hard or impossible? -->
## Proposed solution
## Alternatives considered
## Which tier would this belong to?
- [ ] Free
- [ ] Paid
- [ ] Premium
- [ ] Ultra (human-in-the-loop)
- [ ] Not sure
## Would you be willing to contribute a PR?
- [ ] Yes
- [ ] No

32
.githooks/commit-msg Executable file
View file

@ -0,0 +1,32 @@
#!/usr/bin/env bash
# .githooks/commit-msg — enforces conventional commit format
# Format: type: description OR type(scope): description
set -euo pipefail
RED='\033[0;31m'; YELLOW='\033[1;33m'; NC='\033[0m'
VALID_TYPES="feat|fix|docs|chore|test|refactor|perf|ci|build"
MSG_FILE="$1"
MSG=$(head -1 "$MSG_FILE")
if [[ -z "${MSG// }" ]]; then
echo -e "${RED}Commit rejected:${NC} Commit message is empty."
exit 1
fi
if ! echo "$MSG" | grep -qE "^($VALID_TYPES)(\(.+\))?: .+"; then
echo -e "${RED}Commit rejected:${NC} Message does not follow conventional commit format."
echo ""
echo -e " Required: ${YELLOW}type: description${NC} or ${YELLOW}type(scope): description${NC}"
echo -e " Valid types: ${YELLOW}$VALID_TYPES${NC}"
echo ""
echo -e " Your message: ${YELLOW}$MSG${NC}"
echo ""
echo -e " Examples:"
echo -e " ${YELLOW}feat: add cover letter refinement${NC}"
echo -e " ${YELLOW}fix(wizard): handle missing user.yaml gracefully${NC}"
echo -e " ${YELLOW}docs: update tier system reference${NC}"
exit 1
fi
exit 0

84
.githooks/pre-commit Executable file
View file

@ -0,0 +1,84 @@
#!/usr/bin/env bash
# .githooks/pre-commit — blocks sensitive files and credential patterns from being committed
set -euo pipefail
RED='\033[0;31m'; YELLOW='\033[1;33m'; BOLD='\033[1m'; NC='\033[0m'
BLOCKED=0
STAGED=$(git diff --cached --name-only --diff-filter=ACM 2>/dev/null)
if [[ -z "$STAGED" ]]; then
exit 0
fi
# ── Blocked filenames ──────────────────────────────────────────────────────────
BLOCKED_FILES=(
".env"
".env.local"
".env.production"
".env.staging"
"*.pem"
"*.key"
"*.p12"
"*.pfx"
"id_rsa"
"id_ecdsa"
"id_ed25519"
"id_dsa"
"*.ppk"
"secrets.yml"
"secrets.yaml"
"credentials.json"
"service-account*.json"
"*.keystore"
"htpasswd"
".htpasswd"
)
while IFS= read -r file; do
filename="$(basename "$file")"
for pattern in "${BLOCKED_FILES[@]}"; do
# shellcheck disable=SC2254
case "$filename" in
$pattern)
echo -e "${RED}BLOCKED:${NC} ${BOLD}$file${NC} matches blocked filename pattern '${YELLOW}$pattern${NC}'"
BLOCKED=1
;;
esac
done
done <<< "$STAGED"
# ── Blocked content patterns ───────────────────────────────────────────────────
declare -A CONTENT_PATTERNS=(
["RSA/EC private key header"]="-----BEGIN (RSA|EC|DSA|OPENSSH) PRIVATE KEY"
["AWS access key"]="AKIA[0-9A-Z]{16}"
["GitHub token"]="ghp_[A-Za-z0-9]{36}"
["Generic API key assignment"]="(api_key|API_KEY|secret_key|SECRET_KEY)\s*=\s*['\"][A-Za-z0-9_\-]{16,}"
["Stripe secret key"]="sk_(live|test)_[A-Za-z0-9]{24,}"
["Forgejo/Gitea token (40 hex chars)"]="[a-f0-9]{40}"
)
while IFS= read -r file; do
# Skip binary files
if git diff --cached -- "$file" | grep -qP "^\+.*\x00"; then
continue
fi
for label in "${!CONTENT_PATTERNS[@]}"; do
pattern="${CONTENT_PATTERNS[$label]}"
matches=$(git diff --cached -- "$file" | grep "^+" | grep -cP "$pattern" 2>/dev/null || true)
if [[ "$matches" -gt 0 ]]; then
echo -e "${RED}BLOCKED:${NC} ${BOLD}$file${NC} contains pattern matching '${YELLOW}$label${NC}'"
BLOCKED=1
fi
done
done <<< "$STAGED"
# ── Result ─────────────────────────────────────────────────────────────────────
if [[ "$BLOCKED" -eq 1 ]]; then
echo ""
echo -e "${RED}Commit rejected.${NC} Remove sensitive files/content before committing."
echo -e "To bypass in an emergency: ${YELLOW}git commit --no-verify${NC} (use with extreme caution)"
exit 1
fi
exit 0

30
.github/ISSUE_TEMPLATE/bug_report.md vendored Normal file
View file

@ -0,0 +1,30 @@
---
name: Bug report
about: Something isn't working correctly
labels: bug
---
## Describe the bug
<!-- A clear description of what went wrong. -->
## Steps to reproduce
1.
2.
3.
## Expected behaviour
## Actual behaviour
<!-- Paste relevant log output below (redact any API keys or personal info): -->
```
## Environment
- Peregrine version: <!-- output of `./manage.sh status` or git tag -->
- OS:
- Runtime: Docker / conda-direct
- GPU profile: remote / cpu / single-gpu / dual-gpu

5
.github/ISSUE_TEMPLATE/config.yml vendored Normal file
View file

@ -0,0 +1,5 @@
blank_issues_enabled: false
contact_links:
- name: Security vulnerability
url: mailto:security@circuitforge.tech
about: Do not open a public issue for security vulnerabilities. Email us instead.

View file

@ -0,0 +1,26 @@
---
name: Feature request
about: Suggest an improvement or new capability
labels: enhancement
---
## Problem statement
<!-- What are you trying to do that's currently hard or impossible? -->
## Proposed solution
## Alternatives considered
## Which tier would this belong to?
- [ ] Free
- [ ] Paid
- [ ] Premium
- [ ] Ultra (human-in-the-loop)
- [ ] Not sure
## Would you be willing to contribute a PR?
- [ ] Yes
- [ ] No

View file

@ -0,0 +1,26 @@
---
name: Support Request
about: Ask a question or get help using Peregrine
title: '[Support] '
labels: question
assignees: ''
---
## What are you trying to do?
<!-- Describe what you're trying to accomplish -->
## What have you tried?
<!-- Steps you've already taken, docs you've read, etc. -->
## Environment
- OS: <!-- e.g. Ubuntu 22.04, macOS 14 -->
- Install method: <!-- Docker / Podman / source -->
- Peregrine version: <!-- run `./manage.sh status` or check the UI footer -->
- LLM backend: <!-- Ollama / vLLM / OpenAI / other -->
## Logs or screenshots
<!-- Paste relevant output from `./manage.sh logs` or attach a screenshot -->

27
.github/pull_request_template.md vendored Normal file
View file

@ -0,0 +1,27 @@
## Summary
<!-- What does this PR do? -->
## Related issue(s)
Closes #
## Type of change
- [ ] feat — new feature
- [ ] fix — bug fix
- [ ] docs — documentation only
- [ ] chore — tooling, deps, refactor
- [ ] test — test coverage
## Testing
<!-- What did you run to verify this works? -->
```bash
pytest tests/ -v
```
## CLA
- [ ] I agree that my contribution is licensed under the project's [BSL 1.1](./LICENSE-BSL) terms.

29
.github/workflows/ci.yml vendored Normal file
View file

@ -0,0 +1,29 @@
name: CI
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install system dependencies
run: sudo apt-get update -q && sudo apt-get install -y libsqlcipher-dev
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.11"
cache: pip
- name: Install dependencies
run: pip install -r requirements.txt
- name: Run tests
run: pytest tests/ -v --tb=short

30
.gitignore vendored
View file

@ -18,3 +18,33 @@ log/
unsloth_compiled_cache/
data/survey_screenshots/*
!data/survey_screenshots/.gitkeep
config/user.yaml
config/plain_text_resume.yaml
config/.backup-*
config/integrations/*.yaml
!config/integrations/*.yaml.example
# companyScraper runtime artifacts
scrapers/.cache/
scrapers/.debug/
scrapers/raw_scrapes/
compose.override.yml
config/license.json
config/user.yaml.working
# Claude context files — kept out of version control
CLAUDE.md
data/email_score.jsonl
data/email_label_queue.jsonl
data/email_compare_sample.jsonl
config/label_tool.yaml
config/server.yaml
demo/data/*.db
demo/seed_demo.py
# Git worktrees
.worktrees/

32
.gitleaks.toml Normal file
View file

@ -0,0 +1,32 @@
# peregrine/.gitleaks.toml — per-repo allowlists extending the shared base config
[extend]
path = "/Library/Development/CircuitForge/circuitforge-hooks/gitleaks.toml"
[allowlist]
description = "Peregrine-specific allowlists"
paths = [
'docs/plans/.*', # plan docs contain example tokens and placeholders
'docs/reference/.*', # reference docs (globally excluded in base config)
'tests/.*', # test fixtures use fake phone numbers as job IDs
'scripts/integrations/apple_calendar\.py', # you@icloud.com is a placeholder comment
# Streamlit app files: key= params are widget identifiers, not secrets
'app/feedback\.py',
'app/pages/2_Settings\.py',
'app/pages/7_Survey\.py',
# SearXNG default config: change-me-in-production is a well-known public placeholder
'docker/searxng/settings\.yml',
]
regexes = [
# Job listing numeric IDs (look like phone numbers to the phone rule)
'\d{10}\.html', # Craigslist listing IDs
'\d{10}\/', # LinkedIn job IDs in URLs
# Localhost port patterns (look like phone numbers)
'localhost:\d{4,5}',
# Unix epoch timestamps in the 20252026 range (10-digit, look like phone numbers)
'174\d{7}',
# Example / placeholder license key patterns
'CFG-[A-Z]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}',
# Phone number false positives: 555 area code variants not caught by base allowlist
'555\) \d{3}-\d{4}',
'555-\d{3}-\d{4}',
]

129
CHANGELOG.md Normal file
View file

@ -0,0 +1,129 @@
# Changelog
All notable changes to Peregrine are documented here.
Format follows [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
---
## [Unreleased]
---
## [0.4.0] — 2026-03-13
### Added
- **LinkedIn profile import** — one-click import from a public LinkedIn profile URL
(Playwright headless Chrome, no login required) or from a LinkedIn data export zip.
Staged to `linkedin_stage.json` so the profile is parsed once and reused across
sessions without repeated network requests. Available on all tiers including Free.
- `scripts/linkedin_utils.py` — HTML parser with ordered CSS selector fallbacks;
extracts name, experience, education, skills, certifications, summary
- `scripts/linkedin_scraper.py` — Playwright URL scraper + export zip CSV parser;
atomic staging file write; URL validation; robust error handling
- `scripts/linkedin_parser.py` — staging file reader; re-runs HTML parser on stored
raw HTML so selector improvements apply without re-scraping
- `app/components/linkedin_import.py` — shared Streamlit widget (status bar, preview,
URL import, advanced zip upload) used by both wizard and Settings
- Wizard step 3: new "🔗 LinkedIn" tab alongside Upload and Build Manually
- Settings → Resume Profile: collapsible "Import from LinkedIn" expander
- Dockerfile: Playwright Chromium install added to Docker image
### Fixed
- **Cloud mode perpetual onboarding loop** — wizard gate in `app.py` now reads
`get_config_dir()/user.yaml` (per-user in cloud, repo-level locally) instead of a
hardcoded repo path; completing the wizard now correctly exits it in cloud mode
- **Cloud resume YAML path** — wizard step 3 writes resume to per-user `CONFIG_DIR`
instead of the shared repo `config/` (would have merged all cloud users' data)
- **Cloud session redirect** — missing/invalid session token now JS-redirects to
`circuitforge.tech/login` instead of showing a raw error message
- Removed remaining AIHawk UI references (`Home.py`, `4_Apply.py`, `migrate.py`)
---
## [0.3.0] — 2026-03-06
### Added
- **Feedback button** — in-app issue reporting with screenshot paste support; posts
directly to Forgejo as structured issues; available from sidebar on all pages
(`app/feedback.py`, `scripts/feedback_api.py`, `app/components/paste_image.py`)
- **BYOK cloud backend detection**`scripts/byok_guard.py`: pure Python detection
engine with full unit test coverage (18 tests); classifies backends as cloud or local
based on type, `base_url` heuristic, and opt-out `local: true` flag
- **BYOK activation warning** — one-time acknowledgment required in Settings when a
new cloud LLM backend is enabled; shows data inventory (what leaves your machine,
what stays local), provider policy links; ack state persisted to `config/user.yaml`
under `byok_acknowledged_backends`
- **Sidebar cloud LLM indicator** — amber badge on every page when any cloud backend
is active; links to Settings; disappears when reverted to local-only config
- **LLM suggest: search terms** — three-angle analysis from resume (job titles,
skills keywords, and exclude terms to filter irrelevant listings)
- **LLM suggest: resume keywords** — skills gap analysis against job descriptions
- **LLM Suggest button** in Settings → Search → Skills & Keywords section
- **Backup/restore script** (`scripts/backup.py`) — multi-instance and legacy support
- `PRIVACY.md` — short-form privacy notice linked from Settings
### Changed
- Settings save button for LLM Backends now gates on cloud acknowledgment before
writing `config/llm.yaml`
### Fixed
- Settings widget crash on certain rerun paths
- Docker service controls in Settings → System tab
- `DEFAULT_DB` now respects `STAGING_DB` environment variable (was silently ignoring it)
- `generate()` in cover letter refinement now correctly passes `max_tokens` kwarg
### Security / Privacy
- Full test suite anonymized — fictional "Alex Rivera" replaces all real personal data
in test fixtures (`tests/test_cover_letter.py`, `test_imap_sync.py`,
`test_classifier_adapters.py`, `test_db.py`)
- Complete PII scrub from git history: real name, email address, and phone number
removed from all 161 commits across both branches via `git filter-repo`
---
## [0.2.0] — 2026-02-26
### Added
- Cover letter iterative refinement: "Refine with Feedback" expander in Apply Workspace;
`generate()` accepts `previous_result`/`feedback`; task params passed through `submit_task`
- Expanded first-run wizard: 7-step onboarding with GPU detection, tier selection,
resume upload/parsing, LLM inference test, search profile builder, integration cards
- Tier system: free / paid / premium feature gates (`app/wizard/tiers.py`)
- 13 integration drivers: Notion, Google Sheets, Airtable, Google Drive, Dropbox,
OneDrive, MEGA, Nextcloud, Google Calendar, Apple Calendar, Slack, Discord,
Home Assistant — with auto-discovery registry
- Resume parser: PDF (pdfplumber) and DOCX (python-docx) + LLM structuring
- `wizard_generate` background task type with iterative refinement (feedback loop)
- Dismissible setup banners on Home page (13 contextual prompts)
- Developer tab in Settings: tier override selectbox and wizard reset button
- Integrations tab in Settings: connect / test / disconnect all 12 non-Notion drivers
- HuggingFace token moved to Developer tab
- `params` column in `background_tasks` for wizard task payloads
- `wizard_complete`, `wizard_step`, `tier`, `dev_tier_override`, `dismissed_banners`,
`effective_tier` added to UserProfile
- MkDocs documentation site (Material theme, 20 pages)
- `LICENSE-MIT` and `LICENSE-BSL`, `CONTRIBUTING.md`, `CHANGELOG.md`
### Changed
- `app.py` wizard gate now checks `wizard_complete` flag in addition to file existence
- Settings tabs reorganised: Integrations tab added, Developer tab conditionally shown
- HF token removed from Services tab (now Developer-only)
### Removed
- Dead `app/pages/3_Resume_Editor.py` (functionality lives in Settings → Resume Profile)
---
## [0.1.0] — 2026-02-01
### Added
- Initial release: JobSpy discovery pipeline, SQLite staging, Streamlit UI
- Job Review, Apply Workspace, Interviews kanban, Interview Prep, Survey Assistant
- LLM router with fallback chain (Ollama, vLLM, Claude Code wrapper, Anthropic)
- Notion sync, email sync with IMAP classifier, company research with SearXNG
- Background task runner with daemon threads
- Vision service (moondream2) for survey screenshot analysis
- Adzuna, The Ladders, and Craigslist custom board scrapers
- Docker Compose profiles: remote, cpu, single-gpu, dual-gpu
- `setup.sh` cross-platform dependency installer
- `scripts/preflight.py` and `scripts/migrate.py`

212
CLAUDE.md
View file

@ -1,212 +0,0 @@
# Job Seeker Platform — Claude Context
## Project
Automated job discovery + resume matching + application pipeline for Alex Rivera.
Full pipeline:
```
JobSpy → discover.py → SQLite (staging.db) → match.py → Job Review UI
→ Apply Workspace (cover letter + PDF) → Interviews kanban
→ phone_screen → interviewing → offer → hired
Notion DB (synced via sync.py)
```
## Environment
- Python env: `conda run -n job-seeker <cmd>` — always use this, never bare python
- Run tests: `/devl/miniconda3/envs/job-seeker/bin/pytest tests/ -v`
(use direct binary — `conda run pytest` can spawn runaway processes)
- Run discovery: `conda run -n job-seeker python scripts/discover.py`
- Recreate env: `conda env create -f environment.yml`
- pytest.ini scopes test collection to `tests/` only — never widen this
## ⚠️ AIHawk env isolation — CRITICAL
- NEVER `pip install -r aihawk/requirements.txt` into the job-seeker env
- AIHawk pulls torch + CUDA (~7GB) which causes OOM during test runs
- AIHawk must run in its own env: `conda create -n aihawk-env python=3.12`
- job-seeker env must stay lightweight (no torch, no sentence-transformers, no CUDA)
## Web UI (Streamlit)
- Run: `bash scripts/manage-ui.sh start` → http://localhost:8501
- Manage: `start | stop | restart | status | logs`
- Direct binary: `/devl/miniconda3/envs/job-seeker/bin/streamlit run app/app.py`
- Entry point: `app/app.py` (uses `st.navigation()` — do NOT run `app/Home.py` directly)
- `staging.db` is gitignored — SQLite staging layer between discovery and Notion
### Pages
| Page | File | Purpose |
|------|------|---------|
| Home | `app/Home.py` | Dashboard, discovery trigger, danger-zone purge |
| Job Review | `app/pages/1_Job_Review.py` | Batch approve/reject with sorting |
| Settings | `app/pages/2_Settings.py` | LLM backends, search profiles, Notion, services |
| Resume Profile | Settings → Resume Profile tab | Edit AIHawk YAML profile (was standalone `3_Resume_Editor.py`) |
| Apply Workspace | `app/pages/4_Apply.py` | Cover letter gen + PDF export + mark applied + reject listing |
| Interviews | `app/pages/5_Interviews.py` | Kanban: phone_screen→interviewing→offer→hired |
| Interview Prep | `app/pages/6_Interview_Prep.py` | Live reference sheet during calls + Practice Q&A |
| Survey Assistant | `app/pages/7_Survey.py` | Culture-fit survey help: text paste + screenshot (moondream2) |
## Job Status Pipeline
```
pending → approved/rejected (Job Review)
approved → applied (Apply Workspace — mark applied)
approved → rejected (Apply Workspace — reject listing button)
applied → survey (Interviews — "📋 Survey" button; pre-kanban section)
applied → phone_screen (Interviews — triggers company research)
survey → phone_screen (Interviews — after survey completed)
phone_screen → interviewing
interviewing → offer
offer → hired
any stage → rejected (rejection_stage captured for analytics)
applied/approved → synced (sync.py → Notion)
```
## SQLite Schema (`staging.db`)
### `jobs` table key columns
- Standard: `id, title, company, url, source, location, is_remote, salary, description`
- Scores: `match_score, keyword_gaps`
- Dates: `date_found, applied_at, survey_at, phone_screen_at, interviewing_at, offer_at, hired_at`
- Interview: `interview_date, rejection_stage`
- Content: `cover_letter, notion_page_id`
### Additional tables
- `job_contacts` — email thread log per job (direction, subject, from/to, body, received_at)
- `company_research` — LLM-generated brief per job (company_brief, ceo_brief, talking_points, raw_output, accessibility_brief)
- `background_tasks` — async LLM task queue (task_type, job_id, status: queued/running/completed/failed)
- `survey_responses` — per-job Q&A pairs (survey_name, received_at, source, raw_input, image_path, mode, llm_output, reported_score)
## Scripts
| Script | Purpose |
|--------|---------|
| `scripts/discover.py` | JobSpy + custom board scrape → SQLite insert |
| `scripts/custom_boards/adzuna.py` | Adzuna Jobs API (app_id + app_key in config/adzuna.yaml) |
| `scripts/custom_boards/theladders.py` | The Ladders scraper via curl_cffi + __NEXT_DATA__ SSR parse |
| `scripts/match.py` | Resume keyword matching → match_score |
| `scripts/sync.py` | Push approved/applied jobs to Notion |
| `scripts/llm_router.py` | LLM fallback chain (reads config/llm.yaml) |
| `scripts/generate_cover_letter.py` | Cover letter via LLM; detects mission-aligned companies (music/animal welfare/education) and injects Para 3 hint |
| `scripts/company_research.py` | Pre-interview brief via LLM + optional SearXNG scrape; includes Inclusion & Accessibility section |
| `scripts/prepare_training_data.py` | Extract cover letter JSONL for fine-tuning |
| `scripts/finetune_local.py` | Unsloth QLoRA fine-tune on local GPU |
| `scripts/db.py` | All SQLite helpers (single source of truth) |
| `scripts/task_runner.py` | Background thread executor — `submit_task(db, type, job_id)` dispatches daemon threads for LLM jobs |
| `scripts/vision_service/main.py` | FastAPI moondream2 inference on port 8002; `manage-vision.sh` lifecycle |
## LLM Router
- Config: `config/llm.yaml`
- Cover letter fallback order: `claude_code → ollama (alex-cover-writer:latest) → vllm → copilot → anthropic`
- Research fallback order: `claude_code → vllm (__auto__, ouroboros) → ollama_research (llama3.1:8b) → ...`
- `alex-cover-writer:latest` is cover-letter only — it doesn't follow structured markdown prompts for research
- `LLMRouter.complete()` accepts `fallback_order=` override for per-task routing
- `LLMRouter.complete()` accepts `images: list[str]` (base64) — vision backends only; non-vision backends skipped when images present
- Vision fallback order config key: `vision_fallback_order: [vision_service, claude_code, anthropic]`
- `vision_service` backend type: POST to `/analyze`; skipped automatically when no images provided
- Claude Code wrapper: `/Library/Documents/Post Fight Processing/server-openai-wrapper-v2.js`
- Copilot wrapper: `/Library/Documents/Post Fight Processing/manage-copilot.sh start`
## Fine-Tuned Model
- Model: `alex-cover-writer:latest` registered in Ollama
- Base: `unsloth/Llama-3.2-3B-Instruct` (QLoRA, rank 16, 10 epochs)
- Training data: 62 cover letters from `/Library/Documents/JobSearch/`
- JSONL: `/Library/Documents/JobSearch/training_data/cover_letters.jsonl`
- Adapter: `/Library/Documents/JobSearch/training_data/finetune_output/adapter/`
- Merged: `/Library/Documents/JobSearch/training_data/gguf/alex-cover-writer/`
- Re-train: `conda run -n ogma python scripts/finetune_local.py`
(uses `ogma` env with unsloth + trl; pin to GPU 0 with `CUDA_VISIBLE_DEVICES=0`)
## Background Tasks
- Cover letter gen and company research run as daemon threads via `scripts/task_runner.py`
- Tasks survive page navigation; results written to existing tables when done
- On server restart, `app.py` startup clears any stuck `running`/`queued` rows to `failed`
- Dedup: only one queued/running task per `(task_type, job_id)` at a time
- Sidebar indicator (`app/app.py`) polls every 3s via `@st.fragment(run_every=3)`
- ⚠️ Streamlit fragment + sidebar: use `with st.sidebar: _fragment()` — sidebar context must WRAP the call, not be inside the fragment body
## Vision Service
- Script: `scripts/vision_service/main.py` (FastAPI, port 8002)
- Model: `vikhyatk/moondream2` revision `2025-01-09` — lazy-loaded on first `/analyze` (~1.8GB download)
- GPU: 4-bit quantization when CUDA available (~1.5GB VRAM); CPU fallback
- Conda env: `job-seeker-vision` — separate from job-seeker (torch + transformers live here)
- Create env: `conda env create -f scripts/vision_service/environment.yml`
- Manage: `bash scripts/manage-vision.sh start|stop|restart|status|logs`
- Survey page degrades gracefully to text-only when vision service is down
- ⚠️ Never install vision deps (torch, bitsandbytes, transformers) into the job-seeker env
## Company Research
- Script: `scripts/company_research.py`
- Auto-triggered when a job moves to `phone_screen` in the Interviews kanban
- Three-phase: (1) SearXNG company scrape → (1b) SearXNG news snippets → (2) LLM synthesis
- SearXNG scraper: `/Library/Development/scrapers/companyScraper.py`
- SearXNG Docker: run `docker compose up -d` from `/Library/Development/scrapers/SearXNG/` (port 8888)
- `beautifulsoup4` and `fake-useragent` are installed in job-seeker env (required for scraper)
- News search hits `/search?format=json` — JSON format must be enabled in `searxng-config/settings.yml`
- ⚠️ `settings.yml` owned by UID 977 (container user) — use `docker cp` to update, not direct writes
- ⚠️ `settings.yml` requires `use_default_settings: true` at the top or SearXNG fails schema validation
- `companyScraper` calls `sys.exit()` on missing deps — use `except BaseException` not `except Exception`
## Email Classifier Labels
Six labels: `interview_request`, `rejection`, `offer`, `follow_up`, `survey_received`, `other`
- `survey_received` — links or requests to complete a culture-fit survey/assessment
## Services (managed via Settings → Services tab)
| Service | Port | Notes |
|---------|------|-------|
| Streamlit UI | 8501 | `bash scripts/manage-ui.sh start` |
| Ollama | 11434 | `sudo systemctl start ollama` |
| Claude Code Wrapper | 3009 | `manage-services.sh start` in Post Fight Processing |
| GitHub Copilot Wrapper | 3010 | `manage-copilot.sh start` in Post Fight Processing |
| vLLM Server | 8000 | Manual start only |
| SearXNG | 8888 | `docker compose up -d` in scrapers/SearXNG/ |
| Vision Service | 8002 | `bash scripts/manage-vision.sh start` — moondream2 survey screenshot analysis |
## Notion
- DB: "Tracking Job Applications" (ID: `1bd75cff-7708-8007-8c00-f1de36620a0a`)
- `config/notion.yaml` is gitignored (live token); `.example` is committed
- Field names are non-obvious — always read from `field_map` in `config/notion.yaml`
- "Salary" = Notion title property (unusual — it's the page title field)
- "Job Source" = `multi_select` type
- "Role Link" = URL field
- "Status of Application" = status field; new listings use "Application Submitted"
- Sync pushes `approved` + `applied` jobs; marks them `synced` after
## Key Config Files
- `config/notion.yaml` — gitignored, has token + field_map
- `config/notion.yaml.example` — committed template
- `config/search_profiles.yaml` — titles, locations, boards, custom_boards, exclude_keywords, mission_tags (per profile)
- `config/llm.yaml` — LLM backend priority chain + enabled flags
- `config/tokens.yaml` — gitignored, stores HF token (chmod 600)
- `config/adzuna.yaml` — gitignored, Adzuna API app_id + app_key
- `config/adzuna.yaml.example` — committed template
## Custom Job Board Scrapers
- `scripts/custom_boards/adzuna.py` — Adzuna Jobs API; credentials in `config/adzuna.yaml`
- `scripts/custom_boards/theladders.py` — The Ladders SSR scraper; needs `curl_cffi` installed
- Scrapers registered in `CUSTOM_SCRAPERS` dict in `discover.py`
- Activated per-profile via `custom_boards: [adzuna, theladders]` in `search_profiles.yaml`
- `enrich_all_descriptions()` in `enrich_descriptions.py` covers all sources (not just Glassdoor)
- Home page "Fill Missing Descriptions" button dispatches `enrich_descriptions` task
## Mission Alignment & Accessibility
- Preferred industries: music, animal welfare, children's education (hardcoded in `generate_cover_letter.py`)
- `detect_mission_alignment(company, description)` injects a Para 3 hint into cover letters for aligned companies
- Company research includes an "Inclusion & Accessibility" section (8th section of the brief) in every brief
- Accessibility search query in `_SEARCH_QUERIES` hits SearXNG for ADA/ERG/disability signals
- `accessibility_brief` column in `company_research` table; shown in Interview Prep under ♿ section
- This info is for personal decision-making ONLY — never disclosed in applications
- In generalization: these become `profile.mission_industries` + `profile.accessibility_priority` in `user.yaml`
## Document Rule
Resumes and cover letters live in `/Library/Documents/JobSearch/` or Notion — never committed to this repo.
## AIHawk (LinkedIn Easy Apply)
- Cloned to `aihawk/` (gitignored)
- Config: `aihawk/data_folder/plain_text_resume.yaml` — search FILL_IN for gaps
- Self-ID: non-binary, pronouns any, no disability/drug-test disclosure
- Run: `conda run -n job-seeker python aihawk/main.py`
- Playwright: `conda run -n job-seeker python -m playwright install chromium`
## Git Remote
- Forgejo self-hosted at https://git.opensourcesolarpunk.com (username: pyr0ball)
- `git remote add origin https://git.opensourcesolarpunk.com/pyr0ball/job-seeker.git`
## Subagents
Use `general-purpose` subagent type (not `Bash`) when tasks require file writes.

83
CONTRIBUTING.md Normal file
View file

@ -0,0 +1,83 @@
# Contributing to Peregrine
Thanks for your interest. Peregrine is developed primarily at
[git.opensourcesolarpunk.com](https://git.opensourcesolarpunk.com/pyr0ball/peregrine).
GitHub and Codeberg are push mirrors — issues and PRs are welcome on either platform.
---
## License
Peregrine is licensed under **[BSL 1.1](./LICENSE-BSL)** — Business Source License.
What this means for you:
| Use case | Allowed? |
|----------|----------|
| Personal self-hosting, non-commercial | ✅ Free |
| Contributing code, fixing bugs, writing docs | ✅ Free |
| Commercial SaaS / hosted service | 🔒 Requires a paid license |
| After 4 years from each release date | ✅ Converts to MIT |
**By submitting a pull request you agree that your contribution is licensed under the
project's BSL 1.1 terms.** The PR template includes this as a checkbox.
---
## Dev Setup
See [`docs/getting-started/installation.md`](docs/getting-started/installation.md) for
full instructions.
**Quick start (Docker — recommended):**
```bash
git clone https://git.opensourcesolarpunk.com/pyr0ball/peregrine.git
cd peregrine
./setup.sh # installs deps, activates git hooks
./manage.sh start
```
**Conda (no Docker):**
```bash
conda run -n job-seeker pip install -r requirements.txt
streamlit run app/app.py
```
---
## Commit Format
Hooks enforce [Conventional Commits](https://www.conventionalcommits.org/):
```
type: short description
type(scope): short description
```
Valid types: `feat` `fix` `docs` `chore` `test` `refactor` `perf` `ci` `build`
The hook will tell you exactly what went wrong if your message is rejected.
---
## Pull Request Process
1. Fork and branch from `main`
2. Write tests first (we use `pytest`)
3. Run `pytest tests/ -v` — all tests must pass
4. Open a PR on GitHub or Codeberg
5. PRs are reviewed and cherry-picked to Forgejo (the canonical repo) — you don't need a Forgejo account
---
## Reporting Issues
Use the issue templates:
- **Bug** — steps to reproduce, version, OS, Docker or conda, logs
- **Feature** — problem statement, proposed solution, which tier it belongs to
**Security issues:** Do **not** open a public issue. Email `security@circuitforge.tech`.
See [SECURITY.md](./SECURITY.md).

30
Dockerfile Normal file
View file

@ -0,0 +1,30 @@
# Dockerfile
FROM python:3.11-slim
WORKDIR /app
# System deps for companyScraper (beautifulsoup4, fake-useragent, lxml) and PDF gen
# libsqlcipher-dev: required to build pysqlcipher3 (SQLCipher AES-256 encryption for cloud mode)
RUN apt-get update && apt-get install -y --no-install-recommends \
gcc libffi-dev curl libsqlcipher-dev \
&& rm -rf /var/lib/apt/lists/*
COPY requirements.txt .
# Install Python dependencies
RUN pip install --no-cache-dir -r requirements.txt
# Install Playwright browser (cached separately from Python deps so requirements
# changes don't bust the ~600900 MB Chromium layer and vice versa)
RUN playwright install chromium && playwright install-deps chromium
# Bundle companyScraper (company research web scraper)
COPY scrapers/ /app/scrapers/
COPY . .
EXPOSE 8501
CMD ["streamlit", "run", "app/app.py", \
"--server.port=8501", \
"--server.headless=true", \
"--server.fileWatcherType=none"]

38
Dockerfile.finetune Normal file
View file

@ -0,0 +1,38 @@
# Dockerfile.finetune — Cover letter LoRA fine-tuner (QLoRA via unsloth)
# Large image (~12-15 GB after build). Built once, cached on rebuilds.
# GPU strongly recommended. CPU fallback works but training is very slow.
#
# Tested base: pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime
# If your GPU requires a different CUDA version, change the FROM line and
# reinstall bitsandbytes for the matching CUDA (e.g. bitsandbytes-cuda121).
FROM pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime
WORKDIR /app
# Build tools needed by bitsandbytes CUDA kernels and unsloth
RUN apt-get update && apt-get install -y --no-install-recommends \
gcc g++ git libgomp1 \
&& rm -rf /var/lib/apt/lists/*
# Install training stack.
# unsloth detects CUDA version automatically from the base image.
RUN pip install --no-cache-dir \
"unsloth @ git+https://github.com/unslothai/unsloth.git" \
"datasets>=2.18" "trl>=0.8" peft transformers \
"bitsandbytes>=0.43.0" accelerate sentencepiece \
requests pyyaml
COPY scripts/ /app/scripts/
COPY config/ /app/config/
ENV PYTHONUNBUFFERED=1
# Pin to GPU 0; overridable at runtime with --env CUDA_VISIBLE_DEVICES=
ENV CUDA_VISIBLE_DEVICES=0
# Runtime env vars injected by compose.yml:
# OLLAMA_URL — Ollama API base (default: http://ollama:11434)
# OLLAMA_MODELS_MOUNT — finetune container's mount path for ollama models volume
# OLLAMA_MODELS_OLLAMA_PATH — Ollama container's mount path for same volume
# DOCS_DIR — cover letters + training data root (default: /docs)
ENTRYPOINT ["python", "scripts/finetune_local.py"]

26
LICENSE-BSL Normal file
View file

@ -0,0 +1,26 @@
Business Source License 1.1
Licensor: Circuit Forge LLC
Licensed Work: Peregrine — AI-powered job search pipeline
Copyright (c) 2026 Circuit Forge LLC
Additional Use Grant: You may use the Licensed Work for personal,
non-commercial job searching purposes only.
Change Date: 2030-01-01
Change License: MIT License
For the full Business Source License 1.1 text, see:
https://mariadb.com/bsl11/
---
This license applies to the following components of Peregrine:
- scripts/llm_router.py
- scripts/generate_cover_letter.py
- scripts/company_research.py
- scripts/task_runner.py
- scripts/resume_parser.py
- scripts/imap_sync.py
- scripts/vision_service/
- scripts/integrations/
- app/

35
LICENSE-MIT Normal file
View file

@ -0,0 +1,35 @@
MIT License
Copyright (c) 2026 Circuit Forge LLC
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
---
This license applies to the following components of Peregrine:
- scripts/discover.py
- scripts/custom_boards/
- scripts/match.py
- scripts/db.py
- scripts/migrate.py
- scripts/preflight.py
- scripts/user_profile.py
- setup.sh
- Makefile

84
Makefile Normal file
View file

@ -0,0 +1,84 @@
# Makefile — Peregrine convenience targets
# Usage: make <target>
.PHONY: setup preflight start stop restart logs test prepare-training finetune clean help
PROFILE ?= remote
PYTHON ?= python3
# Auto-detect container engine: prefer docker compose, fall back to podman
COMPOSE ?= $(shell \
command -v docker >/dev/null 2>&1 && docker compose version >/dev/null 2>&1 \
&& echo "docker compose" \
|| (command -v podman >/dev/null 2>&1 \
&& podman compose version >/dev/null 2>&1 \
&& echo "podman compose" \
|| echo "podman-compose"))
# GPU profiles require an overlay for NVIDIA device reservations.
# Docker uses deploy.resources (compose.gpu.yml); Podman uses CDI device specs (compose.podman-gpu.yml).
# Generate CDI spec for Podman first: sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
#
# NOTE: When explicit -f flags are used, Docker Compose does NOT auto-detect
# compose.override.yml. We must include it explicitly when present.
OVERRIDE_FILE := $(wildcard compose.override.yml)
COMPOSE_OVERRIDE := $(if $(OVERRIDE_FILE),-f compose.override.yml,)
DUAL_GPU_MODE ?= $(shell grep -m1 '^DUAL_GPU_MODE=' .env 2>/dev/null | cut -d= -f2 || echo ollama)
COMPOSE_FILES := -f compose.yml $(COMPOSE_OVERRIDE)
ifneq (,$(findstring podman,$(COMPOSE)))
ifneq (,$(findstring gpu,$(PROFILE)))
COMPOSE_FILES := -f compose.yml $(COMPOSE_OVERRIDE) -f compose.podman-gpu.yml
endif
else
ifneq (,$(findstring gpu,$(PROFILE)))
COMPOSE_FILES := -f compose.yml $(COMPOSE_OVERRIDE) -f compose.gpu.yml
endif
endif
ifeq ($(PROFILE),dual-gpu)
COMPOSE_FILES += --profile dual-gpu-$(DUAL_GPU_MODE)
endif
# 'remote' means base services only — no services are tagged 'remote' in compose.yml,
# so --profile remote is a no-op with Docker and a fatal error on old podman-compose.
# Only pass --profile for profiles that actually activate optional services.
PROFILE_ARG := $(if $(filter remote,$(PROFILE)),,--profile $(PROFILE))
setup: ## Install dependencies (Docker or Podman + NVIDIA toolkit)
@bash setup.sh
preflight: ## Check ports + system resources; write .env
@$(PYTHON) scripts/preflight.py
start: preflight ## Preflight check then start Peregrine (PROFILE=remote|cpu|single-gpu|dual-gpu)
$(COMPOSE) $(COMPOSE_FILES) $(PROFILE_ARG) up -d
stop: ## Stop all Peregrine services
$(COMPOSE) down
restart: ## Stop services, re-run preflight (ports now free), then start
$(COMPOSE) down
@$(PYTHON) scripts/preflight.py
$(COMPOSE) $(COMPOSE_FILES) $(PROFILE_ARG) up -d
logs: ## Tail app logs
$(COMPOSE) logs -f app
test: ## Run the test suite
@$(PYTHON) -m pytest tests/ -v
prepare-training: ## Scan docs_dir for cover letters and build training JSONL
$(COMPOSE) $(COMPOSE_FILES) run --rm app python scripts/prepare_training_data.py
finetune: ## Fine-tune your personal cover letter model (run prepare-training first)
@echo "Starting fine-tune (30-90 min on GPU, much longer on CPU)..."
$(COMPOSE) $(COMPOSE_FILES) -f compose.gpu.yml --profile finetune run --rm finetune
clean: ## Remove containers, images, and data volumes (DESTRUCTIVE)
@echo "WARNING: This will delete all Peregrine containers and data."
@read -p "Type 'yes' to confirm: " confirm && [ "$$confirm" = "yes" ]
$(COMPOSE) down --rmi local --volumes
help: ## Show this help
@grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | \
awk 'BEGIN {FS = ":.*?## "}; {printf " \033[36m%-12s\033[0m %s\n", $$1, $$2}'

7
PRIVACY.md Normal file
View file

@ -0,0 +1,7 @@
# Privacy Policy
CircuitForge LLC's privacy policy applies to this product and is published at:
**<https://circuitforge.tech/privacy>**
Last reviewed: March 2026.

206
README.md Normal file
View file

@ -0,0 +1,206 @@
# Peregrine
> **Primary development** happens at [git.opensourcesolarpunk.com](https://git.opensourcesolarpunk.com/pyr0ball/peregrine) — GitHub and Codeberg are push mirrors. Issues and PRs are welcome on either platform.
[![License: BSL 1.1](https://img.shields.io/badge/License-BSL_1.1-blue.svg)](./LICENSE-BSL)
[![CI](https://github.com/CircuitForge/peregrine/actions/workflows/ci.yml/badge.svg)](https://github.com/CircuitForge/peregrine/actions/workflows/ci.yml)
**AI-powered job search pipeline — by [Circuit Forge LLC](https://circuitforge.tech)**
> *"Don't be evil, for real and forever."*
Automates the full job search lifecycle: discovery → matching → cover letters → applications → interview prep.
Privacy-first, local-first. Your data never leaves your machine.
---
## Quick Start
**1. Clone and install dependencies** (Docker, NVIDIA toolkit if needed):
```bash
git clone https://git.opensourcesolarpunk.com/pyr0ball/peregrine
cd peregrine
./manage.sh setup
```
**2. Start Peregrine:**
```bash
./manage.sh start # remote profile (API-only, no GPU)
./manage.sh start --profile cpu # local Ollama (CPU, or Metal GPU on Apple Silicon — see below)
./manage.sh start --profile single-gpu # Ollama + Vision on GPU 0 (NVIDIA only)
./manage.sh start --profile dual-gpu # Ollama + Vision + vLLM (GPU 0 + 1) (NVIDIA only)
```
Or use `make` directly:
```bash
make start # remote profile
make start PROFILE=single-gpu
```
**3.** Open http://localhost:8501 — the setup wizard guides you through the rest.
> **macOS / Apple Silicon:** Docker Desktop must be running. For Metal GPU-accelerated inference, install Ollama natively before starting — `setup.sh` will prompt you to do this. See [Apple Silicon GPU](#apple-silicon-gpu) below.
> **Windows:** Not supported — use WSL2 with Ubuntu.
### Installing to `/opt` or other system directories
If you clone into a root-owned directory (e.g. `sudo git clone ... /opt/peregrine`), two things need fixing:
**1. Git ownership warning** (`fatal: detected dubious ownership`) — `./manage.sh setup` fixes this automatically. If you need git to work *before* running setup:
```bash
git config --global --add safe.directory /opt/peregrine
```
**2. Preflight write access** — preflight writes `.env` and `compose.override.yml` into the repo directory. Fix ownership once:
```bash
sudo chown -R $USER:$USER /opt/peregrine
```
After that, run everything without `sudo`.
### Podman
Podman is rootless by default — **no `sudo` needed.** `./manage.sh setup` will configure `podman-compose` if it isn't already present.
### Docker
After `./manage.sh setup`, log out and back in for docker group membership to take effect. Until then, prefix commands with `sudo`. After re-login, `sudo` is no longer required.
---
## Inference Profiles
| Profile | Services started | Use case |
|---------|-----------------|----------|
| `remote` | app + searxng | No GPU; LLM calls go to Anthropic / OpenAI |
| `cpu` | app + ollama + searxng | No GPU; local models on CPU. On Apple Silicon, use with native Ollama for Metal acceleration — see below. |
| `single-gpu` | app + ollama + vision + searxng | One **NVIDIA** GPU: cover letters, research, vision |
| `dual-gpu` | app + ollama + vllm + vision + searxng | Two **NVIDIA** GPUs: GPU 0 = Ollama, GPU 1 = vLLM |
### Apple Silicon GPU
Docker Desktop on macOS runs in a Linux VM — it cannot access the Apple GPU. Metal-accelerated inference requires Ollama to run **natively** on the host.
`setup.sh` handles this automatically: it offers to install Ollama via Homebrew, starts it as a background service, and explains what happens next. If Ollama is running on port 11434 when you start Peregrine, preflight detects it, stubs out the Docker Ollama container, and routes inference through the native process — which uses Metal automatically.
To do it manually:
```bash
brew install ollama
brew services start ollama # starts at login, uses Metal GPU
./manage.sh start --profile cpu # preflight adopts native Ollama; Docker container is skipped
```
The `cpu` profile label is a slight misnomer in this context — Ollama will be running on the GPU. `single-gpu` and `dual-gpu` profiles are NVIDIA-specific and not applicable on Mac.
---
## First-Run Wizard
On first launch the setup wizard walks through seven steps:
1. **Hardware** — detects NVIDIA GPUs (Linux) or Apple Silicon GPU (macOS) and recommends a profile
2. **Tier** — choose free, paid, or premium (or use `dev_tier_override` for local testing)
3. **Identity** — name, email, phone, LinkedIn, career summary
4. **Resume** — upload a PDF/DOCX for LLM parsing, or use the guided form builder
5. **Inference** — configure LLM backends and API keys
6. **Search** — job titles, locations, boards, keywords, blocklist
7. **Integrations** — optional cloud storage, calendar, and notification services
Wizard state is saved after each step — a crash or browser close resumes where you left off.
Re-enter the wizard any time via **Settings → Developer → Reset wizard**.
---
## Features
| Feature | Tier |
|---------|------|
| Job discovery (JobSpy + custom boards) | Free |
| Resume keyword matching & gap analysis | Free |
| Document storage sync (Google Drive, Dropbox, OneDrive, MEGA, Nextcloud) | Free |
| Webhook notifications (Discord, Home Assistant) | Free |
| **Cover letter generation** | Free with LLM¹ |
| **Company research briefs** | Free with LLM¹ |
| **Interview prep & practice Q&A** | Free with LLM¹ |
| **Survey assistant** (culture-fit Q&A, screenshot analysis) | Free with LLM¹ |
| **AI wizard helpers** (career summary, bullet expansion, skill suggestions) | Free with LLM¹ |
| Managed cloud LLM (no API key needed) | Paid |
| Email sync & auto-classification | Paid |
| Job tracking integrations (Notion, Airtable, Google Sheets) | Paid |
| Calendar sync (Google, Apple) | Paid |
| Slack notifications | Paid |
| CircuitForge shared cover-letter model | Paid |
| Cover letter model fine-tuning (your writing, your model) | Premium |
| Multi-user support | Premium |
¹ **BYOK unlock:** configure any LLM backend — a local [Ollama](https://ollama.com) or vLLM instance,
or your own API key (Anthropic, OpenAI-compatible) — and all AI features marked **Free with LLM**
unlock at no charge. The paid tier earns its price by providing managed cloud inference so you
don't need a key at all, plus integrations and email sync.
---
## Email Sync
Monitors your inbox for job-related emails and automatically updates job stages (interview requests, rejections, survey links, offers).
Configure in **Settings → Email**. Requires IMAP access and, for Gmail, an App Password.
---
## Integrations
Connect external services in **Settings → Integrations**:
- **Job tracking:** Notion, Airtable, Google Sheets
- **Document storage:** Google Drive, Dropbox, OneDrive, MEGA, Nextcloud
- **Calendar:** Google Calendar, Apple Calendar (CalDAV)
- **Notifications:** Slack, Discord (webhook), Home Assistant
---
## CLI Reference (`manage.sh`)
`manage.sh` is the single entry point for all common operations — no need to remember Make targets or Docker commands.
```
./manage.sh setup Install Docker/Podman + NVIDIA toolkit
./manage.sh start [--profile P] Preflight check then start services
./manage.sh stop Stop all services
./manage.sh restart Restart all services
./manage.sh status Show running containers
./manage.sh logs [service] Tail logs (default: app)
./manage.sh update Pull latest images + rebuild app container
./manage.sh preflight Check ports + resources; write .env
./manage.sh test Run test suite
./manage.sh prepare-training Scan docs for cover letters → training JSONL
./manage.sh finetune Run LoRA fine-tune (needs --profile single-gpu+)
./manage.sh open Open the web UI in your browser
./manage.sh clean Remove containers, images, volumes (asks to confirm)
```
---
## Developer Docs
Full documentation at: https://docs.circuitforge.tech/peregrine
- [Installation guide](https://docs.circuitforge.tech/peregrine/getting-started/installation/)
- [Adding a custom job board scraper](https://docs.circuitforge.tech/peregrine/developer-guide/adding-scrapers/)
- [Adding an integration](https://docs.circuitforge.tech/peregrine/developer-guide/adding-integrations/)
- [Contributing](https://docs.circuitforge.tech/peregrine/developer-guide/contributing/)
---
## License
Core discovery pipeline: [MIT](LICENSE-MIT)
AI features (cover letter generation, company research, interview prep, UI): [BSL 1.1](LICENSE-BSL)
© 2026 Circuit Forge LLC

26
SECURITY.md Normal file
View file

@ -0,0 +1,26 @@
# Security Policy
## Reporting a Vulnerability
**Do not open a GitHub or Codeberg issue for security vulnerabilities.**
Email: `security@circuitforge.tech`
Include:
- A description of the vulnerability
- Steps to reproduce
- Potential impact
- Any suggested fix (optional)
**Response target:** 72 hours for acknowledgement, 14 days for triage.
We follow responsible disclosure — we will coordinate a fix and release before any
public disclosure and will credit you in the release notes unless you prefer to remain
anonymous.
## Supported Versions
| Version | Supported |
|---------|-----------|
| Latest release | ✅ |
| Older releases | ❌ — please upgrade |

View file

@ -8,15 +8,81 @@ import sys
from pathlib import Path
import streamlit as st
import yaml
sys.path.insert(0, str(Path(__file__).parent.parent))
from scripts.db import DEFAULT_DB, init_db, get_job_counts, purge_jobs, purge_email_data, \
from scripts.user_profile import UserProfile
_USER_YAML = Path(__file__).parent.parent / "config" / "user.yaml"
_profile = UserProfile(_USER_YAML) if UserProfile.exists(_USER_YAML) else None
_name = _profile.name if _profile else "Job Seeker"
from scripts.db import init_db, get_job_counts, purge_jobs, purge_email_data, \
purge_non_remote, archive_jobs, kill_stuck_tasks, get_task_for_job, get_active_tasks, \
insert_job, get_existing_urls
from scripts.task_runner import submit_task
from app.cloud_session import resolve_session, get_db_path
init_db(DEFAULT_DB)
resolve_session("peregrine")
init_db(get_db_path())
def _email_configured() -> bool:
_e = Path(__file__).parent.parent / "config" / "email.yaml"
if not _e.exists():
return False
import yaml as _yaml
_cfg = _yaml.safe_load(_e.read_text()) or {}
return bool(_cfg.get("username") or _cfg.get("user") or _cfg.get("imap_host"))
def _notion_configured() -> bool:
_n = Path(__file__).parent.parent / "config" / "notion.yaml"
if not _n.exists():
return False
import yaml as _yaml
_cfg = _yaml.safe_load(_n.read_text()) or {}
return bool(_cfg.get("token"))
def _keywords_configured() -> bool:
_k = Path(__file__).parent.parent / "config" / "resume_keywords.yaml"
if not _k.exists():
return False
import yaml as _yaml
_cfg = _yaml.safe_load(_k.read_text()) or {}
return bool(_cfg.get("keywords") or _cfg.get("required") or _cfg.get("preferred"))
_SETUP_BANNERS = [
{"key": "connect_cloud", "text": "Connect a cloud service for resume/cover letter storage",
"link_label": "Settings → Integrations",
"done": _notion_configured},
{"key": "setup_email", "text": "Set up email sync to catch recruiter outreach",
"link_label": "Settings → Email",
"done": _email_configured},
{"key": "setup_email_labels", "text": "Set up email label filters for auto-classification",
"link_label": "Settings → Email (label guide)",
"done": _email_configured},
{"key": "tune_mission", "text": "Tune your mission preferences for better cover letters",
"link_label": "Settings → My Profile"},
{"key": "configure_keywords", "text": "Configure keywords and blocklist for smarter search",
"link_label": "Settings → Search",
"done": _keywords_configured},
{"key": "upload_corpus", "text": "Upload your cover letter corpus for voice fine-tuning",
"link_label": "Settings → Fine-Tune"},
{"key": "configure_linkedin", "text": "Configure LinkedIn Easy Apply automation",
"link_label": "Settings → Integrations"},
{"key": "setup_searxng", "text": "Set up company research with SearXNG",
"link_label": "Settings → Services"},
{"key": "target_companies", "text": "Build a target company list for focused outreach",
"link_label": "Settings → Search"},
{"key": "setup_notifications", "text": "Set up notifications for stage changes",
"link_label": "Settings → Integrations"},
{"key": "tune_model", "text": "Tune a custom cover letter model on your writing",
"link_label": "Settings → Fine-Tune"},
{"key": "review_training", "text": "Review and curate training data for model tuning",
"link_label": "Settings → Fine-Tune"},
{"key": "setup_calendar", "text": "Set up calendar sync to track interview dates",
"link_label": "Settings → Integrations"},
]
def _dismissible(key: str, status: str, msg: str) -> None:
@ -64,7 +130,7 @@ def _queue_url_imports(db_path: Path, urls: list) -> int:
return queued
st.title("🔍 Alex's Job Search")
st.title(f"🔍 {_name}'s Job Search")
st.caption("Discover → Review → Sync to Notion")
st.divider()
@ -72,7 +138,7 @@ st.divider()
@st.fragment(run_every=10)
def _live_counts():
counts = get_job_counts(DEFAULT_DB)
counts = get_job_counts(get_db_path())
col1, col2, col3, col4, col5 = st.columns(5)
col1.metric("Pending Review", counts.get("pending", 0))
col2.metric("Approved", counts.get("approved", 0))
@ -91,18 +157,18 @@ with left:
st.subheader("Find New Jobs")
st.caption("Scrapes all configured boards and adds new listings to your review queue.")
_disc_task = get_task_for_job(DEFAULT_DB, "discovery", 0)
_disc_task = get_task_for_job(get_db_path(), "discovery", 0)
_disc_running = _disc_task and _disc_task["status"] in ("queued", "running")
if st.button("🚀 Run Discovery", use_container_width=True, type="primary",
disabled=bool(_disc_running)):
submit_task(DEFAULT_DB, "discovery", 0)
submit_task(get_db_path(), "discovery", 0)
st.rerun()
if _disc_running:
@st.fragment(run_every=4)
def _disc_status():
t = get_task_for_job(DEFAULT_DB, "discovery", 0)
t = get_task_for_job(get_db_path(), "discovery", 0)
if t and t["status"] in ("queued", "running"):
lbl = "Queued…" if t["status"] == "queued" else "Scraping job boards… this may take a minute"
st.info(f"{lbl}")
@ -120,18 +186,18 @@ with enrich_col:
st.subheader("Enrich Descriptions")
st.caption("Re-fetch missing descriptions for any listing (LinkedIn, Indeed, Glassdoor, Adzuna, The Ladders, generic).")
_enrich_task = get_task_for_job(DEFAULT_DB, "enrich_descriptions", 0)
_enrich_task = get_task_for_job(get_db_path(), "enrich_descriptions", 0)
_enrich_running = _enrich_task and _enrich_task["status"] in ("queued", "running")
if st.button("🔍 Fill Missing Descriptions", use_container_width=True, type="primary",
disabled=bool(_enrich_running)):
submit_task(DEFAULT_DB, "enrich_descriptions", 0)
submit_task(get_db_path(), "enrich_descriptions", 0)
st.rerun()
if _enrich_running:
@st.fragment(run_every=4)
def _enrich_status():
t = get_task_for_job(DEFAULT_DB, "enrich_descriptions", 0)
t = get_task_for_job(get_db_path(), "enrich_descriptions", 0)
if t and t["status"] in ("queued", "running"):
st.info("⏳ Fetching descriptions…")
else:
@ -146,10 +212,10 @@ with enrich_col:
with mid:
unscored = sum(1 for j in __import__("scripts.db", fromlist=["get_jobs_by_status"])
.get_jobs_by_status(DEFAULT_DB, "pending")
.get_jobs_by_status(get_db_path(), "pending")
if j.get("match_score") is None and j.get("description"))
st.subheader("Score Listings")
st.caption(f"Run TF-IDF match scoring against Alex's resume. {unscored} pending job{'s' if unscored != 1 else ''} unscored.")
st.caption(f"Run TF-IDF match scoring against {_name}'s resume. {unscored} pending job{'s' if unscored != 1 else ''} unscored.")
if st.button("📊 Score All Unscored Jobs", use_container_width=True, type="primary",
disabled=unscored == 0):
with st.spinner("Scoring…"):
@ -167,7 +233,7 @@ with mid:
st.rerun()
with right:
approved_count = get_job_counts(DEFAULT_DB).get("approved", 0)
approved_count = get_job_counts(get_db_path()).get("approved", 0)
st.subheader("Send to Notion")
st.caption("Push all approved jobs to your Notion tracking database.")
if approved_count == 0:
@ -179,7 +245,7 @@ with right:
):
with st.spinner("Syncing to Notion…"):
from scripts.sync import sync_to_notion
count = sync_to_notion(DEFAULT_DB)
count = sync_to_notion(get_db_path())
st.success(f"Synced {count} job{'s' if count != 1 else ''} to Notion!")
st.rerun()
@ -194,18 +260,18 @@ with email_left:
"New recruiter outreach is added to your Job Review queue.")
with email_right:
_email_task = get_task_for_job(DEFAULT_DB, "email_sync", 0)
_email_task = get_task_for_job(get_db_path(), "email_sync", 0)
_email_running = _email_task and _email_task["status"] in ("queued", "running")
if st.button("📧 Sync Emails", use_container_width=True, type="primary",
disabled=bool(_email_running)):
submit_task(DEFAULT_DB, "email_sync", 0)
submit_task(get_db_path(), "email_sync", 0)
st.rerun()
if _email_running:
@st.fragment(run_every=4)
def _email_status():
t = get_task_for_job(DEFAULT_DB, "email_sync", 0)
t = get_task_for_job(get_db_path(), "email_sync", 0)
if t and t["status"] in ("queued", "running"):
st.info("⏳ Syncing emails…")
else:
@ -240,7 +306,7 @@ with url_tab:
disabled=not (url_text or "").strip()):
_urls = [u.strip() for u in url_text.strip().splitlines() if u.strip().startswith("http")]
if _urls:
_n = _queue_url_imports(DEFAULT_DB, _urls)
_n = _queue_url_imports(get_db_path(), _urls)
if _n:
st.success(f"Queued {_n} job{'s' if _n != 1 else ''} for import. Check Job Review shortly.")
else:
@ -263,7 +329,7 @@ with csv_tab:
if _csv_urls:
st.caption(f"Found {len(_csv_urls)} URL(s) in CSV.")
if st.button("📥 Import CSV Jobs", key="add_csv_btn", use_container_width=True):
_n = _queue_url_imports(DEFAULT_DB, _csv_urls)
_n = _queue_url_imports(get_db_path(),_csv_urls)
st.success(f"Queued {_n} job{'s' if _n != 1 else ''} for import.")
st.rerun()
else:
@ -273,7 +339,7 @@ with csv_tab:
@st.fragment(run_every=3)
def _scrape_status():
import sqlite3 as _sq
conn = _sq.connect(DEFAULT_DB)
conn = _sq.connect(get_db_path())
conn.row_factory = _sq.Row
rows = conn.execute(
"""SELECT bt.status, bt.error, j.title, j.company, j.url
@ -320,7 +386,7 @@ with st.expander("⚠️ Danger Zone", expanded=False):
st.warning("Are you sure? This cannot be undone.")
c1, c2 = st.columns(2)
if c1.button("Yes, purge", type="primary", use_container_width=True):
deleted = purge_jobs(DEFAULT_DB, statuses=["pending", "rejected"])
deleted = purge_jobs(get_db_path(), statuses=["pending", "rejected"])
st.success(f"Purged {deleted} jobs.")
st.session_state.pop("confirm_purge", None)
st.rerun()
@ -338,7 +404,7 @@ with st.expander("⚠️ Danger Zone", expanded=False):
st.warning("This deletes all email contacts and email-sourced jobs. Cannot be undone.")
c1, c2 = st.columns(2)
if c1.button("Yes, purge emails", type="primary", use_container_width=True):
contacts, jobs = purge_email_data(DEFAULT_DB)
contacts, jobs = purge_email_data(get_db_path())
st.success(f"Purged {contacts} email contacts, {jobs} email jobs.")
st.session_state.pop("confirm_purge", None)
st.rerun()
@ -347,11 +413,11 @@ with st.expander("⚠️ Danger Zone", expanded=False):
st.rerun()
with tasks_col:
_active = get_active_tasks(DEFAULT_DB)
_active = get_active_tasks(get_db_path())
st.markdown("**Kill stuck tasks**")
st.caption(f"Force-fail all queued/running background tasks. Currently **{len(_active)}** active.")
if st.button("⏹ Kill All Tasks", use_container_width=True, disabled=len(_active) == 0):
killed = kill_stuck_tasks(DEFAULT_DB)
killed = kill_stuck_tasks(get_db_path())
st.success(f"Killed {killed} task(s).")
st.rerun()
@ -365,8 +431,8 @@ with st.expander("⚠️ Danger Zone", expanded=False):
st.warning("This will delete ALL pending, approved, and rejected jobs, then re-scrape. Applied and synced records are kept.")
c1, c2 = st.columns(2)
if c1.button("Yes, wipe + scrape", type="primary", use_container_width=True):
purge_jobs(DEFAULT_DB, statuses=["pending", "approved", "rejected"])
submit_task(DEFAULT_DB, "discovery", 0)
purge_jobs(get_db_path(), statuses=["pending", "approved", "rejected"])
submit_task(get_db_path(), "discovery", 0)
st.session_state.pop("confirm_purge", None)
st.rerun()
if c2.button("Cancel ", use_container_width=True):
@ -387,7 +453,7 @@ with st.expander("⚠️ Danger Zone", expanded=False):
st.warning("Deletes all pending jobs. Rejected jobs are kept. Cannot be undone.")
c1, c2 = st.columns(2)
if c1.button("Yes, purge pending", type="primary", use_container_width=True):
deleted = purge_jobs(DEFAULT_DB, statuses=["pending"])
deleted = purge_jobs(get_db_path(), statuses=["pending"])
st.success(f"Purged {deleted} pending jobs.")
st.session_state.pop("confirm_purge", None)
st.rerun()
@ -405,7 +471,7 @@ with st.expander("⚠️ Danger Zone", expanded=False):
st.warning("Deletes all non-remote jobs not yet applied to. Cannot be undone.")
c1, c2 = st.columns(2)
if c1.button("Yes, purge on-site", type="primary", use_container_width=True):
deleted = purge_non_remote(DEFAULT_DB)
deleted = purge_non_remote(get_db_path())
st.success(f"Purged {deleted} non-remote jobs.")
st.session_state.pop("confirm_purge", None)
st.rerun()
@ -423,7 +489,7 @@ with st.expander("⚠️ Danger Zone", expanded=False):
st.warning("Deletes all approved-but-not-applied jobs. Cannot be undone.")
c1, c2 = st.columns(2)
if c1.button("Yes, purge approved", type="primary", use_container_width=True):
deleted = purge_jobs(DEFAULT_DB, statuses=["approved"])
deleted = purge_jobs(get_db_path(), statuses=["approved"])
st.success(f"Purged {deleted} approved jobs.")
st.session_state.pop("confirm_purge", None)
st.rerun()
@ -448,7 +514,7 @@ with st.expander("⚠️ Danger Zone", expanded=False):
st.info("Jobs will be archived (not deleted) — URLs are kept for dedup.")
c1, c2 = st.columns(2)
if c1.button("Yes, archive", type="primary", use_container_width=True):
archived = archive_jobs(DEFAULT_DB, statuses=["pending", "rejected"])
archived = archive_jobs(get_db_path(), statuses=["pending", "rejected"])
st.success(f"Archived {archived} jobs.")
st.session_state.pop("confirm_purge", None)
st.rerun()
@ -466,10 +532,38 @@ with st.expander("⚠️ Danger Zone", expanded=False):
st.info("Approved jobs will be archived (not deleted).")
c1, c2 = st.columns(2)
if c1.button("Yes, archive approved", type="primary", use_container_width=True):
archived = archive_jobs(DEFAULT_DB, statuses=["approved"])
archived = archive_jobs(get_db_path(), statuses=["approved"])
st.success(f"Archived {archived} approved jobs.")
st.session_state.pop("confirm_purge", None)
st.rerun()
if c2.button("Cancel ", use_container_width=True):
st.session_state.pop("confirm_purge", None)
st.rerun()
# ── Setup banners ─────────────────────────────────────────────────────────────
if _profile and _profile.wizard_complete:
_dismissed = set(_profile.dismissed_banners)
_pending_banners = [
b for b in _SETUP_BANNERS
if b["key"] not in _dismissed and not b.get("done", lambda: False)()
]
if _pending_banners:
st.divider()
st.markdown("#### Finish setting up Peregrine")
for banner in _pending_banners:
_bcol, _bdismiss = st.columns([10, 1])
with _bcol:
_ic, _lc = st.columns([3, 1])
_ic.info(f"💡 {banner['text']}")
with _lc:
st.write("")
st.page_link("pages/2_Settings.py", label=banner['link_label'], icon="⚙️")
with _bdismiss:
st.write("")
if st.button("", key=f"dismiss_banner_{banner['key']}", help="Dismiss"):
_data = yaml.safe_load(_USER_YAML.read_text()) if _USER_YAML.exists() else {}
_data.setdefault("dismissed_banners", [])
if banner["key"] not in _data["dismissed_banners"]:
_data["dismissed_banners"].append(banner["key"])
_USER_YAML.write_text(yaml.dump(_data, default_flow_style=False, allow_unicode=True))
st.rerun()

0
app/__init__.py Normal file
View file

View file

@ -7,22 +7,32 @@ a "System" section so it doesn't crowd the navigation.
Run: streamlit run app/app.py
bash scripts/manage-ui.sh start
"""
import logging
import os
import subprocess
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent))
logging.basicConfig(level=logging.WARNING, format="%(name)s %(levelname)s: %(message)s")
IS_DEMO = os.environ.get("DEMO_MODE", "").lower() in ("1", "true", "yes")
import streamlit as st
from scripts.db import DEFAULT_DB, init_db, get_active_tasks
from app.feedback import inject_feedback_button
from app.cloud_session import resolve_session, get_db_path, get_config_dir
import sqlite3
st.set_page_config(
page_title="Job Seeker",
page_title="Peregrine",
page_icon="💼",
layout="wide",
)
init_db(DEFAULT_DB)
resolve_session("peregrine")
init_db(get_db_path())
# ── Startup cleanup — runs once per server process via cache_resource ──────────
@st.cache_resource
@ -32,12 +42,12 @@ def _startup() -> None:
2. Auto-queues re-runs for any research generated without SearXNG data,
if SearXNG is now reachable.
"""
conn = sqlite3.connect(DEFAULT_DB)
conn.execute(
"UPDATE background_tasks SET status='failed', error='Interrupted by server restart',"
" finished_at=datetime('now') WHERE status IN ('queued','running')"
)
conn.commit()
# Reset only in-flight tasks — queued tasks survive for the scheduler to resume.
# MUST run before any submit_task() call in this function.
from scripts.db import reset_running_tasks
reset_running_tasks(get_db_path())
conn = sqlite3.connect(get_db_path())
# Auto-recovery: re-run LLM-only research when SearXNG is available
try:
@ -53,7 +63,7 @@ def _startup() -> None:
_ACTIVE_STAGES,
).fetchall()
for (job_id,) in rows:
submit_task(str(DEFAULT_DB), "company_research", job_id)
submit_task(str(get_db_path()), "company_research", job_id)
except Exception:
pass # never block startup
@ -61,6 +71,26 @@ def _startup() -> None:
_startup()
# Silent license refresh on startup — no-op if unreachable
try:
from scripts.license import refresh_if_needed as _refresh_license
_refresh_license()
except Exception:
pass
# ── First-run wizard gate ───────────────────────────────────────────────────────
from scripts.user_profile import UserProfile as _UserProfile
_USER_YAML = get_config_dir() / "user.yaml"
_show_wizard = not IS_DEMO and (
not _UserProfile.exists(_USER_YAML)
or not _UserProfile(_USER_YAML).wizard_complete
)
if _show_wizard:
_setup_page = st.Page("pages/0_Setup.py", title="Setup", icon="👋")
st.navigation({"": [_setup_page]}).run()
st.stop()
# ── Navigation ─────────────────────────────────────────────────────────────────
# st.navigation() must be called before any sidebar writes so it can establish
# the navigation structure first; sidebar additions come after.
@ -85,7 +115,7 @@ pg = st.navigation(pages)
# The sidebar context WRAPS the fragment call — do not write to st.sidebar inside it.
@st.fragment(run_every=3)
def _task_indicator():
tasks = get_active_tasks(DEFAULT_DB)
tasks = get_active_tasks(get_db_path())
if not tasks:
return
st.divider()
@ -105,6 +135,8 @@ def _task_indicator():
label = "Enriching"
elif task_type == "scrape_url":
label = "Scraping URL"
elif task_type == "wizard_generate":
label = "Wizard generation"
elif task_type == "enrich_craigslist":
label = "Enriching listing"
else:
@ -113,7 +145,47 @@ def _task_indicator():
detail = f" · {stage}" if stage else (f"{t.get('company')}" if t.get("company") else "")
st.caption(f"{icon} {label}{detail}")
@st.cache_resource
def _get_version() -> str:
try:
return subprocess.check_output(
["git", "describe", "--tags", "--always"],
cwd=Path(__file__).parent.parent,
text=True,
).strip()
except Exception:
return "dev"
with st.sidebar:
if IS_DEMO:
st.info(
"**Public demo** — read-only sample data. "
"AI features and data saves are disabled.\n\n"
"[Get your own instance →](https://circuitforge.tech/software/peregrine)",
icon="🔒",
)
_task_indicator()
# Cloud LLM indicator — shown whenever any cloud backend is active
_llm_cfg_path = Path(__file__).parent.parent / "config" / "llm.yaml"
try:
import yaml as _yaml
from scripts.byok_guard import cloud_backends as _cloud_backends
_active_cloud = _cloud_backends(_yaml.safe_load(_llm_cfg_path.read_text(encoding="utf-8")) or {})
except Exception:
_active_cloud = []
if _active_cloud:
_provider_names = ", ".join(b.replace("_", " ").title() for b in _active_cloud)
st.warning(
f"**Cloud LLM active**\n\n"
f"{_provider_names}\n\n"
"AI features send content to this provider. "
"[Change in Settings](2_Settings)",
icon="🔓",
)
st.divider()
st.caption(f"Peregrine {_get_version()}")
inject_feedback_button(page=pg.title)
pg.run()

187
app/cloud_session.py Normal file
View file

@ -0,0 +1,187 @@
# peregrine/app/cloud_session.py
"""
Cloud session middleware for multi-tenant Peregrine deployment.
In local-first mode (CLOUD_MODE unset or false), all functions are no-ops.
In cloud mode (CLOUD_MODE=true), resolves the Directus session JWT from the
X-CF-Session header, validates it, and injects user_id + db_path into
st.session_state.
All Peregrine pages call get_db_path() instead of DEFAULT_DB directly to
transparently support both local and cloud deployments.
"""
import logging
import os
import re
import hmac
import hashlib
from pathlib import Path
import requests
import streamlit as st
from scripts.db import DEFAULT_DB
log = logging.getLogger(__name__)
CLOUD_MODE: bool = os.environ.get("CLOUD_MODE", "").lower() in ("1", "true", "yes")
CLOUD_DATA_ROOT: Path = Path(os.environ.get("CLOUD_DATA_ROOT", "/devl/menagerie-data"))
DIRECTUS_JWT_SECRET: str = os.environ.get("DIRECTUS_JWT_SECRET", "")
SERVER_SECRET: str = os.environ.get("CF_SERVER_SECRET", "")
# Heimdall license server — internal URL preferred when running on the same host
HEIMDALL_URL: str = os.environ.get("HEIMDALL_URL", "https://license.circuitforge.tech")
HEIMDALL_ADMIN_TOKEN: str = os.environ.get("HEIMDALL_ADMIN_TOKEN", "")
def _extract_session_token(cookie_header: str) -> str:
"""Extract cf_session value from a Cookie header string."""
m = re.search(r'(?:^|;)\s*cf_session=([^;]+)', cookie_header)
return m.group(1).strip() if m else ""
@st.cache_data(ttl=300, show_spinner=False)
def _fetch_cloud_tier(user_id: str, product: str) -> str:
"""Call Heimdall to resolve the current cloud tier for this user.
Cached per (user_id, product) for 5 minutes to avoid hammering Heimdall
on every Streamlit rerun. Returns "free" on any error so the app degrades
gracefully rather than blocking the user.
"""
if not HEIMDALL_ADMIN_TOKEN:
log.warning("HEIMDALL_ADMIN_TOKEN not set — defaulting tier to free")
return "free"
try:
resp = requests.post(
f"{HEIMDALL_URL}/admin/cloud/resolve",
json={"user_id": user_id, "product": product},
headers={"Authorization": f"Bearer {HEIMDALL_ADMIN_TOKEN}"},
timeout=5,
)
if resp.status_code == 200:
return resp.json().get("tier", "free")
if resp.status_code == 404:
# No cloud key yet — user signed up before provision ran; return free.
return "free"
log.warning("Heimdall resolve returned %s — defaulting tier to free", resp.status_code)
except Exception as exc:
log.warning("Heimdall tier resolve failed: %s — defaulting to free", exc)
return "free"
def validate_session_jwt(token: str) -> str:
"""Validate a Directus session JWT and return the user UUID. Raises on failure."""
import jwt # PyJWT — lazy import so local mode never needs it
payload = jwt.decode(token, DIRECTUS_JWT_SECRET, algorithms=["HS256"])
user_id = payload.get("id") or payload.get("sub")
if not user_id:
raise ValueError("JWT missing user id claim")
return user_id
def _user_data_path(user_id: str, app: str) -> Path:
return CLOUD_DATA_ROOT / user_id / app
def derive_db_key(user_id: str) -> str:
"""Derive a per-user SQLCipher encryption key from the server secret."""
return hmac.new(
SERVER_SECRET.encode(),
user_id.encode(),
hashlib.sha256,
).hexdigest()
def _render_auth_wall(message: str = "Please sign in to continue.") -> None:
"""Render a branded sign-in prompt and halt the page."""
st.markdown(
"""
<style>
[data-testid="stSidebar"] { display: none; }
[data-testid="collapsedControl"] { display: none; }
</style>
""",
unsafe_allow_html=True,
)
col = st.columns([1, 2, 1])[1]
with col:
st.markdown("## 🦅 Peregrine")
st.info(message, icon="🔒")
st.link_button(
"Sign in to CircuitForge",
url=f"https://circuitforge.tech/login?next=/peregrine",
use_container_width=True,
)
def resolve_session(app: str = "peregrine") -> None:
"""
Call at the top of each Streamlit page.
In local mode: no-op.
In cloud mode: reads X-CF-Session header, validates JWT, creates user
data directory on first visit, and sets st.session_state keys:
- user_id: str
- db_path: Path
- db_key: str (SQLCipher key for this user)
- cloud_tier: str (free | paid | premium | ultra resolved from Heimdall)
Idempotent skips if user_id already in session_state.
"""
if not CLOUD_MODE:
return
if st.session_state.get("user_id"):
return
cookie_header = st.context.headers.get("x-cf-session", "")
session_jwt = _extract_session_token(cookie_header)
if not session_jwt:
_render_auth_wall("Please sign in to access Peregrine.")
st.stop()
try:
user_id = validate_session_jwt(session_jwt)
except Exception:
_render_auth_wall("Your session has expired. Please sign in again.")
st.stop()
user_path = _user_data_path(user_id, app)
user_path.mkdir(parents=True, exist_ok=True)
(user_path / "config").mkdir(exist_ok=True)
(user_path / "data").mkdir(exist_ok=True)
st.session_state["user_id"] = user_id
st.session_state["db_path"] = user_path / "staging.db"
st.session_state["db_key"] = derive_db_key(user_id)
st.session_state["cloud_tier"] = _fetch_cloud_tier(user_id, app)
def get_db_path() -> Path:
"""
Return the active db_path for this session.
Cloud: user-scoped path from session_state.
Local: DEFAULT_DB (from STAGING_DB env var or repo default).
"""
return st.session_state.get("db_path", DEFAULT_DB)
def get_config_dir() -> Path:
"""
Return the config directory for this session.
Cloud: per-user path (<data_root>/<user_id>/peregrine/config/) so each
user's YAML files (user.yaml, plain_text_resume.yaml, etc.) are
isolated and never shared across tenants.
Local: repo-level config/ directory.
"""
if CLOUD_MODE and st.session_state.get("db_path"):
return Path(st.session_state["db_path"]).parent / "config"
return Path(__file__).parent.parent / "config"
def get_cloud_tier() -> str:
"""
Return the current user's cloud tier.
Cloud mode: resolved from Heimdall at session start (cached 5 min).
Local mode: always returns "local" so pages can distinguish self-hosted from cloud.
"""
if not CLOUD_MODE:
return "local"
return st.session_state.get("cloud_tier", "free")

View file

@ -0,0 +1 @@
# app/components/__init__.py

View file

@ -0,0 +1,192 @@
# app/components/linkedin_import.py
"""
Shared LinkedIn import widget.
Usage in a page:
from app.components.linkedin_import import render_linkedin_tab
# At top of page render — check for pending import:
_li_data = st.session_state.pop("_linkedin_extracted", None)
if _li_data:
st.session_state["_parsed_resume"] = _li_data
st.rerun()
# Inside the LinkedIn tab:
with tab_linkedin:
render_linkedin_tab(config_dir=CONFIG_DIR, tier=tier)
"""
from __future__ import annotations
import json
import re
from datetime import datetime, timezone
from pathlib import Path
import streamlit as st
_LINKEDIN_PROFILE_RE = re.compile(r"https?://(www\.)?linkedin\.com/in/", re.I)
def _stage_path(config_dir: Path) -> Path:
return config_dir / "linkedin_stage.json"
def _load_stage(config_dir: Path) -> dict | None:
path = _stage_path(config_dir)
if not path.exists():
return None
try:
return json.loads(path.read_text())
except Exception:
return None
def _days_ago(iso_ts: str) -> str:
try:
dt = datetime.fromisoformat(iso_ts)
delta = datetime.now(timezone.utc) - dt
days = delta.days
if days == 0:
return "today"
if days == 1:
return "yesterday"
return f"{days} days ago"
except Exception:
return "unknown"
def _do_scrape(url: str, config_dir: Path) -> None:
"""Validate URL, run scrape, update state."""
if not _LINKEDIN_PROFILE_RE.match(url):
st.error("Please enter a LinkedIn profile URL (linkedin.com/in/…)")
return
with st.spinner("Fetching LinkedIn profile… (1020 seconds)"):
try:
from scripts.linkedin_scraper import scrape_profile
scrape_profile(url, _stage_path(config_dir))
st.success("Profile imported successfully.")
st.rerun()
except ValueError as e:
st.error(str(e))
except RuntimeError as e:
st.warning(str(e))
except Exception as e:
st.error(f"Unexpected error: {e}")
def render_linkedin_tab(config_dir: Path, tier: str) -> None:
"""
Render the LinkedIn import UI.
When the user clicks "Use this data", writes the extracted dict to
st.session_state["_linkedin_extracted"] and calls st.rerun().
Caller reads: data = st.session_state.pop("_linkedin_extracted", None)
"""
stage = _load_stage(config_dir)
# ── Staged data status bar ────────────────────────────────────────────────
if stage:
scraped_at = stage.get("scraped_at", "")
source_label = "LinkedIn export" if stage.get("source") == "export_zip" else "LinkedIn profile"
col_info, col_refresh = st.columns([4, 1])
col_info.caption(f"Last imported from {source_label}: {_days_ago(scraped_at)}")
if col_refresh.button("🔄 Refresh", key="li_refresh"):
url = stage.get("url")
if url:
_do_scrape(url, config_dir)
else:
st.info("Original URL not available — paste the URL below to re-import.")
# ── URL import ────────────────────────────────────────────────────────────
st.markdown("**Import from LinkedIn profile URL**")
url_input = st.text_input(
"LinkedIn profile URL",
placeholder="https://linkedin.com/in/your-name",
label_visibility="collapsed",
key="li_url_input",
)
if st.button("🔗 Import from LinkedIn", key="li_import_btn", type="primary"):
if not url_input.strip():
st.warning("Please enter your LinkedIn profile URL.")
else:
_do_scrape(url_input.strip(), config_dir)
st.caption(
"Imports from your public LinkedIn profile. No login or credentials required. "
"Scraping typically takes 1020 seconds."
)
st.info(
"**LinkedIn limits public profile data.** Without logging in, LinkedIn only "
"exposes your name, About summary, current employer, and certifications — "
"past roles, education, and skills are hidden behind their login wall. "
"For your full career history use the **data export zip** option below.",
icon="",
)
# ── Section preview + use button ─────────────────────────────────────────
if stage:
from scripts.linkedin_parser import parse_stage
extracted, err = parse_stage(_stage_path(config_dir))
if err:
st.warning(f"Could not read staged data: {err}")
else:
st.divider()
st.markdown("**Preview**")
col1, col2, col3 = st.columns(3)
col1.metric("Experience entries", len(extracted.get("experience", [])))
col2.metric("Skills", len(extracted.get("skills", [])))
col3.metric("Certifications", len(extracted.get("achievements", [])))
if extracted.get("career_summary"):
with st.expander("Summary"):
st.write(extracted["career_summary"])
if extracted.get("experience"):
with st.expander(f"Experience ({len(extracted['experience'])} entries)"):
for exp in extracted["experience"]:
st.markdown(f"**{exp.get('title')}** @ {exp.get('company')} · {exp.get('date_range', '')}")
if extracted.get("education"):
with st.expander("Education"):
for edu in extracted["education"]:
st.markdown(f"**{edu.get('school')}** — {edu.get('degree')} {edu.get('field', '')}".strip())
if extracted.get("skills"):
with st.expander("Skills"):
st.write(", ".join(extracted["skills"]))
st.divider()
if st.button("✅ Use this data", key="li_use_btn", type="primary"):
st.session_state["_linkedin_extracted"] = extracted
st.rerun()
# ── Advanced: data export ─────────────────────────────────────────────────
with st.expander("⬇️ Import from LinkedIn data export (advanced)", expanded=False):
st.caption(
"Download your LinkedIn data: **Settings & Privacy → Data Privacy → "
"Get a copy of your data → Request archive → Fast file**. "
"The Fast file is available immediately and contains your profile, "
"experience, education, and skills."
)
zip_file = st.file_uploader(
"Upload LinkedIn export zip", type=["zip"], key="li_zip_upload"
)
if zip_file is not None:
if st.button("📦 Parse export", key="li_parse_zip"):
with st.spinner("Parsing export archive…"):
try:
from scripts.linkedin_scraper import parse_export_zip
extracted = parse_export_zip(
zip_file.read(), _stage_path(config_dir)
)
st.success(
f"Imported {len(extracted.get('experience', []))} experience entries, "
f"{len(extracted.get('skills', []))} skills. "
"Click 'Use this data' above to apply."
)
st.rerun()
except Exception as e:
st.error(f"Failed to parse export: {e}")

View file

@ -0,0 +1,31 @@
"""
Paste-from-clipboard / drag-and-drop image component.
Uses st.components.v1.declare_component so JS can return image bytes to Python
(st.components.v1.html() is one-way only). No build step required the
frontend is a single index.html file.
"""
from __future__ import annotations
import base64
from pathlib import Path
import streamlit.components.v1 as components
_FRONTEND = Path(__file__).parent / "paste_image_ui"
_paste_image = components.declare_component("paste_image", path=str(_FRONTEND))
def paste_image_component(key: str | None = None) -> bytes | None:
"""
Render the paste/drop zone. Returns PNG/JPEG bytes when an image is
pasted or dropped, or None if nothing has been submitted yet.
"""
result = _paste_image(key=key)
if result:
try:
return base64.b64decode(result)
except Exception:
return None
return None

View file

@ -0,0 +1,142 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<style>
* { box-sizing: border-box; margin: 0; padding: 0; }
body {
font-family: -apple-system, BlinkMacSystemFont, "Source Sans Pro", sans-serif;
background: transparent;
}
.zone {
width: 100%;
min-height: 72px;
border: 2px dashed var(--border, #ccc);
border-radius: 8px;
display: flex;
align-items: center;
justify-content: center;
flex-direction: column;
gap: 6px;
padding: 12px 16px;
cursor: pointer;
outline: none;
transition: border-color 0.15s, background 0.15s;
color: var(--text-muted, #888);
font-size: 13px;
text-align: center;
user-select: none;
}
.zone:focus { border-color: var(--primary, #ff4b4b); background: var(--primary-faint, rgba(255,75,75,0.06)); }
.zone.dragover { border-color: var(--primary, #ff4b4b); background: var(--primary-faint, rgba(255,75,75,0.06)); }
.zone.done { border-style: solid; border-color: #00c853; color: #00c853; }
.icon { font-size: 22px; line-height: 1; }
.hint { font-size: 11px; opacity: 0.7; }
.status { margin-top: 5px; font-size: 11px; text-align: center; color: var(--text-muted, #888); min-height: 16px; }
</style>
</head>
<body>
<div class="zone" id="zone" tabindex="0" role="button"
aria-label="Click to focus, then paste with Ctrl+V, or drag and drop an image">
<span class="icon">📋</span>
<span id="mainMsg"><strong>Click here</strong>, then <strong>Ctrl+V</strong> to paste</span>
<span class="hint" id="hint">or drag &amp; drop an image file</span>
</div>
<div class="status" id="status"></div>
<script>
const zone = document.getElementById('zone');
const status = document.getElementById('status');
const mainMsg = document.getElementById('mainMsg');
const hint = document.getElementById('hint');
// ── Streamlit handshake ─────────────────────────────────────────────────
window.parent.postMessage({ type: "streamlit:componentReady", apiVersion: 1 }, "*");
function setHeight() {
const h = document.body.scrollHeight + 4;
window.parent.postMessage({ type: "streamlit:setFrameHeight", height: h }, "*");
}
setHeight();
// ── Theme ───────────────────────────────────────────────────────────────
window.addEventListener("message", (e) => {
if (e.data && e.data.type === "streamlit:render") {
const t = e.data.args && e.data.args.theme;
if (!t) return;
const r = document.documentElement;
r.style.setProperty("--primary", t.primaryColor || "#ff4b4b");
r.style.setProperty("--primary-faint", (t.primaryColor || "#ff4b4b") + "10");
r.style.setProperty("--text-muted", t.textColor ? t.textColor + "99" : "#888");
r.style.setProperty("--border", t.textColor ? t.textColor + "33" : "#ccc");
document.body.style.background = t.backgroundColor || "transparent";
}
});
// ── Image handling ──────────────────────────────────────────────────────
function markDone() {
zone.classList.add('done');
// Clear children and rebuild with safe DOM methods
while (zone.firstChild) zone.removeChild(zone.firstChild);
const icon = document.createElement('span');
icon.className = 'icon';
icon.textContent = '\u2705';
const msg = document.createElement('span');
msg.textContent = 'Image ready \u2014 remove or replace below';
zone.appendChild(icon);
zone.appendChild(msg);
setHeight();
}
function sendImage(blob) {
const reader = new FileReader();
reader.onload = function(ev) {
const dataUrl = ev.target.result;
const b64 = dataUrl.slice(dataUrl.indexOf(',') + 1);
window.parent.postMessage({ type: "streamlit:setComponentValue", value: b64 }, "*");
markDone();
};
reader.readAsDataURL(blob);
}
function findImageItem(items) {
if (!items) return null;
for (let i = 0; i < items.length; i++) {
if (items[i].type && items[i].type.indexOf('image/') === 0) return items[i];
}
return null;
}
// Ctrl+V paste (works over HTTP — uses paste event, not Clipboard API)
document.addEventListener('paste', function(e) {
const item = findImageItem(e.clipboardData && e.clipboardData.items);
if (item) { sendImage(item.getAsFile()); e.preventDefault(); }
});
// Drag and drop
zone.addEventListener('dragover', function(e) {
e.preventDefault();
zone.classList.add('dragover');
});
zone.addEventListener('dragleave', function() {
zone.classList.remove('dragover');
});
zone.addEventListener('drop', function(e) {
e.preventDefault();
zone.classList.remove('dragover');
const files = e.dataTransfer && e.dataTransfer.files;
if (files && files.length) {
for (let i = 0; i < files.length; i++) {
if (files[i].type.indexOf('image/') === 0) { sendImage(files[i]); return; }
}
}
// Fallback: dataTransfer items (e.g. dragged from browser)
const item = findImageItem(e.dataTransfer && e.dataTransfer.items);
if (item) sendImage(item.getAsFile());
});
// Click to focus so Ctrl+V lands in this iframe
zone.addEventListener('click', function() { zone.focus(); });
</script>
</body>
</html>

247
app/feedback.py Normal file
View file

@ -0,0 +1,247 @@
"""
Floating feedback button + dialog thin Streamlit shell.
All business logic lives in scripts/feedback_api.py.
"""
from __future__ import annotations
import os
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent))
import streamlit as st
# ── CSS: float the button to the bottom-right corner ─────────────────────────
# Targets the button by its aria-label (set via `help=` parameter).
_FLOAT_CSS = """
<style>
button[aria-label="Send feedback or report a bug"] {
position: fixed !important;
bottom: 2rem !important;
right: 2rem !important;
z-index: 9999 !important;
border-radius: 25px !important;
padding: 0.5rem 1.25rem !important;
box-shadow: 0 4px 16px rgba(0,0,0,0.25) !important;
font-size: 0.9rem !important;
}
</style>
"""
@st.dialog("Send Feedback", width="large")
def _feedback_dialog(page: str) -> None:
"""Two-step feedback dialog: form → consent/attachments → submit."""
from scripts.feedback_api import (
collect_context, collect_logs, collect_listings,
build_issue_body, create_forgejo_issue, upload_attachment,
)
from scripts.db import DEFAULT_DB
# ── Initialise step counter ───────────────────────────────────────────────
if "fb_step" not in st.session_state:
st.session_state.fb_step = 1
# ═════════════════════════════════════════════════════════════════════════
# STEP 1 — Form
# ═════════════════════════════════════════════════════════════════════════
if st.session_state.fb_step == 1:
st.subheader("What's on your mind?")
fb_type = st.selectbox(
"Type", ["Bug", "Feature Request", "Other"], key="fb_type"
)
fb_title = st.text_input(
"Title", placeholder="Short summary of the issue or idea", key="fb_title"
)
fb_desc = st.text_area(
"Description",
placeholder="Describe what happened or what you'd like to see...",
key="fb_desc",
)
if fb_type == "Bug":
st.text_area(
"Reproduction steps",
placeholder="1. Go to...\n2. Click...\n3. See error",
key="fb_repro",
)
col_cancel, _, col_next = st.columns([1, 3, 1])
with col_cancel:
if st.button("Cancel"):
_clear_feedback_state()
st.rerun() # intentionally closes the dialog
with col_next:
if st.button("Next →", type="primary"):
# Read widget values NOW (same rerun as the click — values are
# available here even on first click). Copy to non-widget keys
# so they survive step 2's render (Streamlit removes widget
# state for widgets that are no longer rendered).
title = fb_title.strip()
desc = fb_desc.strip()
if not title or not desc:
st.error("Please fill in both Title and Description.")
else:
st.session_state.fb_data_type = fb_type
st.session_state.fb_data_title = title
st.session_state.fb_data_desc = desc
st.session_state.fb_data_repro = st.session_state.get("fb_repro", "")
st.session_state.fb_step = 2
# ═════════════════════════════════════════════════════════════════════════
# STEP 2 — Consent + attachments
# ═════════════════════════════════════════════════════════════════════════
elif st.session_state.fb_step == 2:
st.subheader("Optional: attach diagnostic data")
# ── Diagnostic data toggle + preview ─────────────────────────────────
include_diag = st.toggle(
"Include diagnostic data (logs + recent listings)", key="fb_diag"
)
if include_diag:
with st.expander("Preview what will be sent", expanded=True):
st.caption("**App logs (last 100 lines, PII masked):**")
st.code(collect_logs(100), language=None)
st.caption("**Recent listings (title / company / URL only):**")
for j in collect_listings(DEFAULT_DB, 5):
st.write(f"- {j['title']} @ {j['company']}{j['url']}")
# ── Screenshot ────────────────────────────────────────────────────────
st.divider()
st.caption("**Screenshot** (optional)")
from app.components.paste_image import paste_image_component
# Keyed so we can reset the component when the user removes the image
if "fb_paste_key" not in st.session_state:
st.session_state.fb_paste_key = 0
pasted = paste_image_component(key=f"fb_paste_{st.session_state.fb_paste_key}")
if pasted:
st.session_state.fb_screenshot = pasted
st.caption("or upload a file:")
uploaded = st.file_uploader(
"Upload screenshot",
type=["png", "jpg", "jpeg"],
label_visibility="collapsed",
key="fb_upload",
)
if uploaded:
st.session_state.fb_screenshot = uploaded.read()
if st.session_state.get("fb_screenshot"):
st.image(
st.session_state["fb_screenshot"],
caption="Screenshot preview — this will be attached to the issue",
use_container_width=True,
)
if st.button("🗑 Remove screenshot"):
st.session_state.pop("fb_screenshot", None)
st.session_state.fb_paste_key = st.session_state.get("fb_paste_key", 0) + 1
# no st.rerun() — button click already re-renders the dialog
# ── Attribution consent ───────────────────────────────────────────────
st.divider()
submitter: str | None = None
try:
import yaml
_ROOT = Path(__file__).parent.parent
user = yaml.safe_load((_ROOT / "config" / "user.yaml").read_text()) or {}
name = (user.get("name") or "").strip()
email = (user.get("email") or "").strip()
if name or email:
label = f"Include my name & email in the report: **{name}** ({email})"
if st.checkbox(label, key="fb_attr"):
submitter = f"{name} <{email}>"
except Exception:
pass
# ── Navigation ────────────────────────────────────────────────────────
col_back, _, col_submit = st.columns([1, 3, 2])
with col_back:
if st.button("← Back"):
st.session_state.fb_step = 1
# no st.rerun() — button click already re-renders the dialog
with col_submit:
if st.button("Submit Feedback", type="primary"):
_submit(page, include_diag, submitter, collect_context,
collect_logs, collect_listings, build_issue_body,
create_forgejo_issue, upload_attachment, DEFAULT_DB)
def _submit(page, include_diag, submitter, collect_context, collect_logs,
collect_listings, build_issue_body, create_forgejo_issue,
upload_attachment, db_path) -> None:
"""Handle form submission: build body, file issue, upload screenshot."""
with st.spinner("Filing issue…"):
context = collect_context(page)
attachments: dict = {}
if include_diag:
attachments["logs"] = collect_logs(100)
attachments["listings"] = collect_listings(db_path, 5)
if submitter:
attachments["submitter"] = submitter
fb_type = st.session_state.get("fb_data_type", "Other")
type_key = {"Bug": "bug", "Feature Request": "feature", "Other": "other"}.get(
fb_type, "other"
)
labels = ["beta-feedback", "needs-triage"]
labels.append(
{"bug": "bug", "feature": "feature-request"}.get(type_key, "question")
)
form = {
"type": type_key,
"description": st.session_state.get("fb_data_desc", ""),
"repro": st.session_state.get("fb_data_repro", "") if type_key == "bug" else "",
}
body = build_issue_body(form, context, attachments)
try:
result = create_forgejo_issue(
st.session_state.get("fb_data_title", "Feedback"), body, labels
)
screenshot = st.session_state.get("fb_screenshot")
if screenshot:
upload_attachment(result["number"], screenshot)
_clear_feedback_state()
st.success(f"Issue filed! [View on Forgejo]({result['url']})")
st.balloons()
except Exception as exc:
st.error(f"Failed to file issue: {exc}")
def _clear_feedback_state() -> None:
for key in [
"fb_step",
"fb_type", "fb_title", "fb_desc", "fb_repro", # widget keys
"fb_data_type", "fb_data_title", "fb_data_desc", "fb_data_repro", # saved data
"fb_diag", "fb_upload", "fb_attr", "fb_screenshot", "fb_paste_key",
]:
st.session_state.pop(key, None)
def inject_feedback_button(page: str = "Unknown") -> None:
"""
Inject the floating feedback button. Call once per page render in app.py.
Hidden automatically in DEMO_MODE.
"""
if os.environ.get("DEMO_MODE", "").lower() in ("1", "true", "yes"):
return
if not os.environ.get("FORGEJO_API_TOKEN"):
return # silently skip if not configured
st.markdown(_FLOAT_CSS, unsafe_allow_html=True)
if st.button(
"💬 Feedback",
key="__feedback_floating_btn__",
help="Send feedback or report a bug",
):
_feedback_dialog(page)

744
app/pages/0_Setup.py Normal file
View file

@ -0,0 +1,744 @@
"""
First-run setup wizard orchestrator.
Shown by app.py when user.yaml is absent OR wizard_complete is False.
Seven steps: hardware tier identity resume inference search integrations (optional).
Steps 1-6 are mandatory; step 7 is optional and can be skipped.
Each step writes to user.yaml on "Next" for crash recovery.
"""
from __future__ import annotations
import json
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent.parent))
import streamlit as st
import yaml
from app.cloud_session import resolve_session, get_db_path, get_config_dir
resolve_session("peregrine")
_ROOT = Path(__file__).parent.parent.parent
CONFIG_DIR = get_config_dir() # per-user dir in cloud; repo config/ locally
USER_YAML = CONFIG_DIR / "user.yaml"
STEPS = 6 # mandatory steps
STEP_LABELS = ["Hardware", "Tier", "Resume", "Identity", "Inference", "Search"]
# ── Helpers ────────────────────────────────────────────────────────────────────
def _load_yaml() -> dict:
if USER_YAML.exists():
return yaml.safe_load(USER_YAML.read_text()) or {}
return {}
def _save_yaml(updates: dict) -> None:
existing = _load_yaml()
existing.update(updates)
CONFIG_DIR.mkdir(parents=True, exist_ok=True)
USER_YAML.write_text(
yaml.dump(existing, default_flow_style=False, allow_unicode=True)
)
def _detect_gpus() -> list[str]:
"""Detect GPUs. Prefers env vars written by preflight (works inside Docker)."""
import os
import subprocess
# Preflight writes PEREGRINE_GPU_NAMES to .env; compose passes it to the container.
# This is the reliable path when running inside Docker without nvidia-smi access.
env_names = os.environ.get("PEREGRINE_GPU_NAMES", "").strip()
if env_names:
return [n.strip() for n in env_names.split(",") if n.strip()]
# Fallback: try nvidia-smi directly (works when running bare or with GPU passthrough)
try:
out = subprocess.check_output(
["nvidia-smi", "--query-gpu=name", "--format=csv,noheader"],
text=True, timeout=5,
)
return [l.strip() for l in out.strip().splitlines() if l.strip()]
except Exception:
return []
def _suggest_profile(gpus: list[str]) -> str:
import os
# If preflight already ran and wrote a profile recommendation, use it.
recommended = os.environ.get("RECOMMENDED_PROFILE", "").strip()
if recommended:
return recommended
if len(gpus) >= 2:
return "dual-gpu"
if len(gpus) == 1:
return "single-gpu"
return "remote"
def _submit_wizard_task(section: str, input_data: dict) -> int:
"""Submit a wizard_generate background task. Returns task_id."""
from scripts.task_runner import submit_task
params = json.dumps({"section": section, "input": input_data})
task_id, _ = submit_task(get_db_path(), "wizard_generate", 0, params=params)
return task_id
def _poll_wizard_task(section: str) -> dict | None:
"""Return the most recent wizard_generate task row for a given section, or None."""
import sqlite3
conn = sqlite3.connect(get_db_path())
conn.row_factory = sqlite3.Row
row = conn.execute(
"SELECT * FROM background_tasks "
"WHERE task_type='wizard_generate' AND params LIKE ? "
"ORDER BY id DESC LIMIT 1",
(f'%"section": "{section}"%',),
).fetchone()
conn.close()
return dict(row) if row else None
def _generation_widget(section: str, label: str, tier: str,
feature_key: str, input_data: dict) -> str | None:
"""Render a generation button + polling fragment.
Returns the generated result string if completed and not yet applied, else None.
Call this inside a step to add LLM generation support.
The caller decides whether to auto-populate a field with the result.
"""
from app.wizard.tiers import can_use, tier_label as tl, has_configured_llm
_has_byok = has_configured_llm()
if not can_use(tier, feature_key, has_byok=_has_byok):
st.caption(f"{tl(feature_key, has_byok=_has_byok)} {label}")
return None
col_btn, col_fb = st.columns([2, 5])
if col_btn.button(f"\u2728 {label}", key=f"gen_{section}"):
_submit_wizard_task(section, input_data)
st.rerun()
with st.expander("\u270f\ufe0f Request changes (optional)", expanded=False):
prev = st.session_state.get(f"_gen_result_{section}", "")
feedback = st.text_area(
"Describe what to change", key=f"_feedback_{section}",
placeholder="e.g. Make it shorter and emphasise leadership",
height=60,
)
if prev and st.button(f"\u21ba Regenerate with feedback", key=f"regen_{section}"):
_submit_wizard_task(section, {**input_data,
"previous_result": prev,
"feedback": feedback})
st.rerun()
# Polling fragment
result_key = f"_gen_result_{section}"
@st.fragment(run_every=3)
def _poll():
task = _poll_wizard_task(section)
if not task:
return
status = task.get("status")
if status in ("queued", "running"):
stage = task.get("stage") or "Queued"
st.info(f"\u23f3 {stage}\u2026")
elif status == "completed":
payload = json.loads(task.get("error") or "{}")
result = payload.get("result", "")
if result and result != st.session_state.get(result_key):
st.session_state[result_key] = result
st.rerun()
elif status == "failed":
st.warning(f"Generation failed: {task.get('error', 'unknown error')}")
_poll()
return st.session_state.get(result_key)
# ── Wizard state init ──────────────────────────────────────────────────────────
if "wizard_step" not in st.session_state:
saved = _load_yaml()
last_completed = saved.get("wizard_step", 0)
st.session_state.wizard_step = min(last_completed + 1, STEPS + 1) # resume at next step
step = st.session_state.wizard_step
saved_yaml = _load_yaml()
_tier = saved_yaml.get("dev_tier_override") or saved_yaml.get("tier", "free")
st.title("\U0001f44b Welcome to Peregrine")
st.caption("Complete the setup to start your job search. Progress saves automatically.")
st.progress(
min((step - 1) / STEPS, 1.0),
text=f"Step {min(step, STEPS)} of {STEPS}" if step <= STEPS else "Almost done!",
)
st.divider()
# ── Step 1: Hardware ───────────────────────────────────────────────────────────
if step == 1:
from app.cloud_session import CLOUD_MODE as _CLOUD_MODE
if _CLOUD_MODE:
# Cloud deployment: always single-gpu (Heimdall), skip hardware selection
_save_yaml({"inference_profile": "single-gpu", "wizard_step": 1})
st.session_state.wizard_step = 2
st.rerun()
from app.wizard.step_hardware import validate, PROFILES
st.subheader("Step 1 \u2014 Hardware Detection")
gpus = _detect_gpus()
suggested = _suggest_profile(gpus)
if gpus:
st.success(f"Detected {len(gpus)} GPU(s): {', '.join(gpus)}")
else:
st.info("No NVIDIA GPUs detected. 'Remote' or 'CPU' mode recommended.")
profile = st.selectbox(
"Inference mode", PROFILES, index=PROFILES.index(suggested),
help="Controls which Docker services start. Change later in Settings \u2192 Services.",
)
if profile in ("single-gpu", "dual-gpu") and not gpus:
st.warning(
"No GPUs detected \u2014 GPU profiles require the NVIDIA Container Toolkit. "
"See README for install instructions."
)
if st.button("Next \u2192", type="primary", key="hw_next"):
errs = validate({"inference_profile": profile})
if errs:
st.error("\n".join(errs))
else:
_save_yaml({"inference_profile": profile, "wizard_step": 1})
st.session_state.wizard_step = 2
st.rerun()
# ── Step 2: Tier ───────────────────────────────────────────────────────────────
elif step == 2:
from app.cloud_session import CLOUD_MODE as _CLOUD_MODE
if _CLOUD_MODE:
# Cloud mode: tier already resolved from Heimdall at session init
cloud_tier = st.session_state.get("cloud_tier", "free")
_save_yaml({"tier": cloud_tier, "wizard_step": 2})
st.session_state.wizard_step = 3
st.rerun()
from app.wizard.step_tier import validate
st.subheader("Step 2 \u2014 Choose Your Plan")
st.caption(
"**Free** is fully functional for self-hosted local use. "
"**Paid/Premium** unlock LLM-assisted features."
)
tier_options = {
"free": "\U0001f193 **Free** \u2014 Local discovery, apply workspace, interviews kanban",
"paid": "\U0001f4bc **Paid** \u2014 + AI career summary, company research, email classifier, calendar sync",
"premium": "\u2b50 **Premium** \u2014 + Voice guidelines, model fine-tuning, multi-user",
}
from app.wizard.tiers import TIERS
current_tier = saved_yaml.get("tier", "free")
selected_tier = st.radio(
"Plan",
list(tier_options.keys()),
format_func=lambda x: tier_options[x],
index=TIERS.index(current_tier) if current_tier in TIERS else 0,
)
col_back, col_next = st.columns([1, 4])
if col_back.button("\u2190 Back", key="tier_back"):
st.session_state.wizard_step = 1
st.rerun()
if col_next.button("Next \u2192", type="primary", key="tier_next"):
errs = validate({"tier": selected_tier})
if errs:
st.error("\n".join(errs))
else:
_save_yaml({"tier": selected_tier, "wizard_step": 2})
st.session_state.wizard_step = 3
st.rerun()
# ── Step 3: Resume ─────────────────────────────────────────────────────────────
elif step == 3:
from app.wizard.step_resume import validate
st.subheader("Step 3 \u2014 Resume")
st.caption("Upload your resume for fast parsing, or build it section by section.")
# Read LinkedIn import result before tabs render (spec: "at step render time")
_li_data = st.session_state.pop("_linkedin_extracted", None)
if _li_data:
st.session_state["_parsed_resume"] = _li_data
tab_upload, tab_builder, tab_linkedin = st.tabs([
"\U0001f4ce Upload", "\U0001f4dd Build Manually", "\U0001f517 LinkedIn"
])
with tab_upload:
uploaded = st.file_uploader("Upload PDF, DOCX, or ODT", type=["pdf", "docx", "odt"])
if uploaded and st.button("Parse Resume", type="primary", key="parse_resume"):
from scripts.resume_parser import (
extract_text_from_pdf, extract_text_from_docx,
extract_text_from_odt, structure_resume,
)
file_bytes = uploaded.read()
ext = uploaded.name.rsplit(".", 1)[-1].lower()
if ext == "pdf":
raw_text = extract_text_from_pdf(file_bytes)
elif ext == "odt":
raw_text = extract_text_from_odt(file_bytes)
else:
raw_text = extract_text_from_docx(file_bytes)
with st.spinner("Parsing\u2026"):
parsed, parse_err = structure_resume(raw_text)
# Diagnostic: show raw extraction + detected fields regardless of outcome
with st.expander("🔍 Parse diagnostics", expanded=not bool(parsed and any(
parsed.get(k) for k in ("name", "experience", "skills")
))):
st.caption("**Raw extracted text (first 800 chars)**")
st.code(raw_text[:800] if raw_text else "(empty)", language="text")
if parsed:
st.caption("**Detected fields**")
st.json({k: (v[:3] if isinstance(v, list) else v) for k, v in parsed.items()})
if parsed and any(parsed.get(k) for k in ("name", "experience", "skills")):
st.session_state["_parsed_resume"] = parsed
st.session_state["_raw_resume_text"] = raw_text
_save_yaml({"_raw_resume_text": raw_text[:8000]})
st.success("Parsed! Review the builder tab to edit entries.")
elif parsed:
# Parsed but empty — show what we got and let them proceed or build manually
st.session_state["_parsed_resume"] = parsed
st.warning("Resume text was extracted but no fields were recognised. "
"Check the diagnostics above — the section headers may use unusual labels. "
"You can still fill in the Build tab manually.")
else:
st.warning("Auto-parse failed \u2014 switch to the Build tab and add entries manually.")
if parse_err:
st.caption(f"Reason: {parse_err}")
with tab_builder:
parsed = st.session_state.get("_parsed_resume", {})
experience = st.session_state.get(
"_experience",
parsed.get("experience") or saved_yaml.get("experience", []),
)
for i, entry in enumerate(experience):
with st.expander(
f"{entry.get('title', 'Entry')} @ {entry.get('company', '?')}",
expanded=(i == len(experience) - 1),
):
entry["company"] = st.text_input("Company", entry.get("company", ""), key=f"co_{i}")
entry["title"] = st.text_input("Title", entry.get("title", ""), key=f"ti_{i}")
raw_bullets = st.text_area(
"Responsibilities (one per line)",
"\n".join(entry.get("bullets", [])),
key=f"bu_{i}", height=80,
)
entry["bullets"] = [b.strip() for b in raw_bullets.splitlines() if b.strip()]
if st.button("Remove entry", key=f"rm_{i}"):
experience.pop(i)
st.session_state["_experience"] = experience
st.rerun()
if st.button("\uff0b Add work experience entry", key="add_exp"):
experience.append({"company": "", "title": "", "bullets": []})
st.session_state["_experience"] = experience
st.rerun()
# Bullet expansion generation
if experience:
all_bullets = "\n".join(
b for e in experience for b in e.get("bullets", [])
)
_generation_widget(
section="expand_bullets",
label="Expand bullet points",
tier=_tier,
feature_key="llm_expand_bullets",
input_data={"bullet_notes": all_bullets},
)
with tab_linkedin:
from app.components.linkedin_import import render_linkedin_tab
render_linkedin_tab(config_dir=CONFIG_DIR, tier=_tier)
col_back, col_next = st.columns([1, 4])
if col_back.button("\u2190 Back", key="resume_back"):
st.session_state.wizard_step = 2
st.rerun()
if col_next.button("Next \u2192", type="primary", key="resume_next"):
parsed = st.session_state.get("_parsed_resume", {})
experience = (
parsed.get("experience") or
st.session_state.get("_experience", [])
)
errs = validate({"experience": experience})
if errs:
st.error("\n".join(errs))
else:
resume_yaml_path = CONFIG_DIR / "plain_text_resume.yaml"
resume_yaml_path.parent.mkdir(parents=True, exist_ok=True)
resume_data = {**parsed, "experience": experience} if parsed else {"experience": experience}
resume_yaml_path.write_text(
yaml.dump(resume_data, default_flow_style=False, allow_unicode=True)
)
_save_yaml({"wizard_step": 3})
st.session_state.wizard_step = 4
st.rerun()
# ── Step 4: Identity ───────────────────────────────────────────────────────────
elif step == 4:
from app.wizard.step_identity import validate
st.subheader("Step 4 \u2014 Your Identity")
st.caption("Used in cover letter PDFs, LLM prompts, and the app header.")
c1, c2 = st.columns(2)
name = c1.text_input("Full Name *", saved_yaml.get("name", ""))
email = c1.text_input("Email *", saved_yaml.get("email", ""))
phone = c2.text_input("Phone", saved_yaml.get("phone", ""))
linkedin = c2.text_input("LinkedIn URL", saved_yaml.get("linkedin", ""))
# Career summary with optional LLM generation — resume text available now (step 3 ran first)
summary_default = st.session_state.get("_gen_result_career_summary") or saved_yaml.get("career_summary", "")
summary = st.text_area(
"Career Summary *", value=summary_default, height=120,
placeholder="Experienced professional with X years in [field]. Specialise in [skills].",
help="Injected into cover letter and research prompts as your professional context.",
)
gen_result = _generation_widget(
section="career_summary",
label="Generate from resume",
tier=_tier,
feature_key="llm_career_summary",
input_data={"resume_text": saved_yaml.get("_raw_resume_text", "")},
)
if gen_result and gen_result != summary:
st.info(f"\u2728 Suggested summary \u2014 paste it above if it looks good:\n\n{gen_result}")
col_back, col_next = st.columns([1, 4])
if col_back.button("\u2190 Back", key="ident_back"):
st.session_state.wizard_step = 3
st.rerun()
if col_next.button("Next \u2192", type="primary", key="ident_next"):
errs = validate({"name": name, "email": email, "career_summary": summary})
if errs:
st.error("\n".join(errs))
else:
_save_yaml({
"name": name, "email": email, "phone": phone,
"linkedin": linkedin, "career_summary": summary,
"wizard_complete": False, "wizard_step": 4,
})
st.session_state.wizard_step = 5
st.rerun()
# ── Step 5: Inference ──────────────────────────────────────────────────────────
elif step == 5:
from app.cloud_session import CLOUD_MODE as _CLOUD_MODE
if _CLOUD_MODE:
# Cloud deployment: inference is managed server-side; skip this step
_save_yaml({"wizard_step": 5})
st.session_state.wizard_step = 6
st.rerun()
from app.wizard.step_inference import validate
st.subheader("Step 5 \u2014 Inference & API Keys")
profile = saved_yaml.get("inference_profile", "remote")
if profile == "remote":
st.info("Remote mode: at least one external API key is required.")
anthropic_key = st.text_input("Anthropic API Key", type="password", placeholder="sk-ant-\u2026")
openai_url = st.text_input("OpenAI-compatible endpoint (optional)",
placeholder="https://api.together.xyz/v1")
openai_key = st.text_input("Endpoint API Key (optional)", type="password",
key="oai_key") if openai_url else ""
else:
st.info(f"Local mode ({profile}): Ollama provides inference.")
anthropic_key = openai_url = openai_key = ""
with st.expander("Advanced \u2014 Service Ports & Hosts"):
st.caption("Change only if services run on non-default ports or remote hosts.")
svc = dict(saved_yaml.get("services", {}))
for svc_name, default_host, default_port in [
("ollama", "ollama", 11434), # Docker service name
("vllm", "vllm", 8000), # Docker service name
("searxng", "searxng", 8080), # Docker internal port (host-mapped: 8888)
]:
c1, c2 = st.columns([3, 1])
svc[f"{svc_name}_host"] = c1.text_input(
f"{svc_name} host",
svc.get(f"{svc_name}_host", default_host),
key=f"h_{svc_name}",
)
svc[f"{svc_name}_port"] = int(c2.number_input(
"port",
value=int(svc.get(f"{svc_name}_port", default_port)),
step=1, key=f"p_{svc_name}",
))
confirmed = st.session_state.get("_inf_confirmed", False)
test_label = "\U0001f50c Test Ollama connection" if profile != "remote" else "\U0001f50c Test LLM connection"
if st.button(test_label, key="inf_test"):
if profile == "remote":
from scripts.llm_router import LLMRouter
try:
r = LLMRouter().complete("Reply with only: OK")
if r and r.strip():
st.success("LLM responding.")
st.session_state["_inf_confirmed"] = True
confirmed = True
except Exception as e:
st.error(f"LLM test failed: {e}")
else:
import requests
ollama_url = f"http://{svc.get('ollama_host','localhost')}:{svc.get('ollama_port',11434)}"
try:
requests.get(f"{ollama_url}/api/tags", timeout=5)
st.success("Ollama is running.")
st.session_state["_inf_confirmed"] = True
confirmed = True
except Exception:
st.warning("Ollama not responding \u2014 you can skip this check and configure later.")
st.session_state["_inf_confirmed"] = True
confirmed = True
col_back, col_next = st.columns([1, 4])
if col_back.button("\u2190 Back", key="inf_back"):
st.session_state.wizard_step = 4
st.rerun()
if col_next.button("Next \u2192", type="primary", key="inf_next", disabled=not confirmed):
errs = validate({"endpoint_confirmed": confirmed})
if errs:
st.error("\n".join(errs))
else:
# Write API keys to .env
env_path = _ROOT / ".env"
env_lines = env_path.read_text().splitlines() if env_path.exists() else []
def _set_env(lines: list[str], key: str, val: str) -> list[str]:
for i, l in enumerate(lines):
if l.startswith(f"{key}="):
lines[i] = f"{key}={val}"
return lines
lines.append(f"{key}={val}")
return lines
if anthropic_key:
env_lines = _set_env(env_lines, "ANTHROPIC_API_KEY", anthropic_key)
if openai_url:
env_lines = _set_env(env_lines, "OPENAI_COMPAT_URL", openai_url)
if openai_key:
env_lines = _set_env(env_lines, "OPENAI_COMPAT_KEY", openai_key)
if anthropic_key or openai_url:
env_path.write_text("\n".join(env_lines) + "\n")
_save_yaml({"services": svc, "wizard_step": 5})
st.session_state.wizard_step = 6
st.rerun()
# ── Step 6: Search ─────────────────────────────────────────────────────────────
elif step == 6:
from app.wizard.step_search import validate
st.subheader("Step 6 \u2014 Job Search Preferences")
st.caption("Set up what to search for. You can refine these in Settings \u2192 Search later.")
titles = st.session_state.get("_titles", saved_yaml.get("_wiz_titles", []))
locations = st.session_state.get("_locations", saved_yaml.get("_wiz_locations", []))
c1, c2 = st.columns(2)
with c1:
st.markdown("**Job Titles**")
for i, t in enumerate(titles):
tc1, tc2 = st.columns([5, 1])
tc1.text(t)
if tc2.button("\u00d7", key=f"rmtitle_{i}"):
titles.pop(i)
st.session_state["_titles"] = titles
st.rerun()
new_title = st.text_input("Add title", key="new_title_wiz",
placeholder="Software Engineer, Product Manager\u2026")
ac1, ac2 = st.columns([4, 1])
if ac2.button("\uff0b", key="add_title"):
if new_title.strip() and new_title.strip() not in titles:
titles.append(new_title.strip())
st.session_state["_titles"] = titles
st.rerun()
# LLM title suggestions
_generation_widget(
section="job_titles",
label="Suggest job titles",
tier=_tier,
feature_key="llm_job_titles",
input_data={
"resume_text": saved_yaml.get("_raw_resume_text", ""),
"current_titles": str(titles),
},
)
with c2:
st.markdown("**Locations**")
for i, l in enumerate(locations):
lc1, lc2 = st.columns([5, 1])
lc1.text(l)
if lc2.button("\u00d7", key=f"rmloc_{i}"):
locations.pop(i)
st.session_state["_locations"] = locations
st.rerun()
new_loc = st.text_input("Add location", key="new_loc_wiz",
placeholder="Remote, New York NY, San Francisco CA\u2026")
ll1, ll2 = st.columns([4, 1])
if ll2.button("\uff0b", key="add_loc"):
if new_loc.strip():
locations.append(new_loc.strip())
st.session_state["_locations"] = locations
st.rerun()
col_back, col_next = st.columns([1, 4])
if col_back.button("\u2190 Back", key="search_back"):
st.session_state.wizard_step = 5
st.rerun()
if col_next.button("Next \u2192", type="primary", key="search_next"):
errs = validate({"job_titles": titles, "locations": locations})
if errs:
st.error("\n".join(errs))
else:
search_profile_path = CONFIG_DIR / "search_profiles.yaml"
existing_profiles = {}
if search_profile_path.exists():
existing_profiles = yaml.safe_load(search_profile_path.read_text()) or {}
profiles_list = existing_profiles.get("profiles", [])
# Update or create "default" profile
default_idx = next(
(i for i, p in enumerate(profiles_list) if p.get("name") == "default"), None
)
default_profile = {
"name": "default",
"job_titles": titles,
"locations": locations,
"remote_only": False,
"boards": ["linkedin", "indeed", "glassdoor", "zip_recruiter"],
}
if default_idx is not None:
profiles_list[default_idx] = default_profile
else:
profiles_list.insert(0, default_profile)
search_profile_path.write_text(
yaml.dump({"profiles": profiles_list},
default_flow_style=False, allow_unicode=True)
)
_save_yaml({"wizard_step": 6})
st.session_state.wizard_step = 7
st.rerun()
# ── Step 7: Integrations (optional) ───────────────────────────────────────────
elif step == 7:
st.subheader("Step 7 \u2014 Integrations (Optional)")
st.caption(
"Connect cloud services, calendars, and notification tools. "
"You can add or change these any time in Settings \u2192 Integrations."
)
from scripts.integrations import REGISTRY
from app.wizard.step_integrations import get_available, is_connected
from app.wizard.tiers import tier_label
available = get_available(_tier)
for name, cls in sorted(REGISTRY.items(), key=lambda x: (x[0] not in available, x[0])):
is_conn = is_connected(name, CONFIG_DIR)
icon = "\u2705" if is_conn else "\u25cb"
lock = tier_label(f"{name}_sync") or tier_label(f"{name}_notifications")
with st.expander(f"{icon} {cls.label} {lock}"):
if name not in available:
st.caption(f"Upgrade to {cls.tier} to unlock {cls.label}.")
continue
inst = cls()
config: dict = {}
for field in inst.fields():
val = st.text_input(
field["label"],
type="password" if field["type"] == "password" else "default",
placeholder=field.get("placeholder", ""),
help=field.get("help", ""),
key=f"int_{name}_{field['key']}",
)
config[field["key"]] = val
required_filled = all(
config.get(f["key"])
for f in inst.fields()
if f.get("required")
)
if st.button(f"Connect {cls.label}", key=f"conn_{name}",
disabled=not required_filled):
inst.connect(config)
with st.spinner(f"Testing {cls.label} connection\u2026"):
if inst.test():
inst.save_config(config, CONFIG_DIR)
st.success(f"{cls.label} connected!")
st.rerun()
else:
st.error(
f"Connection test failed for {cls.label}. "
"Double-check your credentials."
)
st.divider()
col_back, col_skip, col_finish = st.columns([1, 1, 3])
if col_back.button("\u2190 Back", key="int_back"):
st.session_state.wizard_step = 6
st.rerun()
if col_skip.button("Skip \u2192"):
st.session_state.wizard_step = 8 # trigger Finish
st.rerun()
if col_finish.button("\U0001f389 Finish Setup", type="primary", key="finish_btn"):
st.session_state.wizard_step = 8
st.rerun()
# ── Finish ─────────────────────────────────────────────────────────────────────
elif step >= 8:
with st.spinner("Finalising setup\u2026"):
from scripts.user_profile import UserProfile
from scripts.generate_llm_config import apply_service_urls
try:
profile_obj = UserProfile(USER_YAML)
if (CONFIG_DIR / "llm.yaml").exists():
apply_service_urls(profile_obj, CONFIG_DIR / "llm.yaml")
except Exception:
pass # don't block finish on llm.yaml errors
data = _load_yaml()
data["wizard_complete"] = True
data.pop("wizard_step", None)
USER_YAML.write_text(
yaml.dump(data, default_flow_style=False, allow_unicode=True)
)
st.success("\u2705 Setup complete! Loading Peregrine\u2026")
st.session_state.clear()
st.rerun()

File diff suppressed because it is too large Load diff

View file

@ -1,191 +0,0 @@
# app/pages/3_Resume_Editor.py
"""
Resume Editor form-based editor for Alex's AIHawk profile YAML.
FILL_IN fields highlighted in amber.
"""
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent.parent))
import streamlit as st
import yaml
st.set_page_config(page_title="Resume Editor", page_icon="📝", layout="wide")
st.title("📝 Resume Editor")
st.caption("Edit Alex's application profile used by AIHawk for LinkedIn Easy Apply.")
RESUME_PATH = Path(__file__).parent.parent.parent / "aihawk" / "data_folder" / "plain_text_resume.yaml"
if not RESUME_PATH.exists():
st.error(f"Resume file not found at `{RESUME_PATH}`. Is AIHawk cloned?")
st.stop()
data = yaml.safe_load(RESUME_PATH.read_text()) or {}
def field(label: str, value: str, key: str, help: str = "", password: bool = False) -> str:
"""Render a text input, highlighted amber if value is FILL_IN or empty."""
needs_attention = str(value).startswith("FILL_IN") or value == ""
if needs_attention:
st.markdown(
'<p style="color:#F59E0B;font-size:0.8em;margin-bottom:2px">⚠️ Needs your attention</p>',
unsafe_allow_html=True,
)
return st.text_input(label, value=value or "", key=key, help=help,
type="password" if password else "default")
st.divider()
# ── Personal Info ─────────────────────────────────────────────────────────────
with st.expander("👤 Personal Information", expanded=True):
info = data.get("personal_information", {})
col1, col2 = st.columns(2)
with col1:
name = field("First Name", info.get("name", ""), "pi_name")
email = field("Email", info.get("email", ""), "pi_email")
phone = field("Phone", info.get("phone", ""), "pi_phone")
city = field("City", info.get("city", ""), "pi_city")
with col2:
surname = field("Last Name", info.get("surname", ""), "pi_surname")
linkedin = field("LinkedIn URL", info.get("linkedin", ""), "pi_linkedin")
zip_code = field("Zip Code", info.get("zip_code", ""), "pi_zip")
dob = field("Date of Birth", info.get("date_of_birth", ""), "pi_dob",
help="Format: MM/DD/YYYY")
# ── Education ─────────────────────────────────────────────────────────────────
with st.expander("🎓 Education"):
edu_list = data.get("education_details", [{}])
updated_edu = []
degree_options = ["Bachelor's Degree", "Master's Degree", "Some College",
"Associate's Degree", "High School", "Other"]
for i, edu in enumerate(edu_list):
st.markdown(f"**Entry {i+1}**")
col1, col2 = st.columns(2)
with col1:
inst = field("Institution", edu.get("institution", ""), f"edu_inst_{i}")
field_study = st.text_input("Field of Study", edu.get("field_of_study", ""), key=f"edu_field_{i}")
start = st.text_input("Start Year", edu.get("start_date", ""), key=f"edu_start_{i}")
with col2:
current_level = edu.get("education_level", "Some College")
level_idx = degree_options.index(current_level) if current_level in degree_options else 2
level = st.selectbox("Degree Level", degree_options, index=level_idx, key=f"edu_level_{i}")
end = st.text_input("Completion Year", edu.get("year_of_completion", ""), key=f"edu_end_{i}")
updated_edu.append({
"education_level": level, "institution": inst, "field_of_study": field_study,
"start_date": start, "year_of_completion": end, "final_evaluation_grade": "", "exam": {},
})
st.divider()
# ── Experience ────────────────────────────────────────────────────────────────
with st.expander("💼 Work Experience"):
exp_list = data.get("experience_details", [{}])
if "exp_count" not in st.session_state:
st.session_state.exp_count = len(exp_list)
if st.button("+ Add Experience Entry"):
st.session_state.exp_count += 1
exp_list.append({})
updated_exp = []
for i in range(st.session_state.exp_count):
exp = exp_list[i] if i < len(exp_list) else {}
st.markdown(f"**Position {i+1}**")
col1, col2 = st.columns(2)
with col1:
pos = field("Job Title", exp.get("position", ""), f"exp_pos_{i}")
company = field("Company", exp.get("company", ""), f"exp_co_{i}")
period = field("Employment Period", exp.get("employment_period", ""), f"exp_period_{i}",
help="e.g. 01/2022 - Present")
with col2:
location = st.text_input("Location", exp.get("location", ""), key=f"exp_loc_{i}")
industry = st.text_input("Industry", exp.get("industry", ""), key=f"exp_ind_{i}")
responsibilities = st.text_area(
"Key Responsibilities (one per line)",
value="\n".join(
r.get(f"responsibility_{j+1}", "") if isinstance(r, dict) else str(r)
for j, r in enumerate(exp.get("key_responsibilities", []))
),
key=f"exp_resp_{i}", height=100,
)
skills = st.text_input(
"Skills (comma-separated)",
value=", ".join(exp.get("skills_acquired", [])),
key=f"exp_skills_{i}",
)
resp_list = [{"responsibility_1": r.strip()} for r in responsibilities.splitlines() if r.strip()]
skill_list = [s.strip() for s in skills.split(",") if s.strip()]
updated_exp.append({
"position": pos, "company": company, "employment_period": period,
"location": location, "industry": industry,
"key_responsibilities": resp_list, "skills_acquired": skill_list,
})
st.divider()
# ── Preferences ───────────────────────────────────────────────────────────────
with st.expander("⚙️ Preferences & Availability"):
wp = data.get("work_preferences", {})
sal = data.get("salary_expectations", {})
avail = data.get("availability", {})
col1, col2 = st.columns(2)
with col1:
salary_range = st.text_input("Salary Range (USD)", sal.get("salary_range_usd", ""),
key="pref_salary", help="e.g. 120000 - 180000")
notice = st.text_input("Notice Period", avail.get("notice_period", "2 weeks"), key="pref_notice")
with col2:
remote_work = st.checkbox("Open to Remote", value=wp.get("remote_work", "Yes") == "Yes", key="pref_remote")
relocation = st.checkbox("Open to Relocation", value=wp.get("open_to_relocation", "No") == "Yes", key="pref_reloc")
assessments = st.checkbox("Willing to complete assessments",
value=wp.get("willing_to_complete_assessments", "Yes") == "Yes", key="pref_assess")
bg_checks = st.checkbox("Willing to undergo background checks",
value=wp.get("willing_to_undergo_background_checks", "Yes") == "Yes", key="pref_bg")
drug_tests = st.checkbox("Willing to undergo drug tests",
value=wp.get("willing_to_undergo_drug_tests", "No") == "Yes", key="pref_drug")
# ── Self-ID ───────────────────────────────────────────────────────────────────
with st.expander("🏳️‍🌈 Self-Identification (optional)"):
sid = data.get("self_identification", {})
col1, col2 = st.columns(2)
with col1:
gender = st.text_input("Gender identity", sid.get("gender", "Non-binary"), key="sid_gender",
help="Select 'Non-binary' or 'Prefer not to say' when options allow")
pronouns = st.text_input("Pronouns", sid.get("pronouns", "Any"), key="sid_pronouns")
ethnicity = field("Ethnicity", sid.get("ethnicity", ""), "sid_ethnicity",
help="'Prefer not to say' is always an option")
with col2:
vet_options = ["No", "Yes", "Prefer not to say"]
veteran = st.selectbox("Veteran status", vet_options,
index=vet_options.index(sid.get("veteran", "No")), key="sid_vet")
dis_options = ["Prefer not to say", "No", "Yes"]
disability = st.selectbox("Disability disclosure", dis_options,
index=dis_options.index(sid.get("disability", "Prefer not to say")),
key="sid_dis")
st.divider()
# ── Save ──────────────────────────────────────────────────────────────────────
if st.button("💾 Save Resume Profile", type="primary", use_container_width=True):
data["personal_information"] = {
**data.get("personal_information", {}),
"name": name, "surname": surname, "email": email, "phone": phone,
"city": city, "zip_code": zip_code, "linkedin": linkedin, "date_of_birth": dob,
}
data["education_details"] = updated_edu
data["experience_details"] = updated_exp
data["salary_expectations"] = {"salary_range_usd": salary_range}
data["availability"] = {"notice_period": notice}
data["work_preferences"] = {
**data.get("work_preferences", {}),
"remote_work": "Yes" if remote_work else "No",
"open_to_relocation": "Yes" if relocation else "No",
"willing_to_complete_assessments": "Yes" if assessments else "No",
"willing_to_undergo_background_checks": "Yes" if bg_checks else "No",
"willing_to_undergo_drug_tests": "Yes" if drug_tests else "No",
}
data["self_identification"] = {
"gender": gender, "pronouns": pronouns, "veteran": veteran,
"disability": disability, "ethnicity": ethnicity,
}
RESUME_PATH.write_text(yaml.dump(data, default_flow_style=False, allow_unicode=True))
st.success("✅ Profile saved!")
st.balloons()

View file

@ -14,19 +14,28 @@ import streamlit as st
import streamlit.components.v1 as components
import yaml
from scripts.user_profile import UserProfile
_USER_YAML = Path(__file__).parent.parent.parent / "config" / "user.yaml"
_profile = UserProfile(_USER_YAML) if UserProfile.exists(_USER_YAML) else None
_name = _profile.name if _profile else "Job Seeker"
from scripts.db import (
DEFAULT_DB, init_db, get_jobs_by_status,
update_cover_letter, mark_applied, update_job_status,
get_task_for_job,
)
from scripts.task_runner import submit_task
from app.cloud_session import resolve_session, get_db_path
from app.telemetry import log_usage_event
DOCS_DIR = Path("/Library/Documents/JobSearch")
RESUME_YAML = Path(__file__).parent.parent.parent / "aihawk" / "data_folder" / "plain_text_resume.yaml"
DOCS_DIR = _profile.docs_dir if _profile else Path.home() / "Documents" / "JobSearch"
RESUME_YAML = Path(__file__).parent.parent.parent / "config" / "plain_text_resume.yaml"
st.title("🚀 Apply Workspace")
init_db(DEFAULT_DB)
resolve_session("peregrine")
init_db(get_db_path())
# ── PDF generation ─────────────────────────────────────────────────────────────
def _make_cover_letter_pdf(job: dict, cover_letter: str, output_dir: Path) -> Path:
@ -70,13 +79,16 @@ def _make_cover_letter_pdf(job: dict, cover_letter: str, output_dir: Path) -> Pa
textColor=dark, leading=16, spaceAfter=12, alignment=TA_LEFT,
)
display_name = _profile.name.upper() if _profile else "YOUR NAME"
contact_line = " · ".join(filter(None, [
_profile.email if _profile else "",
_profile.phone if _profile else "",
_profile.linkedin if _profile else "",
]))
story = [
Paragraph("ALEX RIVERA", name_style),
Paragraph(
"alex@example.com · (555) 867-5309 · "
"linkedin.com/in/AlexMcCann · hirealexmccann.site",
contact_style,
),
Paragraph(display_name, name_style),
Paragraph(contact_line, contact_style),
HRFlowable(width="100%", thickness=1, color=teal, spaceBefore=8, spaceAfter=0),
Paragraph(datetime.now().strftime("%B %d, %Y"), date_style),
]
@ -88,7 +100,7 @@ def _make_cover_letter_pdf(job: dict, cover_letter: str, output_dir: Path) -> Pa
story += [
Spacer(1, 6),
Paragraph("Warm regards,<br/><br/>Alex Rivera", body_style),
Paragraph(f"Warm regards,<br/><br/>{_profile.name if _profile else 'Your Name'}", body_style),
]
doc.build(story)
@ -96,7 +108,7 @@ def _make_cover_letter_pdf(job: dict, cover_letter: str, output_dir: Path) -> Pa
# ── Application Q&A helper ─────────────────────────────────────────────────────
def _answer_question(job: dict, question: str) -> str:
"""Call the LLM to answer an application question in Alex's voice.
"""Call the LLM to answer an application question in the user's voice.
Uses research_fallback_order (claude_code vllm ollama_research)
rather than the default cover-letter order the fine-tuned cover letter
@ -106,21 +118,22 @@ def _answer_question(job: dict, question: str) -> str:
router = LLMRouter()
fallback = router.config.get("research_fallback_order") or router.config.get("fallback_order")
description_snippet = (job.get("description") or "")[:1200].strip()
prompt = f"""You are answering job application questions for Alex Rivera, a customer success leader.
_persona_summary = (
_profile.career_summary[:200] if _profile and _profile.career_summary
else "a professional with experience in their field"
)
prompt = f"""You are answering job application questions for {_name}.
Background:
- 6+ years in customer success, technical account management, and CS leadership
- Most recent role: led Americas Customer Success at UpGuard (cybersecurity SaaS), NPS consistently 95
- Also founder of M3 Consulting, a CS advisory practice for SaaS startups
- Based in SF Bay Area; open to remote/hybrid; pronouns: any
{_persona_summary}
Role she's applying to: {job.get("title", "")} at {job.get("company", "")}
Role they're applying to: {job.get("title", "")} at {job.get("company", "")}
{f"Job description excerpt:{chr(10)}{description_snippet}" if description_snippet else ""}
Application Question:
{question}
Answer in Alex's voice — specific, warm, and confident. If the question specifies a word or character limit, respect it. Answer only the question with no preamble or sign-off."""
Answer in {_name}'s voice — specific, warm, and confident. If the question specifies a word or character limit, respect it. Answer only the question with no preamble or sign-off."""
return router.complete(prompt, fallback_order=fallback).strip()
@ -146,7 +159,7 @@ def _copy_btn(text: str, label: str = "📋 Copy", done: str = "✅ Copied!", he
)
# ── Job selection ──────────────────────────────────────────────────────────────
approved = get_jobs_by_status(DEFAULT_DB, "approved")
approved = get_jobs_by_status(get_db_path(), "approved")
if not approved:
st.info("No approved jobs — head to Job Review to approve some listings first.")
st.stop()
@ -209,17 +222,17 @@ with col_tools:
if _cl_key not in st.session_state:
st.session_state[_cl_key] = job.get("cover_letter") or ""
_cl_task = get_task_for_job(DEFAULT_DB, "cover_letter", selected_id)
_cl_task = get_task_for_job(get_db_path(), "cover_letter", selected_id)
_cl_running = _cl_task and _cl_task["status"] in ("queued", "running")
if st.button("✨ Generate / Regenerate", use_container_width=True, disabled=bool(_cl_running)):
submit_task(DEFAULT_DB, "cover_letter", selected_id)
submit_task(get_db_path(), "cover_letter", selected_id)
st.rerun()
if _cl_running:
@st.fragment(run_every=3)
def _cl_status_fragment():
t = get_task_for_job(DEFAULT_DB, "cover_letter", selected_id)
t = get_task_for_job(get_db_path(), "cover_letter", selected_id)
if t and t["status"] in ("queued", "running"):
lbl = "Queued…" if t["status"] == "queued" else "Generating via LLM…"
st.info(f"{lbl}")
@ -245,6 +258,32 @@ with col_tools:
label_visibility="collapsed",
)
# ── Iterative refinement ──────────────────────
if cl_text and not _cl_running:
with st.expander("✏️ Refine with Feedback"):
st.caption("Describe what to change. The current draft is passed to the LLM as context.")
_fb_key = f"fb_{selected_id}"
feedback_text = st.text_area(
"Feedback",
placeholder="e.g. Shorten the second paragraph and add a line about cross-functional leadership.",
height=80,
key=_fb_key,
label_visibility="collapsed",
)
if st.button("✨ Regenerate with Feedback", use_container_width=True,
disabled=not (feedback_text or "").strip(),
key=f"cl_refine_{selected_id}"):
import json as _json
submit_task(
get_db_path(), "cover_letter", selected_id,
params=_json.dumps({
"previous_result": cl_text,
"feedback": feedback_text.strip(),
}),
)
st.session_state.pop(_fb_key, None)
st.rerun()
# Copy + Save row
c1, c2 = st.columns(2)
with c1:
@ -252,7 +291,7 @@ with col_tools:
_copy_btn(cl_text, label="📋 Copy Letter")
with c2:
if st.button("💾 Save draft", use_container_width=True):
update_cover_letter(DEFAULT_DB, selected_id, cl_text)
update_cover_letter(get_db_path(), selected_id, cl_text)
st.success("Saved!")
# PDF generation
@ -261,8 +300,10 @@ with col_tools:
with st.spinner("Generating PDF…"):
try:
pdf_path = _make_cover_letter_pdf(job, cl_text, DOCS_DIR)
update_cover_letter(DEFAULT_DB, selected_id, cl_text)
update_cover_letter(get_db_path(), selected_id, cl_text)
st.success(f"Saved: `{pdf_path.name}`")
if user_id := st.session_state.get("user_id"):
log_usage_event(user_id, "peregrine", "cover_letter_generated")
except Exception as e:
st.error(f"PDF error: {e}")
@ -276,13 +317,15 @@ with col_tools:
with c4:
if st.button("✅ Mark as Applied", use_container_width=True, type="primary"):
if cl_text:
update_cover_letter(DEFAULT_DB, selected_id, cl_text)
mark_applied(DEFAULT_DB, [selected_id])
update_cover_letter(get_db_path(), selected_id, cl_text)
mark_applied(get_db_path(), [selected_id])
st.success("Marked as applied!")
if user_id := st.session_state.get("user_id"):
log_usage_event(user_id, "peregrine", "job_applied")
st.rerun()
if st.button("🚫 Reject listing", use_container_width=True):
update_job_status(DEFAULT_DB, [selected_id], "rejected")
update_job_status(get_db_path(), [selected_id], "rejected")
# Advance selectbox to next job so list doesn't snap to first item
current_idx = ids.index(selected_id) if selected_id in ids else 0
if current_idx + 1 < len(ids):
@ -346,7 +389,7 @@ with col_tools:
st.markdown("---")
else:
st.warning("Resume YAML not found — check that AIHawk is cloned.")
st.warning("Resume profile not found — complete setup or upload a resume in Settings → Resume Profile.")
# ── Application Q&A ───────────────────────────────────────────────────────
with st.expander("💬 Answer Application Questions"):

View file

@ -22,6 +22,12 @@ sys.path.insert(0, str(Path(__file__).parent.parent.parent))
import streamlit as st
from scripts.user_profile import UserProfile
_USER_YAML = Path(__file__).parent.parent.parent / "config" / "user.yaml"
_profile = UserProfile(_USER_YAML) if UserProfile.exists(_USER_YAML) else None
_name = _profile.name if _profile else "Job Seeker"
from scripts.db import (
DEFAULT_DB, init_db,
get_interview_jobs, advance_to_stage, reject_at_stage,
@ -186,19 +192,21 @@ def _email_modal(job: dict) -> None:
with st.spinner("Drafting…"):
try:
from scripts.llm_router import complete
_persona = (
f"{_name} is a {_profile.career_summary[:120] if _profile and _profile.career_summary else 'professional'}"
)
draft = complete(
prompt=(
f"Draft a professional, warm reply to this email.\n\n"
f"From: {last.get('from_addr', '')}\n"
f"Subject: {last.get('subject', '')}\n\n"
f"{last.get('body', '')}\n\n"
f"Context: Alex Rivera is a Customer Success / "
f"Technical Account Manager applying for "
f"Context: {_persona} applying for "
f"{job.get('title')} at {job.get('company')}."
),
system=(
"You are Alex Rivera's professional email assistant. "
"Write concise, warm, and professional replies in her voice. "
f"You are {_name}'s professional email assistant. "
"Write concise, warm, and professional replies in their voice. "
"Keep it to 35 sentences unless more is needed."
),
)

View file

@ -13,6 +13,12 @@ sys.path.insert(0, str(Path(__file__).parent.parent.parent))
import streamlit as st
from scripts.user_profile import UserProfile
_USER_YAML = Path(__file__).parent.parent.parent / "config" / "user.yaml"
_profile = UserProfile(_USER_YAML) if UserProfile.exists(_USER_YAML) else None
_name = _profile.name if _profile else "Job Seeker"
from scripts.db import (
DEFAULT_DB, init_db,
get_interview_jobs, get_contacts, get_research,
@ -231,7 +237,7 @@ with col_prep:
system=(
f"You are a recruiter at {job.get('company')} conducting "
f"a phone screen for the {job.get('title')} role. "
f"Ask one question at a time. After Alex answers, give "
f"Ask one question at a time. After {_name} answers, give "
f"brief feedback (12 sentences), then ask your next question. "
f"Be professional but warm."
),
@ -253,7 +259,7 @@ with col_prep:
"content": (
f"You are a recruiter at {job.get('company')} conducting "
f"a phone screen for the {job.get('title')} role. "
f"Ask one question at a time. After Alex answers, give "
f"Ask one question at a time. After {_name} answers, give "
f"brief feedback (12 sentences), then ask your next question."
),
}
@ -265,7 +271,7 @@ with col_prep:
router = LLMRouter()
# Build prompt from history for single-turn backends
convo = "\n\n".join(
f"{'Interviewer' if m['role'] == 'assistant' else 'Alex'}: {m['content']}"
f"{'Interviewer' if m['role'] == 'assistant' else _name}: {m['content']}"
for m in history
)
response = router.complete(
@ -331,12 +337,12 @@ with col_context:
f"From: {last.get('from_addr', '')}\n"
f"Subject: {last.get('subject', '')}\n\n"
f"{last.get('body', '')}\n\n"
f"Context: Alex is a CS/TAM professional applying "
f"Context: {_name} is a professional applying "
f"for {job.get('title')} at {job.get('company')}."
),
system=(
"You are Alex Rivera's professional email assistant. "
"Write concise, warm, and professional replies in her voice."
f"You are {_name}'s professional email assistant. "
"Write concise, warm, and professional replies in their voice."
),
)
st.session_state[f"draft_{selected_id}"] = draft

127
app/telemetry.py Normal file
View file

@ -0,0 +1,127 @@
# peregrine/app/telemetry.py
"""
Usage event telemetry for cloud-hosted Peregrine.
In local-first mode (CLOUD_MODE unset/false), all functions are no-ops
no network calls, no DB writes, no imports of psycopg2.
In cloud mode, events are written to the platform Postgres DB ONLY after
confirming the user's telemetry consent.
THE HARD RULE: if telemetry_consent.all_disabled is True for a user,
nothing is written, no exceptions. This function is the ONLY path to
usage_events no feature may write there directly.
"""
import os
import json
from typing import Any
CLOUD_MODE: bool = os.environ.get("CLOUD_MODE", "").lower() in ("1", "true", "yes")
PLATFORM_DB_URL: str = os.environ.get("PLATFORM_DB_URL", "")
_platform_conn = None
def get_platform_conn():
"""Lazy psycopg2 connection to the platform Postgres DB. Reconnects if closed."""
global _platform_conn
if _platform_conn is None or _platform_conn.closed:
import psycopg2
_platform_conn = psycopg2.connect(PLATFORM_DB_URL)
return _platform_conn
def get_consent(user_id: str) -> dict:
"""
Fetch telemetry consent for the user.
Returns safe defaults if record doesn't exist yet:
- usage_events_enabled: True (new cloud users start opted-in, per onboarding disclosure)
- all_disabled: False
"""
conn = get_platform_conn()
with conn.cursor() as cur:
cur.execute(
"SELECT all_disabled, usage_events_enabled "
"FROM telemetry_consent WHERE user_id = %s",
(user_id,)
)
row = cur.fetchone()
if row is None:
return {"all_disabled": False, "usage_events_enabled": True}
return {"all_disabled": row[0], "usage_events_enabled": row[1]}
def log_usage_event(
user_id: str,
app: str,
event_type: str,
metadata: dict[str, Any] | None = None,
) -> None:
"""
Write a usage event to the platform DB if consent allows.
Silent no-op in local mode. Silent no-op if telemetry is disabled.
Swallows all exceptions telemetry must never crash the app.
Args:
user_id: Directus user UUID (from st.session_state["user_id"])
app: App slug ('peregrine', 'falcon', etc.)
event_type: Snake_case event label ('cover_letter_generated', 'job_applied', etc.)
metadata: Optional JSON-serialisable dict NO PII
"""
if not CLOUD_MODE:
return
try:
consent = get_consent(user_id)
if consent.get("all_disabled") or not consent.get("usage_events_enabled", True):
return
conn = get_platform_conn()
with conn.cursor() as cur:
cur.execute(
"INSERT INTO usage_events (user_id, app, event_type, metadata) "
"VALUES (%s, %s, %s, %s)",
(user_id, app, event_type, json.dumps(metadata) if metadata else None),
)
conn.commit()
except Exception:
# Telemetry must never crash the app
pass
def update_consent(user_id: str, **fields) -> None:
"""
UPSERT telemetry consent for a user.
Accepted keyword args (all optional, any subset may be provided):
all_disabled: bool
usage_events_enabled: bool
content_sharing_enabled: bool
support_access_enabled: bool
Safe to call in cloud mode only no-op in local mode.
Swallows all exceptions so the Settings UI is never broken by a DB hiccup.
"""
if not CLOUD_MODE:
return
allowed = {"all_disabled", "usage_events_enabled", "content_sharing_enabled", "support_access_enabled"}
cols = {k: v for k, v in fields.items() if k in allowed}
if not cols:
return
try:
conn = get_platform_conn()
col_names = ", ".join(cols)
placeholders = ", ".join(["%s"] * len(cols))
set_clause = ", ".join(f"{k} = EXCLUDED.{k}" for k in cols)
col_vals = list(cols.values())
with conn.cursor() as cur:
cur.execute(
f"INSERT INTO telemetry_consent (user_id, {col_names}) "
f"VALUES (%s, {placeholders}) "
f"ON CONFLICT (user_id) DO UPDATE SET {set_clause}, updated_at = NOW()",
[user_id] + col_vals,
)
conn.commit()
except Exception:
pass

0
app/wizard/__init__.py Normal file
View file

View file

@ -0,0 +1,14 @@
"""Step 1 — Hardware detection and inference profile selection."""
PROFILES = ["remote", "cpu", "single-gpu", "dual-gpu"]
def validate(data: dict) -> list[str]:
"""Return list of validation errors. Empty list = step passes."""
errors = []
profile = data.get("inference_profile", "")
if not profile:
errors.append("Inference profile is required.")
elif profile not in PROFILES:
errors.append(f"Invalid inference profile '{profile}'. Choose: {', '.join(PROFILES)}.")
return errors

View file

@ -0,0 +1,13 @@
"""Step 3 — Identity (name, email, phone, linkedin, career_summary)."""
def validate(data: dict) -> list[str]:
"""Return list of validation errors. Empty list = step passes."""
errors = []
if not (data.get("name") or "").strip():
errors.append("Full name is required.")
if not (data.get("email") or "").strip():
errors.append("Email address is required.")
if not (data.get("career_summary") or "").strip():
errors.append("Career summary is required.")
return errors

View file

@ -0,0 +1,9 @@
"""Step 5 — LLM inference backend configuration and key entry."""
def validate(data: dict) -> list[str]:
"""Return list of validation errors. Empty list = step passes."""
errors = []
if not data.get("endpoint_confirmed"):
errors.append("At least one working LLM endpoint must be confirmed.")
return errors

View file

@ -0,0 +1,36 @@
"""Step 7 — Optional integrations (cloud storage, calendars, notifications).
This step is never mandatory validate() always returns [].
Helper functions support the wizard UI for tier-filtered integration cards.
"""
from __future__ import annotations
from pathlib import Path
def validate(data: dict) -> list[str]:
"""Integrations step is optional — never blocks Finish."""
return []
def get_available(tier: str) -> list[str]:
"""Return list of integration names available for the given tier.
An integration is available if the user's tier meets or exceeds the
integration's minimum required tier (as declared by cls.tier).
"""
from scripts.integrations import REGISTRY
from app.wizard.tiers import TIERS
available = []
for name, cls in REGISTRY.items():
try:
if TIERS.index(tier) >= TIERS.index(cls.tier):
available.append(name)
except ValueError:
pass # unknown tier string — skip
return available
def is_connected(name: str, config_dir: Path) -> bool:
"""Return True if a live config file exists for this integration."""
return (config_dir / "integrations" / f"{name}.yaml").exists()

10
app/wizard/step_resume.py Normal file
View file

@ -0,0 +1,10 @@
"""Step 4 — Resume (upload or guided form builder)."""
def validate(data: dict) -> list[str]:
"""Return list of validation errors. Empty list = step passes."""
errors = []
experience = data.get("experience") or []
if not experience:
errors.append("At least one work experience entry is required.")
return errors

13
app/wizard/step_search.py Normal file
View file

@ -0,0 +1,13 @@
"""Step 6 — Job search preferences (titles, locations, boards, keywords)."""
def validate(data: dict) -> list[str]:
"""Return list of validation errors. Empty list = step passes."""
errors = []
titles = data.get("job_titles") or []
locations = data.get("locations") or []
if not titles:
errors.append("At least one job title is required.")
if not locations:
errors.append("At least one location is required.")
return errors

13
app/wizard/step_tier.py Normal file
View file

@ -0,0 +1,13 @@
"""Step 2 — Tier selection (free / paid / premium)."""
from app.wizard.tiers import TIERS
def validate(data: dict) -> list[str]:
"""Return list of validation errors. Empty list = step passes."""
errors = []
tier = data.get("tier", "")
if not tier:
errors.append("Tier selection is required.")
elif tier not in TIERS:
errors.append(f"Invalid tier '{tier}'. Choose: {', '.join(TIERS)}.")
return errors

160
app/wizard/tiers.py Normal file
View file

@ -0,0 +1,160 @@
"""
Tier definitions and feature gates for Peregrine.
Tiers: free < paid < premium
FEATURES maps feature key minimum tier required.
Features not in FEATURES are available to all tiers (free).
BYOK policy
-----------
Features in BYOK_UNLOCKABLE are gated only because CircuitForge would otherwise
be providing the LLM compute. When a user has any configured LLM backend (local
ollama/vllm or their own API key), those features unlock regardless of tier.
Pass has_byok=has_configured_llm() to can_use() at call sites.
Features that stay gated even with BYOK:
- Integrations (Notion sync, calendars, etc.) infrastructure we run
- llm_keywords_blocklist orchestration pipeline over background keyword data
- email_classifier training pipeline, not a single LLM call
- shared_cover_writer_model our fine-tuned model weights
- model_fine_tuning GPU infrastructure
- multi_user account infrastructure
"""
from __future__ import annotations
from pathlib import Path
TIERS = ["free", "paid", "premium"]
# Maps feature key → minimum tier string required.
# Features absent from this dict are free (available to all).
FEATURES: dict[str, str] = {
# Wizard LLM generation — BYOK-unlockable (pure LLM calls)
"llm_career_summary": "paid",
"llm_expand_bullets": "paid",
"llm_suggest_skills": "paid",
"llm_voice_guidelines": "premium",
"llm_job_titles": "paid",
"llm_mission_notes": "paid",
# Orchestration — stays gated (background data pipeline, not just an LLM call)
"llm_keywords_blocklist": "paid",
# App features — BYOK-unlockable (pure LLM calls over job/profile data)
"company_research": "paid",
"interview_prep": "paid",
"survey_assistant": "paid",
# Orchestration / infrastructure — stays gated
"email_classifier": "paid",
"model_fine_tuning": "premium",
"shared_cover_writer_model": "paid",
"multi_user": "premium",
# Integrations — stays gated (infrastructure CircuitForge operates)
"notion_sync": "paid",
"google_sheets_sync": "paid",
"airtable_sync": "paid",
"google_calendar_sync": "paid",
"apple_calendar_sync": "paid",
"slack_notifications": "paid",
}
# Features that unlock when the user supplies any LLM backend (local or BYOK).
# These are pure LLM-call features — the only reason they're behind a tier is
# because CircuitForge would otherwise be providing the compute.
BYOK_UNLOCKABLE: frozenset[str] = frozenset({
"llm_career_summary",
"llm_expand_bullets",
"llm_suggest_skills",
"llm_voice_guidelines",
"llm_job_titles",
"llm_mission_notes",
"company_research",
"interview_prep",
"survey_assistant",
})
# Free integrations (not in FEATURES):
# google_drive_sync, dropbox_sync, onedrive_sync, mega_sync,
# nextcloud_sync, discord_notifications, home_assistant
_LLM_CFG = Path(__file__).parent.parent.parent / "config" / "llm.yaml"
def has_configured_llm(config_path: Path | None = None) -> bool:
"""Return True if at least one non-vision LLM backend is enabled in llm.yaml.
Local backends (ollama, vllm) count the policy is "you're providing the
compute", whether that's your own hardware or your own API key.
"""
import yaml
path = config_path or _LLM_CFG
try:
with open(path) as f:
cfg = yaml.safe_load(f) or {}
return any(
b.get("enabled", True) and b.get("type") != "vision_service"
for b in cfg.get("backends", {}).values()
)
except Exception:
return False
def can_use(tier: str, feature: str, has_byok: bool = False) -> bool:
"""Return True if the given tier has access to the feature.
has_byok: pass has_configured_llm() to unlock BYOK_UNLOCKABLE features
for users who supply their own LLM backend regardless of tier.
Returns True for unknown features (not gated).
Returns False for unknown/invalid tier strings.
"""
required = FEATURES.get(feature)
if required is None:
return True # not gated — available to all
if has_byok and feature in BYOK_UNLOCKABLE:
return True
try:
return TIERS.index(tier) >= TIERS.index(required)
except ValueError:
return False # invalid tier string
def tier_label(feature: str, has_byok: bool = False) -> str:
"""Return a display label for a locked feature, or '' if free/unlocked."""
if has_byok and feature in BYOK_UNLOCKABLE:
return ""
required = FEATURES.get(feature)
if required is None:
return ""
return "🔒 Paid" if required == "paid" else "⭐ Premium"
def effective_tier(
profile=None,
license_path=None,
public_key_path=None,
) -> str:
"""Return the effective tier for this installation.
Priority:
1. profile.dev_tier_override (developer mode override)
2. License JWT verification (offline RS256 check)
3. "free" (fallback)
license_path and public_key_path default to production paths when None.
Pass explicit paths in tests to avoid touching real files.
"""
if profile and getattr(profile, "dev_tier_override", None):
return profile.dev_tier_override
from scripts.license import effective_tier as _license_tier
from pathlib import Path as _Path
kwargs = {}
if license_path is not None:
kwargs["license_path"] = _Path(license_path)
if public_key_path is not None:
kwargs["public_key_path"] = _Path(public_key_path)
return _license_tier(**kwargs)

57
compose.cloud.yml Normal file
View file

@ -0,0 +1,57 @@
# compose.cloud.yml — Multi-tenant cloud stack for menagerie.circuitforge.tech/peregrine
#
# Each authenticated user gets their own encrypted SQLite data tree at
# /devl/menagerie-data/<user-id>/peregrine/
#
# Caddy injects the Directus session cookie as X-CF-Session header before forwarding.
# cloud_session.py resolves user_id → per-user db_path at session init.
#
# Usage:
# docker compose -f compose.cloud.yml --project-name peregrine-cloud up -d
# docker compose -f compose.cloud.yml --project-name peregrine-cloud down
# docker compose -f compose.cloud.yml --project-name peregrine-cloud logs app -f
services:
app:
build: .
container_name: peregrine-cloud
ports:
- "8505:8501"
volumes:
- /devl/menagerie-data:/devl/menagerie-data # per-user data trees
environment:
- CLOUD_MODE=true
- CLOUD_DATA_ROOT=/devl/menagerie-data
- DIRECTUS_JWT_SECRET=${DIRECTUS_JWT_SECRET}
- CF_SERVER_SECRET=${CF_SERVER_SECRET}
- PLATFORM_DB_URL=${PLATFORM_DB_URL}
- HEIMDALL_URL=${HEIMDALL_URL:-http://cf-license:8000}
- HEIMDALL_ADMIN_TOKEN=${HEIMDALL_ADMIN_TOKEN}
- STAGING_DB=/devl/menagerie-data/cloud-default.db # fallback only — never used
- DOCS_DIR=/tmp/cloud-docs
- STREAMLIT_SERVER_BASE_URL_PATH=peregrine
- PYTHONUNBUFFERED=1
- DEMO_MODE=false
depends_on:
searxng:
condition: service_healthy
extra_hosts:
- "host.docker.internal:host-gateway"
restart: unless-stopped
searxng:
image: searxng/searxng:latest
volumes:
- ./docker/searxng:/etc/searxng:ro
healthcheck:
test: ["CMD", "wget", "-q", "--spider", "http://localhost:8080/"]
interval: 10s
timeout: 5s
retries: 3
restart: unless-stopped
# No host port — internal only
networks:
default:
external: true
name: caddy-proxy_caddy-internal

52
compose.demo.yml Normal file
View file

@ -0,0 +1,52 @@
# compose.demo.yml — Public demo stack for demo.circuitforge.tech/peregrine
#
# Runs a fully isolated, neutered Peregrine instance:
# - DEMO_MODE=true: blocks all LLM inference in llm_router.py
# - demo/config/: pre-seeded demo user profile, all backends disabled
# - demo/data/: isolated SQLite DB (no personal job data)
# - No personal documents mounted
# - Port 8504 (separate from the personal instance on 8502)
#
# Usage:
# docker compose -f compose.demo.yml --project-name peregrine-demo up -d
# docker compose -f compose.demo.yml --project-name peregrine-demo down
#
# Caddy demo.circuitforge.tech/peregrine* → host port 8504
services:
app:
build: .
ports:
- "8504:8501"
volumes:
- ./demo/config:/app/config
- ./demo/data:/app/data
# No /docs mount — demo has no personal documents
environment:
- DEMO_MODE=true
- STAGING_DB=/app/data/staging.db
- DOCS_DIR=/tmp/demo-docs
- STREAMLIT_SERVER_BASE_URL_PATH=peregrine
- PYTHONUNBUFFERED=1
- PYTHONLOGGING=WARNING
# No API keys — inference is blocked by DEMO_MODE before any key is needed
depends_on:
searxng:
condition: service_healthy
extra_hosts:
- "host.docker.internal:host-gateway"
restart: unless-stopped
searxng:
image: searxng/searxng:latest
volumes:
- ./docker/searxng:/etc/searxng:ro
healthcheck:
test: ["CMD", "wget", "-q", "--spider", "http://localhost:8080/"]
interval: 10s
timeout: 5s
retries: 3
restart: unless-stopped
# No host port published — internal only; demo app uses it for job description enrichment
# (non-AI scraping is allowed; only LLM inference is blocked)

55
compose.gpu.yml Normal file
View file

@ -0,0 +1,55 @@
# compose.gpu.yml — Docker NVIDIA GPU overlay
#
# Adds NVIDIA GPU reservations to Peregrine services.
# Applied automatically by `make start PROFILE=single-gpu|dual-gpu` when Docker is detected.
# Manual: docker compose -f compose.yml -f compose.gpu.yml --profile single-gpu up -d
#
# Prerequisites:
# sudo nvidia-ctk runtime configure --runtime=docker
# sudo systemctl restart docker
#
services:
ollama:
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ["0"]
capabilities: [gpu]
ollama_research:
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ["1"]
capabilities: [gpu]
vision:
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ["0"]
capabilities: [gpu]
vllm:
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ["1"]
capabilities: [gpu]
finetune:
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ["0"]
capabilities: [gpu]

51
compose.podman-gpu.yml Normal file
View file

@ -0,0 +1,51 @@
# compose.podman-gpu.yml — Podman GPU override
#
# Replaces Docker-specific `driver: nvidia` reservations with CDI device specs
# for rootless Podman. Applied automatically via `make start PROFILE=single-gpu|dual-gpu`
# when podman/podman-compose is detected, or manually:
# podman-compose -f compose.yml -f compose.podman-gpu.yml --profile single-gpu up -d
#
# Prerequisites:
# sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
# (requires nvidia-container-toolkit >= 1.14)
#
services:
ollama:
devices:
- nvidia.com/gpu=0
deploy:
resources:
reservations:
devices: []
ollama_research:
devices:
- nvidia.com/gpu=1
deploy:
resources:
reservations:
devices: []
vision:
devices:
- nvidia.com/gpu=0
deploy:
resources:
reservations:
devices: []
vllm:
devices:
- nvidia.com/gpu=1
deploy:
resources:
reservations:
devices: []
finetune:
devices:
- nvidia.com/gpu=0
deploy:
resources:
reservations:
devices: []

127
compose.yml Normal file
View file

@ -0,0 +1,127 @@
# compose.yml — Peregrine by Circuit Forge LLC
# Profiles: remote | cpu | single-gpu | dual-gpu-ollama | dual-gpu-vllm | dual-gpu-mixed
services:
app:
build: .
command: >
bash -c "streamlit run app/app.py
--server.port=8501
--server.headless=true
--server.fileWatcherType=none
2>&1 | tee /app/data/.streamlit.log"
ports:
- "${STREAMLIT_PORT:-8501}:8501"
volumes:
- ./config:/app/config
- ./data:/app/data
- ${DOCS_DIR:-~/Documents/JobSearch}:/docs
- /var/run/docker.sock:/var/run/docker.sock
- /usr/bin/docker:/usr/bin/docker:ro
environment:
- STAGING_DB=/app/data/staging.db
- DOCS_DIR=/docs
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY:-}
- OPENAI_COMPAT_URL=${OPENAI_COMPAT_URL:-}
- OPENAI_COMPAT_KEY=${OPENAI_COMPAT_KEY:-}
- PEREGRINE_GPU_COUNT=${PEREGRINE_GPU_COUNT:-0}
- PEREGRINE_GPU_NAMES=${PEREGRINE_GPU_NAMES:-}
- RECOMMENDED_PROFILE=${RECOMMENDED_PROFILE:-remote}
- STREAMLIT_SERVER_BASE_URL_PATH=${STREAMLIT_BASE_URL_PATH:-}
- FORGEJO_API_TOKEN=${FORGEJO_API_TOKEN:-}
- FORGEJO_REPO=${FORGEJO_REPO:-}
- FORGEJO_API_URL=${FORGEJO_API_URL:-}
- PYTHONUNBUFFERED=1
- PYTHONLOGGING=WARNING
depends_on:
searxng:
condition: service_healthy
extra_hosts:
- "host.docker.internal:host-gateway"
restart: unless-stopped
searxng:
image: searxng/searxng:latest
ports:
- "${SEARXNG_PORT:-8888}:8080"
volumes:
- ./docker/searxng:/etc/searxng:ro
healthcheck:
test: ["CMD", "wget", "-q", "--spider", "http://localhost:8080/"]
interval: 10s
timeout: 5s
retries: 3
restart: unless-stopped
ollama:
image: ollama/ollama:latest
ports:
- "${OLLAMA_PORT:-11434}:11434"
volumes:
- ${OLLAMA_MODELS_DIR:-~/models/ollama}:/root/.ollama
- ./docker/ollama/entrypoint.sh:/entrypoint.sh
environment:
- OLLAMA_MODELS=/root/.ollama
- DEFAULT_OLLAMA_MODEL=${OLLAMA_DEFAULT_MODEL:-llama3.2:3b}
entrypoint: ["/bin/bash", "/entrypoint.sh"]
profiles: [cpu, single-gpu, dual-gpu-ollama, dual-gpu-vllm, dual-gpu-mixed]
restart: unless-stopped
ollama_research:
image: ollama/ollama:latest
ports:
- "${OLLAMA_RESEARCH_PORT:-11435}:11434"
volumes:
- ${OLLAMA_MODELS_DIR:-~/models/ollama}:/root/.ollama
- ./docker/ollama/entrypoint.sh:/entrypoint.sh
environment:
- OLLAMA_MODELS=/root/.ollama
- DEFAULT_OLLAMA_MODEL=${OLLAMA_RESEARCH_MODEL:-llama3.2:3b}
entrypoint: ["/bin/bash", "/entrypoint.sh"]
profiles: [dual-gpu-ollama, dual-gpu-mixed]
restart: unless-stopped
vision:
build:
context: .
dockerfile: scripts/vision_service/Dockerfile
ports:
- "${VISION_PORT:-8002}:8002"
environment:
- VISION_MODEL=${VISION_MODEL:-vikhyatk/moondream2}
- VISION_REVISION=${VISION_REVISION:-2025-01-09}
profiles: [single-gpu, dual-gpu-ollama, dual-gpu-vllm, dual-gpu-mixed]
restart: unless-stopped
vllm:
image: vllm/vllm-openai:latest
ports:
- "${VLLM_PORT:-8000}:8000"
volumes:
- ${VLLM_MODELS_DIR:-~/models/vllm}:/models
command: >
--model /models/${VLLM_MODEL:-Ouro-1.4B}
--trust-remote-code
--max-model-len 4096
--gpu-memory-utilization 0.75
--enforce-eager
--max-num-seqs 8
--cpu-offload-gb ${CPU_OFFLOAD_GB:-0}
profiles: [dual-gpu-vllm, dual-gpu-mixed]
restart: unless-stopped
finetune:
build:
context: .
dockerfile: Dockerfile.finetune
volumes:
- ${DOCS_DIR:-~/Documents/JobSearch}:/docs
- ${OLLAMA_MODELS_DIR:-~/models/ollama}:/ollama-models
- ./config:/app/config
environment:
- DOCS_DIR=/docs
- OLLAMA_URL=http://ollama:11434
- OLLAMA_MODELS_MOUNT=/ollama-models
- OLLAMA_MODELS_OLLAMA_PATH=/root/.ollama
profiles: [finetune]
restart: "no"

View file

@ -3,7 +3,8 @@
# Company name blocklist — partial case-insensitive match on the company field.
# e.g. "Amazon" blocks any listing where company contains "amazon".
companies: []
companies:
- jobgether
# Industry/content blocklist — blocked if company name OR job description contains any keyword.
# Use this for industries you will never work in regardless of company.

View file

@ -0,0 +1,3 @@
api_key: "patXXX..."
base_id: "appXXX..."
table_name: "Jobs"

View file

@ -0,0 +1,4 @@
caldav_url: "https://caldav.icloud.com/"
username: "you@icloud.com"
app_password: "xxxx-xxxx-xxxx-xxxx"
calendar_name: "Interviews"

View file

@ -0,0 +1 @@
webhook_url: "https://discord.com/api/webhooks/..."

View file

@ -0,0 +1,2 @@
access_token: "sl...."
folder_path: "/Peregrine"

View file

@ -0,0 +1,2 @@
calendar_id: "primary"
credentials_json: "~/credentials/google-calendar-sa.json"

View file

@ -0,0 +1,2 @@
folder_id: "your-google-drive-folder-id"
credentials_json: "~/credentials/google-drive-sa.json"

View file

@ -0,0 +1,3 @@
spreadsheet_id: "your-spreadsheet-id"
sheet_name: "Jobs"
credentials_json: "~/credentials/google-sheets-sa.json"

View file

@ -0,0 +1,3 @@
base_url: "http://homeassistant.local:8123"
token: "eyJ0eXAiOiJKV1Qi..."
notification_service: "notify.mobile_app_my_phone"

View file

@ -0,0 +1,3 @@
email: "you@example.com"
password: "your-mega-password"
folder_path: "/Peregrine"

View file

@ -0,0 +1,4 @@
host: "https://nextcloud.example.com"
username: "your-username"
password: "your-app-password"
folder_path: "/Peregrine"

View file

@ -0,0 +1,2 @@
token: "secret_..."
database_id: "32-character-notion-db-id"

View file

@ -0,0 +1,3 @@
client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
client_secret: "your-client-secret"
folder_path: "/Peregrine"

View file

@ -0,0 +1,2 @@
webhook_url: "https://hooks.slack.com/services/..."
channel: "#job-alerts"

View file

@ -3,48 +3,55 @@ backends:
api_key_env: ANTHROPIC_API_KEY
enabled: false
model: claude-sonnet-4-6
type: anthropic
supports_images: true
type: anthropic
claude_code:
api_key: any
base_url: http://localhost:3009/v1
enabled: false
model: claude-code-terminal
type: openai_compat
supports_images: true
type: openai_compat
github_copilot:
api_key: any
base_url: http://localhost:3010/v1
enabled: false
model: gpt-4o
type: openai_compat
supports_images: false
type: openai_compat
ollama:
api_key: ollama
base_url: http://localhost:11434/v1
base_url: http://host.docker.internal:11434/v1
enabled: true
model: alex-cover-writer:latest
type: openai_compat
model: llama3.2:3b
supports_images: false
type: openai_compat
ollama_research:
api_key: ollama
base_url: http://localhost:11434/v1
base_url: http://host.docker.internal:11434/v1
enabled: true
model: llama3.1:8b
type: openai_compat
model: llama3.2:3b
supports_images: false
type: openai_compat
vision_service:
base_url: http://host.docker.internal:8002
enabled: true
supports_images: true
type: vision_service
vllm:
api_key: ''
base_url: http://localhost:8000/v1
base_url: http://host.docker.internal:8000/v1
enabled: true
model: __auto__
type: openai_compat
supports_images: false
vision_service:
base_url: http://localhost:8002
enabled: false
type: vision_service
supports_images: true
type: openai_compat
vllm_research:
api_key: ''
base_url: http://host.docker.internal:8000/v1
enabled: true
model: __auto__
supports_images: false
type: openai_compat
fallback_order:
- ollama
- claude_code
@ -53,7 +60,7 @@ fallback_order:
- anthropic
research_fallback_order:
- claude_code
- vllm
- vllm_research
- ollama_research
- github_copilot
- anthropic
@ -61,6 +68,3 @@ vision_fallback_order:
- vision_service
- claude_code
- anthropic
# Note: 'ollama' (alex-cover-writer) intentionally excluded — research
# must never use the fine-tuned writer model, and this also avoids evicting
# the writer from GPU memory while a cover letter task is in flight.

View file

@ -21,21 +21,21 @@ backends:
supports_images: false
ollama:
api_key: ollama
base_url: http://localhost:11434/v1
base_url: http://ollama:11434/v1 # Docker service name; use localhost:11434 outside Docker
enabled: true
model: alex-cover-writer:latest
model: llama3.2:3b
type: openai_compat
supports_images: false
ollama_research:
api_key: ollama
base_url: http://localhost:11434/v1
base_url: http://ollama:11434/v1 # Docker service name; use localhost:11434 outside Docker
enabled: true
model: llama3.1:8b
model: llama3.2:3b
type: openai_compat
supports_images: false
vllm:
api_key: ''
base_url: http://localhost:8000/v1
base_url: http://vllm:8000/v1 # Docker service name; use localhost:8000 outside Docker
enabled: true
model: __auto__
type: openai_compat
@ -64,3 +64,14 @@ vision_fallback_order:
# Note: 'ollama' (alex-cover-writer) intentionally excluded — research
# must never use the fine-tuned writer model, and this also avoids evicting
# the writer from GPU memory while a cover letter task is in flight.
# ── Scheduler — LLM batch queue optimizer ─────────────────────────────────────
# The scheduler batches LLM tasks by model type to avoid GPU model switching.
# VRAM budgets are conservative peak estimates (GB) for each task type.
# Increase if your models are larger; decrease if tasks share GPU memory well.
scheduler:
vram_budgets:
cover_letter: 2.5 # alex-cover-writer:latest (~2GB GGUF + headroom)
company_research: 5.0 # llama3.1:8b or vllm model
wizard_generate: 2.5 # same model family as cover_letter
max_queue_depth: 500 # max pending tasks per type before drops (with logged warning)

View file

@ -1,4 +1,15 @@
profiles:
- boards:
- linkedin
- indeed
- glassdoor
- zip_recruiter
job_titles:
- Customer Service Specialist
locations:
- San Francisco CA
name: default
remote_only: false
- boards:
- linkedin
- indeed

View file

@ -0,0 +1,14 @@
# config/server.yaml — Peregrine deployment / server settings
# Copy to config/server.yaml and edit. Gitignored — do not commit.
# Changes require restarting Peregrine to take effect (./manage.sh restart).
# base_url_path: URL prefix when serving Peregrine behind a reverse proxy.
# Leave empty ("") for direct access (http://localhost:8502).
# Set to "peregrine" when proxied at https://example.com/peregrine.
# Maps to STREAMLIT_BASE_URL_PATH in .env → STREAMLIT_SERVER_BASE_URL_PATH
# in the container. See: https://docs.streamlit.io/develop/api-reference/configuration/config.toml#server
base_url_path: ""
# server_port: Port Streamlit listens on inside the container (usually 8501).
# The external/host port is set via STREAMLIT_PORT in .env.
server_port: 8501

View file

@ -0,0 +1,193 @@
# skills_suggestions.yaml — Bundled tag suggestions for the Skills & Keywords UI.
# Shown as searchable options in the multiselect. Users can add custom tags beyond these.
# Future: community aggregate (paid tier) will supplement this list from anonymised installs.
skills:
# ── Customer Success & Account Management ──
- Customer Success
- Technical Account Management
- Account Management
- Customer Onboarding
- Renewal Management
- Churn Prevention
- Expansion Revenue
- Executive Relationship Management
- Escalation Management
- QBR Facilitation
- Customer Advocacy
- Voice of the Customer
- Customer Health Scoring
- Success Planning
- Customer Education
- Implementation Management
# ── Revenue & Operations ──
- Revenue Operations
- Sales Operations
- Pipeline Management
- Forecasting
- Contract Negotiation
- Upsell & Cross-sell
- ARR / MRR Management
- NRR Optimization
- Quota Attainment
# ── Leadership & Management ──
- Team Leadership
- People Management
- Cross-functional Collaboration
- Change Management
- Stakeholder Management
- Executive Presentation
- Strategic Planning
- OKR Setting
- Hiring & Recruiting
- Coaching & Mentoring
- Performance Management
# ── Project & Program Management ──
- Project Management
- Program Management
- Agile / Scrum
- Kanban
- Risk Management
- Resource Planning
- Process Improvement
- SOP Development
# ── Technical Skills ──
- SQL
- Python
- Data Analysis
- Tableau
- Looker
- Power BI
- Excel / Google Sheets
- REST APIs
- Salesforce
- HubSpot
- Gainsight
- Totango
- ChurnZero
- Zendesk
- Intercom
- Jira
- Confluence
- Notion
- Slack
- Zoom
# ── Communications & Writing ──
- Executive Communication
- Technical Writing
- Proposal Writing
- Presentation Skills
- Public Speaking
- Stakeholder Communication
# ── Compliance & Security ──
- Compliance
- Risk Assessment
- SOC 2
- ISO 27001
- GDPR
- Security Awareness
- Vendor Management
domains:
# ── Software & Tech ──
- B2B SaaS
- Enterprise Software
- Cloud Infrastructure
- Developer Tools
- Cybersecurity
- Data & Analytics
- AI / ML Platform
- FinTech
- InsurTech
- LegalTech
- HR Tech
- MarTech
- AdTech
- DevOps / Platform Engineering
- Open Source
# ── Industry Verticals ──
- Healthcare / HealthTech
- Education / EdTech
- Non-profit / Social Impact
- Government / GovTech
- E-commerce / Retail
- Manufacturing
- Financial Services
- Media & Entertainment
- Music Industry
- Logistics & Supply Chain
- Real Estate / PropTech
- Energy / CleanTech
- Hospitality & Travel
# ── Market Segments ──
- Enterprise
- Mid-Market
- SMB / SME
- Startup
- Fortune 500
- Public Sector
- International / Global
# ── Business Models ──
- Subscription / SaaS
- Marketplace
- Usage-based Pricing
- Professional Services
- Self-serve / PLG
keywords:
# ── CS Metrics & Outcomes ──
- NPS
- CSAT
- CES
- Churn Rate
- Net Revenue Retention
- Gross Revenue Retention
- Logo Retention
- Time-to-Value
- Product Adoption
- Feature Utilisation
- Health Score
- Customer Lifetime Value
# ── Sales & Growth ──
- ARR
- MRR
- GRR
- NRR
- Expansion ARR
- Pipeline Coverage
- Win Rate
- Average Contract Value
- Land & Expand
- Multi-threading
# ── Process & Delivery ──
- Onboarding
- Implementation
- Knowledge Transfer
- Escalation
- SLA
- Root Cause Analysis
- Post-mortem
- Runbook
- Playbook Development
- Feedback Loop
- Product Roadmap Input
# ── Team & Culture ──
- Cross-functional
- Distributed Team
- Remote-first
- High-growth
- Fast-paced
- Autonomous
- Data-driven
- Customer-centric
- Empathetic Leadership
- Inclusive Culture
# ── Job-seeker Keywords ──
- Strategic
- Proactive
- Hands-on
- Scalable Processes
- Operational Excellence
- Business Impact
- Executive Visibility
- Player-Coach

66
config/user.yaml.example Normal file
View file

@ -0,0 +1,66 @@
# config/user.yaml.example
# Copy to config/user.yaml and fill in your details.
# The first-run wizard will create this file automatically.
name: "Your Name"
email: "you@example.com"
phone: "555-000-0000"
linkedin: "linkedin.com/in/yourprofile"
career_summary: >
Experienced professional with X years in [your field].
Specialise in [key skills]. Known for [strength].
nda_companies: [] # e.g. ["FormerEmployer"] — masked in research briefs
# Optional: industries you genuinely care about.
# When a company/JD matches an industry, the cover letter prompt injects
# your personal note so Para 3 can reflect authentic alignment.
# Leave a value empty ("") to use a sensible generic default.
mission_preferences:
music: "" # e.g. "I've played in bands for 15 years and care deeply about how artists get paid"
animal_welfare: "" # e.g. "I volunteer at my local shelter every weekend"
education: "" # e.g. "I tutored underserved kids for 3 years and care deeply about literacy"
social_impact: "" # e.g. "I want my work to reach people who need help most"
health: "" # e.g. "I care about people navigating rare or poorly-understood health conditions"
# Note: if left empty, Para 3 defaults to focusing on the people the company
# serves — not the industry. Fill in for a more personal connection.
# Optional: how you write and communicate. Used to shape cover letter voice.
# e.g. "Warm and direct. Cares about people first. Finds rare and complex situations fascinating."
candidate_voice: ""
# Set to true to include optional identity-related sections in research briefs.
# Both are for your personal decision-making only — never included in applications.
# Adds a disability inclusion & accessibility section (ADA, ERGs, WCAG signals).
candidate_accessibility_focus: false
# Adds an LGBTQIA+ inclusion section (ERGs, non-discrimination policies, culture signals).
candidate_lgbtq_focus: false
tier: free # free | paid | premium
dev_tier_override: null # overrides tier locally (for testing only)
wizard_complete: false
wizard_step: 0
dismissed_banners: []
docs_dir: "~/Documents/JobSearch"
ollama_models_dir: "~/models/ollama"
vllm_models_dir: "~/models/vllm"
inference_profile: "remote" # remote | cpu | single-gpu | dual-gpu
services:
streamlit_port: 8501
ollama_host: ollama # Docker service name; use "localhost" if running outside Docker
ollama_port: 11434
ollama_ssl: false
ollama_ssl_verify: true
vllm_host: vllm # Docker service name; use "localhost" if running outside Docker
vllm_port: 8000
vllm_ssl: false
vllm_ssl_verify: true
searxng_host: searxng # Docker service name; use "localhost" if running outside Docker
searxng_port: 8080 # internal Docker port; use 8888 for host-mapped access
searxng_ssl: false
searxng_ssl_verify: true

View file

@ -0,0 +1,8 @@
{"subject": "Interview Invitation — Senior Engineer", "body": "Hi Alex, we'd love to schedule a 30-min phone screen. Are you available Thursday at 2pm? Please reply to confirm.", "label": "interview_scheduled"}
{"subject": "Your application to Acme Corp", "body": "Thank you for your interest in the Senior Engineer role. After careful consideration, we have decided to move forward with other candidates whose experience more closely matches our current needs.", "label": "rejected"}
{"subject": "Offer Letter — Product Manager at Initech", "body": "Dear Alex, we are thrilled to extend an offer of employment for the Product Manager position. Please find the attached offer letter outlining compensation and start date.", "label": "offer_received"}
{"subject": "Quick question about your background", "body": "Hi Alex, I came across your profile and would love to connect. We have a few roles that seem like a great match. Would you be open to a brief chat this week?", "label": "positive_response"}
{"subject": "Company Culture Survey — Acme Corp", "body": "Alex, as part of our evaluation process, we invite all candidates to complete our culture fit assessment. The survey takes approximately 15 minutes. Please click the link below.", "label": "survey_received"}
{"subject": "Application Received — DataCo", "body": "Thank you for submitting your application for the Data Engineer role at DataCo. We have received your materials and will be in touch if your qualifications match our needs.", "label": "neutral"}
{"subject": "Following up on your application", "body": "Hi Alex, I wanted to follow up on your recent application. Your background looks interesting and we'd like to learn more. Can we set up a quick call?", "label": "positive_response"}
{"subject": "We're moving forward with other candidates", "body": "Dear Alex, thank you for taking the time to interview with us. After thoughtful consideration, we have decided not to move forward with your candidacy at this time.", "label": "rejected"}

68
demo/config/llm.yaml Normal file
View file

@ -0,0 +1,68 @@
# Demo LLM config — all backends disabled.
# DEMO_MODE=true in the environment blocks the router before any backend is tried,
# so these values are never actually used. Kept for schema completeness.
backends:
anthropic:
api_key_env: ANTHROPIC_API_KEY
enabled: false
model: claude-sonnet-4-6
supports_images: true
type: anthropic
claude_code:
api_key: any
base_url: http://localhost:3009/v1
enabled: false
model: claude-code-terminal
supports_images: true
type: openai_compat
github_copilot:
api_key: any
base_url: http://localhost:3010/v1
enabled: false
model: gpt-4o
supports_images: false
type: openai_compat
ollama:
api_key: ollama
base_url: http://localhost:11434/v1
enabled: false
model: llama3.2:3b
supports_images: false
type: openai_compat
ollama_research:
api_key: ollama
base_url: http://localhost:11434/v1
enabled: false
model: llama3.2:3b
supports_images: false
type: openai_compat
vision_service:
base_url: http://localhost:8002
enabled: false
supports_images: true
type: vision_service
vllm:
api_key: ''
base_url: http://localhost:8000/v1
enabled: false
model: __auto__
supports_images: false
type: openai_compat
vllm_research:
api_key: ''
base_url: http://localhost:8000/v1
enabled: false
model: __auto__
supports_images: false
type: openai_compat
fallback_order:
- ollama
- vllm
- anthropic
research_fallback_order:
- vllm_research
- ollama_research
- anthropic
vision_fallback_order:
- vision_service
- anthropic

44
demo/config/user.yaml Normal file
View file

@ -0,0 +1,44 @@
candidate_accessibility_focus: false
candidate_lgbtq_focus: false
candidate_voice: Clear, direct, and human. Focuses on impact over jargon.
career_summary: 'Experienced software engineer with a background in full-stack development,
cloud infrastructure, and data pipelines. Passionate about building tools that help
people navigate complex systems.
'
dev_tier_override: null
dismissed_banners:
- connect_cloud
- setup_email
docs_dir: /docs
email: demo@circuitforge.tech
inference_profile: remote
linkedin: ''
mission_preferences:
animal_welfare: ''
education: ''
health: ''
music: ''
social_impact: Want my work to reach people who need it most.
name: Demo User
nda_companies: []
ollama_models_dir: ~/models/ollama
phone: ''
services:
ollama_host: localhost
ollama_port: 11434
ollama_ssl: false
ollama_ssl_verify: true
searxng_host: searxng
searxng_port: 8080
searxng_ssl: false
searxng_ssl_verify: true
streamlit_port: 8501
vllm_host: localhost
vllm_port: 8000
vllm_ssl: false
vllm_ssl_verify: true
tier: free
vllm_models_dir: ~/models/vllm
wizard_complete: true
wizard_step: 0

0
demo/data/.gitkeep Normal file
View file

10
docker/ollama/entrypoint.sh Executable file
View file

@ -0,0 +1,10 @@
#!/usr/bin/env bash
# Start Ollama server and pull a default model if none are present
ollama serve &
sleep 5
if [ -z "$(ollama list 2>/dev/null | tail -n +2)" ]; then
MODEL="${DEFAULT_OLLAMA_MODEL:-llama3.2:3b}"
echo "No models found — pulling $MODEL..."
ollama pull "$MODEL"
fi
wait

View file

@ -0,0 +1,8 @@
use_default_settings: true
search:
formats:
- html
- json
server:
secret_key: "change-me-in-production"
bind_address: "0.0.0.0:8080"

0
docs/.gitkeep Normal file
View file

197
docs/backlog.md Normal file
View file

@ -0,0 +1,197 @@
# Peregrine — Feature Backlog
Unscheduled ideas and deferred features. Roughly grouped by area.
See also: `circuitforge-plans/shared/2026-03-07-launch-checklist.md` for pre-launch blockers
(legal docs, Stripe live keys, website deployment, demo DB ownership fix).
---
## Launch Blockers (tracked in shared launch checklist)
- **ToS + Refund Policy** — required before live Stripe charges. Files go in `website/content/legal/`.
- **Stripe live key rotation** — swap test keys to live in `website/.env` (zero code changes).
- **Website deployment to bastion** — Caddy route for Nuxt frontend at `circuitforge.tech`.
- **Demo DB ownership**`demo/data/staging.db` is root-owned (Docker artifact); fix with `sudo chown alan:alan` then re-run `demo/seed_demo.py`.
---
## Post-Launch / Infrastructure
- **Accessibility Statement** — WCAG 2.1 conformance doc at `website/content/legal/accessibility.md`. High credibility value for ND audience.
- **Data deletion request process** — published procedure at `website/content/legal/data-deletion.md` (GDPR/CCPA; references `privacy@circuitforge.tech`).
- **Uptime Kuma monitors** — 6 monitors need to be added manually (website, Heimdall, demo, Directus, Forgejo, Peregrine container health).
- **Directus admin password rotation** — change from `changeme-set-via-ui-on-first-run` before website goes public.
---
## Discovery — Community Scraper Plugin System
Design doc: `circuitforge-plans/peregrine/2026-03-07-community-scraper-plugin-design.md`
**Summary:** Add a `scripts/plugins/` directory with auto-discovery and a documented MIT-licensed
plugin API. Separates CF-built custom scrapers (paid, BSL 1.1, in `scripts/custom_boards/`) from
community-contributed and CF-freebie scrapers (free, MIT, in `scripts/plugins/`).
**Implementation tasks:**
- [ ] Add `scripts/plugins/` with `__init__.py`, `README.md`, and `example_plugin.py`
- [ ] Add `config/plugins/` directory with `.gitkeep`; gitignore `config/plugins/*.yaml` (not `.example`)
- [ ] Update `discover.py`: `load_plugins()` auto-discovery + tier gate (`custom_boards` = paid, `plugins` = free)
- [ ] Update `search_profiles.yaml` schema: add `plugins:` list + `plugin_config:` block
- [ ] Migrate `scripts/custom_boards/craigslist.py``scripts/plugins/craigslist.py` (CF freebie)
- [ ] Settings UI: render `CONFIG_SCHEMA` fields for installed plugins (Settings → Search)
- [ ] Rewrite `docs/developer-guide/adding-scrapers.md` to document the plugin API
- [ ] Add `scripts/plugins/LICENSE` (MIT) to make the dual-license split explicit
**CF freebie candidates** (future, after plugin system ships):
- Dice.com (tech-focused, no API key)
- We Work Remotely (remote-only, clean HTML)
- Wellfound / AngelList (startup roles)
---
## Discovery — Jobgether Non-Headless Scraper
Design doc: `peregrine/docs/superpowers/specs/2026-03-15-jobgether-integration-design.md`
**Background:** Headless Playwright is blocked by Cloudflare Turnstile on all `jobgether.com` pages.
A non-headless Playwright instance backed by `Xvfb` (virtual framebuffer) renders as a real browser and
bypasses Turnstile. Heimdall already has Xvfb available.
**Live-inspection findings (2026-03-15):**
- Search URL: `https://jobgether.com/search-offers?keyword=<query>`
- Job cards: `div.new-opportunity` — one per listing
- Card URL: `div.new-opportunity > a[href*="/offer/"]` (`href` attr)
- Title: `#offer-body h3`
- Company: `#offer-body p.font-medium`
- Dedup: existing URL-based dedup in `discover.py` covers Jobgether↔other-board overlap
**Implementation tasks (blocked until Xvfb-Playwright integration is in place):**
- [ ] Add `Xvfb` launch helper to `scripts/custom_boards/` (shared util, or inline in scraper)
- [ ] Implement `scripts/custom_boards/jobgether.py` using `p.chromium.launch(headless=False)` with `DISPLAY=:99`
- [ ] Pre-launch `Xvfb :99 -screen 0 1280x720x24` (or assert `DISPLAY` is already set)
- [ ] Register `jobgether` in `discover.py` `CUSTOM_SCRAPERS` (currently omitted — no viable scraper)
- [ ] Add `jobgether` to `custom_boards` in remote-eligible profiles in `config/search_profiles.yaml`
- [ ] Remove or update the "Jobgether discovery scraper — decided against" note in the design spec
**Pre-condition:** Validate Xvfb approach manually (headless=False + `DISPLAY=:99`) before implementing.
The `filter-api.jobgether.com` endpoint still requires auth and `robots.txt` still blocks bots —
confirm Turnstile acceptance is the only remaining blocker before beginning.
---
## Settings / Data Management
- **Backup / Restore / Teleport** — Settings panel option to export a full config snapshot (user.yaml + all gitignored configs) as a zip, restore from a snapshot, and "teleport" (export + import to a new machine or Docker volume). Useful for migrations, multi-machine setups, and safe wizard testing.
- **Complete Google Drive integration test()**`scripts/integrations/google_drive.py` `test()` currently only checks that the credentials file exists (TODO comment). Implement actual Google Drive API call using `google-api-python-client` to verify the token works.
---
## First-Run Wizard
- **Wire real LLM test in Step 5 (Inference)**`app/wizard/step_inference.py` validates an `endpoint_confirmed` boolean flag only. Replace with an actual LLM call: submit a minimal prompt to the configured endpoint, show pass/fail, and only set `endpoint_confirmed: true` on success. Should test whichever backend the user selected (Ollama, vLLM, Anthropic, etc.).
---
## LinkedIn Import
Shipped in v0.4.0. Ongoing maintenance and known decisions:
- **Selector maintenance** — LinkedIn changes their DOM periodically. When import stops working, update
CSS selectors in `scripts/linkedin_utils.py` only (all other files import from there). Real `data-section`
attribute values (as of 2025 DOM): `summary`, `currentPositionsDetails`, `educationsDetails`,
`certifications`, `posts`, `volunteering`, `publications`, `projects`.
- **Data export zip is the recommended path for full history** — LinkedIn's unauthenticated public profile
page is server-side degraded: experience titles, past roles, education, and skills are blurred/omitted.
Only available without login: name, About summary (truncated), current employer name, certifications.
The "Import from LinkedIn data export zip" expander (Settings → Resume Profile and Wizard step 3) is the
correct path for full career history. UI already shows an `` callout explaining this.
- **LinkedIn OAuth — decided: not viable** — LinkedIn's OAuth API is restricted to approved partner
programs. Even if approved, it only grants name + email (not career history, experience, or skills).
This is a deliberate LinkedIn platform restriction, not a technical gap. Do not pursue this path.
- **Selector test harness** (future) — A lightweight test that fetches a known-public LinkedIn profile
and asserts at least N fields non-empty would catch DOM breakage before users report it. Low priority
until selector breakage becomes a recurring support issue.
---
## Cover Letter / Resume Generation
- ~~**Iterative refinement feedback loop**~~ — ✅ Done (`94225c9`): `generate()` accepts `previous_result`/`feedback`; task_runner parses params JSON; Apply Workspace has "Refine with Feedback" expander. Same pattern available for wizard `expand_bullets` via `_run_wizard_generate`.
---
## Apply / Browser Integration
- **Browser autofill extension** — Chrome/Firefox extension that reads job application forms and auto-fills from the user's profile + generated cover letter; syncs submitted applications back into the pipeline automatically. (Phase 2 paid+ feature per business plan.)
---
## Ultra Tier — Managed Applications (White-Glove Service)
- **Concept** — A human-in-the-loop concierge tier where a trained operator submits applications on the user's behalf, powered by AI-generated artifacts (cover letter, company research, survey responses). AI handles ~80% of the work; operator handles form submission, CAPTCHAs, and complex custom questions.
- **Pricing model** — Per-application or bundle pricing rather than flat "X apps/month" — application complexity varies too much for flat pricing to be sustainable.
- **Operator interface** — Thin admin UI (separate from user-facing app) that reads from the same `staging.db`: shows candidate profile, job listing, generated cover letter, company brief, and a "Mark submitted" button. New job status `queued_for_operator` to represent the handoff.
- **Key unlock** — Browser autofill extension (above) becomes the operator's primary tool; pre-fills forms from profile + cover letter, operator reviews and submits.
- **Tier addition** — Add `"ultra"` to `TIERS` in `app/wizard/tiers.py`; gate `"managed_applications"` feature. The existing tier system is designed to accommodate this cleanly.
- **Quality / trust** — Each submission requires explicit per-job user approval before operator acts. Full audit trail (who submitted, when, what was sent). Clear ToS around representation.
- **Bootstrap strategy** — Waitlist + small trusted operator team initially to validate workflow before scaling or automating further. Don't build operator tooling until the manual flow is proven.
---
## Container Runtime
- ~~**Podman support**~~ — ✅ Done: `Makefile` auto-detects `docker compose` / `podman compose` / `podman-compose`; `compose.podman-gpu.yml` CDI override for GPU profiles; `setup.sh` detects existing Podman and skips Docker install.
- **FastAPI migration path** — When concurrent-user scale demands it: port Streamlit pages to FastAPI + React/HTMX, keep `scripts/` layer unchanged, replace daemon threads with Celery + Redis. The `scripts/` separation already makes this clean.
---
## Email Sync
See also: `docs/plans/email-sync-testing-checklist.md` for outstanding test coverage items.
---
## Circuit Forge LLC — Product Expansion ("Heinous Tasks" Platform)
The core insight: the Peregrine pipeline architecture (monitor → AI assist → human approval → execute) is domain-agnostic. Job searching is the proof-of-concept. The same pattern applies to any task that is high-stakes, repetitive, opaque, or just deeply unpleasant.
Each product ships as a **separate app** sharing the same underlying scaffold (pipeline engine, LLM router, background tasks, wizard, tier system, operator interface for Ultra tier). The business is Circuit Forge LLC; the brand positioning is: *"AI for the tasks you hate most."*
### Candidate products (rough priority order)
- **Falcon** — Government form assistance. Benefits applications, disability claims, FAFSA, immigration forms, small business permits. AI pre-fills from user profile, flags ambiguous questions, generates supporting statements. High value: mistakes here are costly and correction is slow.
- **Osprey** — Customer service queue management. Monitors hold queues, auto-navigates IVR trees via speech synthesis, escalates to human agent at the right moment, drafts complaint letters and dispute emails with the right tone and regulatory citations (CFPB, FCC, etc.). Tracks ticket status across cases.
- **Kestrel** — DMV / government appointment booking. Monitors appointment availability for DMV, passport offices, Social Security offices, USCIS biometrics, etc. Auto-books the moment a slot opens. Sends reminders with checklist of required documents.
- **Harrier** — Insurance navigation. Prior authorization tracking, claim dispute drafting, EOB reconciliation, appeal letters. High willingness-to-pay: a denied $50k claim is worth paying to fight.
- **Merlin** — Rental / housing applications. Monitors listings, auto-applies to matching properties, generates cover letters for competitive rental markets, tracks responses, flags lease red flags.
- **Ibis** — Healthcare coordination. The sacred ibis was the symbol of Thoth, Egyptian god of medicine — the name carries genuine medical heritage. Referral tracking, specialist waitlist monitoring, prescription renewal reminders, medical record request management, prior auth paper trails.
- **Tern** — Travel planning. The Arctic tern makes the longest migration of any animal (44,000 miles/year, pole to pole) — the ultimate traveler. Flight/hotel monitoring, itinerary generation, visa requirement research, travel insurance comparison, rebooking assistance on disruption.
- **Wren** — Contractor engagement. Wrens are legendary nest-builders — meticulous, structural, persistent. Contractor discovery, quote comparison, scope-of-work generation, milestone tracking, dispute documentation, lien waiver management.
- **Martin** — Car / home maintenance. The house martin nests on the exterior of buildings and returns to the same site every year to maintain it — almost too on-the-nose. Service scheduling, maintenance history tracking, recall monitoring, warranty tracking, finding trusted local providers.
### Shared architecture decisions
- **Separate repos, shared `circuitforge-core` package** — pipeline engine, LLM router, background task runner, wizard framework, tier system, operator interface all extracted into a private PyPI package that each product imports.
- **Same Docker Compose scaffold** — each product is a `compose.yml` away from deployment.
- **Same Ultra tier model** — operator interface reads from product's DB, human-in-the-loop for tasks that can't be automated (CAPTCHAs, phone calls, wet signatures).
- **Prove Peregrine first** — don't extract `circuitforge-core` until the second product is actively being built. Premature extraction is over-engineering.
### What makes this viable
- Each domain has the same pain profile: high-stakes, time-sensitive, opaque processes with inconsistent UX.
- Users are highly motivated to pay — the alternative is hours of their own time on hold or filling out forms.
- The human-in-the-loop (Ultra) model handles the hardest cases without requiring full automation.
- Regulatory moat: knowing which citations matter (CFPB for billing disputes, ADA for accommodation requests) is defensible knowledge that gets baked into prompts over time.
---

View file

@ -0,0 +1,249 @@
# Adding an Integration
Peregrine's integration system is auto-discovered — add a class and a config example, and it appears in the wizard and Settings automatically. No registration step is needed.
---
## Step 1 — Create the integration module
Create `scripts/integrations/myservice.py`:
```python
# scripts/integrations/myservice.py
from scripts.integrations.base import IntegrationBase
class MyServiceIntegration(IntegrationBase):
name = "myservice" # must be unique; matches config filename
label = "My Service" # display name shown in the UI
tier = "free" # "free" | "paid" | "premium"
def fields(self) -> list[dict]:
"""Return form field definitions for the connection card in the wizard/Settings UI."""
return [
{
"key": "api_key",
"label": "API Key",
"type": "password", # "text" | "password" | "url" | "checkbox"
"placeholder": "sk-...",
"required": True,
"help": "Get your key at myservice.com/settings/api",
},
{
"key": "workspace_id",
"label": "Workspace ID",
"type": "text",
"placeholder": "ws_abc123",
"required": True,
"help": "Found in your workspace URL",
},
]
def connect(self, config: dict) -> bool:
"""
Store credentials in memory. Return True if all required fields are present.
Does NOT verify credentials — call test() for that.
"""
self._api_key = config.get("api_key", "").strip()
self._workspace_id = config.get("workspace_id", "").strip()
return bool(self._api_key and self._workspace_id)
def test(self) -> bool:
"""
Verify the stored credentials actually work.
Returns True on success, False on any failure.
"""
try:
import requests
r = requests.get(
"https://api.myservice.com/v1/ping",
headers={"Authorization": f"Bearer {self._api_key}"},
params={"workspace": self._workspace_id},
timeout=5,
)
return r.ok
except Exception:
return False
def sync(self, jobs: list[dict]) -> int:
"""
Optional: push jobs to the external service.
Return the count of successfully synced jobs.
The default implementation in IntegrationBase returns 0 (no-op).
Only override this if your integration supports job syncing
(e.g. Notion, Airtable, Google Sheets).
"""
synced = 0
for job in jobs:
try:
self._push_job(job)
synced += 1
except Exception as e:
print(f"[myservice] sync error for job {job.get('id')}: {e}")
return synced
def _push_job(self, job: dict) -> None:
import requests
requests.post(
"https://api.myservice.com/v1/records",
headers={"Authorization": f"Bearer {self._api_key}"},
json={
"workspace": self._workspace_id,
"title": job.get("title", ""),
"company": job.get("company", ""),
"status": job.get("status", "pending"),
"url": job.get("url", ""),
},
timeout=10,
).raise_for_status()
```
---
## Step 2 — Create the config example file
Create `config/integrations/myservice.yaml.example`:
```yaml
# config/integrations/myservice.yaml.example
# Copy to config/integrations/myservice.yaml and fill in your credentials.
# This file is gitignored — never commit the live credentials.
api_key: ""
workspace_id: ""
```
The live credentials file (`config/integrations/myservice.yaml`) is gitignored automatically via the `config/integrations/` entry in `.gitignore`.
---
## Step 3 — Auto-discovery
No registration step is needed. The integration registry (`scripts/integrations/__init__.py`) imports all `.py` files in the `integrations/` directory and discovers subclasses of `IntegrationBase` automatically.
On next startup, `myservice` will appear in:
- The first-run wizard Step 7 (Integrations)
- **Settings → Integrations** with a connection card rendered from `fields()`
---
## Step 4 — Tier-gate new features (optional)
If you want to gate a specific action (not just the integration itself) behind a tier, add an entry to `app/wizard/tiers.py`:
```python
FEATURES: dict[str, str] = {
# ...existing entries...
"myservice_sync": "paid", # or "free" | "premium"
}
```
Then guard the action in the relevant UI page:
```python
from app.wizard.tiers import can_use
from scripts.user_profile import UserProfile
user = UserProfile()
if can_use(user.tier, "myservice_sync"):
# show the sync button
else:
st.info("MyService sync requires a Paid plan.")
```
---
## Step 5 — Write a test
Create or add to `tests/test_integrations.py`:
```python
# tests/test_integrations.py (add to existing file)
import pytest
from unittest.mock import patch, MagicMock
from pathlib import Path
from scripts.integrations.myservice import MyServiceIntegration
def test_fields_returns_required_keys():
integration = MyServiceIntegration()
fields = integration.fields()
assert len(fields) >= 1
for field in fields:
assert "key" in field
assert "label" in field
assert "type" in field
assert "required" in field
def test_connect_returns_true_with_valid_config():
integration = MyServiceIntegration()
result = integration.connect({"api_key": "sk-abc", "workspace_id": "ws-123"})
assert result is True
def test_connect_returns_false_with_missing_required_field():
integration = MyServiceIntegration()
result = integration.connect({"api_key": "", "workspace_id": "ws-123"})
assert result is False
def test_test_returns_true_on_200(tmp_path):
integration = MyServiceIntegration()
integration.connect({"api_key": "sk-abc", "workspace_id": "ws-123"})
mock_resp = MagicMock()
mock_resp.ok = True
with patch("scripts.integrations.myservice.requests.get", return_value=mock_resp):
assert integration.test() is True
def test_test_returns_false_on_error(tmp_path):
integration = MyServiceIntegration()
integration.connect({"api_key": "sk-abc", "workspace_id": "ws-123"})
with patch("scripts.integrations.myservice.requests.get", side_effect=Exception("timeout")):
assert integration.test() is False
def test_is_configured_reflects_file_presence(tmp_path):
config_dir = tmp_path / "config"
config_dir.mkdir()
(config_dir / "integrations").mkdir()
assert MyServiceIntegration.is_configured(config_dir) is False
(config_dir / "integrations" / "myservice.yaml").write_text("api_key: sk-abc\n")
assert MyServiceIntegration.is_configured(config_dir) is True
```
---
## IntegrationBase Reference
All integrations inherit from `scripts/integrations/base.py`. Here is the full interface:
| Method / attribute | Required | Description |
|-------------------|----------|-------------|
| `name: str` | Yes | Machine key — must be unique. Matches the YAML config filename. |
| `label: str` | Yes | Human-readable display name for the UI. |
| `tier: str` | Yes | Minimum tier: `"free"`, `"paid"`, or `"premium"`. |
| `fields() -> list[dict]` | Yes | Returns form field definitions. Each dict: `key`, `label`, `type`, `placeholder`, `required`, `help`. |
| `connect(config: dict) -> bool` | Yes | Stores credentials in memory. Returns `True` if required fields are present. Does NOT verify credentials. |
| `test() -> bool` | Yes | Makes a real network call to verify stored credentials. Returns `True` on success. |
| `sync(jobs: list[dict]) -> int` | No | Pushes jobs to the external service. Returns count synced. Default is a no-op returning 0. |
| `config_path(config_dir: Path) -> Path` | Inherited | Returns `config_dir / "integrations" / f"{name}.yaml"`. |
| `is_configured(config_dir: Path) -> bool` | Inherited | Returns `True` if the config YAML file exists. |
| `save_config(config: dict, config_dir: Path)` | Inherited | Writes config dict to the YAML file. Call after `test()` returns `True`. |
| `load_config(config_dir: Path) -> dict` | Inherited | Loads and returns the YAML config, or `{}` if not configured. |
### Field type values
| `type` value | UI widget rendered |
|-------------|-------------------|
| `"text"` | Plain text input |
| `"password"` | Password input (masked) |
| `"url"` | URL input |
| `"checkbox"` | Boolean checkbox |

View file

@ -0,0 +1,244 @@
# Adding a Custom Job Board Scraper
Peregrine supports pluggable custom job board scrapers. Standard boards use the JobSpy library. Custom scrapers handle boards with non-standard APIs, paywalls, or SSR-rendered pages.
This guide walks through adding a new scraper from scratch.
---
## Step 1 — Create the scraper module
Create `scripts/custom_boards/myboard.py`. Every custom scraper must implement one function:
```python
# scripts/custom_boards/myboard.py
def scrape(profile: dict, db_path: str) -> list[dict]:
"""
Scrape job listings from MyBoard for the given search profile.
Args:
profile: The active search profile dict from search_profiles.yaml.
Keys include: titles (list), locations (list),
hours_old (int), results_per_board (int).
db_path: Absolute path to staging.db. Use this if you need to
check for existing URLs before returning.
Returns:
List of job dicts. Each dict must contain at minimum:
title (str) — job title
company (str) — company name
url (str) — canonical job URL (used as unique key)
source (str) — board identifier, e.g. "myboard"
location (str) — "Remote" or "City, State"
is_remote (bool) — True if remote
salary (str) — salary string or "" if unknown
description (str) — full job description text or "" if unavailable
date_found (str) — ISO 8601 datetime string, e.g. "2026-02-25T12:00:00"
"""
jobs = []
for title in profile.get("titles", []):
for location in profile.get("locations", []):
results = _fetch_from_myboard(title, location, profile)
jobs.extend(results)
return jobs
def _fetch_from_myboard(title: str, location: str, profile: dict) -> list[dict]:
"""Internal helper — call the board's API and transform results."""
import requests
from datetime import datetime
params = {
"q": title,
"l": location,
"limit": profile.get("results_per_board", 50),
}
try:
resp = requests.get(
"https://api.myboard.com/jobs",
params=params,
timeout=15,
)
resp.raise_for_status()
data = resp.json()
except Exception as e:
print(f"[myboard] fetch error: {e}")
return []
jobs = []
for item in data.get("results", []):
jobs.append({
"title": item.get("title", ""),
"company": item.get("company", ""),
"url": item.get("url", ""),
"source": "myboard",
"location": item.get("location", ""),
"is_remote": "remote" in item.get("location", "").lower(),
"salary": item.get("salary", ""),
"description": item.get("description", ""),
"date_found": datetime.utcnow().isoformat(),
})
return jobs
```
### Required fields
| Field | Type | Notes |
|-------|------|-------|
| `title` | str | Job title |
| `company` | str | Company name |
| `url` | str | **Unique key** — must be stable and canonical |
| `source` | str | Short board identifier, e.g. `"myboard"` |
| `location` | str | `"Remote"` or `"City, ST"` |
| `is_remote` | bool | `True` if remote |
| `salary` | str | Salary string or `""` |
| `description` | str | Full description text or `""` |
| `date_found` | str | ISO 8601 UTC datetime |
### Deduplication
`discover.py` deduplicates by `url` before inserting into the database. If a job with the same URL already exists, it is silently skipped. You do not need to handle deduplication inside your scraper.
### Rate limiting
Be a good citizen:
- Add a `time.sleep(0.5)` between paginated requests
- Respect `Retry-After` headers
- Do not scrape faster than a human browsing the site
- If the site provides an official API, prefer that over scraping HTML
### Credentials
If your scraper requires API keys or credentials:
- Create `config/myboard.yaml.example` as a template
- Create `config/myboard.yaml` (gitignored) for live credentials
- Read it in your scraper with `yaml.safe_load(open("config/myboard.yaml"))`
- Document the credential setup in comments at the top of your module
---
## Step 2 — Register the scraper
Open `scripts/discover.py` and add your scraper to the `CUSTOM_SCRAPERS` dict:
```python
from scripts.custom_boards import adzuna, theladders, craigslist, myboard
CUSTOM_SCRAPERS = {
"adzuna": adzuna.scrape,
"theladders": theladders.scrape,
"craigslist": craigslist.scrape,
"myboard": myboard.scrape, # add this line
}
```
---
## Step 3 — Activate in a search profile
Open `config/search_profiles.yaml` and add `myboard` to `custom_boards` in any profile:
```yaml
profiles:
- name: cs_leadership
boards:
- linkedin
- indeed
custom_boards:
- adzuna
- myboard # add this line
titles:
- Customer Success Manager
locations:
- Remote
```
---
## Step 4 — Write a test
Create `tests/test_myboard.py`. Mock the HTTP call to avoid hitting the live API during tests:
```python
# tests/test_myboard.py
from unittest.mock import patch
from scripts.custom_boards.myboard import scrape
MOCK_RESPONSE = {
"results": [
{
"title": "Customer Success Manager",
"company": "Acme Corp",
"url": "https://myboard.com/jobs/12345",
"location": "Remote",
"salary": "$80,000 - $100,000",
"description": "We are looking for a CSM...",
}
]
}
def test_scrape_returns_correct_shape():
profile = {
"titles": ["Customer Success Manager"],
"locations": ["Remote"],
"results_per_board": 10,
"hours_old": 240,
}
with patch("scripts.custom_boards.myboard.requests.get") as mock_get:
mock_get.return_value.ok = True
mock_get.return_value.raise_for_status = lambda: None
mock_get.return_value.json.return_value = MOCK_RESPONSE
jobs = scrape(profile, db_path="nonexistent.db")
assert len(jobs) == 1
job = jobs[0]
# Required fields
assert "title" in job
assert "company" in job
assert "url" in job
assert "source" in job
assert "location" in job
assert "is_remote" in job
assert "salary" in job
assert "description" in job
assert "date_found" in job
assert job["source"] == "myboard"
assert job["title"] == "Customer Success Manager"
assert job["url"] == "https://myboard.com/jobs/12345"
def test_scrape_handles_http_error_gracefully():
profile = {
"titles": ["Customer Success Manager"],
"locations": ["Remote"],
"results_per_board": 10,
"hours_old": 240,
}
with patch("scripts.custom_boards.myboard.requests.get") as mock_get:
mock_get.side_effect = Exception("Connection refused")
jobs = scrape(profile, db_path="nonexistent.db")
assert jobs == []
```
---
## Existing Scrapers as Reference
| Scraper | Notes |
|---------|-------|
| `scripts/custom_boards/adzuna.py` | REST API with `app_id` + `app_key` authentication |
| `scripts/custom_boards/theladders.py` | SSR scraper using `curl_cffi` to parse `__NEXT_DATA__` JSON embedded in the page |
| `scripts/custom_boards/craigslist.py` | RSS feed scraper |

View file

@ -0,0 +1,286 @@
# Architecture
This page describes Peregrine's system structure, layer boundaries, and key design decisions.
---
## System Overview
### Pipeline
```mermaid
flowchart LR
sources["JobSpy\nCustom Boards"]
discover["discover.py"]
db[("staging.db\nSQLite")]
match["match.py\nScoring"]
review["Job Review\nApprove / Reject"]
apply["Apply Workspace\nCover letter + PDF"]
kanban["Interviews\nphone_screen → hired"]
sync["sync.py"]
notion["Notion DB"]
sources --> discover --> db --> match --> review --> apply --> kanban
db --> sync --> notion
```
### Docker Compose Services
Three compose files serve different deployment contexts:
| File | Project name | Port | Purpose |
|------|-------------|------|---------|
| `compose.yml` | `peregrine` | 8502 | Local self-hosted install (default) |
| `compose.demo.yml` | `peregrine-demo` | 8504 | Public demo at `demo.circuitforge.tech/peregrine``DEMO_MODE=true`, no LLM |
| `compose.cloud.yml` | `peregrine-cloud` | 8505 | Cloud managed instance at `menagerie.circuitforge.tech/peregrine``CLOUD_MODE=true`, per-user data |
```mermaid
flowchart TB
subgraph local["compose.yml (local)"]
app_l["**app** :8502\nStreamlit UI"]
ollama_l["**ollama**\nLocal LLM"]
vllm_l["**vllm**\nvLLM"]
vision_l["**vision**\nMoondream2"]
searxng_l["**searxng**\nWeb Search"]
db_l[("staging.db\nSQLite")]
end
subgraph cloud["compose.cloud.yml (cloud)"]
app_c["**app** :8505\nStreamlit UI\nCLOUD_MODE=true"]
searxng_c["**searxng**\nWeb Search"]
db_c[("menagerie-data/\n&lt;user-id&gt;/staging.db\nSQLCipher")]
pg[("Postgres\nplatform DB\n:5433")]
end
```
Solid lines = always connected. Dashed lines = optional/profile-dependent backends.
### Streamlit App Layer
```mermaid
flowchart TD
entry["app/app.py\nEntry point · navigation · sidebar task badge"]
setup["0_Setup.py\nFirst-run wizard\n⚠ Gates everything"]
review["1_Job_Review.py\nApprove / reject queue"]
settings["2_Settings.py\nAll user configuration"]
apply["4_Apply.py\nCover letter gen + PDF export"]
interviews["5_Interviews.py\nKanban: phone_screen → hired"]
prep["6_Interview_Prep.py\nResearch brief + practice Q&A"]
survey["7_Survey.py\nCulture-fit survey assistant"]
wizard["app/wizard/\nstep_hardware.py … step_integrations.py\ntiers.py — feature gate definitions"]
entry --> setup
entry --> review
entry --> settings
entry --> apply
entry --> interviews
entry --> prep
entry --> survey
setup <-.->|wizard steps| wizard
```
### Scripts Layer
Framework-independent — no Streamlit imports. Can be called from CLI, FastAPI, or background threads.
| Script | Purpose |
|--------|---------|
| `discover.py` | JobSpy + custom board orchestration |
| `match.py` | Resume keyword scoring |
| `db.py` | All SQLite helpers (single source of truth) |
| `llm_router.py` | LLM fallback chain |
| `generate_cover_letter.py` | Cover letter generation |
| `company_research.py` | Pre-interview research brief |
| `task_runner.py` | Background daemon thread executor |
| `imap_sync.py` | IMAP email fetch + classify |
| `sync.py` | Push to external integrations |
| `user_profile.py` | `UserProfile` wrapper for `user.yaml` |
| `preflight.py` | Port + resource check |
| `custom_boards/` | Per-board scrapers |
| `integrations/` | Per-service integration drivers |
| `vision_service/` | FastAPI Moondream2 inference server |
### Config Layer
Plain YAML files. Gitignored files contain secrets; `.example` files are committed as templates.
| File | Purpose |
|------|---------|
| `config/user.yaml` | Personal data + wizard state |
| `config/llm.yaml` | LLM backends + fallback chains |
| `config/search_profiles.yaml` | Job search configuration |
| `config/resume_keywords.yaml` | Scoring keywords |
| `config/blocklist.yaml` | Excluded companies/domains |
| `config/email.yaml` | IMAP credentials |
| `config/integrations/` | Per-integration credentials |
### Database Layer
**Local mode** — `staging.db`: SQLite, single file, gitignored.
**Cloud mode** — Hybrid:
- **Postgres (platform layer):** account data, subscriptions, telemetry consent. Shared across all users.
- **SQLite-per-user (content layer):** each user's job data in an isolated, SQLCipher-encrypted file at `/devl/menagerie-data/<user-id>/peregrine/staging.db`. Schema is identical to local — the app sees no difference.
#### Local SQLite tables
| Table | Purpose |
|-------|---------|
| `jobs` | Core pipeline — all job data |
| `job_contacts` | Email thread log per job |
| `company_research` | LLM-generated research briefs |
| `background_tasks` | Async task queue state |
| `survey_responses` | Culture-fit survey Q&A pairs |
#### Postgres platform tables (cloud only)
| Table | Purpose |
|-------|---------|
| `subscriptions` | User tier, license JWT, product |
| `usage_events` | Anonymous usage telemetry (consent-gated) |
| `telemetry_consent` | Per-user telemetry preferences + hard kill switch |
| `support_access_grants` | Time-limited support session grants |
---
### Cloud Session Middleware
`app/cloud_session.py` handles multi-tenant routing transparently:
```
Request → Caddy injects X-CF-Session header (from Directus session cookie)
→ resolve_session() validates JWT, derives db_path + db_key
→ all DB calls use get_db_path() instead of DEFAULT_DB
```
Key functions:
| Function | Purpose |
|----------|---------|
| `resolve_session(app)` | Called at top of every page — no-op in local mode |
| `get_db_path()` | Returns per-user `db_path` (cloud) or `DEFAULT_DB` (local) |
| `derive_db_key(user_id)` | `HMAC(SERVER_SECRET, user_id)` — deterministic per-user SQLCipher key |
The app code never branches on `CLOUD_MODE` except at the entry points (`resolve_session` and `get_db_path`). Everything downstream is transparent.
### Telemetry (cloud only)
`app/telemetry.py` is the **only** path to the `usage_events` table. No feature may write there directly.
```python
from app.telemetry import log_usage_event
log_usage_event(user_id, "peregrine", "cover_letter_generated", {"words": 350})
```
- Complete no-op when `CLOUD_MODE=false`
- Checks `telemetry_consent.all_disabled` first — if set, nothing is written, no exceptions
- Swallows all exceptions so telemetry never crashes the app
---
## Layer Boundaries
### App layer (app/)
The Streamlit UI layer. Its only responsibilities are:
- Reading from `scripts/db.py` helpers
- Calling `scripts/` functions directly or via `task_runner.submit_task()`
- Rendering results to the browser
The app layer does not contain business logic. Database queries, LLM calls, and integrations all live in `scripts/`.
### Scripts layer (scripts/)
This is the stable public API of Peregrine. Scripts are designed to be framework-independent — they do not import Streamlit and can be called from a CLI, FastAPI endpoint, or background thread without modification.
All personal data access goes through `scripts/user_profile.py` (`UserProfile` class). Scripts never read `config/user.yaml` directly.
All database access goes through `scripts/db.py`. No script does raw SQLite outside of `db.py`.
### Config layer (config/)
Plain YAML files. Gitignored files contain secrets; `.example` files are committed as templates.
---
## Background Tasks
`scripts/task_runner.py` provides a simple background thread executor for long-running LLM tasks.
```python
from scripts.task_runner import submit_task
# Queue a cover letter generation task
submit_task(db_path, task_type="cover_letter", job_id=42)
# Queue a company research task
submit_task(db_path, task_type="company_research", job_id=42)
```
Tasks are recorded in the `background_tasks` table with the following state machine:
```mermaid
stateDiagram-v2
[*] --> queued : submit_task()
queued --> running : daemon picks up
running --> completed
running --> failed
queued --> failed : server restart clears stuck tasks
completed --> [*]
failed --> [*]
```
**Dedup rule:** Only one `queued` or `running` task per `(task_type, job_id)` pair is allowed at a time. Submitting a duplicate is a silent no-op.
**On startup:** `app/app.py` resets any `running` or `queued` rows to `failed` to clear tasks that were interrupted by a server restart.
**Sidebar indicator:** `app/app.py` polls the `background_tasks` table every 3 seconds via a Streamlit fragment and displays a badge in the sidebar.
---
## LLM Router
`scripts/llm_router.py` provides a single `complete()` call that tries backends in priority order and falls back transparently. See [LLM Router](../reference/llm-router.md) for full documentation.
---
## Key Design Decisions
### scripts/ is framework-independent
The scripts layer was deliberately kept free of Streamlit imports. This means the full pipeline can be migrated to a FastAPI or Celery backend without rewriting business logic.
### All personal data via UserProfile
`scripts/user_profile.py` is the single source of truth for all user data. This makes it easy to swap the storage backend (e.g. from YAML to a database) without touching every script.
### SQLite as staging layer
`staging.db` acts as the staging layer between discovery and external integrations. This lets discovery, matching, and the UI all run independently without network dependencies. External integrations (Notion, Airtable, etc.) are push-only and optional.
### Tier system in app/wizard/tiers.py
`FEATURES` is a single dict that maps feature key → minimum tier. `can_use(tier, feature)` is the single gating function. New features are added to `FEATURES` in one place.
### Vision service is a separate process
Moondream2 requires `torch` and `transformers`, which are incompatible with the lightweight main conda environment. The vision service runs as a separate FastAPI process in a separate conda environment (`job-seeker-vision`), keeping the main env free of GPU dependencies.
### Cloud mode is a transparent layer, not a fork
`CLOUD_MODE=true` activates two entry points (`resolve_session`, `get_db_path`) and the telemetry middleware. Every other line of app code is unchanged. There is no cloud branch, no conditional imports, no schema divergence. The local-first architecture is preserved end-to-end; the cloud layer sits on top of it.
### SQLite-per-user instead of shared Postgres
Each cloud user gets their own encrypted SQLite file. This means:
- No SQL migrations when the schema changes — new users get the latest schema, existing users keep their file as-is
- Zero risk of cross-user data leakage at the DB layer
- GDPR deletion is `rm -rf /devl/menagerie-data/<user-id>/` — auditable and complete
- The app can be tested locally with `CLOUD_MODE=false` without any Postgres dependency
The Postgres platform DB holds only account metadata (subscriptions, consent, telemetry) — never job search content.

View file

@ -0,0 +1,198 @@
# Cloud Deployment
This page covers operating the Peregrine cloud managed instance at `menagerie.circuitforge.tech/peregrine`.
---
## Architecture Overview
```
Browser → Caddy (bastion) → host:8505 → peregrine-cloud container
┌─────────────────────────┼──────────────────────────┐
│ │ │
cloud_session.py /devl/menagerie-data/ Postgres :5433
(session routing) <user-id>/peregrine/ (platform DB)
staging.db (SQLCipher)
```
Caddy injects the Directus session cookie as `X-CF-Session`. `cloud_session.py` validates the JWT, derives the per-user db path and SQLCipher key, and injects both into `st.session_state`. All downstream DB calls are transparent — the app never knows it's multi-tenant.
---
## Compose File
```bash
# Start
docker compose -f compose.cloud.yml --project-name peregrine-cloud --env-file .env up -d
# Stop
docker compose -f compose.cloud.yml --project-name peregrine-cloud down
# Logs
docker compose -f compose.cloud.yml --project-name peregrine-cloud logs app -f
# Rebuild after code changes
docker compose -f compose.cloud.yml --project-name peregrine-cloud build app
docker compose -f compose.cloud.yml --project-name peregrine-cloud up -d
```
---
## Required Environment Variables
These must be present in `.env` (gitignored) before starting the cloud stack:
| Variable | Description | Where to find |
|----------|-------------|---------------|
| `CLOUD_MODE` | Must be `true` | Hardcoded in compose.cloud.yml |
| `CLOUD_DATA_ROOT` | Host path for per-user data trees | `/devl/menagerie-data` |
| `DIRECTUS_JWT_SECRET` | Directus signing secret — validates session JWTs | `website/.env``DIRECTUS_SECRET` |
| `CF_SERVER_SECRET` | Server secret for SQLCipher key derivation | Generate: `openssl rand -base64 32 \| tr -d '/=+' \| cut -c1-32` |
| `PLATFORM_DB_URL` | Postgres connection string for platform DB | `postgresql://cf_platform:<pass>@host.docker.internal:5433/circuitforge_platform` |
!!! warning "SECRET ROTATION"
`CF_SERVER_SECRET` is used to derive all per-user SQLCipher keys via `HMAC(secret, user_id)`. Rotating this secret renders all existing user databases unreadable. Do not rotate it without a migration plan.
---
## Data Root
User data lives at `/devl/menagerie-data/` on the host, bind-mounted into the container:
```
/devl/menagerie-data/
<directus-user-uuid>/
peregrine/
staging.db ← SQLCipher-encrypted (AES-256)
config/ ← llm.yaml, server.yaml, user.yaml, etc.
data/ ← documents, exports, attachments
```
The directory is created automatically on first login. The SQLCipher key for each user is derived deterministically: `HMAC-SHA256(CF_SERVER_SECRET, user_id)`.
### GDPR / Data deletion
To fully delete a user's data:
```bash
# Remove all content data
rm -rf /devl/menagerie-data/<user-id>/
# Remove platform DB rows (cascades)
docker exec cf-platform-db psql -U cf_platform -d circuitforge_platform \
-c "DELETE FROM subscriptions WHERE user_id = '<user-id>';"
```
---
## Platform Database
The Postgres platform DB runs as `cf-platform-db` in the website compose stack (port 5433 on host).
```bash
# Connect
docker exec cf-platform-db psql -U cf_platform -d circuitforge_platform
# Check tables
\dt
# View telemetry consent for a user
SELECT * FROM telemetry_consent WHERE user_id = '<uuid>';
# View recent usage events
SELECT user_id, event_type, occurred_at FROM usage_events
ORDER BY occurred_at DESC LIMIT 20;
```
The schema is initialised on container start from `platform-db/init.sql` in the website repo.
---
## Telemetry
`app/telemetry.py` is the **only** entry point to `usage_events`. Never write to that table directly.
```python
from app.telemetry import log_usage_event
# Fires in cloud mode only; no-op locally
log_usage_event(user_id, "peregrine", "cover_letter_generated", {"words": 350})
```
Events are blocked if:
1. `telemetry_consent.all_disabled = true` (hard kill switch, overrides all)
2. `telemetry_consent.usage_events_enabled = false`
The user controls both from Settings → 🔒 Privacy.
---
## Backup / Restore (Cloud Mode)
The Settings → 💾 Data tab handles backup/restore transparently. In cloud mode:
- **Export:** the SQLCipher-encrypted DB is decrypted before zipping — the downloaded `.zip` is a portable plain SQLite archive, compatible with any local Docker install.
- **Import:** a plain SQLite backup is re-encrypted with the user's key on restore.
The user's `base_dir` in cloud mode is `get_db_path().parent` (`/devl/menagerie-data/<user-id>/peregrine/`), not the app root.
---
## Routing (Caddy)
`menagerie.circuitforge.tech` in `/devl/caddy-proxy/Caddyfile`:
```caddy
menagerie.circuitforge.tech {
encode gzip zstd
handle /peregrine* {
reverse_proxy http://host.docker.internal:8505 {
header_up X-CF-Session {header.Cookie}
}
}
handle {
respond "This app is not yet available in the managed cloud — check back soon." 503
}
log {
output file /data/logs/menagerie.circuitforge.tech.log
format json
}
}
```
`header_up X-CF-Session {header.Cookie}` passes the full cookie header so `cloud_session.py` can extract the Directus session token.
!!! note "Caddy inode gotcha"
After editing the Caddyfile, run `docker restart caddy-proxy` — not `caddy reload`. The Edit tool creates a new inode; Docker bind mounts pin to the original inode and `caddy reload` re-reads the stale one.
---
## Demo Instance
The public demo at `demo.circuitforge.tech/peregrine` runs separately:
```bash
# Start demo
docker compose -f compose.demo.yml --project-name peregrine-demo up -d
# Rebuild after code changes
docker compose -f compose.demo.yml --project-name peregrine-demo build app
docker compose -f compose.demo.yml --project-name peregrine-demo up -d
```
`DEMO_MODE=true` blocks all LLM inference calls at `llm_router.py`. Discovery, job enrichment, and the UI work normally. Demo data lives in `demo/config/` and `demo/data/` — isolated from personal data.
---
## Adding a New App to the Cloud
To onboard a new menagerie app (e.g. `falcon`) to the cloud:
1. Add `resolve_session("falcon")` at the top of each page (calls `cloud_session.py` with the app slug)
2. Replace `DEFAULT_DB` references with `get_db_path()`
3. Add `app/telemetry.py` import and `log_usage_event()` calls at key action points
4. Create `compose.cloud.yml` following the Peregrine pattern (port, `CLOUD_MODE=true`, data mount)
5. Add a Caddy `handle /falcon*` block in `menagerie.circuitforge.tech`, routing to the new port
6. `cloud_session.py` automatically creates `<data_root>/<user-id>/falcon/` on first login

View file

@ -0,0 +1,120 @@
# Contributing
Thank you for your interest in contributing to Peregrine. This guide covers the development environment, code standards, test requirements, and pull request process.
!!! note "License"
Peregrine uses a dual licence. The discovery pipeline (`scripts/discover.py`, `scripts/match.py`, `scripts/db.py`, `scripts/custom_boards/`) is MIT. All AI features, the UI, and everything else is BSL 1.1.
Do not add `Co-Authored-By:` trailers or AI-attribution notices to commits — this is a commercial repository.
---
## Fork and Clone
```bash
git clone https://git.circuitforge.io/circuitforge/peregrine
cd peregrine
```
Create a feature branch from `main`:
```bash
git checkout -b feat/my-feature
```
---
## Dev Environment Setup
Peregrine's Python dependencies are managed with conda. The same `job-seeker` environment is used for both the legacy personal app and Peregrine.
```bash
# Create the environment from the lockfile
conda env create -f environment.yml
# Activate
conda activate job-seeker
```
Alternatively, install from `requirements.txt` into an existing Python 3.12 environment:
```bash
pip install -r requirements.txt
```
!!! warning "Keep the env lightweight"
Do not add `torch`, `sentence-transformers`, `bitsandbytes`, `transformers`, or any other CUDA/GPU package to the main environment. These live in separate conda environments (`job-seeker-vision` for the vision service, `ogma` for fine-tuning). Adding them to the main env causes out-of-memory failures during test runs.
---
## Running Tests
```bash
conda run -n job-seeker python -m pytest tests/ -v
```
Or with the direct binary (avoids runaway process spawning):
```bash
/path/to/miniconda3/envs/job-seeker/bin/pytest tests/ -v
```
The `pytest.ini` file scopes collection to the `tests/` directory only — do not widen this.
All tests must pass before submitting a PR. See [Testing](testing.md) for patterns and conventions.
---
## Code Style
- **PEP 8** for all Python code — use `flake8` or `ruff` to check
- **Type hints preferred** on function signatures — not required but strongly encouraged
- **Docstrings** on all public functions and classes
- **No print statements** in library code (`scripts/`); use Python's `logging` module or return status in the return value. `print` is acceptable in one-off scripts and `discover.py`-style entry points.
---
## Branch Naming
| Prefix | Use for |
|--------|---------|
| `feat/` | New features |
| `fix/` | Bug fixes |
| `docs/` | Documentation only |
| `refactor/` | Code reorganisation without behaviour change |
| `test/` | Test additions or corrections |
| `chore/` | Dependency updates, CI, tooling |
Example: `feat/add-greenhouse-scraper`, `fix/email-imap-timeout`, `docs/add-integration-guide`
---
## PR Checklist
Before opening a pull request:
- [ ] All tests pass: `conda run -n job-seeker python -m pytest tests/ -v`
- [ ] New behaviour is covered by at least one test
- [ ] No new dependencies added to `environment.yml` or `requirements.txt` without a clear justification in the PR description
- [ ] Documentation updated if the PR changes user-visible behaviour (update the relevant page in `docs/`)
- [ ] Config file changes are reflected in the `.example` file
- [ ] No secrets, tokens, or personal data in any committed file
- [ ] Gitignored files (`config/*.yaml`, `staging.db`, `aihawk/`, `.env`) are not committed
---
## What NOT to Do
- Do not commit `config/user.yaml`, `config/notion.yaml`, `config/email.yaml`, `config/adzuna.yaml`, or any `config/integrations/*.yaml` — all are gitignored
- Do not commit `staging.db`
- Do not add `torch`, `bitsandbytes`, `transformers`, or `sentence-transformers` to the main environment
- Do not add `Co-Authored-By:` or AI-attribution lines to commit messages
- Do not force-push to `main`
---
## Getting Help
Open an issue on the repository with the `question` label. Include:
- Your OS and Docker version
- The `inference_profile` from your `config/user.yaml`
- Relevant log output from `make logs`

View file

@ -0,0 +1,181 @@
# Testing
Peregrine has a test suite covering the core scripts layer, LLM router, integrations, wizard steps, and database helpers.
---
## Running the Test Suite
```bash
conda run -n job-seeker python -m pytest tests/ -v
```
Or using the direct binary (recommended to avoid runaway process spawning):
```bash
/path/to/miniconda3/envs/job-seeker/bin/pytest tests/ -v
```
`pytest.ini` scopes test collection to `tests/` only:
```ini
[pytest]
testpaths = tests
```
Do not widen this — the `aihawk/` subtree has its own test files that pull in GPU dependencies.
---
## What Is Covered
The suite currently has approximately 219 tests covering:
| Module | What is tested |
|--------|---------------|
| `scripts/db.py` | CRUD helpers, status transitions, dedup logic |
| `scripts/llm_router.py` | Fallback chain, backend selection, vision routing, error handling |
| `scripts/match.py` | Keyword scoring, gap calculation |
| `scripts/imap_sync.py` | Email parsing, classification label mapping |
| `scripts/company_research.py` | Prompt construction, output parsing |
| `scripts/generate_cover_letter.py` | Mission alignment detection, prompt injection |
| `scripts/task_runner.py` | Task submission, dedup, status transitions |
| `scripts/user_profile.py` | Accessor methods, defaults, YAML round-trip |
| `scripts/integrations/` | Base class contract, per-driver `fields()` and `connect()` |
| `app/wizard/tiers.py` | `can_use()`, `tier_label()`, edge cases |
| `scripts/custom_boards/` | Scraper return shape, HTTP error handling |
---
## Test Structure
Tests live in `tests/`. File naming mirrors the module being tested:
```
tests/
test_db.py
test_llm_router.py
test_match.py
test_imap_sync.py
test_company_research.py
test_cover_letter.py
test_task_runner.py
test_user_profile.py
test_integrations.py
test_tiers.py
test_adzuna.py
test_theladders.py
```
---
## Key Patterns
### tmp_path for YAML files
Use pytest's built-in `tmp_path` fixture for any test that reads or writes YAML config files:
```python
def test_user_profile_reads_name(tmp_path):
config = tmp_path / "user.yaml"
config.write_text("name: Alice\nemail: alice@example.com\n")
from scripts.user_profile import UserProfile
profile = UserProfile(config_path=config)
assert profile.name == "Alice"
```
### Mocking LLM calls
Never make real LLM calls in tests. Patch `LLMRouter.complete`:
```python
from unittest.mock import patch
def test_cover_letter_calls_llm(tmp_path):
with patch("scripts.generate_cover_letter.LLMRouter") as MockRouter:
MockRouter.return_value.complete.return_value = "Dear Hiring Manager,\n..."
from scripts.generate_cover_letter import generate
result = generate(job={...}, user_profile={...})
assert "Dear Hiring Manager" in result
MockRouter.return_value.complete.assert_called_once()
```
### Mocking HTTP in scraper tests
```python
from unittest.mock import patch
def test_adzuna_returns_jobs():
with patch("scripts.custom_boards.adzuna.requests.get") as mock_get:
mock_get.return_value.ok = True
mock_get.return_value.raise_for_status = lambda: None
mock_get.return_value.json.return_value = {"results": [...]}
from scripts.custom_boards.adzuna import scrape
jobs = scrape(profile={...}, db_path="nonexistent.db")
assert len(jobs) > 0
```
### In-memory SQLite for DB tests
```python
import sqlite3, tempfile, os
def test_insert_job():
with tempfile.NamedTemporaryFile(suffix=".db", delete=False) as f:
db_path = f.name
try:
from scripts.db import init_db, insert_job
init_db(db_path)
insert_job(db_path, title="CSM", company="Acme", url="https://example.com/1", ...)
# assert...
finally:
os.unlink(db_path)
```
---
## What NOT to Test
- **Streamlit widget rendering** — Streamlit has no headless test support. Do not try to test `st.button()` or `st.text_input()` calls. Test the underlying script functions instead.
- **Real network calls** — always mock HTTP and LLM clients
- **Real GPU inference** — mock the vision service and LLM router
---
## Adding Tests for New Code
### New scraper
Create `tests/test_myboard.py`. Required test cases:
1. Happy path: mock HTTP returns valid data → correct job dict shape
2. HTTP error: mock raises `Exception` → function returns `[]` (does not raise)
3. Empty results: API returns `{"results": []}` → function returns `[]`
### New integration
Add to `tests/test_integrations.py`. Required test cases:
1. `fields()` returns list of dicts with required keys
2. `connect()` returns `True` with valid config, `False` with missing required field
3. `test()` returns `True` with mocked successful HTTP, `False` with exception
4. `is_configured()` reflects file presence in `tmp_path`
### New wizard step
Add to `tests/test_wizard_steps.py`. Test the step's pure-logic functions (validation, data extraction). Do not test the Streamlit rendering.
### New tier feature gate
Add to `tests/test_tiers.py`:
```python
from app.wizard.tiers import can_use
def test_my_new_feature_requires_paid():
assert can_use("free", "my_new_feature") is False
assert can_use("paid", "my_new_feature") is True
assert can_use("premium", "my_new_feature") is True
```

View file

@ -0,0 +1,118 @@
# Docker Profiles
Peregrine uses Docker Compose profiles to start only the services your hardware can support. Choose a profile with `make start PROFILE=<name>`.
---
## Profile Reference
| Profile | Services started | Use case |
|---------|----------------|----------|
| `remote` | `app`, `searxng` | No GPU. LLM calls go to an external API (Anthropic, OpenAI-compatible). |
| `cpu` | `app`, `ollama`, `searxng` | No GPU. Runs local models on CPU — functional but slow. |
| `single-gpu` | `app`, `ollama`, `vision`, `searxng` | One NVIDIA GPU. Covers cover letters, research, and vision (survey screenshots). |
| `dual-gpu` | `app`, `ollama`, `vllm`, `vision`, `searxng` | Two NVIDIA GPUs. GPU 0 = Ollama (cover letters), GPU 1 = vLLM (research). |
---
## Service Descriptions
| Service | Image / Source | Port | Purpose |
|---------|---------------|------|---------|
| `app` | `Dockerfile` (Streamlit) | 8501 | The main Peregrine UI |
| `ollama` | `ollama/ollama` | 11434 | Local model inference — cover letters and general tasks |
| `vllm` | `vllm/vllm-openai` | 8000 | High-throughput local inference — research tasks |
| `vision` | `scripts/vision_service/` | 8002 | Moondream2 — survey screenshot analysis |
| `searxng` | `searxng/searxng` | 8888 | Private meta-search engine — company research web scraping |
---
## Choosing a Profile
### remote
Use `remote` if:
- You have no NVIDIA GPU
- You plan to use Anthropic Claude or another API-hosted model exclusively
- You want the fastest startup (only two containers)
You must configure at least one external LLM backend in **Settings → LLM Backends**.
### cpu
Use `cpu` if:
- You have no GPU but want to run models locally (e.g. for privacy)
- Acceptable for light use; cover letter generation may take several minutes per request
Pull a model after the container starts:
```bash
docker exec -it peregrine-ollama-1 ollama pull llama3.1:8b
```
### single-gpu
Use `single-gpu` if:
- You have one NVIDIA GPU with at least 8 GB VRAM
- Recommended for most single-user installs
- The vision service (Moondream2) starts on the same GPU using 4-bit quantisation (~1.5 GB VRAM)
### dual-gpu
Use `dual-gpu` if:
- You have two or more NVIDIA GPUs
- GPU 0 handles Ollama (cover letters, quick tasks)
- GPU 1 handles vLLM (research, long-context tasks)
- The vision service shares GPU 0 with Ollama
---
## GPU Memory Guidance
| GPU VRAM | Recommended profile | Notes |
|----------|-------------------|-------|
| < 4 GB | `cpu` | GPU too small for practical model loading |
| 48 GB | `single-gpu` | Run smaller models (3B8B parameters) |
| 816 GB | `single-gpu` | Run 8B13B models comfortably |
| 1624 GB | `single-gpu` | Run 13B34B models |
| 24 GB+ | `single-gpu` or `dual-gpu` | 70B models with quantisation |
---
## How preflight.py Works
`make start` calls `scripts/preflight.py` before launching Docker. Preflight does the following:
1. **Port conflict detection** — checks whether `STREAMLIT_PORT`, `OLLAMA_PORT`, `VLLM_PORT`, `SEARXNG_PORT`, and `VISION_PORT` are already in use. Reports any conflicts and suggests alternatives.
2. **GPU enumeration** — queries `nvidia-smi` for GPU count and VRAM per card.
3. **RAM check** — reads `/proc/meminfo` (Linux) or `vm_stat` (macOS) to determine available system RAM.
4. **KV cache offload** — if GPU VRAM is less than 10 GB, preflight calculates `CPU_OFFLOAD_GB` (the amount of KV cache to spill to system RAM) and writes it to `.env`. The vLLM container picks this up via `--cpu-offload-gb`.
5. **Profile recommendation** — writes `RECOMMENDED_PROFILE` to `.env`. This is informational; `make start` uses the `PROFILE` variable you specify (defaulting to `remote`).
You can run preflight independently:
```bash
make preflight
# or
python scripts/preflight.py
```
---
## Customising Ports
Edit `.env` before running `make start`:
```bash
STREAMLIT_PORT=8501
OLLAMA_PORT=11434
VLLM_PORT=8000
SEARXNG_PORT=8888
VISION_PORT=8002
```
All containers read from `.env` via the `env_file` directive in `compose.yml`.

View file

@ -0,0 +1,165 @@
# First-Run Wizard
When you open Peregrine for the first time, the setup wizard launches automatically. It walks through seven steps and saves your progress after each one — if your browser closes or the server restarts, it resumes where you left off.
---
## Step 1 — Hardware
Peregrine detects NVIDIA GPUs using `nvidia-smi` and reports:
- Number of GPUs found
- VRAM per GPU
- Available system RAM
Based on this, it recommends a Docker Compose profile:
| Recommendation | Condition |
|---------------|-----------|
| `remote` | No GPU detected |
| `cpu` | GPU detected but VRAM < 4 GB |
| `single-gpu` | One GPU with VRAM >= 4 GB |
| `dual-gpu` | Two or more GPUs |
You can override the recommendation and select any profile manually. The selection is written to `config/user.yaml` as `inference_profile`.
---
## Step 2 — Tier
Select your Peregrine tier:
| Tier | Description |
|------|-------------|
| **Free** | Job discovery, matching, and basic pipeline — no LLM features |
| **Paid** | Adds cover letters, company research, email sync, integrations, and all AI features |
| **Premium** | Adds fine-tuning and multi-user support |
Your tier is written to `config/user.yaml` as `tier`.
**Dev tier override** — for local testing without a paid licence, set `dev_tier_override: premium` in `config/user.yaml`. This is for development use only and has no effect on production deployments.
See [Tier System](../reference/tier-system.md) for the full feature gate table.
---
## Step 3 — Identity
Enter your personal details. These are stored locally in `config/user.yaml` and used to personalise cover letters and research briefs.
| Field | Description |
|-------|-------------|
| Name | Your full name |
| Email | Primary contact email |
| Phone | Contact phone number |
| LinkedIn | LinkedIn profile URL |
| Career summary | 24 sentence professional summary — used in cover letters and interview prep |
**LLM-assisted writing (Paid):** If you have a paid tier, the wizard offers to generate your career summary from a few bullet points using your configured LLM backend.
---
## Step 4 — Resume
Two paths are available:
### Upload PDF or DOCX
Upload your existing resume. The LLM parses it and extracts:
- Work experience (employer, title, dates, bullets)
- Education
- Skills
- Certifications
The extracted data is stored in `config/user.yaml` and used when generating cover letters.
### Guided form builder
Fill in each section manually using structured form fields. Useful if you do not have a digital resume file ready, or if the parser misses something important.
Both paths produce the same data structure. You can mix them — upload first, then edit the result in the form.
---
## Step 5 — Inference
Configure which LLM backends Peregrine uses. Backends are tried in priority order; if the first fails, Peregrine falls back to the next.
Available backend types:
| Type | Examples | Notes |
|------|---------|-------|
| `openai_compat` | Ollama, vLLM, Claude Code wrapper, Copilot wrapper | Any OpenAI-compatible API |
| `anthropic` | Claude via Anthropic API | Requires `ANTHROPIC_API_KEY` env var |
| `vision_service` | Moondream2 local service | Used for survey screenshot analysis only |
For each backend you want to enable:
1. Enter the base URL (e.g. `http://localhost:11434/v1` for Ollama)
2. Enter an API key if required (Anthropic, OpenAI)
3. Click **Test** — Peregrine pings the `/health` endpoint and attempts a short completion
The full backend configuration is written to `config/llm.yaml`. You can edit it directly later via **Settings → LLM Backends**.
!!! tip "Recommended minimum"
Enable at least Ollama with a general-purpose model (e.g. `llama3.1:8b`) for research tasks, and either Ollama or Anthropic for cover letter generation. The wizard will not block you if no backend is configured, but most features will not work.
---
## Step 6 — Search
Define what jobs to look for. Search configuration is written to `config/search_profiles.yaml`.
| Field | Description |
|-------|-------------|
| Profile name | A label for this search profile (e.g. `cs_leadership`) |
| Job titles | List of titles to search for (e.g. `Customer Success Manager`, `TAM`) |
| Locations | City/region strings or `Remote` |
| Boards | Standard boards: `linkedin`, `indeed`, `glassdoor`, `zip_recruiter`, `google` |
| Custom boards | Additional scrapers: `adzuna`, `theladders`, `craigslist` |
| Exclude keywords | Jobs containing these words in the title are dropped |
| Results per board | Max jobs to fetch per board per run |
| Hours old | Only fetch jobs posted within this many hours |
You can create multiple profiles (e.g. one for remote roles, one for a target industry). Run them all from the Home page or run a specific one.
---
## Step 7 — Integrations
Connect optional external services. All integrations are optional — skip this step if you want to use Peregrine without external accounts.
Available integrations:
**Job tracking (Paid):** Notion, Airtable, Google Sheets
**Document storage (Free):** Google Drive, Dropbox, OneDrive, MEGA, Nextcloud
**Calendar (Paid):** Google Calendar, Apple Calendar (CalDAV)
**Notifications (Paid for Slack; Free for Discord and Home Assistant):** Slack, Discord, Home Assistant
Each integration has a connection card with the required credentials. Click **Test** to verify the connection before saving. Credentials are written to `config/integrations/<name>.yaml` (gitignored).
See [Integrations](../user-guide/integrations.md) for per-service details.
---
## Crash Recovery
The wizard saves your progress to `config/user.yaml` after each step is completed (`wizard_step` field). If anything goes wrong:
- Restart Peregrine and navigate to http://localhost:8501
- The wizard resumes at the last completed step
---
## Re-entering the Wizard
To go through the wizard again (e.g. to change your search profile or swap LLM backends):
1. Open **Settings**
2. Go to the **Developer** tab
3. Click **Reset wizard**
This sets `wizard_complete: false` and `wizard_step: 0` in `config/user.yaml`. Your previously entered data is preserved as defaults.

View file

@ -0,0 +1,134 @@
# Installation
This page walks through a full Peregrine installation from scratch.
---
## Prerequisites
- **Git** — to clone the repository
- **Internet connection**`setup.sh` downloads Docker and other dependencies
- **Operating system**: Ubuntu/Debian, Fedora/RHEL, Arch Linux, or macOS (with Docker Desktop)
!!! warning "Windows"
Windows is not supported. Use [WSL2 with Ubuntu](https://docs.microsoft.com/windows/wsl/install) instead.
---
## Step 1 — Clone the repository
```bash
git clone https://git.circuitforge.io/circuitforge/peregrine
cd peregrine
```
---
## Step 2 — Run setup.sh
```bash
bash setup.sh
```
`setup.sh` performs the following automatically:
1. **Detects your platform** (Ubuntu/Debian, Fedora/RHEL, Arch, macOS)
2. **Installs Git** if not already present
3. **Installs Docker Engine** and the Docker Compose v2 plugin via the official Docker repositories
4. **Adds your user to the `docker` group** so you do not need `sudo` for docker commands (Linux only — log out and back in after this)
5. **Detects NVIDIA GPUs** — if `nvidia-smi` is present and working, installs the NVIDIA Container Toolkit and configures Docker to use it
6. **Creates `.env` from `.env.example`** — edit `.env` to customise ports and model storage paths before starting
!!! note "macOS"
`setup.sh` installs Docker Desktop via Homebrew (`brew install --cask docker`) then exits. Open Docker Desktop, start it, then re-run the script.
!!! note "GPU requirement"
For GPU support, `nvidia-smi` must return output before you run `setup.sh`. Install your NVIDIA driver first. The Container Toolkit installation will fail silently if the driver is not present.
---
## Step 3 — (Optional) Edit .env
The `.env` file controls ports and volume mount paths. The defaults work for most single-user installs:
```bash
# Default ports
STREAMLIT_PORT=8501
OLLAMA_PORT=11434
VLLM_PORT=8000
SEARXNG_PORT=8888
VISION_PORT=8002
```
Change `STREAMLIT_PORT` if 8501 is taken on your machine.
---
## Step 4 — Start Peregrine
Choose a profile based on your hardware:
```bash
make start # remote — no GPU, use API-only LLMs
make start PROFILE=cpu # cpu — local models on CPU (slow)
make start PROFILE=single-gpu # single-gpu — one NVIDIA GPU
make start PROFILE=dual-gpu # dual-gpu — GPU 0 = Ollama, GPU 1 = vLLM
```
`make start` runs `preflight.py` first, which checks for port conflicts and writes GPU/RAM recommendations back to `.env`. Then it calls `docker compose --profile <PROFILE> up -d`.
---
## Step 5 — Open the UI
Navigate to **http://localhost:8501** (or whatever `STREAMLIT_PORT` you set).
The first-run wizard launches automatically. See [First-Run Wizard](first-run-wizard.md) for a step-by-step guide through all seven steps.
---
## Supported Platforms
| Platform | Tested | Notes |
|----------|--------|-------|
| Ubuntu 22.04 / 24.04 | Yes | Primary target |
| Debian 12 | Yes | |
| Fedora 39/40 | Yes | |
| RHEL / Rocky / AlmaLinux | Yes | |
| Arch Linux / Manjaro | Yes | |
| macOS (Apple Silicon) | Yes | Docker Desktop required; no GPU support |
| macOS (Intel) | Yes | Docker Desktop required; no GPU support |
| Windows | No | Use WSL2 with Ubuntu |
---
## GPU Support
Only NVIDIA GPUs are supported. AMD ROCm is not currently supported.
Requirements:
- NVIDIA driver installed and `nvidia-smi` working before running `setup.sh`
- CUDA 12.x recommended (CUDA 11.x may work but is untested)
- Minimum 8 GB VRAM for `single-gpu` profile with default models
- For `dual-gpu`: GPU 0 is assigned to Ollama, GPU 1 to vLLM
If your GPU has less than 10 GB VRAM, `preflight.py` will calculate a `CPU_OFFLOAD_GB` value and write it to `.env`. The vLLM container picks this up via `--cpu-offload-gb` to overflow KV cache to system RAM.
---
## Stopping Peregrine
```bash
make stop # stop all containers
make restart # stop then start again (runs preflight first)
```
---
## Reinstalling / Clean State
```bash
make clean # removes containers, images, and data volumes (destructive)
```
You will be prompted to type `yes` to confirm.

65
docs/index.md Normal file
View file

@ -0,0 +1,65 @@
# Peregrine
**AI-powered job search pipeline — by [Circuit Forge LLC](https://circuitforge.io)**
Peregrine automates the full job search lifecycle: discovery, matching, cover letter generation, application tracking, and interview preparation. It is privacy-first and local-first — your data never leaves your machine unless you configure an external integration.
---
## Quick Start
```bash
# 1. Clone and install dependencies
git clone https://git.circuitforge.io/circuitforge/peregrine
cd peregrine
bash setup.sh
# 2. Start Peregrine
make start # no GPU, API-only
make start PROFILE=single-gpu # one NVIDIA GPU
make start PROFILE=dual-gpu # dual GPU (Ollama + vLLM)
# 3. Open the UI
# http://localhost:8501
```
The first-run wizard guides you through hardware detection, tier selection, identity, resume, LLM configuration, search profiles, and integrations. See [Installation](getting-started/installation.md) for the full walkthrough.
---
## Feature Overview
| Feature | Free | Paid | Premium |
|---------|------|------|---------|
| Job discovery (JobSpy + custom boards) | Yes | Yes | Yes |
| Resume keyword matching | Yes | Yes | Yes |
| Cover letter generation | - | Yes | Yes |
| Company research briefs | - | Yes | Yes |
| Interview prep & practice Q&A | - | Yes | Yes |
| Email sync & auto-classification | - | Yes | Yes |
| Survey assistant (culture-fit Q&A) | - | Yes | Yes |
| Integration connectors (Notion, Airtable, etc.) | Partial | Yes | Yes |
| Calendar sync (Google, Apple) | - | Yes | Yes |
| Cover letter model fine-tuning | - | - | Yes |
| Multi-user support | - | - | Yes |
See [Tier System](reference/tier-system.md) for the full feature gate table.
---
## Documentation Sections
- **[Getting Started](getting-started/installation.md)** — Install, configure, and launch Peregrine
- **[User Guide](user-guide/job-discovery.md)** — How to use every feature in the UI
- **[Developer Guide](developer-guide/contributing.md)** — Add scrapers, integrations, and contribute code
- **[Reference](reference/tier-system.md)** — Tier system, LLM router, and config file schemas
---
## License
Core discovery pipeline: [MIT](https://git.circuitforge.io/circuitforge/peregrine/src/branch/main/LICENSE-MIT)
AI features (cover letter generation, company research, interview prep, UI): [BSL 1.1](https://git.circuitforge.io/circuitforge/peregrine/src/branch/main/LICENSE-BSL)
© 2026 Circuit Forge LLC

View file

@ -1,201 +0,0 @@
# Job Seeker Platform — Design Document
**Date:** 2026-02-20
**Status:** Approved
**Candidate:** Alex Rivera
---
## Overview
A monorepo project at `/devl/job-seeker/` that integrates three FOSS tools into a
cohesive job search pipeline: automated discovery (JobSpy), resume-to-listing keyword
matching (Resume Matcher), and automated application submission (AIHawk). Job listings
and interactive documents are tracked in Notion; source documents live in
`/Library/Documents/JobSearch/`.
---
## Project Structure
```
/devl/job-seeker/
├── config/
│ ├── search_profiles.yaml # JobSpy queries (titles, locations, boards)
│ ├── llm.yaml # LLM router: backends + fallback order
│ └── notion.yaml # Notion DB IDs and field mappings
├── aihawk/ # git clone — Auto_Jobs_Applier_AIHawk
├── resume_matcher/ # git clone — Resume-Matcher
├── scripts/
│ ├── discover.py # JobSpy → deduplicate → push to Notion
│ ├── match.py # Notion job URL → Resume Matcher → write score back
│ └── llm_router.py # LLM abstraction layer with priority fallback chain
├── docs/plans/ # Design and implementation docs (no resume files)
├── environment.yml # conda env spec (env name: job-seeker)
└── .gitignore
```
**Document storage rule:** Resumes, cover letters, and any interactable documents live
in `/Library/Documents/JobSearch/` or Notion — never committed to this repo.
---
## Architecture
### Data Flow
```
JobSpy (LinkedIn / Indeed / Glassdoor / ZipRecruiter)
└─▶ discover.py
├─ deduplicate by URL against existing Notion records
└─▶ Notion DB (Status: "New")
Notion DB (daily review — decide what to pursue)
└─▶ match.py <notion-page-url>
├─ fetch job description from listing URL
├─ run Resume Matcher vs. /Library/Documents/JobSearch/Alex_Rivera_Resume_02-19-2025.pdf
└─▶ write Match Score + Keyword Gaps back to Notion page
AIHawk (when ready to apply)
├─ reads config pointing to same resume + personal_info.yaml
├─ llm_router.py → best available LLM backend
├─ submits LinkedIn Easy Apply
└─▶ Notion status → "Applied"
```
---
## Notion Database Schema
| Field | Type | Notes |
|---------------|----------|------------------------------------------------------------|
| Job Title | Title | Primary identifier |
| Company | Text | |
| Location | Text | |
| Remote | Checkbox | |
| URL | URL | Deduplication key |
| Source | Select | LinkedIn / Indeed / Glassdoor / ZipRecruiter |
| Status | Select | New → Reviewing → Applied → Interview → Offer → Rejected |
| Match Score | Number | 0100, written by match.py |
| Keyword Gaps | Text | Comma-separated missing keywords from Resume Matcher |
| Salary | Text | If listed |
| Date Found | Date | Set at discovery time |
| Notes | Text | Manual field |
---
## LLM Router (`scripts/llm_router.py`)
Single `complete(prompt, system=None)` interface. On each call: health-check each
backend in configured order, use the first that responds. Falls back silently on
connection error, timeout, or 5xx. Logs which backend was used.
All backends except Anthropic use the `openai` Python package (OpenAI-compatible
endpoints). Anthropic uses the `anthropic` package.
### `config/llm.yaml`
```yaml
fallback_order:
- claude_code # port 3009 — Claude via local pipeline (highest quality)
- ollama # port 11434 — local, always-on
- vllm # port 8000 — start when needed
- github_copilot # port 3010 — Copilot via gh token
- anthropic # cloud fallback, burns API credits
backends:
claude_code:
type: openai_compat
base_url: http://localhost:3009/v1
model: claude-code-terminal
api_key: "any"
ollama:
type: openai_compat
base_url: http://localhost:11434/v1
model: llama3.2
api_key: "ollama"
vllm:
type: openai_compat
base_url: http://localhost:8000/v1
model: __auto__
api_key: ""
github_copilot:
type: openai_compat
base_url: http://localhost:3010/v1
model: gpt-4o
api_key: "any"
anthropic:
type: anthropic
model: claude-sonnet-4-6
api_key_env: ANTHROPIC_API_KEY
```
---
## Job Search Profile
### `config/search_profiles.yaml` (initial)
```yaml
profiles:
- name: cs_leadership
titles:
- "Customer Success Manager"
- "Director of Customer Success"
- "VP Customer Success"
- "Head of Customer Success"
- "Technical Account Manager"
- "Revenue Operations Manager"
- "Customer Experience Lead"
locations:
- "Remote"
- "San Francisco Bay Area, CA"
boards:
- linkedin
- indeed
- glassdoor
- zip_recruiter
results_per_board: 25
remote_only: false # remote preferred but Bay Area in-person ok
hours_old: 72 # listings posted in last 3 days
```
---
## Conda Environment
New dedicated env `job-seeker` (not base). Core packages:
- `python-jobspy` — job scraping
- `notion-client` — Notion API
- `openai` — OpenAI-compatible calls (Ollama, vLLM, Copilot, Claude pipeline)
- `anthropic` — Anthropic API fallback
- `pyyaml` — config parsing
- `pandas` — CSV handling and dedup
- Resume Matcher dependencies (sentence-transformers, streamlit — installed from clone)
Resume Matcher Streamlit UI runs on port **8501** (confirmed clear).
---
## Port Map
| Port | Service | Status |
|-------|--------------------------------|----------------|
| 3009 | Claude Code OpenAI wrapper | Start via manage.sh in Post Fight Processing |
| 3010 | GitHub Copilot wrapper | Start via manage-copilot.sh |
| 11434 | Ollama | Running |
| 8000 | vLLM | Start when needed |
| 8501 | Resume Matcher (Streamlit) | Start when needed |
---
## Out of Scope (this phase)
- Scheduled/cron automation (run discover.py manually for now)
- Email/SMS alerts for new listings
- ATS resume rebuild (separate task)
- Applications to non-LinkedIn platforms via AIHawk

File diff suppressed because it is too large Load diff

View file

@ -1,148 +0,0 @@
# Job Seeker Platform — Web UI Design
**Date:** 2026-02-20
**Status:** Approved
## Overview
A Streamlit multi-page web UI that gives Alex (and her partner) a friendly interface to review scraped job listings, curate them before they hit Notion, edit search/LLM/Notion settings, and fill out her AIHawk application profile. Designed to be usable by anyone — no technical knowledge required.
---
## Architecture & Data Flow
```
discover.py → SQLite staging.db (status: pending)
Streamlit UI
review / approve / reject
"Sync N approved jobs" button
Notion DB (status: synced)
```
`discover.py` is modified to write to SQLite instead of directly to Notion.
A new `sync.py` handles the approved → Notion push.
`db.py` provides shared SQLite helpers used by both scripts and UI pages.
### SQLite Schema (`staging.db`, gitignored)
```sql
CREATE TABLE jobs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
title TEXT,
company TEXT,
url TEXT UNIQUE,
source TEXT,
location TEXT,
is_remote INTEGER,
salary TEXT,
description TEXT,
match_score REAL,
keyword_gaps TEXT,
date_found TEXT,
status TEXT DEFAULT 'pending', -- pending / approved / rejected / synced
notion_page_id TEXT
);
```
---
## Pages
### Home (Dashboard)
- Stat cards: Pending / Approved / Rejected / Synced counts
- "Run Discovery" button — runs `discover.py` as subprocess, streams output
- "Sync N approved jobs → Notion" button — visible only when approved count > 0
- Recent activity list (last 10 jobs found)
### Job Review
- Filterable table/card view of pending jobs
- Filters: source (LinkedIn/Indeed/etc), remote only toggle, minimum match score slider
- Checkboxes for batch selection
- "Approve Selected" / "Reject Selected" buttons
- Rejected jobs hidden by default, togglable
- Match score shown as colored badge (green ≥70, amber 4069, red <40)
### Settings
Three tabs:
**Search** — edit `config/search_profiles.yaml`:
- Job titles (add/remove tags)
- Locations (add/remove)
- Boards checkboxes
- Hours old slider
- Results per board slider
**LLM Backends** — edit `config/llm.yaml`:
- Fallback order (drag or up/down arrows)
- Per-backend: URL, model name, enabled toggle
- "Test connection" button per backend
**Notion** — edit `config/notion.yaml`:
- Token field (masked, show/hide toggle)
- Database ID
- "Test connection" button
### Resume Editor
Sectioned form over `aihawk/data_folder/plain_text_resume.yaml`:
- **Personal Info** — name, email, phone, LinkedIn, city, zip
- **Education** — list of entries, add/remove buttons
- **Experience** — list of entries, add/remove buttons
- **Skills & Interests** — tag-style inputs
- **Preferences** — salary range, notice period, remote/relocation toggles
- **Self-Identification** — gender, pronouns, veteran, disability, ethnicity (with "prefer not to say" options)
- **Legal** — work authorization checkboxes
`FILL_IN` fields highlighted in amber with "Needs your attention" note.
Save button writes back to YAML. No raw YAML shown by default.
---
## Theme & Styling
Central theme at `app/.streamlit/config.toml`:
- Dark base, accent color teal/green (job search = growth)
- Consistent font (Inter or system sans-serif)
- Responsive column layouts — usable on tablet/mobile
- No jargon — "Run Discovery" not "Execute scrape", "Sync to Notion" not "Push records"
---
## File Layout
```
app/
├── .streamlit/
│ └── config.toml # central theme
├── Home.py # dashboard
└── pages/
├── 1_Job_Review.py
├── 2_Settings.py
└── 3_Resume_Editor.py
scripts/
├── db.py # new: SQLite helpers
├── sync.py # new: approved → Notion push
├── discover.py # modified: write to SQLite not Notion
├── match.py # unchanged
└── llm_router.py # unchanged
```
Run: `conda run -n job-seeker streamlit run app/Home.py`
---
## New Dependencies
None — `streamlit` already installed via resume_matcher deps.
`sqlite3` is Python stdlib.
---
## Out of Scope
- Real-time collaboration
- Mobile native app
- Cover letter editor (handled separately via LoRA fine-tune task)
- AIHawk trigger from UI (run manually for now)

File diff suppressed because it is too large Load diff

View file

@ -1,100 +0,0 @@
# Background Task Processing — Design
**Date:** 2026-02-21
**Status:** Approved
## Problem
Cover letter generation (`4_Apply.py`) and company research (`6_Interview_Prep.py`) call LLM scripts synchronously inside `st.spinner()`. If the user navigates away during generation, Streamlit abandons the in-progress call and the result is lost. Both results are already persisted to SQLite on completion, so if the task kept running in the background the result would be available on return.
## Solution Overview
Python threading + SQLite task table. When a user clicks Generate, a daemon thread is spawned immediately and the task is recorded in a new `background_tasks` table. The thread writes results to the existing tables (`jobs.cover_letter`, `company_research`) and marks itself complete/failed. All pages share a sidebar indicator that auto-refreshes while tasks are active. Individual pages show task-level status inline.
## SQLite Schema
New table `background_tasks` added in `scripts/db.py`:
```sql
CREATE TABLE IF NOT EXISTS background_tasks (
id INTEGER PRIMARY KEY AUTOINCREMENT,
task_type TEXT NOT NULL, -- "cover_letter" | "company_research"
job_id INTEGER NOT NULL,
status TEXT NOT NULL DEFAULT 'queued', -- queued | running | completed | failed
error TEXT,
created_at DATETIME DEFAULT (datetime('now')),
started_at DATETIME,
finished_at DATETIME
)
```
## Deduplication Rule
Before inserting a new task, check for an existing `queued` or `running` row with the same `(task_type, job_id)`. If one exists, reject the submission (return the existing task's id). Different task types for the same job (e.g. cover letter + research) are allowed to run concurrently. Different jobs of the same type are allowed concurrently.
## Components
### `scripts/task_runner.py` (new)
- `submit_task(db, task_type, job_id) -> int` — dedup check, insert row, spawn daemon thread, return task id
- `_run_task(db, task_id, task_type, job_id)` — thread body: mark running, call generator, save result, mark completed/failed
- `get_active_tasks(db) -> list[dict]` — all queued/running rows with job title+company joined
- `get_task_for_job(db, task_type, job_id) -> dict | None` — latest task row for a specific job+type
### `scripts/db.py` (modified)
- Add `init_background_tasks(conn)` called inside `init_db()`
- Add `insert_task`, `update_task_status`, `get_active_tasks`, `get_task_for_job` helpers
### `app/app.py` (modified)
- After `st.navigation()`, call `get_active_tasks()` and render sidebar indicator
- Use `st.fragment` with `time.sleep(3)` + `st.rerun(scope="fragment")` to poll while tasks are active
- Sidebar shows: `⏳ N task(s) running` count + per-task line (type + company name)
- Fragment polling stops when active task count reaches zero
### `app/pages/4_Apply.py` (modified)
- Generate button calls `submit_task(db, "cover_letter", job_id)` instead of running inline
- If a task is `queued`/`running` for the selected job, disable button and show inline status fragment (polls every 3s)
- On `completed`, load cover letter from `jobs` row (already saved by thread)
- On `failed`, show error message and re-enable button
### `app/pages/6_Interview_Prep.py` (modified)
- Generate/Refresh buttons call `submit_task(db, "company_research", job_id)` instead of running inline
- Same inline status fragment pattern as Apply page
## Data Flow
```
User clicks Generate
→ submit_task(db, type, job_id)
→ dedup check (reject if already queued/running for same type+job)
→ INSERT background_tasks row (status=queued)
→ spawn daemon thread
→ return task_id
→ page shows inline "⏳ Queued…" fragment
Thread runs
→ UPDATE status=running, started_at=now
→ call generate_cover_letter.generate() OR research_company()
→ write result to jobs.cover_letter OR company_research table
→ UPDATE status=completed, finished_at=now
(on exception: UPDATE status=failed, error=str(e))
Sidebar fragment (every 3s while active tasks > 0)
→ get_active_tasks() → render count + list
→ st.rerun(scope="fragment")
Page fragment (every 3s while task for this job is running)
→ get_task_for_job() → render status
→ on completed: st.rerun() (full rerun to reload cover letter / research)
```
## What Is Not Changed
- `generate_cover_letter.generate()` and `research_company()` are called unchanged from the thread
- `update_cover_letter()` and `save_research()` DB helpers are reused unchanged
- No new Python packages required
- No separate worker process — daemon threads die with the Streamlit server, but results already written to SQLite survive

View file

@ -1,933 +0,0 @@
# Background Task Processing Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
**Goal:** Replace synchronous LLM calls in Apply and Interview Prep pages with background threads so cover letter and research generation survive page navigation.
**Architecture:** A new `background_tasks` SQLite table tracks task state. `scripts/task_runner.py` spawns daemon threads that call existing generator functions and write results via existing DB helpers. The Streamlit sidebar polls active tasks every 3s via `@st.fragment(run_every=3)`; individual pages show per-job status with the same pattern.
**Tech Stack:** Python `threading` (stdlib), SQLite, Streamlit `st.fragment` (≥1.33 — already installed)
---
## Task 1: Add background_tasks table and DB helpers
**Files:**
- Modify: `scripts/db.py`
- Test: `tests/test_db.py`
### Step 1: Write the failing tests
Add to `tests/test_db.py`:
```python
# ── background_tasks tests ────────────────────────────────────────────────────
def test_init_db_creates_background_tasks_table(tmp_path):
"""init_db creates a background_tasks table."""
from scripts.db import init_db
db_path = tmp_path / "test.db"
init_db(db_path)
import sqlite3
conn = sqlite3.connect(db_path)
cur = conn.execute(
"SELECT name FROM sqlite_master WHERE type='table' AND name='background_tasks'"
)
assert cur.fetchone() is not None
conn.close()
def test_insert_task_returns_id_and_true(tmp_path):
"""insert_task returns (task_id, True) for a new task."""
from scripts.db import init_db, insert_job, insert_task
db_path = tmp_path / "test.db"
init_db(db_path)
job_id = insert_job(db_path, {
"title": "CSM", "company": "Acme", "url": "https://ex.com/1",
"source": "linkedin", "location": "Remote", "is_remote": True,
"salary": "", "description": "", "date_found": "2026-02-20",
})
task_id, is_new = insert_task(db_path, "cover_letter", job_id)
assert isinstance(task_id, int) and task_id > 0
assert is_new is True
def test_insert_task_deduplicates_active_task(tmp_path):
"""insert_task returns (existing_id, False) if a queued/running task already exists."""
from scripts.db import init_db, insert_job, insert_task
db_path = tmp_path / "test.db"
init_db(db_path)
job_id = insert_job(db_path, {
"title": "CSM", "company": "Acme", "url": "https://ex.com/1",
"source": "linkedin", "location": "Remote", "is_remote": True,
"salary": "", "description": "", "date_found": "2026-02-20",
})
first_id, _ = insert_task(db_path, "cover_letter", job_id)
second_id, is_new = insert_task(db_path, "cover_letter", job_id)
assert second_id == first_id
assert is_new is False
def test_insert_task_allows_different_types_same_job(tmp_path):
"""insert_task allows cover_letter and company_research for the same job concurrently."""
from scripts.db import init_db, insert_job, insert_task
db_path = tmp_path / "test.db"
init_db(db_path)
job_id = insert_job(db_path, {
"title": "CSM", "company": "Acme", "url": "https://ex.com/1",
"source": "linkedin", "location": "Remote", "is_remote": True,
"salary": "", "description": "", "date_found": "2026-02-20",
})
_, cl_new = insert_task(db_path, "cover_letter", job_id)
_, res_new = insert_task(db_path, "company_research", job_id)
assert cl_new is True
assert res_new is True
def test_update_task_status_running(tmp_path):
"""update_task_status('running') sets started_at."""
from scripts.db import init_db, insert_job, insert_task, update_task_status
import sqlite3
db_path = tmp_path / "test.db"
init_db(db_path)
job_id = insert_job(db_path, {
"title": "CSM", "company": "Acme", "url": "https://ex.com/1",
"source": "linkedin", "location": "Remote", "is_remote": True,
"salary": "", "description": "", "date_found": "2026-02-20",
})
task_id, _ = insert_task(db_path, "cover_letter", job_id)
update_task_status(db_path, task_id, "running")
conn = sqlite3.connect(db_path)
row = conn.execute("SELECT status, started_at FROM background_tasks WHERE id=?", (task_id,)).fetchone()
conn.close()
assert row[0] == "running"
assert row[1] is not None
def test_update_task_status_completed(tmp_path):
"""update_task_status('completed') sets finished_at."""
from scripts.db import init_db, insert_job, insert_task, update_task_status
import sqlite3
db_path = tmp_path / "test.db"
init_db(db_path)
job_id = insert_job(db_path, {
"title": "CSM", "company": "Acme", "url": "https://ex.com/1",
"source": "linkedin", "location": "Remote", "is_remote": True,
"salary": "", "description": "", "date_found": "2026-02-20",
})
task_id, _ = insert_task(db_path, "cover_letter", job_id)
update_task_status(db_path, task_id, "completed")
conn = sqlite3.connect(db_path)
row = conn.execute("SELECT status, finished_at FROM background_tasks WHERE id=?", (task_id,)).fetchone()
conn.close()
assert row[0] == "completed"
assert row[1] is not None
def test_update_task_status_failed_stores_error(tmp_path):
"""update_task_status('failed') stores error message and sets finished_at."""
from scripts.db import init_db, insert_job, insert_task, update_task_status
import sqlite3
db_path = tmp_path / "test.db"
init_db(db_path)
job_id = insert_job(db_path, {
"title": "CSM", "company": "Acme", "url": "https://ex.com/1",
"source": "linkedin", "location": "Remote", "is_remote": True,
"salary": "", "description": "", "date_found": "2026-02-20",
})
task_id, _ = insert_task(db_path, "cover_letter", job_id)
update_task_status(db_path, task_id, "failed", error="LLM timeout")
conn = sqlite3.connect(db_path)
row = conn.execute("SELECT status, error, finished_at FROM background_tasks WHERE id=?", (task_id,)).fetchone()
conn.close()
assert row[0] == "failed"
assert row[1] == "LLM timeout"
assert row[2] is not None
def test_get_active_tasks_returns_only_active(tmp_path):
"""get_active_tasks returns only queued/running tasks with job info joined."""
from scripts.db import init_db, insert_job, insert_task, update_task_status, get_active_tasks
db_path = tmp_path / "test.db"
init_db(db_path)
job_id = insert_job(db_path, {
"title": "CSM", "company": "Acme", "url": "https://ex.com/1",
"source": "linkedin", "location": "Remote", "is_remote": True,
"salary": "", "description": "", "date_found": "2026-02-20",
})
active_id, _ = insert_task(db_path, "cover_letter", job_id)
done_id, _ = insert_task(db_path, "company_research", job_id)
update_task_status(db_path, done_id, "completed")
tasks = get_active_tasks(db_path)
assert len(tasks) == 1
assert tasks[0]["id"] == active_id
assert tasks[0]["company"] == "Acme"
assert tasks[0]["title"] == "CSM"
def test_get_task_for_job_returns_latest(tmp_path):
"""get_task_for_job returns the most recent task for the given type+job."""
from scripts.db import init_db, insert_job, insert_task, update_task_status, get_task_for_job
db_path = tmp_path / "test.db"
init_db(db_path)
job_id = insert_job(db_path, {
"title": "CSM", "company": "Acme", "url": "https://ex.com/1",
"source": "linkedin", "location": "Remote", "is_remote": True,
"salary": "", "description": "", "date_found": "2026-02-20",
})
first_id, _ = insert_task(db_path, "cover_letter", job_id)
update_task_status(db_path, first_id, "completed")
second_id, _ = insert_task(db_path, "cover_letter", job_id) # allowed since first is done
task = get_task_for_job(db_path, "cover_letter", job_id)
assert task is not None
assert task["id"] == second_id
def test_get_task_for_job_returns_none_when_absent(tmp_path):
"""get_task_for_job returns None when no task exists for that job+type."""
from scripts.db import init_db, insert_job, get_task_for_job
db_path = tmp_path / "test.db"
init_db(db_path)
job_id = insert_job(db_path, {
"title": "CSM", "company": "Acme", "url": "https://ex.com/1",
"source": "linkedin", "location": "Remote", "is_remote": True,
"salary": "", "description": "", "date_found": "2026-02-20",
})
assert get_task_for_job(db_path, "cover_letter", job_id) is None
```
### Step 2: Run tests to verify they fail
```bash
/devl/miniconda3/envs/job-seeker/bin/pytest tests/test_db.py -v -k "background_tasks or insert_task or update_task_status or get_active_tasks or get_task_for_job"
```
Expected: FAIL with `ImportError: cannot import name 'insert_task'`
### Step 3: Implement in scripts/db.py
Add the DDL constant after `CREATE_COMPANY_RESEARCH`:
```python
CREATE_BACKGROUND_TASKS = """
CREATE TABLE IF NOT EXISTS background_tasks (
id INTEGER PRIMARY KEY AUTOINCREMENT,
task_type TEXT NOT NULL,
job_id INTEGER NOT NULL,
status TEXT NOT NULL DEFAULT 'queued',
error TEXT,
created_at DATETIME DEFAULT (datetime('now')),
started_at DATETIME,
finished_at DATETIME
)
"""
```
Add `conn.execute(CREATE_BACKGROUND_TASKS)` inside `init_db()`, after the existing three `conn.execute()` calls:
```python
def init_db(db_path: Path = DEFAULT_DB) -> None:
"""Create tables if they don't exist, then run migrations."""
conn = sqlite3.connect(db_path)
conn.execute(CREATE_JOBS)
conn.execute(CREATE_JOB_CONTACTS)
conn.execute(CREATE_COMPANY_RESEARCH)
conn.execute(CREATE_BACKGROUND_TASKS) # ← add this line
conn.commit()
conn.close()
_migrate_db(db_path)
```
Add the four helper functions at the end of `scripts/db.py`:
```python
# ── Background task helpers ───────────────────────────────────────────────────
def insert_task(db_path: Path = DEFAULT_DB, task_type: str = "",
job_id: int = None) -> tuple[int, bool]:
"""Insert a new background task.
Returns (task_id, True) if inserted, or (existing_id, False) if a
queued/running task for the same (task_type, job_id) already exists.
"""
conn = sqlite3.connect(db_path)
existing = conn.execute(
"SELECT id FROM background_tasks WHERE task_type=? AND job_id=? AND status IN ('queued','running')",
(task_type, job_id),
).fetchone()
if existing:
conn.close()
return existing[0], False
cur = conn.execute(
"INSERT INTO background_tasks (task_type, job_id, status) VALUES (?, ?, 'queued')",
(task_type, job_id),
)
task_id = cur.lastrowid
conn.commit()
conn.close()
return task_id, True
def update_task_status(db_path: Path = DEFAULT_DB, task_id: int = None,
status: str = "", error: Optional[str] = None) -> None:
"""Update a task's status and set the appropriate timestamp."""
now = datetime.now().isoformat()[:16]
conn = sqlite3.connect(db_path)
if status == "running":
conn.execute(
"UPDATE background_tasks SET status=?, started_at=? WHERE id=?",
(status, now, task_id),
)
elif status in ("completed", "failed"):
conn.execute(
"UPDATE background_tasks SET status=?, finished_at=?, error=? WHERE id=?",
(status, now, error, task_id),
)
else:
conn.execute("UPDATE background_tasks SET status=? WHERE id=?", (status, task_id))
conn.commit()
conn.close()
def get_active_tasks(db_path: Path = DEFAULT_DB) -> list[dict]:
"""Return all queued/running tasks with job title and company joined in."""
conn = sqlite3.connect(db_path)
conn.row_factory = sqlite3.Row
rows = conn.execute("""
SELECT bt.*, j.title, j.company
FROM background_tasks bt
LEFT JOIN jobs j ON j.id = bt.job_id
WHERE bt.status IN ('queued', 'running')
ORDER BY bt.created_at ASC
""").fetchall()
conn.close()
return [dict(r) for r in rows]
def get_task_for_job(db_path: Path = DEFAULT_DB, task_type: str = "",
job_id: int = None) -> Optional[dict]:
"""Return the most recent task row for a (task_type, job_id) pair, or None."""
conn = sqlite3.connect(db_path)
conn.row_factory = sqlite3.Row
row = conn.execute(
"""SELECT * FROM background_tasks
WHERE task_type=? AND job_id=?
ORDER BY id DESC LIMIT 1""",
(task_type, job_id),
).fetchone()
conn.close()
return dict(row) if row else None
```
### Step 4: Run tests to verify they pass
```bash
/devl/miniconda3/envs/job-seeker/bin/pytest tests/test_db.py -v -k "background_tasks or insert_task or update_task_status or get_active_tasks or get_task_for_job"
```
Expected: all new tests PASS, no regressions
### Step 5: Run full test suite
```bash
/devl/miniconda3/envs/job-seeker/bin/pytest tests/ -v
```
Expected: all tests PASS
### Step 6: Commit
```bash
git add scripts/db.py tests/test_db.py
git commit -m "feat: add background_tasks table and DB helpers"
```
---
## Task 2: Create scripts/task_runner.py
**Files:**
- Create: `scripts/task_runner.py`
- Test: `tests/test_task_runner.py`
### Step 1: Write the failing tests
Create `tests/test_task_runner.py`:
```python
import threading
import time
import pytest
from pathlib import Path
from unittest.mock import patch, MagicMock
import sqlite3
def _make_db(tmp_path):
from scripts.db import init_db, insert_job
db = tmp_path / "test.db"
init_db(db)
job_id = insert_job(db, {
"title": "CSM", "company": "Acme", "url": "https://ex.com/1",
"source": "linkedin", "location": "Remote", "is_remote": True,
"salary": "", "description": "Great role.", "date_found": "2026-02-20",
})
return db, job_id
def test_submit_task_returns_id_and_true(tmp_path):
"""submit_task returns (task_id, True) and spawns a thread."""
db, job_id = _make_db(tmp_path)
with patch("scripts.task_runner._run_task"): # don't actually call LLM
from scripts.task_runner import submit_task
task_id, is_new = submit_task(db, "cover_letter", job_id)
assert isinstance(task_id, int) and task_id > 0
assert is_new is True
def test_submit_task_deduplicates(tmp_path):
"""submit_task returns (existing_id, False) for a duplicate in-flight task."""
db, job_id = _make_db(tmp_path)
with patch("scripts.task_runner._run_task"):
from scripts.task_runner import submit_task
first_id, _ = submit_task(db, "cover_letter", job_id)
second_id, is_new = submit_task(db, "cover_letter", job_id)
assert second_id == first_id
assert is_new is False
def test_run_task_cover_letter_success(tmp_path):
"""_run_task marks running→completed and saves cover letter to DB."""
db, job_id = _make_db(tmp_path)
from scripts.db import insert_task, get_task_for_job, get_jobs_by_status
task_id, _ = insert_task(db, "cover_letter", job_id)
with patch("scripts.generate_cover_letter.generate", return_value="Dear Hiring Manager,\nGreat fit!"):
from scripts.task_runner import _run_task
_run_task(db, task_id, "cover_letter", job_id)
task = get_task_for_job(db, "cover_letter", job_id)
assert task["status"] == "completed"
assert task["error"] is None
conn = sqlite3.connect(db)
row = conn.execute("SELECT cover_letter FROM jobs WHERE id=?", (job_id,)).fetchone()
conn.close()
assert row[0] == "Dear Hiring Manager,\nGreat fit!"
def test_run_task_company_research_success(tmp_path):
"""_run_task marks running→completed and saves research to DB."""
db, job_id = _make_db(tmp_path)
from scripts.db import insert_task, get_task_for_job, get_research
task_id, _ = insert_task(db, "company_research", job_id)
fake_result = {
"raw_output": "raw", "company_brief": "brief",
"ceo_brief": "ceo", "talking_points": "points",
}
with patch("scripts.company_research.research_company", return_value=fake_result):
from scripts.task_runner import _run_task
_run_task(db, task_id, "company_research", job_id)
task = get_task_for_job(db, "company_research", job_id)
assert task["status"] == "completed"
research = get_research(db, job_id=job_id)
assert research["company_brief"] == "brief"
def test_run_task_marks_failed_on_exception(tmp_path):
"""_run_task marks status=failed and stores error when generator raises."""
db, job_id = _make_db(tmp_path)
from scripts.db import insert_task, get_task_for_job
task_id, _ = insert_task(db, "cover_letter", job_id)
with patch("scripts.generate_cover_letter.generate", side_effect=RuntimeError("LLM timeout")):
from scripts.task_runner import _run_task
_run_task(db, task_id, "cover_letter", job_id)
task = get_task_for_job(db, "cover_letter", job_id)
assert task["status"] == "failed"
assert "LLM timeout" in task["error"]
def test_submit_task_actually_completes(tmp_path):
"""Integration: submit_task spawns a thread that completes asynchronously."""
db, job_id = _make_db(tmp_path)
from scripts.db import get_task_for_job
with patch("scripts.generate_cover_letter.generate", return_value="Cover letter text"):
from scripts.task_runner import submit_task
task_id, _ = submit_task(db, "cover_letter", job_id)
# Wait for thread to complete (max 5s)
for _ in range(50):
task = get_task_for_job(db, "cover_letter", job_id)
if task and task["status"] in ("completed", "failed"):
break
time.sleep(0.1)
task = get_task_for_job(db, "cover_letter", job_id)
assert task["status"] == "completed"
```
### Step 2: Run tests to verify they fail
```bash
/devl/miniconda3/envs/job-seeker/bin/pytest tests/test_task_runner.py -v
```
Expected: FAIL with `ModuleNotFoundError: No module named 'scripts.task_runner'`
### Step 3: Implement scripts/task_runner.py
Create `scripts/task_runner.py`:
```python
# scripts/task_runner.py
"""
Background task runner for LLM generation tasks.
Submitting a task inserts a row in background_tasks and spawns a daemon thread.
The thread calls the appropriate generator, writes results to existing tables,
and marks the task completed or failed.
Deduplication: only one queued/running task per (task_type, job_id) is allowed.
Different task types for the same job run concurrently (e.g. cover letter + research).
"""
import sqlite3
import threading
from pathlib import Path
from scripts.db import (
DEFAULT_DB,
insert_task,
update_task_status,
update_cover_letter,
save_research,
)
def submit_task(db_path: Path = DEFAULT_DB, task_type: str = "",
job_id: int = None) -> tuple[int, bool]:
"""Submit a background LLM task.
Returns (task_id, True) if a new task was queued and a thread spawned.
Returns (existing_id, False) if an identical task is already in-flight.
"""
task_id, is_new = insert_task(db_path, task_type, job_id)
if is_new:
t = threading.Thread(
target=_run_task,
args=(db_path, task_id, task_type, job_id),
daemon=True,
)
t.start()
return task_id, is_new
def _run_task(db_path: Path, task_id: int, task_type: str, job_id: int) -> None:
"""Thread body: run the generator and persist the result."""
conn = sqlite3.connect(db_path)
conn.row_factory = sqlite3.Row
row = conn.execute("SELECT * FROM jobs WHERE id=?", (job_id,)).fetchone()
conn.close()
if row is None:
update_task_status(db_path, task_id, "failed", error=f"Job {job_id} not found")
return
job = dict(row)
update_task_status(db_path, task_id, "running")
try:
if task_type == "cover_letter":
from scripts.generate_cover_letter import generate
result = generate(
job.get("title", ""),
job.get("company", ""),
job.get("description", ""),
)
update_cover_letter(db_path, job_id, result)
elif task_type == "company_research":
from scripts.company_research import research_company
result = research_company(job)
save_research(db_path, job_id=job_id, **result)
else:
raise ValueError(f"Unknown task_type: {task_type!r}")
update_task_status(db_path, task_id, "completed")
except Exception as exc:
update_task_status(db_path, task_id, "failed", error=str(exc))
```
### Step 4: Run tests to verify they pass
```bash
/devl/miniconda3/envs/job-seeker/bin/pytest tests/test_task_runner.py -v
```
Expected: all tests PASS
### Step 5: Run full test suite
```bash
/devl/miniconda3/envs/job-seeker/bin/pytest tests/ -v
```
Expected: all tests PASS
### Step 6: Commit
```bash
git add scripts/task_runner.py tests/test_task_runner.py
git commit -m "feat: add task_runner — background thread executor for LLM tasks"
```
---
## Task 3: Add sidebar task indicator to app/app.py
**Files:**
- Modify: `app/app.py`
No new tests needed — this is pure UI wiring.
### Step 1: Replace the contents of app/app.py
Current file is 33 lines. Replace entirely with:
```python
# app/app.py
"""
Streamlit entry point — uses st.navigation() to control the sidebar.
Main workflow pages are listed at the top; Settings is separated into
a "System" section so it doesn't crowd the navigation.
Run: streamlit run app/app.py
bash scripts/manage-ui.sh start
"""
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent))
import streamlit as st
from scripts.db import DEFAULT_DB, init_db, get_active_tasks
st.set_page_config(
page_title="Job Seeker",
page_icon="💼",
layout="wide",
)
init_db(DEFAULT_DB)
# ── Background task sidebar indicator ─────────────────────────────────────────
@st.fragment(run_every=3)
def _task_sidebar() -> None:
tasks = get_active_tasks(DEFAULT_DB)
if not tasks:
return
with st.sidebar:
st.divider()
st.markdown(f"**⏳ {len(tasks)} task(s) running**")
for t in tasks:
icon = "⏳" if t["status"] == "running" else "🕐"
label = "Cover letter" if t["task_type"] == "cover_letter" else "Research"
st.caption(f"{icon} {label} — {t.get('company') or 'unknown'}")
_task_sidebar()
# ── Navigation ─────────────────────────────────────────────────────────────────
pages = {
"": [
st.Page("Home.py", title="Home", icon="🏠"),
st.Page("pages/1_Job_Review.py", title="Job Review", icon="📋"),
st.Page("pages/4_Apply.py", title="Apply Workspace", icon="🚀"),
st.Page("pages/5_Interviews.py", title="Interviews", icon="🎯"),
st.Page("pages/6_Interview_Prep.py", title="Interview Prep", icon="📞"),
],
"System": [
st.Page("pages/2_Settings.py", title="Settings", icon="⚙️"),
],
}
pg = st.navigation(pages)
pg.run()
```
### Step 2: Smoke-test by running the UI
```bash
bash /devl/job-seeker/scripts/manage-ui.sh restart
```
Navigate to http://localhost:8501 and confirm the app loads without error. The sidebar task indicator does not appear when no tasks are running (correct).
### Step 3: Commit
```bash
git add app/app.py
git commit -m "feat: sidebar background task indicator with 3s auto-refresh"
```
---
## Task 4: Update 4_Apply.py to use background generation
**Files:**
- Modify: `app/pages/4_Apply.py`
No new unit tests — covered by existing test suite for DB layer. Smoke-test in browser.
### Step 1: Add imports at the top of 4_Apply.py
After the existing imports block (after `from scripts.db import ...`), add:
```python
from scripts.db import get_task_for_job
from scripts.task_runner import submit_task
```
So the full import block becomes:
```python
from scripts.db import (
DEFAULT_DB, init_db, get_jobs_by_status,
update_cover_letter, mark_applied,
get_task_for_job,
)
from scripts.task_runner import submit_task
```
### Step 2: Replace the Generate button section
Find this block (around line 174185):
```python
if st.button("✨ Generate / Regenerate", use_container_width=True):
with st.spinner("Generating via LLM…"):
try:
from scripts.generate_cover_letter import generate as _gen
st.session_state[_cl_key] = _gen(
job.get("title", ""),
job.get("company", ""),
job.get("description", ""),
)
st.rerun()
except Exception as e:
st.error(f"Generation failed: {e}")
```
Replace with:
```python
_cl_task = get_task_for_job(DEFAULT_DB, "cover_letter", selected_id)
_cl_running = _cl_task and _cl_task["status"] in ("queued", "running")
if st.button("✨ Generate / Regenerate", use_container_width=True, disabled=bool(_cl_running)):
submit_task(DEFAULT_DB, "cover_letter", selected_id)
st.rerun()
if _cl_running:
@st.fragment(run_every=3)
def _cl_status_fragment():
t = get_task_for_job(DEFAULT_DB, "cover_letter", selected_id)
if t and t["status"] in ("queued", "running"):
lbl = "Queued…" if t["status"] == "queued" else "Generating via LLM…"
st.info(f"⏳ {lbl}")
else:
st.rerun() # full page rerun — reloads cover letter from DB
_cl_status_fragment()
elif _cl_task and _cl_task["status"] == "failed":
st.error(f"Generation failed: {_cl_task.get('error', 'unknown error')}")
```
Also update the session-state initialiser just below (line 171172) so it loads from DB after background completion. The existing code already does this correctly:
```python
if _cl_key not in st.session_state:
st.session_state[_cl_key] = job.get("cover_letter") or ""
```
This is fine — `job` is fetched fresh on each full-page rerun, so when the background thread writes to `jobs.cover_letter`, the next full rerun picks it up.
### Step 3: Smoke-test in browser
1. Navigate to Apply Workspace
2. Select an approved job
3. Click "Generate / Regenerate"
4. Navigate away to Home
5. Navigate back to Apply Workspace for the same job
6. Observe: button is disabled and "⏳ Generating via LLM…" shows while running; cover letter appears when done
### Step 4: Commit
```bash
git add app/pages/4_Apply.py
git commit -m "feat: cover letter generation runs in background, survives navigation"
```
---
## Task 5: Update 6_Interview_Prep.py to use background research
**Files:**
- Modify: `app/pages/6_Interview_Prep.py`
### Step 1: Add imports at the top of 6_Interview_Prep.py
After the existing `from scripts.db import (...)` block, add:
```python
from scripts.db import get_task_for_job
from scripts.task_runner import submit_task
```
So the full import block becomes:
```python
from scripts.db import (
DEFAULT_DB, init_db,
get_interview_jobs, get_contacts, get_research,
save_research, get_task_for_job,
)
from scripts.task_runner import submit_task
```
### Step 2: Replace the "no research yet" generate button block
Find this block (around line 99111):
```python
if not research:
st.warning("No research brief yet for this job.")
if st.button("🔬 Generate research brief", type="primary", use_container_width=True):
with st.spinner("Generating… this may take 3060 seconds"):
try:
from scripts.company_research import research_company
result = research_company(job)
save_research(DEFAULT_DB, job_id=selected_id, **result)
st.success("Done!")
st.rerun()
except Exception as e:
st.error(f"Error: {e}")
st.stop()
else:
```
Replace with:
```python
_res_task = get_task_for_job(DEFAULT_DB, "company_research", selected_id)
_res_running = _res_task and _res_task["status"] in ("queued", "running")
if not research:
if not _res_running:
st.warning("No research brief yet for this job.")
if _res_task and _res_task["status"] == "failed":
st.error(f"Last attempt failed: {_res_task.get('error', '')}")
if st.button("🔬 Generate research brief", type="primary", use_container_width=True):
submit_task(DEFAULT_DB, "company_research", selected_id)
st.rerun()
if _res_running:
@st.fragment(run_every=3)
def _res_status_initial():
t = get_task_for_job(DEFAULT_DB, "company_research", selected_id)
if t and t["status"] in ("queued", "running"):
lbl = "Queued…" if t["status"] == "queued" else "Generating… this may take 3060 seconds"
st.info(f"⏳ {lbl}")
else:
st.rerun()
_res_status_initial()
st.stop()
else:
```
### Step 3: Replace the "refresh" button block
Find this block (around line 113124):
```python
generated_at = research.get("generated_at", "")
col_ts, col_btn = st.columns([3, 1])
col_ts.caption(f"Research generated: {generated_at}")
if col_btn.button("🔄 Refresh", use_container_width=True):
with st.spinner("Refreshing…"):
try:
from scripts.company_research import research_company
result = research_company(job)
save_research(DEFAULT_DB, job_id=selected_id, **result)
st.rerun()
except Exception as e:
st.error(f"Error: {e}")
```
Replace with:
```python
generated_at = research.get("generated_at", "")
col_ts, col_btn = st.columns([3, 1])
col_ts.caption(f"Research generated: {generated_at}")
if col_btn.button("🔄 Refresh", use_container_width=True, disabled=bool(_res_running)):
submit_task(DEFAULT_DB, "company_research", selected_id)
st.rerun()
if _res_running:
@st.fragment(run_every=3)
def _res_status_refresh():
t = get_task_for_job(DEFAULT_DB, "company_research", selected_id)
if t and t["status"] in ("queued", "running"):
lbl = "Queued…" if t["status"] == "queued" else "Refreshing research…"
st.info(f"⏳ {lbl}")
else:
st.rerun()
_res_status_refresh()
elif _res_task and _res_task["status"] == "failed":
st.error(f"Refresh failed: {_res_task.get('error', '')}")
```
### Step 4: Smoke-test in browser
1. Move a job to Phone Screen on the Interviews page
2. Navigate to Interview Prep, select that job
3. Click "Generate research brief"
4. Navigate away to Home
5. Navigate back — observe "⏳ Generating…" inline indicator
6. Wait for completion — research sections populate automatically
### Step 5: Run full test suite one final time
```bash
/devl/miniconda3/envs/job-seeker/bin/pytest tests/ -v
```
Expected: all tests PASS
### Step 6: Commit
```bash
git add app/pages/6_Interview_Prep.py
git commit -m "feat: company research generation runs in background, survives navigation"
```
---
## Summary of Changes
| File | Change |
|------|--------|
| `scripts/db.py` | Add `CREATE_BACKGROUND_TASKS`, `init_db` call, 4 new helpers |
| `scripts/task_runner.py` | New file — `submit_task` + `_run_task` thread body |
| `app/app.py` | Add `_task_sidebar` fragment with 3s auto-refresh |
| `app/pages/4_Apply.py` | Generate button → `submit_task`; inline status fragment |
| `app/pages/6_Interview_Prep.py` | Generate/Refresh buttons → `submit_task`; inline status fragments |
| `tests/test_db.py` | 9 new tests for background_tasks helpers |
| `tests/test_task_runner.py` | New file — 6 tests for task_runner |

View file

@ -1,91 +0,0 @@
# Email Handling Design
**Date:** 2026-02-21
**Status:** Approved
## Problem
IMAP sync already pulls emails for active pipeline jobs, but two gaps exist:
1. Inbound emails suggesting a stage change (e.g. "let's schedule a call") produce no signal — the recruiter's message just sits in the email log.
2. Recruiter outreach to email addresses not yet in the pipeline is invisible — those leads never enter Job Review.
## Goals
- Surface stage-change suggestions inline on the Interviews kanban card (suggest-only, never auto-advance).
- Capture recruiter leads from unmatched inbound email and surface them in Job Review.
- Make email sync a background task triggerable from the UI (Home page + Interviews sidebar).
## Data Model
**No new tables.** Two columns added to `job_contacts`:
```sql
ALTER TABLE job_contacts ADD COLUMN stage_signal TEXT;
ALTER TABLE job_contacts ADD COLUMN suggestion_dismissed INTEGER DEFAULT 0;
```
- `stage_signal` — one of: `interview_scheduled`, `offer_received`, `rejected`, `positive_response`, `neutral` (or NULL if not yet classified).
- `suggestion_dismissed` — 1 when the user clicks Dismiss; prevents the banner re-appearing.
Email leads reuse the existing `jobs` table with `source = 'email'` and `status = 'pending'`. No new columns needed.
## Components
### 1. Stage Signal Classification (`scripts/imap_sync.py`)
After saving each **inbound** contact row, call `phi3:mini` via Ollama to classify the email into one of the five labels. Store the result in `stage_signal`. If classification fails, default to `NULL` (no suggestion shown).
**Model:** `phi3:mini` via `LLMRouter.complete(model_override="phi3:mini", fallback_order=["ollama_research"])`.
Benchmarked at 100% accuracy / 3.0 s per email on a 12-case test suite. Runner-up Qwen2.5-3B untested but phi3-mini is the safe choice.
### 2. Recruiter Lead Extraction (`scripts/imap_sync.py`)
A second pass after per-job sync: scan INBOX broadly for recruitment-keyword emails that don't match any known pipeline company. For each unmatched email, call **Nemotron 1.5B** (already in use for company research) to extract `{company, title}`. If extraction returns a company name not already in the DB, insert a new job row `source='email', status='pending'`.
**Dedup:** checked by `message_id` against all known contacts (cross-job), plus `url` uniqueness on the jobs table (the email lead URL is set to a synthetic `email://<from_domain>/<message_id>` value).
### 3. Background Task (`scripts/task_runner.py`)
New task type: `email_sync` with `job_id = 0`.
`submit_task(db, "email_sync", 0)` → daemon thread → `sync_all()` → returns summary via task `error` field.
Deduplication: only one `email_sync` can be queued/running at a time (existing insert_task logic handles this).
### 4. UI — Sync Button (Home + Interviews)
**Home.py:** New "Sync Emails" section alongside Find Jobs / Score / Notion sync.
**5_Interviews.py:** Existing sync button already present in sidebar; convert from synchronous `sync_all()` call to `submit_task()` + fragment polling.
### 5. UI — Email Leads (Job Review)
When `show_status == "pending"`, prepend email leads (`source = 'email'`) at the top of the list with a distinct `📧 Email Lead` badge. Actions are identical to scraped pending jobs (Approve / Reject).
### 6. UI — Stage Suggestion Banner (Interviews Kanban)
Inside `_render_card()`, before the advance/reject buttons, check for unseen stage signals:
```
💡 Email suggests: interview_scheduled
From: sarah@company.com · "Let's book a call"
[→ Move to Phone Screen] [Dismiss]
```
- "Move" calls `advance_to_stage()` + `submit_task("company_research")` then reruns.
- "Dismiss" calls `dismiss_stage_signal(contact_id)` then reruns.
- Only the most recent undismissed signal is shown per card.
## Error Handling
| Failure | Behaviour |
|---------|-----------|
| IMAP connection fails | Error stored in task `error` field; shown as warning in UI after sync |
| Classifier call fails | `stage_signal` left NULL; no suggestion shown; sync continues |
| Lead extractor fails | Email skipped; appended to `result["errors"]`; sync continues |
| Duplicate `email_sync` task | `insert_task` returns existing id; no new thread spawned |
| LLM extraction returns no company | Email silently skipped (not a lead) |
## Out of Scope
- Auto-advancing pipeline stage (suggest only).
- Sending email replies from the app (draft helper already exists).
- OAuth / token-refresh IMAP (config/email.yaml credentials only).

Some files were not shown because too many files have changed in this diff Show more