Commit graph

324 commits

Author SHA1 Message Date
869cb2f197 ci: apt-get update before installing libsqlcipher-dev 2026-03-15 16:37:46 -07:00
27d6fc01fc ci: install libsqlcipher-dev before pip install 2026-03-15 16:36:50 -07:00
e034a07509 ci: re-trigger after actions enabled 2026-03-15 15:54:27 -07:00
2b9a6c8a22 ci: enable forgejo actions 2026-03-15 15:48:35 -07:00
e62548a22e ci: trigger runner 2026-03-15 15:39:45 -07:00
3267a895b0 docs: add Jobgether non-headless Playwright scraper to backlog
Xvfb-based Playwright can bypass Cloudflare Turnstile on jobgether.com.
Live inspection confirmed selectors; deferred pending Xvfb integration.
2026-03-15 11:59:48 -07:00
522534d28e feat: add Jobgether recruiter framing to cover letter generation
When source == "jobgether", build_prompt() injects a recruiter context
note directing the LLM to address the Jobgether recruiter using
"Your client [at {company}] will appreciate..." framing rather than
addressing the employer directly. generate() and task_runner both
thread the is_jobgether flag through automatically.
2026-03-15 09:45:51 -07:00
37119cb332 feat: add Jobgether URL detection and scraper to scrape_url.py 2026-03-15 09:45:50 -07:00
8d9e17d749 feat: filter Jobgether listings via blocklist 2026-03-15 09:45:50 -07:00
4d08e64acf docs: update spec — Jobgether discovery scraper not viable (Cloudflare + robots.txt) 2026-03-15 09:45:50 -07:00
fc6ef88a05 docs: add Jobgether integration implementation plan 2026-03-15 09:45:50 -07:00
952b21377f docs: add cover letter recruiter framing to Jobgether spec 2026-03-15 09:45:50 -07:00
9c87ed1cf2 docs: add Jobgether integration design spec 2026-03-15 09:45:50 -07:00
a1a1141616 Merge pull request 'feat: LLM queue optimizer — resource-aware batch scheduler (closes #2)' (#13) from feature/llm-queue-optimizer into main
Reviewed-on: #13
2026-03-15 05:11:29 -07:00
27d4b0e732 feat: LLM queue optimizer complete — closes #2
Resource-aware batch scheduler for LLM tasks:
- scripts/task_scheduler.py (new): TaskScheduler singleton with VRAM-aware
  batch scheduling, durability, thread-safe singleton, memory safety
- scripts/task_runner.py: submit_task() routes LLM types through scheduler
- scripts/db.py: reset_running_tasks() for durable restart behavior
- app/app.py: _startup() preserves queued tasks on restart
- config/llm.yaml.example: scheduler VRAM budget config documented
- tests/test_task_scheduler.py (new): 24 tests covering all behaviors

Pre-existing failure: test_generate_calls_llm_router (issue #12, unrelated)
2026-03-15 05:01:24 -07:00
95378c106e feat(app): use reset_running_tasks() on startup to preserve queued tasks 2026-03-15 04:57:49 -07:00
07c627cdb0 feat(task_runner): route LLM tasks through scheduler in submit_task()
Replaces the spawn-per-task model for LLM task types with scheduler
routing: cover_letter, company_research, and wizard_generate are now
enqueued via the TaskScheduler singleton for VRAM-aware batching.
Non-LLM tasks (discovery, email_sync, etc.) continue to spawn daemon
threads directly. Adds autouse clean_scheduler fixture to
test_task_runner.py to prevent singleton cross-test contamination.
2026-03-15 04:52:42 -07:00
bcd918fb67 feat(scheduler): add durability — re-queue surviving LLM tasks on startup 2026-03-15 04:24:11 -07:00
207d3816b3 feat(scheduler): implement thread-safe singleton get_scheduler/reset_scheduler 2026-03-15 04:19:23 -07:00
3984a9c743 feat(scheduler): implement scheduler loop and batch worker with VRAM-aware scheduling 2026-03-15 04:14:56 -07:00
4d055f6bcd feat(scheduler): implement enqueue() with depth guard and ghost-row cleanup 2026-03-15 04:05:22 -07:00
28e66001a3 refactor(scheduler): use module-level _get_gpus directly in __init__ 2026-03-15 04:01:01 -07:00
535c0ae9e0 feat(scheduler): implement TaskScheduler.__init__ with budget loading and VRAM detection 2026-03-15 03:32:11 -07:00
3d7f6f7ff1 feat(scheduler): add task_scheduler.py skeleton with constants and TaskSpec 2026-03-15 03:28:43 -07:00
52470759a4 docs(config): add scheduler VRAM budget config to llm.yaml.example 2026-03-15 03:28:26 -07:00
d51066e8c2 refactor(tests): remove unused imports from test_task_scheduler 2026-03-15 03:27:17 -07:00
905db2f147 feat(db): add reset_running_tasks() for durable scheduler restart 2026-03-15 03:22:45 -07:00
eef2478948 docs: add LLM queue optimizer implementation plan
11-task TDD plan across 3 reviewed chunks. Covers:
- reset_running_tasks() db helper
- TaskScheduler skeleton + __init__ + enqueue + loop + workers
- Thread-safe singleton, durability, submit_task routing shim
- app.py startup change + full suite verification
2026-03-14 17:11:49 -07:00
beb1553821 docs: revise queue optimizer spec after review
Addresses 16 review findings across two passes:
- Clarify _active.pop/double-decrement non-issue
- Fix app.py change target (inline SQL, not kill_stuck_tasks)
- Scope durability to LLM types only
- Add _budgets to state table with load logic
- Fix singleton safety explanation (lock, not GIL)
- Ghost row fix: mark dropped tasks failed in DB
- Document static _available_vram as known limitation
- Fix test_llm_tasks_batch_by_type description
- Eliminate circular import via routing split in submit_task()
- Add missing budget warning at construction
2026-03-14 16:46:38 -07:00
61dc2122e4 docs: add LLM queue optimizer design spec
Resource-aware batch scheduler for LLM tasks. Closes #2.
2026-03-14 16:38:47 -07:00
0f80b698ff chore: add .worktrees/ to .gitignore
Prevents worktree directories from being tracked.
2026-03-14 16:30:38 -07:00
097def4bba fix(linkedin): update selectors for 2025 public DOM; surface login-wall limitation in UI
LinkedIn's unauthenticated public profile only exposes name, summary (truncated),
current employer name, and certifications. Past roles, education, and skills are
blurred server-side behind a login wall — not a scraper limitation.

- Update selectors: data-section='summary' (was 'about'), .profile-section-card
  for certs, .visible-list for current experience entry
- Strip login-wall noise injected into summary text after 'see more'
- Skip aria-hidden blurred placeholder experience items
- Add info callout in UI directing users to data export zip for full history
2026-03-13 19:47:21 -07:00
1a50bc1392 chore: update changelog for v0.4.0 release 2026-03-13 11:28:03 -07:00
d1fb4abd56 docs: update backlog with LinkedIn import follow-up items 2026-03-13 11:24:55 -07:00
6c7499752c fix(cloud): use per-user config dir for wizard gate; redirect on invalid session
- app.py: wizard gate now reads get_config_dir()/user.yaml instead of
  hardcoded repo-level config/ — fixes perpetual onboarding loop in
  cloud mode where per-user wizard_complete was never seen
- app.py: page title corrected to "Peregrine"
- cloud_session.py: add get_config_dir() returning per-user config path
  in cloud mode, repo config/ locally
- cloud_session.py: replace st.error() with JS redirect on missing/invalid
  session token so users land on login page instead of error screen
- Home.py, 4_Apply.py, migrate.py: remove remaining AIHawk UI references
2026-03-13 11:24:42 -07:00
42f0e6261c fix(linkedin): conservative settings merge, mkdir guard, split dockerfile playwright layer 2026-03-13 10:58:58 -07:00
1e12da45f1 fix(linkedin): move session state pop before tabs; add rerun after settings merge
- Pop _linkedin_extracted before st.tabs() so tab_builder sees the
  freshly populated _parsed_resume in the same render pass (no extra rerun needed)
- Fix tab label capitalisation: "Build Manually" (capital M) per spec
- Add st.rerun() after LinkedIn merge in Settings so form fields
  refresh immediately to show the newly applied data
2026-03-13 10:55:25 -07:00
b80e4de050 feat(linkedin): install Playwright Chromium in Docker image 2026-03-13 10:44:03 -07:00
7489c1c12a feat(linkedin): add LinkedIn import expander to Settings Resume Profile tab 2026-03-13 10:44:02 -07:00
97ab8b94e5 feat(linkedin): add LinkedIn tab to wizard resume step 2026-03-13 10:43:53 -07:00
bd0e9240eb feat(linkedin): add shared LinkedIn import Streamlit widget 2026-03-13 10:32:23 -07:00
5344dc8e7a feat(linkedin): add staging file parser with re-parse support 2026-03-13 10:18:01 -07:00
fba6796b8a fix(linkedin): improve scraper error handling, current-job date range, add missing tests 2026-03-13 06:02:03 -07:00
f759f5fbc0 feat(linkedin): add scraper (Playwright + export zip) with URL validation 2026-03-13 01:06:39 -07:00
530f4346d1 feat(linkedin): add HTML parser utils with fixture tests 2026-03-13 01:01:05 -07:00
db26b9aaf9 feat(cloud): add Heimdall tier resolution to cloud_session
Calls /admin/cloud/resolve after JWT validation to inject the user's
current subscription tier (free/paid/premium/ultra) into session_state
as cloud_tier. Cached 5 minutes via st.cache_data to avoid Heimdall
spam on every Streamlit rerun. Degrades gracefully to free on timeout
or missing token.

New env vars: HEIMDALL_URL, HEIMDALL_ADMIN_TOKEN (added to .env.example
and compose.cloud.yml). HEIMDALL_URL defaults to http://cf-license:8000
for internal Docker network access.

New helper: get_cloud_tier() — returns tier string in cloud mode, "local"
in local-first mode, so pages can distinguish self-hosted from cloud.
2026-03-10 12:31:14 -07:00
97b695c3e3 fix(cloud): extract cf_session cookie by name from X-CF-Session header 2026-03-10 09:22:08 -07:00
72320315e2 docs: add cloud architecture + cloud-deployment.md
architecture.md: updated Docker Compose table (3 compose files), database
layer (Postgres platform + SQLite-per-user), cloud session middleware,
telemetry system, and cloud design decisions.

cloud-deployment.md (new): full operational runbook — env vars, data root
layout, GDPR deletion, platform DB queries, telemetry, backup/restore,
Caddy routing, demo instance, and onboarding a new app to the cloud.
2026-03-09 23:02:29 -07:00
37dcdec754 feat(cloud): fix backup/restore for cloud mode — SQLCipher encrypt/decrypt
T13: Three fixes:
1. backup.py: _decrypt_db_to_bytes() decrypts SQLCipher DB before archiving
   so the zip is portable to any local Docker install (plain SQLite).
2. backup.py: _encrypt_db_from_bytes() re-encrypts on restore in cloud mode
   so the app can open the restored DB normally.
3. 2_Settings.py: _base_dir uses get_db_path().parent in cloud mode (user's
   per-tenant data dir) instead of the hardcoded app root; db_key wired
   through both create_backup() and restore_backup() calls.

6 new cloud backup tests + 2 unit tests for SQLCipher helpers (pysqlcipher3
mocked — not available in the local conda test env). 419/419 total passing.
2026-03-09 22:41:44 -07:00
ce19e00cfe feat(cloud): Privacy & Telemetry tab in Settings + update_consent()
T11: Add CLOUD_MODE-gated Privacy tab to Settings with full telemetry
consent UI — hard kill switch, anonymous usage toggle, de-identified
content sharing toggle, and time-limited support access grant. All changes
persist to telemetry_consent table via new update_consent() in telemetry.py.

Tab and all DB calls are completely no-op in local mode (CLOUD_MODE=false).
2026-03-09 22:14:22 -07:00