Commit graph

75 commits

Author SHA1 Message Date
b3893e9ad9 feat: add Jobgether URL detection and scraper to scrape_url.py 2026-03-15 09:45:50 -07:00
690a1ccf93 feat(task_runner): route LLM tasks through scheduler in submit_task()
Replaces the spawn-per-task model for LLM task types with scheduler
routing: cover_letter, company_research, and wizard_generate are now
enqueued via the TaskScheduler singleton for VRAM-aware batching.
Non-LLM tasks (discovery, email_sync, etc.) continue to spawn daemon
threads directly. Adds autouse clean_scheduler fixture to
test_task_runner.py to prevent singleton cross-test contamination.
2026-03-15 04:52:42 -07:00
3e3c6f1fc5 feat(scheduler): add durability — re-queue surviving LLM tasks on startup 2026-03-15 04:24:11 -07:00
9b96c45b63 feat(scheduler): implement thread-safe singleton get_scheduler/reset_scheduler 2026-03-15 04:19:23 -07:00
a53a03d593 feat(scheduler): implement scheduler loop and batch worker with VRAM-aware scheduling 2026-03-15 04:14:56 -07:00
68d257d278 feat(scheduler): implement enqueue() with depth guard and ghost-row cleanup 2026-03-15 04:05:22 -07:00
46b229094a refactor(scheduler): use module-level _get_gpus directly in __init__ 2026-03-15 04:01:01 -07:00
415e98d401 feat(scheduler): implement TaskScheduler.__init__ with budget loading and VRAM detection 2026-03-15 03:32:11 -07:00
fe8da36e00 feat(scheduler): add task_scheduler.py skeleton with constants and TaskSpec 2026-03-15 03:28:43 -07:00
376e028af5 feat(db): add reset_running_tasks() for durable scheduler restart 2026-03-15 03:22:45 -07:00
2c61d4038f fix(linkedin): update selectors for 2025 public DOM; surface login-wall limitation in UI
LinkedIn's unauthenticated public profile only exposes name, summary (truncated),
current employer name, and certifications. Past roles, education, and skills are
blurred server-side behind a login wall — not a scraper limitation.

- Update selectors: data-section='summary' (was 'about'), .profile-section-card
  for certs, .visible-list for current experience entry
- Strip login-wall noise injected into summary text after 'see more'
- Skip aria-hidden blurred placeholder experience items
- Add info callout in UI directing users to data export zip for full history
2026-03-13 19:47:21 -07:00
3e8b4cd654 fix(cloud): use per-user config dir for wizard gate; redirect on invalid session
- app.py: wizard gate now reads get_config_dir()/user.yaml instead of
  hardcoded repo-level config/ — fixes perpetual onboarding loop in
  cloud mode where per-user wizard_complete was never seen
- app.py: page title corrected to "Peregrine"
- cloud_session.py: add get_config_dir() returning per-user config path
  in cloud mode, repo config/ locally
- cloud_session.py: replace st.error() with JS redirect on missing/invalid
  session token so users land on login page instead of error screen
- Home.py, 4_Apply.py, migrate.py: remove remaining AIHawk UI references
2026-03-13 11:24:42 -07:00
00f0eb4807 feat(linkedin): add staging file parser with re-parse support 2026-03-13 10:18:01 -07:00
e937094884 fix(linkedin): improve scraper error handling, current-job date range, add missing tests 2026-03-13 06:02:03 -07:00
f64ecf81e0 feat(linkedin): add scraper (Playwright + export zip) with URL validation 2026-03-13 01:06:39 -07:00
a43e29e50d feat(linkedin): add HTML parser utils with fixture tests 2026-03-13 01:01:05 -07:00
7a698496f9 feat(cloud): fix backup/restore for cloud mode — SQLCipher encrypt/decrypt
T13: Three fixes:
1. backup.py: _decrypt_db_to_bytes() decrypts SQLCipher DB before archiving
   so the zip is portable to any local Docker install (plain SQLite).
2. backup.py: _encrypt_db_from_bytes() re-encrypts on restore in cloud mode
   so the app can open the restored DB normally.
3. 2_Settings.py: _base_dir uses get_db_path().parent in cloud mode (user's
   per-tenant data dir) instead of the hardcoded app root; db_key wired
   through both create_backup() and restore_backup() calls.

6 new cloud backup tests + 2 unit tests for SQLCipher helpers (pysqlcipher3
mocked — not available in the local conda test env). 419/419 total passing.
2026-03-09 22:41:44 -07:00
96715bdeb6 feat(peregrine): add cloud_session middleware + SQLCipher get_connection()
cloud_session.py: no-op in local mode; in cloud mode resolves Directus JWT
from X-CF-Session header to per-user db_path in st.session_state.

get_connection() in scripts/db.py: transparent SQLCipher/sqlite3 switch —
uses encrypted driver when CLOUD_MODE=true and key provided, vanilla sqlite3
otherwise. libsqlcipher-dev added to Dockerfile for Docker builds.

6 new cloud_session tests + 1 new get_connection test — 34/34 db tests pass.
2026-03-09 19:43:42 -07:00
f60ac07541 test: add missing base_url edge case + clarify 0.0.0.0 marker intent
Document defensive behavior: openai_compat with no base_url returns True
(cloud) because unknown destination is assumed cloud. Add explanatory
comment to LOCAL_URL_MARKERS for the 0.0.0.0 bind-address case.
2026-03-06 14:43:45 -08:00
47d8317d56 feat: byok_guard — cloud backend detection with full test coverage 2026-03-06 14:40:06 -08:00
ce8d5a4ac0 feat: add suggest_resume_keywords for skills/domains/keywords gap analysis
Replaces NotImplementedError stub with full LLM-backed implementation.
Builds a prompt from the last 3 resume positions plus already-selected
skills/domains/keywords, calls LLMRouter, and returns de-duped suggestions
in all three categories.
2026-03-05 15:00:53 -08:00
4e600c3019 fix: guard mission_preferences values against non-string types in suggest_search_terms 2026-03-05 13:40:53 -08:00
b841ac5418 feat: add suggest_search_terms with three-angle exclude analysis
Replaces NotImplementedError stub with a real LLMRouter-backed implementation
that builds a structured prompt covering blocklist alias expansion, values
misalignment, and role-type filtering, then parses the JSON response into
suggested_titles and suggested_excludes lists.

Moves LLMRouter import to module level so tests can patch it at
scripts.suggest_helpers.LLMRouter.
2026-03-05 13:15:25 -08:00
40d87dc014 fix: DEFAULT_DB respects STAGING_DB env var — was ignoring Docker-set path 2026-03-04 11:47:59 -08:00
d56c44224f feat: backup/restore script with multi-instance and legacy support
- create_backup() / restore_backup() / list_backup_contents() public API
- --base-dir PATH flag: targets any instance root (default: this repo)
  --base-dir /devl/job-seeker backs up the legacy Conda install
- _DB_CANDIDATES fallback: data/staging.db (Peregrine) or staging.db root (legacy)
- Manifest records source label (dir name), source_path, created_at, files, includes_db
- Added config/resume_keywords.yaml and config/server.yaml to backup lists
- 21 tests covering create, list, restore, legacy DB path, overwrite, roundtrip
2026-03-04 10:52:51 -08:00
042bb519de fix: llm_backend reads fallback_order, logs tee'd to data/.streamlit.log in Docker 2026-03-03 15:04:18 -08:00
582f2422ff fix: lazy-import playwright in screenshot_page, fix SQLite connection leak in collect_listings 2026-03-03 12:45:39 -08:00
260be9e821 feat: feedback_api — screenshot_page with Playwright (graceful fallback) 2026-03-03 12:14:33 -08:00
b77bb754af feat: feedback_api — Forgejo label management + issue filing + attachment upload 2026-03-03 12:09:11 -08:00
1940cfb131 feat: feedback_api — build_issue_body 2026-03-03 12:00:01 -08:00
6764ad4288 feat: feedback_api — collect_logs + collect_listings 2026-03-03 11:56:35 -08:00
faf65023b4 chore: remove unused imports from feedback_api (will be re-added in later tasks) 2026-03-03 11:45:14 -08:00
7f46d7fadf feat: feedback_api — mask_pii + collect_context 2026-03-03 11:43:35 -08:00
0a728fddbc feat: DEMO_MODE — isolated public menagerie demo instance
Adds a fully neutered public demo for menagerie.circuitforge.tech/peregrine
that shows the Peregrine UI without exposing any personal data or real LLM inference.

scripts/llm_router.py:
  - Block all inference when DEMO_MODE env var is set (1/true/yes)
  - Raises RuntimeError with a user-friendly "public demo" message

app/app.py:
  - IS_DEMO constant from DEMO_MODE env var
  - Wizard gate bypassed in demo mode (demo/config/user.yaml pre-seeds a fake profile)
  - Demo banner in sidebar: explains read-only status + links to circuitforge.tech

compose.menagerie.yml (new):
  - Separate Docker Compose project (peregrine-demo) on host port 8504
  - Mounts demo/config/ and demo/data/ — isolated from personal instance
  - DEMO_MODE=true, no API keys, no /docs mount
  - Project name: peregrine-demo (run alongside personal instance)

demo/config/user.yaml:
  - Generic "Demo User" profile, wizard_complete=true, no real personal info

demo/config/llm.yaml:
  - All backends disabled (belt-and-suspenders alongside DEMO_MODE block)

demo/data/.gitkeep:
  - staging.db is auto-created on first run, gitignored via demo/data/*.db

.gitignore: add demo/data/*.db

Caddy routes menagerie.circuitforge.tech/peregrine* → 8504 (demo instance).
Personal Peregrine remains on 8502, unchanged.
2026-03-02 11:22:38 -08:00
9fe9c6234d fix: RerankerAdapter falls back to label name when no LABEL_DESCRIPTIONS entry 2026-02-27 14:54:31 -08:00
23828520f0 feat: label_tool — 9 labels, wildcard Other, InvalidCharacterError fix; sync with avocet canonical 2026-02-27 14:34:24 -08:00
a316f110c8 feat: add health mission category, trim-to-sign-off, max_tokens cap for cover letters
- _MISSION_SIGNALS: add health category (pharma, clinical, patient care, etc.)
  listed last so music/animals/education/social_impact take priority
- _MISSION_DEFAULTS: health note steers toward people-first framing, not
  industry enthusiasm — focuses on patients navigating rare/invisible journeys
- _trim_to_letter_end(): cuts output at first sign-off + first name to prevent
  fine-tuned models from looping into repetitive garbage after completing letter
- generate(): pass max_tokens=1200 to router (prevents runaway output)
- user.yaml.example: add health + social_impact to mission_preferences,
  add candidate_voice field for per-user voice/personality context
2026-02-27 12:31:06 -08:00
94734ad584 feat: benchmark_classifier — MODEL_REGISTRY, --list-models, --score, --compare modes 2026-02-27 06:19:32 -08:00
6ca5893b1c feat: add DUAL_GPU_MODE default, VRAM warning, and download size report to preflight
- Add _mixed_mode_vram_warning() to flag low VRAM on GPU 1 in mixed mode
- Wire download size report block into main() before closing border line
- Wire mixed-mode VRAM warning into report if triggered
- Write DUAL_GPU_MODE=ollama default to .env for new 2-GPU setups (no override if already set)
- Promote import os to top-level (was local import inside get_cpu_cores)
2026-02-27 00:17:00 -08:00
5ab3e2dc39 feat: add _download_size_mb() pure function for preflight size warning 2026-02-27 00:15:26 -08:00
e79404d316 feat: add ollama_research to preflight service table and LLM backend map 2026-02-27 00:14:04 -08:00
5497674b34 feat: ZeroShotAdapter, GLiClassAdapter, RerankerAdapter with full mock test coverage 2026-02-27 00:10:43 -08:00
3e47afd953 feat: ClassifierAdapter ABC + compute_metrics() with full test coverage 2026-02-27 00:09:45 -08:00
52e972fd69 feat: add job-seeker-classifiers conda env for HF classifier benchmark 2026-02-26 23:43:41 -08:00
cda980da62 feat: bundled skills suggestion list and content filter utility
- config/skills_suggestions.yaml: 168 curated tags across skills (77),
  domains (40), keywords (51) covering CS/TAM/ops and common tech roles;
  structured for future community aggregate (paid tier backlog)
- scripts/skills_utils.py: filter_tag() rejects blanks, URLs, profanity,
  overlong strings, disallowed chars, and repeated-char runs;
  load_suggestions() reads bundled YAML per category
2026-02-26 13:09:32 -08:00
db127848a1 fix: resume CID glyphs, resume YAML path, PyJWT dep, candidate voice & mission UI
- resume_parser: add _clean_cid() to strip (cid:NNN) glyph refs from ATS PDFs;
  CIDs 127/149/183 become bullets, unknowns are stripped; applied to PDF/DOCX/ODT
- resume YAML: canonicalize plain_text_resume.yaml path to config/ across all
  references (Settings, Apply, Setup, company_research, migrate); was pointing at
  unmounted aihawk/data_folder/ in Docker
- requirements/environment: add PyJWT>=2.8 (was missing; broke Settings page)
- user_profile: add candidate_voice field
- generate_cover_letter: inject candidate_voice into SYSTEM_CONTEXT; add
  social_impact mission signal category (nonprofit, community, equity, etc.)
- Settings: add Voice & Personality textarea to Identity expander; add
  Mission & Values expander with editable fields for all 4 mission categories
- .gitignore: exclude CLAUDE.md, config/plain_text_resume.yaml,
  config/user.yaml.working
- search_profiles: add default profile
2026-02-26 12:32:28 -08:00
07bdac6302 feat: ODT support, two-column PDF column-split extraction, title/company layout detection hardening 2026-02-26 10:33:28 -08:00
5af2b20d82 fix: harden resume section detection — anchor patterns to full line, expand header synonyms, fix name heuristic for hyphenated/middle-initial names, add parse diagnostics UI 2026-02-26 09:28:31 -08:00
b9f5dd1fc3 refactor: replace LLM-based resume parser with section regex parser
Primary parse path is now fully deterministic — no LLM, no token limits,
no JSON generation. Handles two-column experience headers, institution-before-
or-after-degree education layouts, and header bleed prevention via
looks_like_header detection.

LLM path retained as optional career_summary enhancement only (1500 chars,
falls back silently). structure_resume() now returns tuple[dict, str].
Tests updated to match the new API.
2026-02-26 07:34:25 -08:00
9297477ba0 fix: resume parser — max_tokens, json-repair fallback, logging, PYTHONUNBUFFERED 2026-02-26 00:00:23 -08:00