peregrine

Author	SHA1	Message	Date
pyr0ball	376e028af5	feat(db): add reset_running_tasks() for durable scheduler restart	2026-03-15 03:22:45 -07:00
pyr0ball	2c61d4038f	fix(linkedin): update selectors for 2025 public DOM; surface login-wall limitation in UI LinkedIn's unauthenticated public profile only exposes name, summary (truncated), current employer name, and certifications. Past roles, education, and skills are blurred server-side behind a login wall — not a scraper limitation. - Update selectors: data-section='summary' (was 'about'), .profile-section-card for certs, .visible-list for current experience entry - Strip login-wall noise injected into summary text after 'see more' - Skip aria-hidden blurred placeholder experience items - Add info callout in UI directing users to data export zip for full history	2026-03-13 19:47:21 -07:00
pyr0ball	3e8b4cd654	fix(cloud): use per-user config dir for wizard gate; redirect on invalid session - app.py: wizard gate now reads get_config_dir()/user.yaml instead of hardcoded repo-level config/ — fixes perpetual onboarding loop in cloud mode where per-user wizard_complete was never seen - app.py: page title corrected to "Peregrine" - cloud_session.py: add get_config_dir() returning per-user config path in cloud mode, repo config/ locally - cloud_session.py: replace st.error() with JS redirect on missing/invalid session token so users land on login page instead of error screen - Home.py, 4_Apply.py, migrate.py: remove remaining AIHawk UI references	2026-03-13 11:24:42 -07:00
pyr0ball	00f0eb4807	feat(linkedin): add staging file parser with re-parse support	2026-03-13 10:18:01 -07:00
pyr0ball	e937094884	fix(linkedin): improve scraper error handling, current-job date range, add missing tests	2026-03-13 06:02:03 -07:00
pyr0ball	f64ecf81e0	feat(linkedin): add scraper (Playwright + export zip) with URL validation	2026-03-13 01:06:39 -07:00
pyr0ball	a43e29e50d	feat(linkedin): add HTML parser utils with fixture tests	2026-03-13 01:01:05 -07:00
pyr0ball	7a698496f9	feat(cloud): fix backup/restore for cloud mode — SQLCipher encrypt/decrypt T13: Three fixes: 1. backup.py: _decrypt_db_to_bytes() decrypts SQLCipher DB before archiving so the zip is portable to any local Docker install (plain SQLite). 2. backup.py: _encrypt_db_from_bytes() re-encrypts on restore in cloud mode so the app can open the restored DB normally. 3. 2_Settings.py: _base_dir uses get_db_path().parent in cloud mode (user's per-tenant data dir) instead of the hardcoded app root; db_key wired through both create_backup() and restore_backup() calls. 6 new cloud backup tests + 2 unit tests for SQLCipher helpers (pysqlcipher3 mocked — not available in the local conda test env). 419/419 total passing.	2026-03-09 22:41:44 -07:00
pyr0ball	96715bdeb6	feat(peregrine): add cloud_session middleware + SQLCipher get_connection() cloud_session.py: no-op in local mode; in cloud mode resolves Directus JWT from X-CF-Session header to per-user db_path in st.session_state. get_connection() in scripts/db.py: transparent SQLCipher/sqlite3 switch — uses encrypted driver when CLOUD_MODE=true and key provided, vanilla sqlite3 otherwise. libsqlcipher-dev added to Dockerfile for Docker builds. 6 new cloud_session tests + 1 new get_connection test — 34/34 db tests pass.	2026-03-09 19:43:42 -07:00
pyr0ball	f60ac07541	test: add missing base_url edge case + clarify 0.0.0.0 marker intent Document defensive behavior: openai_compat with no base_url returns True (cloud) because unknown destination is assumed cloud. Add explanatory comment to LOCAL_URL_MARKERS for the 0.0.0.0 bind-address case.	2026-03-06 14:43:45 -08:00
pyr0ball	47d8317d56	feat: byok_guard — cloud backend detection with full test coverage	2026-03-06 14:40:06 -08:00
pyr0ball	ce8d5a4ac0	feat: add suggest_resume_keywords for skills/domains/keywords gap analysis Replaces NotImplementedError stub with full LLM-backed implementation. Builds a prompt from the last 3 resume positions plus already-selected skills/domains/keywords, calls LLMRouter, and returns de-duped suggestions in all three categories.	2026-03-05 15:00:53 -08:00
pyr0ball	4e600c3019	fix: guard mission_preferences values against non-string types in suggest_search_terms	2026-03-05 13:40:53 -08:00
pyr0ball	b841ac5418	feat: add suggest_search_terms with three-angle exclude analysis Replaces NotImplementedError stub with a real LLMRouter-backed implementation that builds a structured prompt covering blocklist alias expansion, values misalignment, and role-type filtering, then parses the JSON response into suggested_titles and suggested_excludes lists. Moves LLMRouter import to module level so tests can patch it at scripts.suggest_helpers.LLMRouter.	2026-03-05 13:15:25 -08:00
pyr0ball	40d87dc014	fix: DEFAULT_DB respects STAGING_DB env var — was ignoring Docker-set path	2026-03-04 11:47:59 -08:00
pyr0ball	d56c44224f	feat: backup/restore script with multi-instance and legacy support - create_backup() / restore_backup() / list_backup_contents() public API - --base-dir PATH flag: targets any instance root (default: this repo) --base-dir /devl/job-seeker backs up the legacy Conda install - _DB_CANDIDATES fallback: data/staging.db (Peregrine) or staging.db root (legacy) - Manifest records source label (dir name), source_path, created_at, files, includes_db - Added config/resume_keywords.yaml and config/server.yaml to backup lists - 21 tests covering create, list, restore, legacy DB path, overwrite, roundtrip	2026-03-04 10:52:51 -08:00
pyr0ball	042bb519de	fix: llm_backend reads fallback_order, logs tee'd to data/.streamlit.log in Docker	2026-03-03 15:04:18 -08:00
pyr0ball	582f2422ff	fix: lazy-import playwright in screenshot_page, fix SQLite connection leak in collect_listings	2026-03-03 12:45:39 -08:00
pyr0ball	260be9e821	feat: feedback_api — screenshot_page with Playwright (graceful fallback)	2026-03-03 12:14:33 -08:00
pyr0ball	b77bb754af	feat: feedback_api — Forgejo label management + issue filing + attachment upload	2026-03-03 12:09:11 -08:00
pyr0ball	1940cfb131	feat: feedback_api — build_issue_body	2026-03-03 12:00:01 -08:00
pyr0ball	6764ad4288	feat: feedback_api — collect_logs + collect_listings	2026-03-03 11:56:35 -08:00
pyr0ball	faf65023b4	chore: remove unused imports from feedback_api (will be re-added in later tasks)	2026-03-03 11:45:14 -08:00
pyr0ball	7f46d7fadf	feat: feedback_api — mask_pii + collect_context	2026-03-03 11:43:35 -08:00
pyr0ball	0a728fddbc	feat: DEMO_MODE — isolated public menagerie demo instance Adds a fully neutered public demo for menagerie.circuitforge.tech/peregrine that shows the Peregrine UI without exposing any personal data or real LLM inference. scripts/llm_router.py: - Block all inference when DEMO_MODE env var is set (1/true/yes) - Raises RuntimeError with a user-friendly "public demo" message app/app.py: - IS_DEMO constant from DEMO_MODE env var - Wizard gate bypassed in demo mode (demo/config/user.yaml pre-seeds a fake profile) - Demo banner in sidebar: explains read-only status + links to circuitforge.tech compose.menagerie.yml (new): - Separate Docker Compose project (peregrine-demo) on host port 8504 - Mounts demo/config/ and demo/data/ — isolated from personal instance - DEMO_MODE=true, no API keys, no /docs mount - Project name: peregrine-demo (run alongside personal instance) demo/config/user.yaml: - Generic "Demo User" profile, wizard_complete=true, no real personal info demo/config/llm.yaml: - All backends disabled (belt-and-suspenders alongside DEMO_MODE block) demo/data/.gitkeep: - staging.db is auto-created on first run, gitignored via demo/data/.db .gitignore: add demo/data/.db Caddy routes menagerie.circuitforge.tech/peregrine* → 8504 (demo instance). Personal Peregrine remains on 8502, unchanged.	2026-03-02 11:22:38 -08:00
pyr0ball	9fe9c6234d	fix: RerankerAdapter falls back to label name when no LABEL_DESCRIPTIONS entry	2026-02-27 14:54:31 -08:00
pyr0ball	23828520f0	feat: label_tool — 9 labels, wildcard Other, InvalidCharacterError fix; sync with avocet canonical	2026-02-27 14:34:24 -08:00
pyr0ball	a316f110c8	feat: add health mission category, trim-to-sign-off, max_tokens cap for cover letters - _MISSION_SIGNALS: add health category (pharma, clinical, patient care, etc.) listed last so music/animals/education/social_impact take priority - _MISSION_DEFAULTS: health note steers toward people-first framing, not industry enthusiasm — focuses on patients navigating rare/invisible journeys - _trim_to_letter_end(): cuts output at first sign-off + first name to prevent fine-tuned models from looping into repetitive garbage after completing letter - generate(): pass max_tokens=1200 to router (prevents runaway output) - user.yaml.example: add health + social_impact to mission_preferences, add candidate_voice field for per-user voice/personality context	2026-02-27 12:31:06 -08:00
pyr0ball	94734ad584	feat: benchmark_classifier — MODEL_REGISTRY, --list-models, --score, --compare modes	2026-02-27 06:19:32 -08:00
pyr0ball	6ca5893b1c	feat: add DUAL_GPU_MODE default, VRAM warning, and download size report to preflight - Add _mixed_mode_vram_warning() to flag low VRAM on GPU 1 in mixed mode - Wire download size report block into main() before closing border line - Wire mixed-mode VRAM warning into report if triggered - Write DUAL_GPU_MODE=ollama default to .env for new 2-GPU setups (no override if already set) - Promote import os to top-level (was local import inside get_cpu_cores)	2026-02-27 00:17:00 -08:00
pyr0ball	5ab3e2dc39	feat: add _download_size_mb() pure function for preflight size warning	2026-02-27 00:15:26 -08:00
pyr0ball	e79404d316	feat: add ollama_research to preflight service table and LLM backend map	2026-02-27 00:14:04 -08:00
pyr0ball	5497674b34	feat: ZeroShotAdapter, GLiClassAdapter, RerankerAdapter with full mock test coverage	2026-02-27 00:10:43 -08:00
pyr0ball	3e47afd953	feat: ClassifierAdapter ABC + compute_metrics() with full test coverage	2026-02-27 00:09:45 -08:00
pyr0ball	52e972fd69	feat: add job-seeker-classifiers conda env for HF classifier benchmark	2026-02-26 23:43:41 -08:00
pyr0ball	cda980da62	feat: bundled skills suggestion list and content filter utility - config/skills_suggestions.yaml: 168 curated tags across skills (77), domains (40), keywords (51) covering CS/TAM/ops and common tech roles; structured for future community aggregate (paid tier backlog) - scripts/skills_utils.py: filter_tag() rejects blanks, URLs, profanity, overlong strings, disallowed chars, and repeated-char runs; load_suggestions() reads bundled YAML per category	2026-02-26 13:09:32 -08:00
pyr0ball	db127848a1	fix: resume CID glyphs, resume YAML path, PyJWT dep, candidate voice & mission UI - resume_parser: add _clean_cid() to strip (cid:NNN) glyph refs from ATS PDFs; CIDs 127/149/183 become bullets, unknowns are stripped; applied to PDF/DOCX/ODT - resume YAML: canonicalize plain_text_resume.yaml path to config/ across all references (Settings, Apply, Setup, company_research, migrate); was pointing at unmounted aihawk/data_folder/ in Docker - requirements/environment: add PyJWT>=2.8 (was missing; broke Settings page) - user_profile: add candidate_voice field - generate_cover_letter: inject candidate_voice into SYSTEM_CONTEXT; add social_impact mission signal category (nonprofit, community, equity, etc.) - Settings: add Voice & Personality textarea to Identity expander; add Mission & Values expander with editable fields for all 4 mission categories - .gitignore: exclude CLAUDE.md, config/plain_text_resume.yaml, config/user.yaml.working - search_profiles: add default profile	2026-02-26 12:32:28 -08:00
pyr0ball	07bdac6302	feat: ODT support, two-column PDF column-split extraction, title/company layout detection hardening	2026-02-26 10:33:28 -08:00
pyr0ball	5af2b20d82	fix: harden resume section detection — anchor patterns to full line, expand header synonyms, fix name heuristic for hyphenated/middle-initial names, add parse diagnostics UI	2026-02-26 09:28:31 -08:00
pyr0ball	b9f5dd1fc3	refactor: replace LLM-based resume parser with section regex parser Primary parse path is now fully deterministic — no LLM, no token limits, no JSON generation. Handles two-column experience headers, institution-before- or-after-degree education layouts, and header bleed prevention via looks_like_header detection. LLM path retained as optional career_summary enhancement only (1500 chars, falls back silently). structure_resume() now returns tuple[dict, str]. Tests updated to match the new API.	2026-02-26 07:34:25 -08:00
pyr0ball	9297477ba0	fix: resume parser — max_tokens, json-repair fallback, logging, PYTHONUNBUFFERED	2026-02-26 00:00:23 -08:00
pyr0ball	5ac42e4c02	fix: add /v1 prefix to all license server API paths	2026-02-25 23:35:58 -08:00
pyr0ball	bf2d0f81c7	feat: license.py client — verify_local, effective_tier, activate, refresh, report_usage	2026-02-25 22:53:11 -08:00
pyr0ball	30542808c7	fix: GPU detection + pdfplumber + pass GPU env vars into app container - preflight.py now writes PEREGRINE_GPU_COUNT and PEREGRINE_GPU_NAMES to .env so the app container gets GPU info without needing nvidia-smi access - compose.yml passes PEREGRINE_GPU_COUNT, PEREGRINE_GPU_NAMES, and RECOMMENDED_PROFILE as env vars to the app service - 0_Setup.py _detect_gpus() reads PEREGRINE_GPU_NAMES env var first; falls back to nvidia-smi (bare / GPU-passthrough environments) - 0_Setup.py _suggest_profile() reads RECOMMENDED_PROFILE env var first - requirements.txt: add pdfplumber (needed for resume PDF parsing)	2026-02-25 21:58:28 -08:00
pyr0ball	1dcf9d47a4	fix: stub-port adoption — stubs bind free ports, app routes to external via host.docker.internal Three inter-related fixes for the service adoption flow: - preflight: stub_port field — adopted services get a free port for their no-op container (avoids binding conflict with external service on real port) while update_llm_yaml still uses the real external port for host.docker.internal URLs - preflight: write_env now uses stub_port (not resolved) for adopted services so SEARXNG_PORT etc point to the stub's harmless port, not the occupied one - preflight: stub containers use sleep infinity + CMD true healthcheck so depends_on: service_healthy is satisfied without holding any real port - Makefile: finetune profile changed from [cpu,single-gpu,dual-gpu] to [finetune] so the pytorch/cuda base image is not built during make start	2026-02-25 21:38:23 -08:00
pyr0ball	010abe6339	fix: ollama docker_owned=True; finetune gets own profile to avoid build on start - preflight: ollama was incorrectly marked docker_owned=False — Docker does define an ollama service, so external detection now correctly disables it via compose.override.yml when host Ollama is already running - compose.yml: finetune moves from [cpu,single-gpu,dual-gpu] profiles to [finetune] profile so it is never built during 'make start' (pytorch/cuda base is 3.7GB+ and unnecessary for the UI) - compose.yml: remove depends_on ollama from finetune — it reaches Ollama via OLLAMA_URL env var which works whether Ollama is Docker or host - Makefile: finetune target uses --profile finetune + compose.gpu.yml overlay	2026-02-25 21:24:33 -08:00
pyr0ball	3518d63ec2	feat: smart service adoption in preflight — use external services instead of conflicting preflight.py now detects when a managed service (ollama, vllm, vision, searxng) is already running on its configured port and adopts it rather than reassigning or conflicting: - Generates compose.override.yml disabling Docker containers for adopted services (profiles: [_external_] — a profile never passed via --profile) - Rewrites config/llm.yaml base_url entries to host.docker.internal:<port> so the app container can reach host-side services through Docker's host-gateway mapping - compose.yml: adds extra_hosts host.docker.internal:host-gateway to the app service (required on Linux; no-op on macOS Docker Desktop) - .gitignore: excludes compose.override.yml (auto-generated, host-specific) Only streamlit is non-adoptable and continues to reassign on conflict.	2026-02-25 19:23:02 -08:00
pyr0ball	f38f0c2007	feat: wire fine-tune UI end-to-end + harden setup.sh - setup.sh: replace docker-image-based NVIDIA test with nvidia-ctk validate (faster, no 100MB pull, no daemon required); add check_docker_running() to auto-start the Docker service on Linux or warn on macOS - prepare_training_data.py: also scan training_data/uploads/*.{md,txt} so web-uploaded letters are included in training data - task_runner.py: add prepare_training task type (calls build_records + write_jsonl inline; reports pair count in task result) - Settings fine-tune tab: Step 1 accepts .md/.txt uploads; Step 2 Extract button submits prepare_training background task + shows status; Step 3 shows make finetune command + live Ollama model status poller	2026-02-25 16:31:53 -08:00
pyr0ball	54de37e5fa	feat: containerize fine-tune pipeline (Dockerfile.finetune + make finetune) - Dockerfile.finetune: PyTorch 2.3/CUDA 12.1 base + unsloth + training stack - finetune_local.py: auto-register model via Ollama HTTP API after GGUF export; path-translate between finetune container mount and Ollama's view; update config/llm.yaml automatically; DOCS_DIR env override for Docker - prepare_training_data.py: DOCS_DIR env override so make prepare-training works correctly inside the app container - compose.yml: add finetune service (cpu/single-gpu/dual-gpu profiles); DOCS_DIR=/docs injected into app + finetune containers - compose.podman-gpu.yml: CDI device override for finetune service - Makefile: make prepare-training + make finetune targets	2026-02-25 16:22:48 -08:00
pyr0ball	97bb0819b4	feat: cover letter iterative refinement — feedback UI + backend params - generate() accepts previous_result + feedback; appends both to LLM prompt - task_runner cover_letter handler parses params JSON, passes fields through - Apply Workspace: "Refine with Feedback" expander with text area + Regenerate button; only shown when a draft exists; clears feedback after submitting - 8 new tests (TestGenerateRefinement + TestTaskRunnerCoverLetterParams)	2026-02-25 14:44:20 -08:00

1 2

66 commits