Commit graph

55 commits

Author SHA1 Message Date
93fb452941 feat: add suggest_resume_keywords for skills/domains/keywords gap analysis
Replaces NotImplementedError stub with full LLM-backed implementation.
Builds a prompt from the last 3 resume positions plus already-selected
skills/domains/keywords, calls LLMRouter, and returns de-duped suggestions
in all three categories.
2026-03-05 15:00:53 -08:00
e0063e237b fix: guard mission_preferences values against non-string types in suggest_search_terms 2026-03-05 13:40:53 -08:00
bf23987c11 feat: add suggest_search_terms with three-angle exclude analysis
Replaces NotImplementedError stub with a real LLMRouter-backed implementation
that builds a structured prompt covering blocklist alias expansion, values
misalignment, and role-type filtering, then parses the JSON response into
suggested_titles and suggested_excludes lists.

Moves LLMRouter import to module level so tests can patch it at
scripts.suggest_helpers.LLMRouter.
2026-03-05 13:15:25 -08:00
ddf07c52ab fix: DEFAULT_DB respects STAGING_DB env var — was ignoring Docker-set path 2026-03-04 11:47:59 -08:00
64db154a87 feat: backup/restore script with multi-instance and legacy support
- create_backup() / restore_backup() / list_backup_contents() public API
- --base-dir PATH flag: targets any instance root (default: this repo)
  --base-dir /devl/job-seeker backs up the legacy Conda install
- _DB_CANDIDATES fallback: data/staging.db (Peregrine) or staging.db root (legacy)
- Manifest records source label (dir name), source_path, created_at, files, includes_db
- Added config/resume_keywords.yaml and config/server.yaml to backup lists
- 21 tests covering create, list, restore, legacy DB path, overwrite, roundtrip
2026-03-04 10:52:51 -08:00
aa03163f18 fix: llm_backend reads fallback_order, logs tee'd to data/.streamlit.log in Docker
Some checks failed
CI / test (pull_request) Has been cancelled
2026-03-03 15:04:18 -08:00
bdbd139b94 fix: lazy-import playwright in screenshot_page, fix SQLite connection leak in collect_listings
Some checks failed
CI / test (pull_request) Has been cancelled
2026-03-03 12:45:39 -08:00
ff40caf350 feat: feedback_api — screenshot_page with Playwright (graceful fallback) 2026-03-03 12:14:33 -08:00
e03a7171b1 feat: feedback_api — Forgejo label management + issue filing + attachment upload 2026-03-03 12:09:11 -08:00
b578b5365c feat: feedback_api — build_issue_body 2026-03-03 12:00:01 -08:00
dc40c66310 feat: feedback_api — collect_logs + collect_listings 2026-03-03 11:56:35 -08:00
19057f237b chore: remove unused imports from feedback_api (will be re-added in later tasks) 2026-03-03 11:45:14 -08:00
a1cfee4499 feat: feedback_api — mask_pii + collect_context 2026-03-03 11:43:35 -08:00
f1c26ed67e feat: DEMO_MODE — isolated public menagerie demo instance
Adds a fully neutered public demo for menagerie.circuitforge.tech/peregrine
that shows the Peregrine UI without exposing any personal data or real LLM inference.

scripts/llm_router.py:
  - Block all inference when DEMO_MODE env var is set (1/true/yes)
  - Raises RuntimeError with a user-friendly "public demo" message

app/app.py:
  - IS_DEMO constant from DEMO_MODE env var
  - Wizard gate bypassed in demo mode (demo/config/user.yaml pre-seeds a fake profile)
  - Demo banner in sidebar: explains read-only status + links to circuitforge.tech

compose.menagerie.yml (new):
  - Separate Docker Compose project (peregrine-demo) on host port 8504
  - Mounts demo/config/ and demo/data/ — isolated from personal instance
  - DEMO_MODE=true, no API keys, no /docs mount
  - Project name: peregrine-demo (run alongside personal instance)

demo/config/user.yaml:
  - Generic "Demo User" profile, wizard_complete=true, no real personal info

demo/config/llm.yaml:
  - All backends disabled (belt-and-suspenders alongside DEMO_MODE block)

demo/data/.gitkeep:
  - staging.db is auto-created on first run, gitignored via demo/data/*.db

.gitignore: add demo/data/*.db

Caddy routes menagerie.circuitforge.tech/peregrine* → 8504 (demo instance).
Personal Peregrine remains on 8502, unchanged.
2026-03-02 11:22:38 -08:00
38cde2e128 fix: RerankerAdapter falls back to label name when no LABEL_DESCRIPTIONS entry 2026-02-27 14:54:31 -08:00
a28c92a187 feat: label_tool — 9 labels, wildcard Other, InvalidCharacterError fix; sync with avocet canonical 2026-02-27 14:34:24 -08:00
7962245a0e feat: add health mission category, trim-to-sign-off, max_tokens cap for cover letters
- _MISSION_SIGNALS: add health category (pharma, clinical, patient care, etc.)
  listed last so music/animals/education/social_impact take priority
- _MISSION_DEFAULTS: health note steers toward people-first framing, not
  industry enthusiasm — focuses on patients navigating rare/invisible journeys
- _trim_to_letter_end(): cuts output at first sign-off + first name to prevent
  fine-tuned models from looping into repetitive garbage after completing letter
- generate(): pass max_tokens=1200 to router (prevents runaway output)
- user.yaml.example: add health + social_impact to mission_preferences,
  add candidate_voice field for per-user voice/personality context
2026-02-27 12:31:06 -08:00
7ef95dd9ba feat: benchmark_classifier — MODEL_REGISTRY, --list-models, --score, --compare modes 2026-02-27 06:19:32 -08:00
44c3d9a5d6 feat: add DUAL_GPU_MODE default, VRAM warning, and download size report to preflight
- Add _mixed_mode_vram_warning() to flag low VRAM on GPU 1 in mixed mode
- Wire download size report block into main() before closing border line
- Wire mixed-mode VRAM warning into report if triggered
- Write DUAL_GPU_MODE=ollama default to .env for new 2-GPU setups (no override if already set)
- Promote import os to top-level (was local import inside get_cpu_cores)
2026-02-27 00:17:00 -08:00
b03e5f6c57 feat: add _download_size_mb() pure function for preflight size warning 2026-02-27 00:15:26 -08:00
c9d7b810f6 feat: add ollama_research to preflight service table and LLM backend map 2026-02-27 00:14:04 -08:00
baa862bc14 feat: ZeroShotAdapter, GLiClassAdapter, RerankerAdapter with full mock test coverage 2026-02-27 00:10:43 -08:00
e99b3703f1 feat: ClassifierAdapter ABC + compute_metrics() with full test coverage 2026-02-27 00:09:45 -08:00
41e0fe7f55 feat: add job-seeker-classifiers conda env for HF classifier benchmark 2026-02-26 23:43:41 -08:00
15c2a1d4ef feat: bundled skills suggestion list and content filter utility
- config/skills_suggestions.yaml: 168 curated tags across skills (77),
  domains (40), keywords (51) covering CS/TAM/ops and common tech roles;
  structured for future community aggregate (paid tier backlog)
- scripts/skills_utils.py: filter_tag() rejects blanks, URLs, profanity,
  overlong strings, disallowed chars, and repeated-char runs;
  load_suggestions() reads bundled YAML per category
2026-02-26 13:09:32 -08:00
d5cf02096b fix: resume CID glyphs, resume YAML path, PyJWT dep, candidate voice & mission UI
- resume_parser: add _clean_cid() to strip (cid:NNN) glyph refs from ATS PDFs;
  CIDs 127/149/183 become bullets, unknowns are stripped; applied to PDF/DOCX/ODT
- resume YAML: canonicalize plain_text_resume.yaml path to config/ across all
  references (Settings, Apply, Setup, company_research, migrate); was pointing at
  unmounted aihawk/data_folder/ in Docker
- requirements/environment: add PyJWT>=2.8 (was missing; broke Settings page)
- user_profile: add candidate_voice field
- generate_cover_letter: inject candidate_voice into SYSTEM_CONTEXT; add
  social_impact mission signal category (nonprofit, community, equity, etc.)
- Settings: add Voice & Personality textarea to Identity expander; add
  Mission & Values expander with editable fields for all 4 mission categories
- .gitignore: exclude CLAUDE.md, config/plain_text_resume.yaml,
  config/user.yaml.working
- search_profiles: add default profile
2026-02-26 12:32:28 -08:00
48e7748b43 feat: ODT support, two-column PDF column-split extraction, title/company layout detection hardening 2026-02-26 10:33:28 -08:00
9c8b206f6b fix: harden resume section detection — anchor patterns to full line, expand header synonyms, fix name heuristic for hyphenated/middle-initial names, add parse diagnostics UI 2026-02-26 09:28:31 -08:00
ab6d7f2c87 refactor: replace LLM-based resume parser with section regex parser
Primary parse path is now fully deterministic — no LLM, no token limits,
no JSON generation. Handles two-column experience headers, institution-before-
or-after-degree education layouts, and header bleed prevention via
looks_like_header detection.

LLM path retained as optional career_summary enhancement only (1500 chars,
falls back silently). structure_resume() now returns tuple[dict, str].
Tests updated to match the new API.
2026-02-26 07:34:25 -08:00
7393ad2a14 fix: resume parser — max_tokens, json-repair fallback, logging, PYTHONUNBUFFERED 2026-02-26 00:00:23 -08:00
f196a367a6 fix: add /v1 prefix to all license server API paths 2026-02-25 23:35:58 -08:00
b7ef804cf7 feat: license.py client — verify_local, effective_tier, activate, refresh, report_usage 2026-02-25 22:53:11 -08:00
d3ab3fa460 fix: GPU detection + pdfplumber + pass GPU env vars into app container
- preflight.py now writes PEREGRINE_GPU_COUNT and PEREGRINE_GPU_NAMES to
  .env so the app container gets GPU info without needing nvidia-smi access
- compose.yml passes PEREGRINE_GPU_COUNT, PEREGRINE_GPU_NAMES, and
  RECOMMENDED_PROFILE as env vars to the app service
- 0_Setup.py _detect_gpus() reads PEREGRINE_GPU_NAMES env var first;
  falls back to nvidia-smi (bare / GPU-passthrough environments)
- 0_Setup.py _suggest_profile() reads RECOMMENDED_PROFILE env var first
- requirements.txt: add pdfplumber (needed for resume PDF parsing)
2026-02-25 21:58:28 -08:00
ecf44ea6c5 fix: stub-port adoption — stubs bind free ports, app routes to external via host.docker.internal
Three inter-related fixes for the service adoption flow:
- preflight: stub_port field — adopted services get a free port for their
  no-op container (avoids binding conflict with external service on real port)
  while update_llm_yaml still uses the real external port for host.docker.internal URLs
- preflight: write_env now uses stub_port (not resolved) for adopted services
  so SEARXNG_PORT etc point to the stub's harmless port, not the occupied one
- preflight: stub containers use sleep infinity + CMD true healthcheck so
  depends_on: service_healthy is satisfied without holding any real port
- Makefile: finetune profile changed from [cpu,single-gpu,dual-gpu] to [finetune]
  so the pytorch/cuda base image is not built during make start
2026-02-25 21:38:23 -08:00
7f8dc18a92 fix: ollama docker_owned=True; finetune gets own profile to avoid build on start
- preflight: ollama was incorrectly marked docker_owned=False — Docker does
  define an ollama service, so external detection now correctly disables it
  via compose.override.yml when host Ollama is already running
- compose.yml: finetune moves from [cpu,single-gpu,dual-gpu] profiles to
  [finetune] profile so it is never built during 'make start' (pytorch/cuda
  base is 3.7GB+ and unnecessary for the UI)
- compose.yml: remove depends_on ollama from finetune — it reaches Ollama
  via OLLAMA_URL env var which works whether Ollama is Docker or host
- Makefile: finetune target uses --profile finetune + compose.gpu.yml overlay
2026-02-25 21:24:33 -08:00
e5cf0aee36 feat: smart service adoption in preflight — use external services instead of conflicting
preflight.py now detects when a managed service (ollama, vllm, vision,
searxng) is already running on its configured port and adopts it rather
than reassigning or conflicting:

- Generates compose.override.yml disabling Docker containers for adopted
  services (profiles: [_external_] — a profile never passed via --profile)
- Rewrites config/llm.yaml base_url entries to host.docker.internal:<port>
  so the app container can reach host-side services through Docker's
  host-gateway mapping
- compose.yml: adds extra_hosts host.docker.internal:host-gateway to the
  app service (required on Linux; no-op on macOS Docker Desktop)
- .gitignore: excludes compose.override.yml (auto-generated, host-specific)

Only streamlit is non-adoptable and continues to reassign on conflict.
2026-02-25 19:23:02 -08:00
dc4a08c063 feat: wire fine-tune UI end-to-end + harden setup.sh
- setup.sh: replace docker-image-based NVIDIA test with nvidia-ctk validate
  (faster, no 100MB pull, no daemon required); add check_docker_running()
  to auto-start the Docker service on Linux or warn on macOS
- prepare_training_data.py: also scan training_data/uploads/*.{md,txt}
  so web-uploaded letters are included in training data
- task_runner.py: add prepare_training task type (calls build_records +
  write_jsonl inline; reports pair count in task result)
- Settings fine-tune tab: Step 1 accepts .md/.txt uploads; Step 2 Extract
  button submits prepare_training background task + shows status; Step 3
  shows make finetune command + live Ollama model status poller
2026-02-25 16:31:53 -08:00
4d66c04d1e feat: containerize fine-tune pipeline (Dockerfile.finetune + make finetune)
- Dockerfile.finetune: PyTorch 2.3/CUDA 12.1 base + unsloth + training stack
- finetune_local.py: auto-register model via Ollama HTTP API after GGUF
  export; path-translate between finetune container mount and Ollama's view;
  update config/llm.yaml automatically; DOCS_DIR env override for Docker
- prepare_training_data.py: DOCS_DIR env override so make prepare-training
  works correctly inside the app container
- compose.yml: add finetune service (cpu/single-gpu/dual-gpu profiles);
  DOCS_DIR=/docs injected into app + finetune containers
- compose.podman-gpu.yml: CDI device override for finetune service
- Makefile: make prepare-training + make finetune targets
2026-02-25 16:22:48 -08:00
94225c95e7 feat: cover letter iterative refinement — feedback UI + backend params
- generate() accepts previous_result + feedback; appends both to LLM prompt
- task_runner cover_letter handler parses params JSON, passes fields through
- Apply Workspace: "Refine with Feedback" expander with text area + Regenerate
  button; only shown when a draft exists; clears feedback after submitting
- 8 new tests (TestGenerateRefinement + TestTaskRunnerCoverLetterParams)
2026-02-25 14:44:20 -08:00
cfbd065bc0 feat: wizard_generate — feedback + previous_result support for iterative refinement 2026-02-25 08:29:56 -08:00
02ba54ea7a feat: wizard_generate task type — 8 LLM generation sections 2026-02-25 08:25:17 -08:00
4114abb761 feat: 13 integration implementations + config examples
Add all 13 integration modules (Notion, Google Drive, Google Sheets,
Airtable, Dropbox, OneDrive, MEGA, Nextcloud, Google Calendar, Apple
Calendar/CalDAV, Slack, Discord, Home Assistant) with fields(), connect(),
and test() implementations. Add config/integrations/*.yaml.example files
and gitignore rules for live config files. Add 5 new registry/schema
tests bringing total to 193 passing.
2026-02-25 08:18:45 -08:00
c87447e017 feat: integration base class + auto-discovery registry 2026-02-25 08:13:14 -08:00
9258a91fd6 feat: resume parser — PDF/DOCX extraction + LLM structuring 2026-02-25 08:04:48 -08:00
3b83610314 feat: wizard fields in UserProfile + params column in background_tasks
- Add tier, dev_tier_override, wizard_complete, wizard_step, dismissed_banners
  fields to UserProfile with defaults and effective_tier property
- Add params TEXT column to background_tasks table (CREATE + migration)
- Update insert_task() to accept params with params-aware dedup logic
- Update submit_task() and _run_task() to thread params through
- Add test_wizard_defaults, test_effective_tier_override,
  test_effective_tier_no_override, and test_insert_task_with_params
2026-02-25 07:27:14 -08:00
56fb386225 feat: startup preflight — port collision avoidance + resource checks
scripts/preflight.py (stdlib-only, no psutil):
- Port probing: owned services auto-reassign to next free port; external
  services (Ollama) show ✓ reachable / ⚠ not responding
- System resources: CPU cores, RAM (total + available), GPU VRAM via
  nvidia-smi; works on Linux + macOS
- Profile recommendation: remote / cpu / single-gpu / dual-gpu
- vLLM KV cache offload: calculates CPU_OFFLOAD_GB when VRAM < 10 GB
  free and RAM headroom > 4 GB (uses up to 25% of available headroom)
- Writes resolved values to .env for docker compose; single-service mode
  (--service streamlit) for scripted port queries
- Exit 0 unless an owned port genuinely can't be resolved

scripts/manage-ui.sh:
- Calls preflight.py --service streamlit before bind; falls back to
  pure-bash port scan if Python/yaml unavailable

compose.yml:
- vllm command: adds --cpu-offload-gb ${CPU_OFFLOAD_GB:-0}

Makefile:
- start / restart depend on preflight target
- PYTHON variable for env portability
- test target uses PYTHON variable

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 20:36:16 -08:00
81e7699545 feat: migration tool + portable startup scripts
scripts/migrate.py:
- dry-run by default; --apply writes files; --copy-db migrates staging.db
- generates config/user.yaml from source repo's resume + cover letter scripts
- copies gitignored configs (notion, email, adzuna, craigslist, search profiles,
  resume keywords, blocklist, aihawk resume)
- merges fine-tuned model name from source llm.yaml into dest llm.yaml

scripts/manage-ui.sh:
- STREAMLIT_BIN no longer hardcoded; auto-resolves via conda env or PATH;
  override with STREAMLIT_BIN env var

scripts/manage-vllm.sh:
- VLLM_BIN and MODEL_DIR now read from env vars with portable defaults

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 20:25:54 -08:00
35735c3074 feat: LGBTQIA+ focus + Phase 2/3 audit fixes
LGBTQIA+ inclusion section in research briefs:
- user_profile.py: add candidate_lgbtq_focus bool accessor
- user.yaml.example: add candidate_lgbtq_focus flag (default false)
- company_research.py: gate new LGBTQIA+ section behind flag; section
  count now dynamic (7 base + 1 per opt-in section, max 9)
- 2_Settings.py: add "Research Brief Preferences" expander with
  checkboxes for both accessibility and LGBTQIA+ focus flags;
  mission_preferences now round-trips through save (no silent drop)

Phase 2 fixes:
- manage-vllm.sh: MODEL_DIR and VLLM_BIN now read from env vars
  (VLLM_MODELS_DIR, VLLM_BIN) with portable defaults
- search_profiles.yaml: replace personal CS/TAM/Bay Area profiles
  with a documented generic starter profile

Phase 3 fix:
- llm.yaml: rename meghan-cover-writer:latest → llama3.2:3b with
  inline comment for users to substitute their fine-tuned model;
  fix model-exclusion comment

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 20:02:03 -08:00
0cdfb41201 fix: remove hardcoded personal values — Phase 1 audit findings
- 3_Resume_Editor.py: replace "Meghan's" in docstring and caption
- user_profile.py: expose mission_preferences and candidate_accessibility_focus
- user.yaml.example: add mission_preferences section + candidate_accessibility_focus flag
- generate_cover_letter.py: build _MISSION_NOTES from user profile instead of
  hardcoded personal passion notes; falls back to generic defaults when not set
- company_research.py: gate "Inclusion & Accessibility" section behind
  candidate_accessibility_focus flag; section count adjusts (7 or 8) accordingly

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 19:57:03 -08:00
1ac559df0a feat: add vision service to compose stack and fine-tune wizard tab to Settings
- Add moondream2 vision service to compose.yml (single-gpu + dual-gpu profiles)
- Create scripts/vision_service/Dockerfile for the vision container
- Add VISION_PORT, VISION_MODEL, VISION_REVISION vars to .env.example
- Add Vision Service entry to SERVICES list in Settings (hidden unless gpu profile active)
- Add Fine-Tune Wizard tab (Task 10) to Settings with 3-step upload→preview→train flow
- Tab is always rendered; shows info message when non-GPU profile is active

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 19:37:55 -08:00