Compare commits
227 commits
da43578806
...
b4116e8bae
| Author | SHA1 | Date | |
|---|---|---|---|
| b4116e8bae | |||
| 00a567768b | |||
| 1ce283bb79 | |||
| ab564741f4 | |||
| 869cb2f197 | |||
| 27d6fc01fc | |||
| e034a07509 | |||
| 2b9a6c8a22 | |||
| e62548a22e | |||
| 3267a895b0 | |||
| 522534d28e | |||
| 37119cb332 | |||
| 8d9e17d749 | |||
| 4d08e64acf | |||
| fc6ef88a05 | |||
| 952b21377f | |||
| 9c87ed1cf2 | |||
| a1a1141616 | |||
| 27d4b0e732 | |||
| 95378c106e | |||
| 07c627cdb0 | |||
| bcd918fb67 | |||
| 207d3816b3 | |||
| 3984a9c743 | |||
| 4d055f6bcd | |||
| 28e66001a3 | |||
| 535c0ae9e0 | |||
| 3d7f6f7ff1 | |||
| 52470759a4 | |||
| d51066e8c2 | |||
| 905db2f147 | |||
| eef2478948 | |||
| beb1553821 | |||
| 61dc2122e4 | |||
| 0f80b698ff | |||
| 097def4bba | |||
| 1a50bc1392 | |||
| d1fb4abd56 | |||
| 6c7499752c | |||
| 42f0e6261c | |||
| 1e12da45f1 | |||
| b80e4de050 | |||
| 7489c1c12a | |||
| 97ab8b94e5 | |||
| bd0e9240eb | |||
| 5344dc8e7a | |||
| fba6796b8a | |||
| f759f5fbc0 | |||
| 530f4346d1 | |||
| db26b9aaf9 | |||
| 97b695c3e3 | |||
| 72320315e2 | |||
| 37dcdec754 | |||
| ce19e00cfe | |||
| 8f9955fa96 | |||
| 5a1fceda84 | |||
| 634e31968f | |||
| 2fdf6f725e | |||
| fbd47368ff | |||
| 2124b24e3d | |||
| 88f28c2b41 | |||
| 28cc03ba70 | |||
| 7de630e065 | |||
| 1cf6e370b1 | |||
| 9d2ed1d00d | |||
| 1b500b9f26 | |||
| d1c5c89da7 | |||
| bf8eee8a62 | |||
| d3f86f2143 | |||
| 8da36f251c | |||
| 89f11b0cae | |||
| 84862b8ab8 | |||
| 5827386789 | |||
| 7ca348b97f | |||
| 329baf013f | |||
| 67634d459a | |||
| 5124d18770 | |||
| 92e0ea0ba1 | |||
| 0e30096a88 | |||
| 2bae1a92ed | |||
| dbcd2710ae | |||
| 5f1c372c0a | |||
| efe71150e3 | |||
| 8166204c05 | |||
| 11997f8a13 | |||
| e5d606ab4b | |||
| db3dff268a | |||
| e9b389feb6 | |||
| 483ca00f1a | |||
| ecad32cd6f | |||
| d05cb91401 | |||
| 3d17122334 | |||
| 2ab396bad0 | |||
| 199daebb87 | |||
| f7f438df70 | |||
| e1f65d8fe9 | |||
| 20f9933e99 | |||
| 60dab647f2 | |||
| cad7b9ba35 | |||
| 5f466fa107 | |||
| c3dc05fe34 | |||
| 1efb033b6f | |||
| 2d9b8d10f9 | |||
| 791e11d5d5 | |||
| 86613d0218 | |||
| 5254212cb4 | |||
| 435f2e71fd | |||
| 0d6aa5975e | |||
| 476ede4267 | |||
| a2f4102d78 | |||
| 0306b3716d | |||
| adc3526470 | |||
| 75499bc250 | |||
| 1e5d354209 | |||
| bc7e3c8952 | |||
| 044b25e838 | |||
| 43bf30fac5 | |||
| 39e8194679 | |||
| 7dab560938 | |||
| 30a2962797 | |||
| 9b24599832 | |||
| 7e96e57d92 | |||
| 6febea216e | |||
| 207fbdbb69 | |||
| ca1e4b062a | |||
| 88908ceca2 | |||
| be28aba07f | |||
| 637e8379b6 | |||
| 128ab11763 | |||
| efc7a1f0bc | |||
| e4b6456bc9 | |||
| 488fa71891 | |||
| ea708321e4 | |||
| 85f0f648b0 | |||
| 2df61eedd2 | |||
| a7fe4d9ff4 | |||
| ae7c985fab | |||
| 6dd89a0863 | |||
| c287392c39 | |||
| b4f7a7317d | |||
| 2fe0e0e2f2 | |||
| 657f9c4060 | |||
| 3b2870ddf1 | |||
| bef92d667e | |||
| de8fb1ddc7 | |||
| fe09e23f4c | |||
| 8caf7b6356 | |||
| 8887955e7d | |||
| d13505e760 | |||
| 64487a6abb | |||
| 84b9490f46 | |||
| e54208fc14 | |||
| 01a341e4c5 | |||
| d6545cf496 | |||
| 9fb207c15c | |||
| f35fec33e9 | |||
| 35056161d7 | |||
| 8ff134addd | |||
| 5739d1935b | |||
| 52f912f938 | |||
| 124b950ca3 | |||
| c3f3fa97a7 | |||
| 26fc97dfe5 | |||
| 8e3f58cf46 | |||
| 2662bab1e6 | |||
| 0174a5396d | |||
| d0371e8525 | |||
| 3aac7b167f | |||
| 5e63cd731c | |||
| 946924524d | |||
| feb7bab43e | |||
| e94695ef1a | |||
| 4e1748ca62 | |||
| 67aaf7c0b7 | |||
| 11662dde4a | |||
| f26f948377 | |||
| 6258b9e34d | |||
| bd326162f1 | |||
| f08f1b16d0 | |||
| bdbbc06702 | |||
| 46d10f5daa | |||
| d8348e4906 | |||
| a149b65d5d | |||
| f9e974a957 | |||
| f78ac24657 | |||
| 41019269a2 | |||
| 41c7954b9d | |||
| 85e8034093 | |||
| 09a4b38a99 | |||
| e1cc0e9210 | |||
| 7efbf95840 | |||
| 350591bc48 | |||
| ca17994e00 | |||
| fd215a22f6 | |||
| 1a74793804 | |||
| 4c7f74c669 | |||
| 4748cd3672 | |||
| 51e48f8eee | |||
| 9b0ca6457a | |||
| 3f85c00359 | |||
| beb32e576d | |||
| d3b941134e | |||
| 27112c7ed2 | |||
| 0546c0e289 | |||
| 1dbb91dc31 | |||
| edb169959a | |||
| eac747d999 | |||
| 5d2428f1b9 | |||
| dc770d151b | |||
| e332b8a069 | |||
| c7fb9a00f1 | |||
| 7abf753469 | |||
| cf185dfbaf | |||
| 633a7f2d1c | |||
| af5237e3c2 | |||
| f13c49d5f1 | |||
| 1a68b07076 | |||
| aacde4f623 | |||
| bb656194e1 | |||
| e40128e289 | |||
| 46790a64d3 | |||
| 306c90c9da | |||
| 33d3994fb8 | |||
| a8fa1eb115 | |||
| f28d91d4d7 | |||
| af41d14241 | |||
| 6493cf5c5b |
217 changed files with 24581 additions and 13480 deletions
20
.dockerignore
Normal file
20
.dockerignore
Normal file
|
|
@ -0,0 +1,20 @@
|
||||||
|
.git
|
||||||
|
__pycache__
|
||||||
|
*.pyc
|
||||||
|
*.pyo
|
||||||
|
staging.db
|
||||||
|
config/user.yaml
|
||||||
|
config/notion.yaml
|
||||||
|
config/email.yaml
|
||||||
|
config/tokens.yaml
|
||||||
|
config/craigslist.yaml
|
||||||
|
.streamlit.pid
|
||||||
|
.streamlit.log
|
||||||
|
aihawk/
|
||||||
|
docs/
|
||||||
|
tests/
|
||||||
|
.env
|
||||||
|
data/
|
||||||
|
log/
|
||||||
|
unsloth_compiled_cache/
|
||||||
|
resume_matcher/
|
||||||
38
.env.example
Normal file
38
.env.example
Normal file
|
|
@ -0,0 +1,38 @@
|
||||||
|
# .env.example — copy to .env
|
||||||
|
# Auto-generated by the setup wizard, or fill in manually.
|
||||||
|
# NEVER commit .env to git.
|
||||||
|
|
||||||
|
STREAMLIT_PORT=8501
|
||||||
|
OLLAMA_PORT=11434
|
||||||
|
VLLM_PORT=8000
|
||||||
|
SEARXNG_PORT=8888
|
||||||
|
VISION_PORT=8002
|
||||||
|
VISION_MODEL=vikhyatk/moondream2
|
||||||
|
VISION_REVISION=2025-01-09
|
||||||
|
|
||||||
|
DOCS_DIR=~/Documents/JobSearch
|
||||||
|
OLLAMA_MODELS_DIR=~/models/ollama
|
||||||
|
VLLM_MODELS_DIR=~/models/vllm
|
||||||
|
VLLM_MODEL=Ouro-1.4B
|
||||||
|
OLLAMA_DEFAULT_MODEL=llama3.2:3b
|
||||||
|
|
||||||
|
# API keys (required for remote profile)
|
||||||
|
ANTHROPIC_API_KEY=
|
||||||
|
OPENAI_COMPAT_URL=
|
||||||
|
OPENAI_COMPAT_KEY=
|
||||||
|
|
||||||
|
# Feedback button — Forgejo issue filing
|
||||||
|
FORGEJO_API_TOKEN=
|
||||||
|
FORGEJO_REPO=pyr0ball/peregrine
|
||||||
|
FORGEJO_API_URL=https://git.opensourcesolarpunk.com/api/v1
|
||||||
|
# GITHUB_TOKEN= # future — enable when public mirror is active
|
||||||
|
# GITHUB_REPO= # future
|
||||||
|
|
||||||
|
# Cloud multi-tenancy (compose.cloud.yml only — do not set for local installs)
|
||||||
|
CLOUD_MODE=false
|
||||||
|
CLOUD_DATA_ROOT=/devl/menagerie-data
|
||||||
|
DIRECTUS_JWT_SECRET= # must match website/.env DIRECTUS_SECRET value
|
||||||
|
CF_SERVER_SECRET= # random 64-char hex — generate: openssl rand -hex 32
|
||||||
|
PLATFORM_DB_URL=postgresql://cf_platform:<password>@host.docker.internal:5433/circuitforge_platform
|
||||||
|
HEIMDALL_URL=http://cf-license:8000 # internal Docker URL; override for external access
|
||||||
|
HEIMDALL_ADMIN_TOKEN= # must match ADMIN_TOKEN in circuitforge-license .env
|
||||||
30
.gitea/ISSUE_TEMPLATE/bug_report.md
Normal file
30
.gitea/ISSUE_TEMPLATE/bug_report.md
Normal file
|
|
@ -0,0 +1,30 @@
|
||||||
|
---
|
||||||
|
name: Bug report
|
||||||
|
about: Something isn't working correctly
|
||||||
|
labels: bug
|
||||||
|
---
|
||||||
|
|
||||||
|
## Describe the bug
|
||||||
|
|
||||||
|
<!-- A clear description of what went wrong. -->
|
||||||
|
|
||||||
|
## Steps to reproduce
|
||||||
|
|
||||||
|
1.
|
||||||
|
2.
|
||||||
|
3.
|
||||||
|
|
||||||
|
## Expected behaviour
|
||||||
|
|
||||||
|
## Actual behaviour
|
||||||
|
|
||||||
|
<!-- Paste relevant log output below (redact any API keys or personal info): -->
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
## Environment
|
||||||
|
|
||||||
|
- Peregrine version: <!-- output of `./manage.sh status` or git tag -->
|
||||||
|
- OS:
|
||||||
|
- Runtime: Docker / conda-direct
|
||||||
|
- GPU profile: remote / cpu / single-gpu / dual-gpu
|
||||||
26
.gitea/ISSUE_TEMPLATE/feature_request.md
Normal file
26
.gitea/ISSUE_TEMPLATE/feature_request.md
Normal file
|
|
@ -0,0 +1,26 @@
|
||||||
|
---
|
||||||
|
name: Feature request
|
||||||
|
about: Suggest an improvement or new capability
|
||||||
|
labels: enhancement
|
||||||
|
---
|
||||||
|
|
||||||
|
## Problem statement
|
||||||
|
|
||||||
|
<!-- What are you trying to do that's currently hard or impossible? -->
|
||||||
|
|
||||||
|
## Proposed solution
|
||||||
|
|
||||||
|
## Alternatives considered
|
||||||
|
|
||||||
|
## Which tier would this belong to?
|
||||||
|
|
||||||
|
- [ ] Free
|
||||||
|
- [ ] Paid
|
||||||
|
- [ ] Premium
|
||||||
|
- [ ] Ultra (human-in-the-loop)
|
||||||
|
- [ ] Not sure
|
||||||
|
|
||||||
|
## Would you be willing to contribute a PR?
|
||||||
|
|
||||||
|
- [ ] Yes
|
||||||
|
- [ ] No
|
||||||
32
.githooks/commit-msg
Executable file
32
.githooks/commit-msg
Executable file
|
|
@ -0,0 +1,32 @@
|
||||||
|
#!/usr/bin/env bash
|
||||||
|
# .githooks/commit-msg — enforces conventional commit format
|
||||||
|
# Format: type: description OR type(scope): description
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
RED='\033[0;31m'; YELLOW='\033[1;33m'; NC='\033[0m'
|
||||||
|
|
||||||
|
VALID_TYPES="feat|fix|docs|chore|test|refactor|perf|ci|build"
|
||||||
|
MSG_FILE="$1"
|
||||||
|
MSG=$(head -1 "$MSG_FILE")
|
||||||
|
|
||||||
|
if [[ -z "${MSG// }" ]]; then
|
||||||
|
echo -e "${RED}Commit rejected:${NC} Commit message is empty."
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
if ! echo "$MSG" | grep -qE "^($VALID_TYPES)(\(.+\))?: .+"; then
|
||||||
|
echo -e "${RED}Commit rejected:${NC} Message does not follow conventional commit format."
|
||||||
|
echo ""
|
||||||
|
echo -e " Required: ${YELLOW}type: description${NC} or ${YELLOW}type(scope): description${NC}"
|
||||||
|
echo -e " Valid types: ${YELLOW}$VALID_TYPES${NC}"
|
||||||
|
echo ""
|
||||||
|
echo -e " Your message: ${YELLOW}$MSG${NC}"
|
||||||
|
echo ""
|
||||||
|
echo -e " Examples:"
|
||||||
|
echo -e " ${YELLOW}feat: add cover letter refinement${NC}"
|
||||||
|
echo -e " ${YELLOW}fix(wizard): handle missing user.yaml gracefully${NC}"
|
||||||
|
echo -e " ${YELLOW}docs: update tier system reference${NC}"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
exit 0
|
||||||
84
.githooks/pre-commit
Executable file
84
.githooks/pre-commit
Executable file
|
|
@ -0,0 +1,84 @@
|
||||||
|
#!/usr/bin/env bash
|
||||||
|
# .githooks/pre-commit — blocks sensitive files and credential patterns from being committed
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
RED='\033[0;31m'; YELLOW='\033[1;33m'; BOLD='\033[1m'; NC='\033[0m'
|
||||||
|
|
||||||
|
BLOCKED=0
|
||||||
|
STAGED=$(git diff --cached --name-only --diff-filter=ACM 2>/dev/null)
|
||||||
|
|
||||||
|
if [[ -z "$STAGED" ]]; then
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
# ── Blocked filenames ──────────────────────────────────────────────────────────
|
||||||
|
BLOCKED_FILES=(
|
||||||
|
".env"
|
||||||
|
".env.local"
|
||||||
|
".env.production"
|
||||||
|
".env.staging"
|
||||||
|
"*.pem"
|
||||||
|
"*.key"
|
||||||
|
"*.p12"
|
||||||
|
"*.pfx"
|
||||||
|
"id_rsa"
|
||||||
|
"id_ecdsa"
|
||||||
|
"id_ed25519"
|
||||||
|
"id_dsa"
|
||||||
|
"*.ppk"
|
||||||
|
"secrets.yml"
|
||||||
|
"secrets.yaml"
|
||||||
|
"credentials.json"
|
||||||
|
"service-account*.json"
|
||||||
|
"*.keystore"
|
||||||
|
"htpasswd"
|
||||||
|
".htpasswd"
|
||||||
|
)
|
||||||
|
|
||||||
|
while IFS= read -r file; do
|
||||||
|
filename="$(basename "$file")"
|
||||||
|
for pattern in "${BLOCKED_FILES[@]}"; do
|
||||||
|
# shellcheck disable=SC2254
|
||||||
|
case "$filename" in
|
||||||
|
$pattern)
|
||||||
|
echo -e "${RED}BLOCKED:${NC} ${BOLD}$file${NC} matches blocked filename pattern '${YELLOW}$pattern${NC}'"
|
||||||
|
BLOCKED=1
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
done
|
||||||
|
done <<< "$STAGED"
|
||||||
|
|
||||||
|
# ── Blocked content patterns ───────────────────────────────────────────────────
|
||||||
|
declare -A CONTENT_PATTERNS=(
|
||||||
|
["RSA/EC private key header"]="-----BEGIN (RSA|EC|DSA|OPENSSH) PRIVATE KEY"
|
||||||
|
["AWS access key"]="AKIA[0-9A-Z]{16}"
|
||||||
|
["GitHub token"]="ghp_[A-Za-z0-9]{36}"
|
||||||
|
["Generic API key assignment"]="(api_key|API_KEY|secret_key|SECRET_KEY)\s*=\s*['\"][A-Za-z0-9_\-]{16,}"
|
||||||
|
["Stripe secret key"]="sk_(live|test)_[A-Za-z0-9]{24,}"
|
||||||
|
["Forgejo/Gitea token (40 hex chars)"]="[a-f0-9]{40}"
|
||||||
|
)
|
||||||
|
|
||||||
|
while IFS= read -r file; do
|
||||||
|
# Skip binary files
|
||||||
|
if git diff --cached -- "$file" | grep -qP "^\+.*\x00"; then
|
||||||
|
continue
|
||||||
|
fi
|
||||||
|
for label in "${!CONTENT_PATTERNS[@]}"; do
|
||||||
|
pattern="${CONTENT_PATTERNS[$label]}"
|
||||||
|
matches=$(git diff --cached -- "$file" | grep "^+" | grep -cP "$pattern" 2>/dev/null || true)
|
||||||
|
if [[ "$matches" -gt 0 ]]; then
|
||||||
|
echo -e "${RED}BLOCKED:${NC} ${BOLD}$file${NC} contains pattern matching '${YELLOW}$label${NC}'"
|
||||||
|
BLOCKED=1
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
done <<< "$STAGED"
|
||||||
|
|
||||||
|
# ── Result ─────────────────────────────────────────────────────────────────────
|
||||||
|
if [[ "$BLOCKED" -eq 1 ]]; then
|
||||||
|
echo ""
|
||||||
|
echo -e "${RED}Commit rejected.${NC} Remove sensitive files/content before committing."
|
||||||
|
echo -e "To bypass in an emergency: ${YELLOW}git commit --no-verify${NC} (use with extreme caution)"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
exit 0
|
||||||
30
.github/ISSUE_TEMPLATE/bug_report.md
vendored
Normal file
30
.github/ISSUE_TEMPLATE/bug_report.md
vendored
Normal file
|
|
@ -0,0 +1,30 @@
|
||||||
|
---
|
||||||
|
name: Bug report
|
||||||
|
about: Something isn't working correctly
|
||||||
|
labels: bug
|
||||||
|
---
|
||||||
|
|
||||||
|
## Describe the bug
|
||||||
|
|
||||||
|
<!-- A clear description of what went wrong. -->
|
||||||
|
|
||||||
|
## Steps to reproduce
|
||||||
|
|
||||||
|
1.
|
||||||
|
2.
|
||||||
|
3.
|
||||||
|
|
||||||
|
## Expected behaviour
|
||||||
|
|
||||||
|
## Actual behaviour
|
||||||
|
|
||||||
|
<!-- Paste relevant log output below (redact any API keys or personal info): -->
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
## Environment
|
||||||
|
|
||||||
|
- Peregrine version: <!-- output of `./manage.sh status` or git tag -->
|
||||||
|
- OS:
|
||||||
|
- Runtime: Docker / conda-direct
|
||||||
|
- GPU profile: remote / cpu / single-gpu / dual-gpu
|
||||||
5
.github/ISSUE_TEMPLATE/config.yml
vendored
Normal file
5
.github/ISSUE_TEMPLATE/config.yml
vendored
Normal file
|
|
@ -0,0 +1,5 @@
|
||||||
|
blank_issues_enabled: false
|
||||||
|
contact_links:
|
||||||
|
- name: Security vulnerability
|
||||||
|
url: mailto:security@circuitforge.tech
|
||||||
|
about: Do not open a public issue for security vulnerabilities. Email us instead.
|
||||||
26
.github/ISSUE_TEMPLATE/feature_request.md
vendored
Normal file
26
.github/ISSUE_TEMPLATE/feature_request.md
vendored
Normal file
|
|
@ -0,0 +1,26 @@
|
||||||
|
---
|
||||||
|
name: Feature request
|
||||||
|
about: Suggest an improvement or new capability
|
||||||
|
labels: enhancement
|
||||||
|
---
|
||||||
|
|
||||||
|
## Problem statement
|
||||||
|
|
||||||
|
<!-- What are you trying to do that's currently hard or impossible? -->
|
||||||
|
|
||||||
|
## Proposed solution
|
||||||
|
|
||||||
|
## Alternatives considered
|
||||||
|
|
||||||
|
## Which tier would this belong to?
|
||||||
|
|
||||||
|
- [ ] Free
|
||||||
|
- [ ] Paid
|
||||||
|
- [ ] Premium
|
||||||
|
- [ ] Ultra (human-in-the-loop)
|
||||||
|
- [ ] Not sure
|
||||||
|
|
||||||
|
## Would you be willing to contribute a PR?
|
||||||
|
|
||||||
|
- [ ] Yes
|
||||||
|
- [ ] No
|
||||||
26
.github/ISSUE_TEMPLATE/support_request.md
vendored
Normal file
26
.github/ISSUE_TEMPLATE/support_request.md
vendored
Normal file
|
|
@ -0,0 +1,26 @@
|
||||||
|
---
|
||||||
|
name: Support Request
|
||||||
|
about: Ask a question or get help using Peregrine
|
||||||
|
title: '[Support] '
|
||||||
|
labels: question
|
||||||
|
assignees: ''
|
||||||
|
---
|
||||||
|
|
||||||
|
## What are you trying to do?
|
||||||
|
|
||||||
|
<!-- Describe what you're trying to accomplish -->
|
||||||
|
|
||||||
|
## What have you tried?
|
||||||
|
|
||||||
|
<!-- Steps you've already taken, docs you've read, etc. -->
|
||||||
|
|
||||||
|
## Environment
|
||||||
|
|
||||||
|
- OS: <!-- e.g. Ubuntu 22.04, macOS 14 -->
|
||||||
|
- Install method: <!-- Docker / Podman / source -->
|
||||||
|
- Peregrine version: <!-- run `./manage.sh status` or check the UI footer -->
|
||||||
|
- LLM backend: <!-- Ollama / vLLM / OpenAI / other -->
|
||||||
|
|
||||||
|
## Logs or screenshots
|
||||||
|
|
||||||
|
<!-- Paste relevant output from `./manage.sh logs` or attach a screenshot -->
|
||||||
27
.github/pull_request_template.md
vendored
Normal file
27
.github/pull_request_template.md
vendored
Normal file
|
|
@ -0,0 +1,27 @@
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
<!-- What does this PR do? -->
|
||||||
|
|
||||||
|
## Related issue(s)
|
||||||
|
|
||||||
|
Closes #
|
||||||
|
|
||||||
|
## Type of change
|
||||||
|
|
||||||
|
- [ ] feat — new feature
|
||||||
|
- [ ] fix — bug fix
|
||||||
|
- [ ] docs — documentation only
|
||||||
|
- [ ] chore — tooling, deps, refactor
|
||||||
|
- [ ] test — test coverage
|
||||||
|
|
||||||
|
## Testing
|
||||||
|
|
||||||
|
<!-- What did you run to verify this works? -->
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pytest tests/ -v
|
||||||
|
```
|
||||||
|
|
||||||
|
## CLA
|
||||||
|
|
||||||
|
- [ ] I agree that my contribution is licensed under the project's [BSL 1.1](./LICENSE-BSL) terms.
|
||||||
29
.github/workflows/ci.yml
vendored
Normal file
29
.github/workflows/ci.yml
vendored
Normal file
|
|
@ -0,0 +1,29 @@
|
||||||
|
name: CI
|
||||||
|
|
||||||
|
on:
|
||||||
|
push:
|
||||||
|
branches: [main]
|
||||||
|
pull_request:
|
||||||
|
branches: [main]
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
test:
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Install system dependencies
|
||||||
|
run: sudo apt-get update -q && sudo apt-get install -y libsqlcipher-dev
|
||||||
|
|
||||||
|
- name: Set up Python
|
||||||
|
uses: actions/setup-python@v5
|
||||||
|
with:
|
||||||
|
python-version: "3.11"
|
||||||
|
cache: pip
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: pip install -r requirements.txt
|
||||||
|
|
||||||
|
- name: Run tests
|
||||||
|
run: pytest tests/ -v --tb=short
|
||||||
30
.gitignore
vendored
30
.gitignore
vendored
|
|
@ -18,3 +18,33 @@ log/
|
||||||
unsloth_compiled_cache/
|
unsloth_compiled_cache/
|
||||||
data/survey_screenshots/*
|
data/survey_screenshots/*
|
||||||
!data/survey_screenshots/.gitkeep
|
!data/survey_screenshots/.gitkeep
|
||||||
|
config/user.yaml
|
||||||
|
config/plain_text_resume.yaml
|
||||||
|
config/.backup-*
|
||||||
|
config/integrations/*.yaml
|
||||||
|
!config/integrations/*.yaml.example
|
||||||
|
|
||||||
|
# companyScraper runtime artifacts
|
||||||
|
scrapers/.cache/
|
||||||
|
scrapers/.debug/
|
||||||
|
scrapers/raw_scrapes/
|
||||||
|
|
||||||
|
compose.override.yml
|
||||||
|
config/license.json
|
||||||
|
config/user.yaml.working
|
||||||
|
|
||||||
|
# Claude context files — kept out of version control
|
||||||
|
CLAUDE.md
|
||||||
|
|
||||||
|
data/email_score.jsonl
|
||||||
|
data/email_label_queue.jsonl
|
||||||
|
data/email_compare_sample.jsonl
|
||||||
|
|
||||||
|
config/label_tool.yaml
|
||||||
|
config/server.yaml
|
||||||
|
|
||||||
|
demo/data/*.db
|
||||||
|
demo/seed_demo.py
|
||||||
|
|
||||||
|
# Git worktrees
|
||||||
|
.worktrees/
|
||||||
|
|
|
||||||
32
.gitleaks.toml
Normal file
32
.gitleaks.toml
Normal file
|
|
@ -0,0 +1,32 @@
|
||||||
|
# peregrine/.gitleaks.toml — per-repo allowlists extending the shared base config
|
||||||
|
[extend]
|
||||||
|
path = "/Library/Development/CircuitForge/circuitforge-hooks/gitleaks.toml"
|
||||||
|
|
||||||
|
[allowlist]
|
||||||
|
description = "Peregrine-specific allowlists"
|
||||||
|
paths = [
|
||||||
|
'docs/plans/.*', # plan docs contain example tokens and placeholders
|
||||||
|
'docs/reference/.*', # reference docs (globally excluded in base config)
|
||||||
|
'tests/.*', # test fixtures use fake phone numbers as job IDs
|
||||||
|
'scripts/integrations/apple_calendar\.py', # you@icloud.com is a placeholder comment
|
||||||
|
# Streamlit app files: key= params are widget identifiers, not secrets
|
||||||
|
'app/feedback\.py',
|
||||||
|
'app/pages/2_Settings\.py',
|
||||||
|
'app/pages/7_Survey\.py',
|
||||||
|
# SearXNG default config: change-me-in-production is a well-known public placeholder
|
||||||
|
'docker/searxng/settings\.yml',
|
||||||
|
]
|
||||||
|
regexes = [
|
||||||
|
# Job listing numeric IDs (look like phone numbers to the phone rule)
|
||||||
|
'\d{10}\.html', # Craigslist listing IDs
|
||||||
|
'\d{10}\/', # LinkedIn job IDs in URLs
|
||||||
|
# Localhost port patterns (look like phone numbers)
|
||||||
|
'localhost:\d{4,5}',
|
||||||
|
# Unix epoch timestamps in the 2025–2026 range (10-digit, look like phone numbers)
|
||||||
|
'174\d{7}',
|
||||||
|
# Example / placeholder license key patterns
|
||||||
|
'CFG-[A-Z]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}',
|
||||||
|
# Phone number false positives: 555 area code variants not caught by base allowlist
|
||||||
|
'555\) \d{3}-\d{4}',
|
||||||
|
'555-\d{3}-\d{4}',
|
||||||
|
]
|
||||||
129
CHANGELOG.md
Normal file
129
CHANGELOG.md
Normal file
|
|
@ -0,0 +1,129 @@
|
||||||
|
# Changelog
|
||||||
|
|
||||||
|
All notable changes to Peregrine are documented here.
|
||||||
|
Format follows [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## [Unreleased]
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## [0.4.0] — 2026-03-13
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- **LinkedIn profile import** — one-click import from a public LinkedIn profile URL
|
||||||
|
(Playwright headless Chrome, no login required) or from a LinkedIn data export zip.
|
||||||
|
Staged to `linkedin_stage.json` so the profile is parsed once and reused across
|
||||||
|
sessions without repeated network requests. Available on all tiers including Free.
|
||||||
|
- `scripts/linkedin_utils.py` — HTML parser with ordered CSS selector fallbacks;
|
||||||
|
extracts name, experience, education, skills, certifications, summary
|
||||||
|
- `scripts/linkedin_scraper.py` — Playwright URL scraper + export zip CSV parser;
|
||||||
|
atomic staging file write; URL validation; robust error handling
|
||||||
|
- `scripts/linkedin_parser.py` — staging file reader; re-runs HTML parser on stored
|
||||||
|
raw HTML so selector improvements apply without re-scraping
|
||||||
|
- `app/components/linkedin_import.py` — shared Streamlit widget (status bar, preview,
|
||||||
|
URL import, advanced zip upload) used by both wizard and Settings
|
||||||
|
- Wizard step 3: new "🔗 LinkedIn" tab alongside Upload and Build Manually
|
||||||
|
- Settings → Resume Profile: collapsible "Import from LinkedIn" expander
|
||||||
|
- Dockerfile: Playwright Chromium install added to Docker image
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- **Cloud mode perpetual onboarding loop** — wizard gate in `app.py` now reads
|
||||||
|
`get_config_dir()/user.yaml` (per-user in cloud, repo-level locally) instead of a
|
||||||
|
hardcoded repo path; completing the wizard now correctly exits it in cloud mode
|
||||||
|
- **Cloud resume YAML path** — wizard step 3 writes resume to per-user `CONFIG_DIR`
|
||||||
|
instead of the shared repo `config/` (would have merged all cloud users' data)
|
||||||
|
- **Cloud session redirect** — missing/invalid session token now JS-redirects to
|
||||||
|
`circuitforge.tech/login` instead of showing a raw error message
|
||||||
|
- Removed remaining AIHawk UI references (`Home.py`, `4_Apply.py`, `migrate.py`)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## [0.3.0] — 2026-03-06
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- **Feedback button** — in-app issue reporting with screenshot paste support; posts
|
||||||
|
directly to Forgejo as structured issues; available from sidebar on all pages
|
||||||
|
(`app/feedback.py`, `scripts/feedback_api.py`, `app/components/paste_image.py`)
|
||||||
|
- **BYOK cloud backend detection** — `scripts/byok_guard.py`: pure Python detection
|
||||||
|
engine with full unit test coverage (18 tests); classifies backends as cloud or local
|
||||||
|
based on type, `base_url` heuristic, and opt-out `local: true` flag
|
||||||
|
- **BYOK activation warning** — one-time acknowledgment required in Settings when a
|
||||||
|
new cloud LLM backend is enabled; shows data inventory (what leaves your machine,
|
||||||
|
what stays local), provider policy links; ack state persisted to `config/user.yaml`
|
||||||
|
under `byok_acknowledged_backends`
|
||||||
|
- **Sidebar cloud LLM indicator** — amber badge on every page when any cloud backend
|
||||||
|
is active; links to Settings; disappears when reverted to local-only config
|
||||||
|
- **LLM suggest: search terms** — three-angle analysis from resume (job titles,
|
||||||
|
skills keywords, and exclude terms to filter irrelevant listings)
|
||||||
|
- **LLM suggest: resume keywords** — skills gap analysis against job descriptions
|
||||||
|
- **LLM Suggest button** in Settings → Search → Skills & Keywords section
|
||||||
|
- **Backup/restore script** (`scripts/backup.py`) — multi-instance and legacy support
|
||||||
|
- `PRIVACY.md` — short-form privacy notice linked from Settings
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- Settings save button for LLM Backends now gates on cloud acknowledgment before
|
||||||
|
writing `config/llm.yaml`
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Settings widget crash on certain rerun paths
|
||||||
|
- Docker service controls in Settings → System tab
|
||||||
|
- `DEFAULT_DB` now respects `STAGING_DB` environment variable (was silently ignoring it)
|
||||||
|
- `generate()` in cover letter refinement now correctly passes `max_tokens` kwarg
|
||||||
|
|
||||||
|
### Security / Privacy
|
||||||
|
- Full test suite anonymized — fictional "Alex Rivera" replaces all real personal data
|
||||||
|
in test fixtures (`tests/test_cover_letter.py`, `test_imap_sync.py`,
|
||||||
|
`test_classifier_adapters.py`, `test_db.py`)
|
||||||
|
- Complete PII scrub from git history: real name, email address, and phone number
|
||||||
|
removed from all 161 commits across both branches via `git filter-repo`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## [0.2.0] — 2026-02-26
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Cover letter iterative refinement: "Refine with Feedback" expander in Apply Workspace;
|
||||||
|
`generate()` accepts `previous_result`/`feedback`; task params passed through `submit_task`
|
||||||
|
- Expanded first-run wizard: 7-step onboarding with GPU detection, tier selection,
|
||||||
|
resume upload/parsing, LLM inference test, search profile builder, integration cards
|
||||||
|
- Tier system: free / paid / premium feature gates (`app/wizard/tiers.py`)
|
||||||
|
- 13 integration drivers: Notion, Google Sheets, Airtable, Google Drive, Dropbox,
|
||||||
|
OneDrive, MEGA, Nextcloud, Google Calendar, Apple Calendar, Slack, Discord,
|
||||||
|
Home Assistant — with auto-discovery registry
|
||||||
|
- Resume parser: PDF (pdfplumber) and DOCX (python-docx) + LLM structuring
|
||||||
|
- `wizard_generate` background task type with iterative refinement (feedback loop)
|
||||||
|
- Dismissible setup banners on Home page (13 contextual prompts)
|
||||||
|
- Developer tab in Settings: tier override selectbox and wizard reset button
|
||||||
|
- Integrations tab in Settings: connect / test / disconnect all 12 non-Notion drivers
|
||||||
|
- HuggingFace token moved to Developer tab
|
||||||
|
- `params` column in `background_tasks` for wizard task payloads
|
||||||
|
- `wizard_complete`, `wizard_step`, `tier`, `dev_tier_override`, `dismissed_banners`,
|
||||||
|
`effective_tier` added to UserProfile
|
||||||
|
- MkDocs documentation site (Material theme, 20 pages)
|
||||||
|
- `LICENSE-MIT` and `LICENSE-BSL`, `CONTRIBUTING.md`, `CHANGELOG.md`
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- `app.py` wizard gate now checks `wizard_complete` flag in addition to file existence
|
||||||
|
- Settings tabs reorganised: Integrations tab added, Developer tab conditionally shown
|
||||||
|
- HF token removed from Services tab (now Developer-only)
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
- Dead `app/pages/3_Resume_Editor.py` (functionality lives in Settings → Resume Profile)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## [0.1.0] — 2026-02-01
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- Initial release: JobSpy discovery pipeline, SQLite staging, Streamlit UI
|
||||||
|
- Job Review, Apply Workspace, Interviews kanban, Interview Prep, Survey Assistant
|
||||||
|
- LLM router with fallback chain (Ollama, vLLM, Claude Code wrapper, Anthropic)
|
||||||
|
- Notion sync, email sync with IMAP classifier, company research with SearXNG
|
||||||
|
- Background task runner with daemon threads
|
||||||
|
- Vision service (moondream2) for survey screenshot analysis
|
||||||
|
- Adzuna, The Ladders, and Craigslist custom board scrapers
|
||||||
|
- Docker Compose profiles: remote, cpu, single-gpu, dual-gpu
|
||||||
|
- `setup.sh` cross-platform dependency installer
|
||||||
|
- `scripts/preflight.py` and `scripts/migrate.py`
|
||||||
212
CLAUDE.md
212
CLAUDE.md
|
|
@ -1,212 +0,0 @@
|
||||||
# Job Seeker Platform — Claude Context
|
|
||||||
|
|
||||||
## Project
|
|
||||||
Automated job discovery + resume matching + application pipeline for Alex Rivera.
|
|
||||||
|
|
||||||
Full pipeline:
|
|
||||||
```
|
|
||||||
JobSpy → discover.py → SQLite (staging.db) → match.py → Job Review UI
|
|
||||||
→ Apply Workspace (cover letter + PDF) → Interviews kanban
|
|
||||||
→ phone_screen → interviewing → offer → hired
|
|
||||||
↓
|
|
||||||
Notion DB (synced via sync.py)
|
|
||||||
```
|
|
||||||
|
|
||||||
## Environment
|
|
||||||
- Python env: `conda run -n job-seeker <cmd>` — always use this, never bare python
|
|
||||||
- Run tests: `/devl/miniconda3/envs/job-seeker/bin/pytest tests/ -v`
|
|
||||||
(use direct binary — `conda run pytest` can spawn runaway processes)
|
|
||||||
- Run discovery: `conda run -n job-seeker python scripts/discover.py`
|
|
||||||
- Recreate env: `conda env create -f environment.yml`
|
|
||||||
- pytest.ini scopes test collection to `tests/` only — never widen this
|
|
||||||
|
|
||||||
## ⚠️ AIHawk env isolation — CRITICAL
|
|
||||||
- NEVER `pip install -r aihawk/requirements.txt` into the job-seeker env
|
|
||||||
- AIHawk pulls torch + CUDA (~7GB) which causes OOM during test runs
|
|
||||||
- AIHawk must run in its own env: `conda create -n aihawk-env python=3.12`
|
|
||||||
- job-seeker env must stay lightweight (no torch, no sentence-transformers, no CUDA)
|
|
||||||
|
|
||||||
## Web UI (Streamlit)
|
|
||||||
- Run: `bash scripts/manage-ui.sh start` → http://localhost:8501
|
|
||||||
- Manage: `start | stop | restart | status | logs`
|
|
||||||
- Direct binary: `/devl/miniconda3/envs/job-seeker/bin/streamlit run app/app.py`
|
|
||||||
- Entry point: `app/app.py` (uses `st.navigation()` — do NOT run `app/Home.py` directly)
|
|
||||||
- `staging.db` is gitignored — SQLite staging layer between discovery and Notion
|
|
||||||
|
|
||||||
### Pages
|
|
||||||
| Page | File | Purpose |
|
|
||||||
|------|------|---------|
|
|
||||||
| Home | `app/Home.py` | Dashboard, discovery trigger, danger-zone purge |
|
|
||||||
| Job Review | `app/pages/1_Job_Review.py` | Batch approve/reject with sorting |
|
|
||||||
| Settings | `app/pages/2_Settings.py` | LLM backends, search profiles, Notion, services |
|
|
||||||
| Resume Profile | Settings → Resume Profile tab | Edit AIHawk YAML profile (was standalone `3_Resume_Editor.py`) |
|
|
||||||
| Apply Workspace | `app/pages/4_Apply.py` | Cover letter gen + PDF export + mark applied + reject listing |
|
|
||||||
| Interviews | `app/pages/5_Interviews.py` | Kanban: phone_screen→interviewing→offer→hired |
|
|
||||||
| Interview Prep | `app/pages/6_Interview_Prep.py` | Live reference sheet during calls + Practice Q&A |
|
|
||||||
| Survey Assistant | `app/pages/7_Survey.py` | Culture-fit survey help: text paste + screenshot (moondream2) |
|
|
||||||
|
|
||||||
## Job Status Pipeline
|
|
||||||
```
|
|
||||||
pending → approved/rejected (Job Review)
|
|
||||||
approved → applied (Apply Workspace — mark applied)
|
|
||||||
approved → rejected (Apply Workspace — reject listing button)
|
|
||||||
applied → survey (Interviews — "📋 Survey" button; pre-kanban section)
|
|
||||||
applied → phone_screen (Interviews — triggers company research)
|
|
||||||
survey → phone_screen (Interviews — after survey completed)
|
|
||||||
phone_screen → interviewing
|
|
||||||
interviewing → offer
|
|
||||||
offer → hired
|
|
||||||
any stage → rejected (rejection_stage captured for analytics)
|
|
||||||
applied/approved → synced (sync.py → Notion)
|
|
||||||
```
|
|
||||||
|
|
||||||
## SQLite Schema (`staging.db`)
|
|
||||||
### `jobs` table key columns
|
|
||||||
- Standard: `id, title, company, url, source, location, is_remote, salary, description`
|
|
||||||
- Scores: `match_score, keyword_gaps`
|
|
||||||
- Dates: `date_found, applied_at, survey_at, phone_screen_at, interviewing_at, offer_at, hired_at`
|
|
||||||
- Interview: `interview_date, rejection_stage`
|
|
||||||
- Content: `cover_letter, notion_page_id`
|
|
||||||
|
|
||||||
### Additional tables
|
|
||||||
- `job_contacts` — email thread log per job (direction, subject, from/to, body, received_at)
|
|
||||||
- `company_research` — LLM-generated brief per job (company_brief, ceo_brief, talking_points, raw_output, accessibility_brief)
|
|
||||||
- `background_tasks` — async LLM task queue (task_type, job_id, status: queued/running/completed/failed)
|
|
||||||
- `survey_responses` — per-job Q&A pairs (survey_name, received_at, source, raw_input, image_path, mode, llm_output, reported_score)
|
|
||||||
|
|
||||||
## Scripts
|
|
||||||
| Script | Purpose |
|
|
||||||
|--------|---------|
|
|
||||||
| `scripts/discover.py` | JobSpy + custom board scrape → SQLite insert |
|
|
||||||
| `scripts/custom_boards/adzuna.py` | Adzuna Jobs API (app_id + app_key in config/adzuna.yaml) |
|
|
||||||
| `scripts/custom_boards/theladders.py` | The Ladders scraper via curl_cffi + __NEXT_DATA__ SSR parse |
|
|
||||||
| `scripts/match.py` | Resume keyword matching → match_score |
|
|
||||||
| `scripts/sync.py` | Push approved/applied jobs to Notion |
|
|
||||||
| `scripts/llm_router.py` | LLM fallback chain (reads config/llm.yaml) |
|
|
||||||
| `scripts/generate_cover_letter.py` | Cover letter via LLM; detects mission-aligned companies (music/animal welfare/education) and injects Para 3 hint |
|
|
||||||
| `scripts/company_research.py` | Pre-interview brief via LLM + optional SearXNG scrape; includes Inclusion & Accessibility section |
|
|
||||||
| `scripts/prepare_training_data.py` | Extract cover letter JSONL for fine-tuning |
|
|
||||||
| `scripts/finetune_local.py` | Unsloth QLoRA fine-tune on local GPU |
|
|
||||||
| `scripts/db.py` | All SQLite helpers (single source of truth) |
|
|
||||||
| `scripts/task_runner.py` | Background thread executor — `submit_task(db, type, job_id)` dispatches daemon threads for LLM jobs |
|
|
||||||
| `scripts/vision_service/main.py` | FastAPI moondream2 inference on port 8002; `manage-vision.sh` lifecycle |
|
|
||||||
|
|
||||||
## LLM Router
|
|
||||||
- Config: `config/llm.yaml`
|
|
||||||
- Cover letter fallback order: `claude_code → ollama (alex-cover-writer:latest) → vllm → copilot → anthropic`
|
|
||||||
- Research fallback order: `claude_code → vllm (__auto__, ouroboros) → ollama_research (llama3.1:8b) → ...`
|
|
||||||
- `alex-cover-writer:latest` is cover-letter only — it doesn't follow structured markdown prompts for research
|
|
||||||
- `LLMRouter.complete()` accepts `fallback_order=` override for per-task routing
|
|
||||||
- `LLMRouter.complete()` accepts `images: list[str]` (base64) — vision backends only; non-vision backends skipped when images present
|
|
||||||
- Vision fallback order config key: `vision_fallback_order: [vision_service, claude_code, anthropic]`
|
|
||||||
- `vision_service` backend type: POST to `/analyze`; skipped automatically when no images provided
|
|
||||||
- Claude Code wrapper: `/Library/Documents/Post Fight Processing/server-openai-wrapper-v2.js`
|
|
||||||
- Copilot wrapper: `/Library/Documents/Post Fight Processing/manage-copilot.sh start`
|
|
||||||
|
|
||||||
## Fine-Tuned Model
|
|
||||||
- Model: `alex-cover-writer:latest` registered in Ollama
|
|
||||||
- Base: `unsloth/Llama-3.2-3B-Instruct` (QLoRA, rank 16, 10 epochs)
|
|
||||||
- Training data: 62 cover letters from `/Library/Documents/JobSearch/`
|
|
||||||
- JSONL: `/Library/Documents/JobSearch/training_data/cover_letters.jsonl`
|
|
||||||
- Adapter: `/Library/Documents/JobSearch/training_data/finetune_output/adapter/`
|
|
||||||
- Merged: `/Library/Documents/JobSearch/training_data/gguf/alex-cover-writer/`
|
|
||||||
- Re-train: `conda run -n ogma python scripts/finetune_local.py`
|
|
||||||
(uses `ogma` env with unsloth + trl; pin to GPU 0 with `CUDA_VISIBLE_DEVICES=0`)
|
|
||||||
|
|
||||||
## Background Tasks
|
|
||||||
- Cover letter gen and company research run as daemon threads via `scripts/task_runner.py`
|
|
||||||
- Tasks survive page navigation; results written to existing tables when done
|
|
||||||
- On server restart, `app.py` startup clears any stuck `running`/`queued` rows to `failed`
|
|
||||||
- Dedup: only one queued/running task per `(task_type, job_id)` at a time
|
|
||||||
- Sidebar indicator (`app/app.py`) polls every 3s via `@st.fragment(run_every=3)`
|
|
||||||
- ⚠️ Streamlit fragment + sidebar: use `with st.sidebar: _fragment()` — sidebar context must WRAP the call, not be inside the fragment body
|
|
||||||
|
|
||||||
## Vision Service
|
|
||||||
- Script: `scripts/vision_service/main.py` (FastAPI, port 8002)
|
|
||||||
- Model: `vikhyatk/moondream2` revision `2025-01-09` — lazy-loaded on first `/analyze` (~1.8GB download)
|
|
||||||
- GPU: 4-bit quantization when CUDA available (~1.5GB VRAM); CPU fallback
|
|
||||||
- Conda env: `job-seeker-vision` — separate from job-seeker (torch + transformers live here)
|
|
||||||
- Create env: `conda env create -f scripts/vision_service/environment.yml`
|
|
||||||
- Manage: `bash scripts/manage-vision.sh start|stop|restart|status|logs`
|
|
||||||
- Survey page degrades gracefully to text-only when vision service is down
|
|
||||||
- ⚠️ Never install vision deps (torch, bitsandbytes, transformers) into the job-seeker env
|
|
||||||
|
|
||||||
## Company Research
|
|
||||||
- Script: `scripts/company_research.py`
|
|
||||||
- Auto-triggered when a job moves to `phone_screen` in the Interviews kanban
|
|
||||||
- Three-phase: (1) SearXNG company scrape → (1b) SearXNG news snippets → (2) LLM synthesis
|
|
||||||
- SearXNG scraper: `/Library/Development/scrapers/companyScraper.py`
|
|
||||||
- SearXNG Docker: run `docker compose up -d` from `/Library/Development/scrapers/SearXNG/` (port 8888)
|
|
||||||
- `beautifulsoup4` and `fake-useragent` are installed in job-seeker env (required for scraper)
|
|
||||||
- News search hits `/search?format=json` — JSON format must be enabled in `searxng-config/settings.yml`
|
|
||||||
- ⚠️ `settings.yml` owned by UID 977 (container user) — use `docker cp` to update, not direct writes
|
|
||||||
- ⚠️ `settings.yml` requires `use_default_settings: true` at the top or SearXNG fails schema validation
|
|
||||||
- `companyScraper` calls `sys.exit()` on missing deps — use `except BaseException` not `except Exception`
|
|
||||||
|
|
||||||
## Email Classifier Labels
|
|
||||||
Six labels: `interview_request`, `rejection`, `offer`, `follow_up`, `survey_received`, `other`
|
|
||||||
- `survey_received` — links or requests to complete a culture-fit survey/assessment
|
|
||||||
|
|
||||||
## Services (managed via Settings → Services tab)
|
|
||||||
| Service | Port | Notes |
|
|
||||||
|---------|------|-------|
|
|
||||||
| Streamlit UI | 8501 | `bash scripts/manage-ui.sh start` |
|
|
||||||
| Ollama | 11434 | `sudo systemctl start ollama` |
|
|
||||||
| Claude Code Wrapper | 3009 | `manage-services.sh start` in Post Fight Processing |
|
|
||||||
| GitHub Copilot Wrapper | 3010 | `manage-copilot.sh start` in Post Fight Processing |
|
|
||||||
| vLLM Server | 8000 | Manual start only |
|
|
||||||
| SearXNG | 8888 | `docker compose up -d` in scrapers/SearXNG/ |
|
|
||||||
| Vision Service | 8002 | `bash scripts/manage-vision.sh start` — moondream2 survey screenshot analysis |
|
|
||||||
|
|
||||||
## Notion
|
|
||||||
- DB: "Tracking Job Applications" (ID: `1bd75cff-7708-8007-8c00-f1de36620a0a`)
|
|
||||||
- `config/notion.yaml` is gitignored (live token); `.example` is committed
|
|
||||||
- Field names are non-obvious — always read from `field_map` in `config/notion.yaml`
|
|
||||||
- "Salary" = Notion title property (unusual — it's the page title field)
|
|
||||||
- "Job Source" = `multi_select` type
|
|
||||||
- "Role Link" = URL field
|
|
||||||
- "Status of Application" = status field; new listings use "Application Submitted"
|
|
||||||
- Sync pushes `approved` + `applied` jobs; marks them `synced` after
|
|
||||||
|
|
||||||
## Key Config Files
|
|
||||||
- `config/notion.yaml` — gitignored, has token + field_map
|
|
||||||
- `config/notion.yaml.example` — committed template
|
|
||||||
- `config/search_profiles.yaml` — titles, locations, boards, custom_boards, exclude_keywords, mission_tags (per profile)
|
|
||||||
- `config/llm.yaml` — LLM backend priority chain + enabled flags
|
|
||||||
- `config/tokens.yaml` — gitignored, stores HF token (chmod 600)
|
|
||||||
- `config/adzuna.yaml` — gitignored, Adzuna API app_id + app_key
|
|
||||||
- `config/adzuna.yaml.example` — committed template
|
|
||||||
|
|
||||||
## Custom Job Board Scrapers
|
|
||||||
- `scripts/custom_boards/adzuna.py` — Adzuna Jobs API; credentials in `config/adzuna.yaml`
|
|
||||||
- `scripts/custom_boards/theladders.py` — The Ladders SSR scraper; needs `curl_cffi` installed
|
|
||||||
- Scrapers registered in `CUSTOM_SCRAPERS` dict in `discover.py`
|
|
||||||
- Activated per-profile via `custom_boards: [adzuna, theladders]` in `search_profiles.yaml`
|
|
||||||
- `enrich_all_descriptions()` in `enrich_descriptions.py` covers all sources (not just Glassdoor)
|
|
||||||
- Home page "Fill Missing Descriptions" button dispatches `enrich_descriptions` task
|
|
||||||
|
|
||||||
## Mission Alignment & Accessibility
|
|
||||||
- Preferred industries: music, animal welfare, children's education (hardcoded in `generate_cover_letter.py`)
|
|
||||||
- `detect_mission_alignment(company, description)` injects a Para 3 hint into cover letters for aligned companies
|
|
||||||
- Company research includes an "Inclusion & Accessibility" section (8th section of the brief) in every brief
|
|
||||||
- Accessibility search query in `_SEARCH_QUERIES` hits SearXNG for ADA/ERG/disability signals
|
|
||||||
- `accessibility_brief` column in `company_research` table; shown in Interview Prep under ♿ section
|
|
||||||
- This info is for personal decision-making ONLY — never disclosed in applications
|
|
||||||
- In generalization: these become `profile.mission_industries` + `profile.accessibility_priority` in `user.yaml`
|
|
||||||
|
|
||||||
## Document Rule
|
|
||||||
Resumes and cover letters live in `/Library/Documents/JobSearch/` or Notion — never committed to this repo.
|
|
||||||
|
|
||||||
## AIHawk (LinkedIn Easy Apply)
|
|
||||||
- Cloned to `aihawk/` (gitignored)
|
|
||||||
- Config: `aihawk/data_folder/plain_text_resume.yaml` — search FILL_IN for gaps
|
|
||||||
- Self-ID: non-binary, pronouns any, no disability/drug-test disclosure
|
|
||||||
- Run: `conda run -n job-seeker python aihawk/main.py`
|
|
||||||
- Playwright: `conda run -n job-seeker python -m playwright install chromium`
|
|
||||||
|
|
||||||
## Git Remote
|
|
||||||
- Forgejo self-hosted at https://git.opensourcesolarpunk.com (username: pyr0ball)
|
|
||||||
- `git remote add origin https://git.opensourcesolarpunk.com/pyr0ball/job-seeker.git`
|
|
||||||
|
|
||||||
## Subagents
|
|
||||||
Use `general-purpose` subagent type (not `Bash`) when tasks require file writes.
|
|
||||||
83
CONTRIBUTING.md
Normal file
83
CONTRIBUTING.md
Normal file
|
|
@ -0,0 +1,83 @@
|
||||||
|
# Contributing to Peregrine
|
||||||
|
|
||||||
|
Thanks for your interest. Peregrine is developed primarily at
|
||||||
|
[git.opensourcesolarpunk.com](https://git.opensourcesolarpunk.com/pyr0ball/peregrine).
|
||||||
|
GitHub and Codeberg are push mirrors — issues and PRs are welcome on either platform.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
Peregrine is licensed under **[BSL 1.1](./LICENSE-BSL)** — Business Source License.
|
||||||
|
|
||||||
|
What this means for you:
|
||||||
|
|
||||||
|
| Use case | Allowed? |
|
||||||
|
|----------|----------|
|
||||||
|
| Personal self-hosting, non-commercial | ✅ Free |
|
||||||
|
| Contributing code, fixing bugs, writing docs | ✅ Free |
|
||||||
|
| Commercial SaaS / hosted service | 🔒 Requires a paid license |
|
||||||
|
| After 4 years from each release date | ✅ Converts to MIT |
|
||||||
|
|
||||||
|
**By submitting a pull request you agree that your contribution is licensed under the
|
||||||
|
project's BSL 1.1 terms.** The PR template includes this as a checkbox.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Dev Setup
|
||||||
|
|
||||||
|
See [`docs/getting-started/installation.md`](docs/getting-started/installation.md) for
|
||||||
|
full instructions.
|
||||||
|
|
||||||
|
**Quick start (Docker — recommended):**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git clone https://git.opensourcesolarpunk.com/pyr0ball/peregrine.git
|
||||||
|
cd peregrine
|
||||||
|
./setup.sh # installs deps, activates git hooks
|
||||||
|
./manage.sh start
|
||||||
|
```
|
||||||
|
|
||||||
|
**Conda (no Docker):**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
conda run -n job-seeker pip install -r requirements.txt
|
||||||
|
streamlit run app/app.py
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Commit Format
|
||||||
|
|
||||||
|
Hooks enforce [Conventional Commits](https://www.conventionalcommits.org/):
|
||||||
|
|
||||||
|
```
|
||||||
|
type: short description
|
||||||
|
type(scope): short description
|
||||||
|
```
|
||||||
|
|
||||||
|
Valid types: `feat` `fix` `docs` `chore` `test` `refactor` `perf` `ci` `build`
|
||||||
|
|
||||||
|
The hook will tell you exactly what went wrong if your message is rejected.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Pull Request Process
|
||||||
|
|
||||||
|
1. Fork and branch from `main`
|
||||||
|
2. Write tests first (we use `pytest`)
|
||||||
|
3. Run `pytest tests/ -v` — all tests must pass
|
||||||
|
4. Open a PR on GitHub or Codeberg
|
||||||
|
5. PRs are reviewed and cherry-picked to Forgejo (the canonical repo) — you don't need a Forgejo account
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Reporting Issues
|
||||||
|
|
||||||
|
Use the issue templates:
|
||||||
|
|
||||||
|
- **Bug** — steps to reproduce, version, OS, Docker or conda, logs
|
||||||
|
- **Feature** — problem statement, proposed solution, which tier it belongs to
|
||||||
|
|
||||||
|
**Security issues:** Do **not** open a public issue. Email `security@circuitforge.tech`.
|
||||||
|
See [SECURITY.md](./SECURITY.md).
|
||||||
30
Dockerfile
Normal file
30
Dockerfile
Normal file
|
|
@ -0,0 +1,30 @@
|
||||||
|
# Dockerfile
|
||||||
|
FROM python:3.11-slim
|
||||||
|
|
||||||
|
WORKDIR /app
|
||||||
|
|
||||||
|
# System deps for companyScraper (beautifulsoup4, fake-useragent, lxml) and PDF gen
|
||||||
|
# libsqlcipher-dev: required to build pysqlcipher3 (SQLCipher AES-256 encryption for cloud mode)
|
||||||
|
RUN apt-get update && apt-get install -y --no-install-recommends \
|
||||||
|
gcc libffi-dev curl libsqlcipher-dev \
|
||||||
|
&& rm -rf /var/lib/apt/lists/*
|
||||||
|
|
||||||
|
COPY requirements.txt .
|
||||||
|
# Install Python dependencies
|
||||||
|
RUN pip install --no-cache-dir -r requirements.txt
|
||||||
|
|
||||||
|
# Install Playwright browser (cached separately from Python deps so requirements
|
||||||
|
# changes don't bust the ~600–900 MB Chromium layer and vice versa)
|
||||||
|
RUN playwright install chromium && playwright install-deps chromium
|
||||||
|
|
||||||
|
# Bundle companyScraper (company research web scraper)
|
||||||
|
COPY scrapers/ /app/scrapers/
|
||||||
|
|
||||||
|
COPY . .
|
||||||
|
|
||||||
|
EXPOSE 8501
|
||||||
|
|
||||||
|
CMD ["streamlit", "run", "app/app.py", \
|
||||||
|
"--server.port=8501", \
|
||||||
|
"--server.headless=true", \
|
||||||
|
"--server.fileWatcherType=none"]
|
||||||
38
Dockerfile.finetune
Normal file
38
Dockerfile.finetune
Normal file
|
|
@ -0,0 +1,38 @@
|
||||||
|
# Dockerfile.finetune — Cover letter LoRA fine-tuner (QLoRA via unsloth)
|
||||||
|
# Large image (~12-15 GB after build). Built once, cached on rebuilds.
|
||||||
|
# GPU strongly recommended. CPU fallback works but training is very slow.
|
||||||
|
#
|
||||||
|
# Tested base: pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime
|
||||||
|
# If your GPU requires a different CUDA version, change the FROM line and
|
||||||
|
# reinstall bitsandbytes for the matching CUDA (e.g. bitsandbytes-cuda121).
|
||||||
|
FROM pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime
|
||||||
|
|
||||||
|
WORKDIR /app
|
||||||
|
|
||||||
|
# Build tools needed by bitsandbytes CUDA kernels and unsloth
|
||||||
|
RUN apt-get update && apt-get install -y --no-install-recommends \
|
||||||
|
gcc g++ git libgomp1 \
|
||||||
|
&& rm -rf /var/lib/apt/lists/*
|
||||||
|
|
||||||
|
# Install training stack.
|
||||||
|
# unsloth detects CUDA version automatically from the base image.
|
||||||
|
RUN pip install --no-cache-dir \
|
||||||
|
"unsloth @ git+https://github.com/unslothai/unsloth.git" \
|
||||||
|
"datasets>=2.18" "trl>=0.8" peft transformers \
|
||||||
|
"bitsandbytes>=0.43.0" accelerate sentencepiece \
|
||||||
|
requests pyyaml
|
||||||
|
|
||||||
|
COPY scripts/ /app/scripts/
|
||||||
|
COPY config/ /app/config/
|
||||||
|
|
||||||
|
ENV PYTHONUNBUFFERED=1
|
||||||
|
# Pin to GPU 0; overridable at runtime with --env CUDA_VISIBLE_DEVICES=
|
||||||
|
ENV CUDA_VISIBLE_DEVICES=0
|
||||||
|
|
||||||
|
# Runtime env vars injected by compose.yml:
|
||||||
|
# OLLAMA_URL — Ollama API base (default: http://ollama:11434)
|
||||||
|
# OLLAMA_MODELS_MOUNT — finetune container's mount path for ollama models volume
|
||||||
|
# OLLAMA_MODELS_OLLAMA_PATH — Ollama container's mount path for same volume
|
||||||
|
# DOCS_DIR — cover letters + training data root (default: /docs)
|
||||||
|
|
||||||
|
ENTRYPOINT ["python", "scripts/finetune_local.py"]
|
||||||
26
LICENSE-BSL
Normal file
26
LICENSE-BSL
Normal file
|
|
@ -0,0 +1,26 @@
|
||||||
|
Business Source License 1.1
|
||||||
|
|
||||||
|
Licensor: Circuit Forge LLC
|
||||||
|
Licensed Work: Peregrine — AI-powered job search pipeline
|
||||||
|
Copyright (c) 2026 Circuit Forge LLC
|
||||||
|
Additional Use Grant: You may use the Licensed Work for personal,
|
||||||
|
non-commercial job searching purposes only.
|
||||||
|
Change Date: 2030-01-01
|
||||||
|
Change License: MIT License
|
||||||
|
|
||||||
|
For the full Business Source License 1.1 text, see:
|
||||||
|
https://mariadb.com/bsl11/
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
This license applies to the following components of Peregrine:
|
||||||
|
|
||||||
|
- scripts/llm_router.py
|
||||||
|
- scripts/generate_cover_letter.py
|
||||||
|
- scripts/company_research.py
|
||||||
|
- scripts/task_runner.py
|
||||||
|
- scripts/resume_parser.py
|
||||||
|
- scripts/imap_sync.py
|
||||||
|
- scripts/vision_service/
|
||||||
|
- scripts/integrations/
|
||||||
|
- app/
|
||||||
35
LICENSE-MIT
Normal file
35
LICENSE-MIT
Normal file
|
|
@ -0,0 +1,35 @@
|
||||||
|
MIT License
|
||||||
|
|
||||||
|
Copyright (c) 2026 Circuit Forge LLC
|
||||||
|
|
||||||
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
|
of this software and associated documentation files (the "Software"), to deal
|
||||||
|
in the Software without restriction, including without limitation the rights
|
||||||
|
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||||
|
copies of the Software, and to permit persons to whom the Software is
|
||||||
|
furnished to do so, subject to the following conditions:
|
||||||
|
|
||||||
|
The above copyright notice and this permission notice shall be included in all
|
||||||
|
copies or substantial portions of the Software.
|
||||||
|
|
||||||
|
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||||
|
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||||
|
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||||
|
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||||
|
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||||
|
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||||
|
SOFTWARE.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
This license applies to the following components of Peregrine:
|
||||||
|
|
||||||
|
- scripts/discover.py
|
||||||
|
- scripts/custom_boards/
|
||||||
|
- scripts/match.py
|
||||||
|
- scripts/db.py
|
||||||
|
- scripts/migrate.py
|
||||||
|
- scripts/preflight.py
|
||||||
|
- scripts/user_profile.py
|
||||||
|
- setup.sh
|
||||||
|
- Makefile
|
||||||
84
Makefile
Normal file
84
Makefile
Normal file
|
|
@ -0,0 +1,84 @@
|
||||||
|
# Makefile — Peregrine convenience targets
|
||||||
|
# Usage: make <target>
|
||||||
|
|
||||||
|
.PHONY: setup preflight start stop restart logs test prepare-training finetune clean help
|
||||||
|
|
||||||
|
PROFILE ?= remote
|
||||||
|
PYTHON ?= python3
|
||||||
|
|
||||||
|
# Auto-detect container engine: prefer docker compose, fall back to podman
|
||||||
|
COMPOSE ?= $(shell \
|
||||||
|
command -v docker >/dev/null 2>&1 && docker compose version >/dev/null 2>&1 \
|
||||||
|
&& echo "docker compose" \
|
||||||
|
|| (command -v podman >/dev/null 2>&1 \
|
||||||
|
&& podman compose version >/dev/null 2>&1 \
|
||||||
|
&& echo "podman compose" \
|
||||||
|
|| echo "podman-compose"))
|
||||||
|
|
||||||
|
# GPU profiles require an overlay for NVIDIA device reservations.
|
||||||
|
# Docker uses deploy.resources (compose.gpu.yml); Podman uses CDI device specs (compose.podman-gpu.yml).
|
||||||
|
# Generate CDI spec for Podman first: sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
|
||||||
|
#
|
||||||
|
# NOTE: When explicit -f flags are used, Docker Compose does NOT auto-detect
|
||||||
|
# compose.override.yml. We must include it explicitly when present.
|
||||||
|
OVERRIDE_FILE := $(wildcard compose.override.yml)
|
||||||
|
COMPOSE_OVERRIDE := $(if $(OVERRIDE_FILE),-f compose.override.yml,)
|
||||||
|
DUAL_GPU_MODE ?= $(shell grep -m1 '^DUAL_GPU_MODE=' .env 2>/dev/null | cut -d= -f2 || echo ollama)
|
||||||
|
|
||||||
|
COMPOSE_FILES := -f compose.yml $(COMPOSE_OVERRIDE)
|
||||||
|
ifneq (,$(findstring podman,$(COMPOSE)))
|
||||||
|
ifneq (,$(findstring gpu,$(PROFILE)))
|
||||||
|
COMPOSE_FILES := -f compose.yml $(COMPOSE_OVERRIDE) -f compose.podman-gpu.yml
|
||||||
|
endif
|
||||||
|
else
|
||||||
|
ifneq (,$(findstring gpu,$(PROFILE)))
|
||||||
|
COMPOSE_FILES := -f compose.yml $(COMPOSE_OVERRIDE) -f compose.gpu.yml
|
||||||
|
endif
|
||||||
|
endif
|
||||||
|
ifeq ($(PROFILE),dual-gpu)
|
||||||
|
COMPOSE_FILES += --profile dual-gpu-$(DUAL_GPU_MODE)
|
||||||
|
endif
|
||||||
|
|
||||||
|
# 'remote' means base services only — no services are tagged 'remote' in compose.yml,
|
||||||
|
# so --profile remote is a no-op with Docker and a fatal error on old podman-compose.
|
||||||
|
# Only pass --profile for profiles that actually activate optional services.
|
||||||
|
PROFILE_ARG := $(if $(filter remote,$(PROFILE)),,--profile $(PROFILE))
|
||||||
|
|
||||||
|
setup: ## Install dependencies (Docker or Podman + NVIDIA toolkit)
|
||||||
|
@bash setup.sh
|
||||||
|
|
||||||
|
preflight: ## Check ports + system resources; write .env
|
||||||
|
@$(PYTHON) scripts/preflight.py
|
||||||
|
|
||||||
|
start: preflight ## Preflight check then start Peregrine (PROFILE=remote|cpu|single-gpu|dual-gpu)
|
||||||
|
$(COMPOSE) $(COMPOSE_FILES) $(PROFILE_ARG) up -d
|
||||||
|
|
||||||
|
stop: ## Stop all Peregrine services
|
||||||
|
$(COMPOSE) down
|
||||||
|
|
||||||
|
restart: ## Stop services, re-run preflight (ports now free), then start
|
||||||
|
$(COMPOSE) down
|
||||||
|
@$(PYTHON) scripts/preflight.py
|
||||||
|
$(COMPOSE) $(COMPOSE_FILES) $(PROFILE_ARG) up -d
|
||||||
|
|
||||||
|
logs: ## Tail app logs
|
||||||
|
$(COMPOSE) logs -f app
|
||||||
|
|
||||||
|
test: ## Run the test suite
|
||||||
|
@$(PYTHON) -m pytest tests/ -v
|
||||||
|
|
||||||
|
prepare-training: ## Scan docs_dir for cover letters and build training JSONL
|
||||||
|
$(COMPOSE) $(COMPOSE_FILES) run --rm app python scripts/prepare_training_data.py
|
||||||
|
|
||||||
|
finetune: ## Fine-tune your personal cover letter model (run prepare-training first)
|
||||||
|
@echo "Starting fine-tune (30-90 min on GPU, much longer on CPU)..."
|
||||||
|
$(COMPOSE) $(COMPOSE_FILES) -f compose.gpu.yml --profile finetune run --rm finetune
|
||||||
|
|
||||||
|
clean: ## Remove containers, images, and data volumes (DESTRUCTIVE)
|
||||||
|
@echo "WARNING: This will delete all Peregrine containers and data."
|
||||||
|
@read -p "Type 'yes' to confirm: " confirm && [ "$$confirm" = "yes" ]
|
||||||
|
$(COMPOSE) down --rmi local --volumes
|
||||||
|
|
||||||
|
help: ## Show this help
|
||||||
|
@grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | \
|
||||||
|
awk 'BEGIN {FS = ":.*?## "}; {printf " \033[36m%-12s\033[0m %s\n", $$1, $$2}'
|
||||||
7
PRIVACY.md
Normal file
7
PRIVACY.md
Normal file
|
|
@ -0,0 +1,7 @@
|
||||||
|
# Privacy Policy
|
||||||
|
|
||||||
|
CircuitForge LLC's privacy policy applies to this product and is published at:
|
||||||
|
|
||||||
|
**<https://circuitforge.tech/privacy>**
|
||||||
|
|
||||||
|
Last reviewed: March 2026.
|
||||||
206
README.md
Normal file
206
README.md
Normal file
|
|
@ -0,0 +1,206 @@
|
||||||
|
# Peregrine
|
||||||
|
|
||||||
|
> **Primary development** happens at [git.opensourcesolarpunk.com](https://git.opensourcesolarpunk.com/pyr0ball/peregrine) — GitHub and Codeberg are push mirrors. Issues and PRs are welcome on either platform.
|
||||||
|
|
||||||
|
[](./LICENSE-BSL)
|
||||||
|
[](https://github.com/CircuitForge/peregrine/actions/workflows/ci.yml)
|
||||||
|
|
||||||
|
**AI-powered job search pipeline — by [Circuit Forge LLC](https://circuitforge.tech)**
|
||||||
|
|
||||||
|
> *"Don't be evil, for real and forever."*
|
||||||
|
|
||||||
|
Automates the full job search lifecycle: discovery → matching → cover letters → applications → interview prep.
|
||||||
|
Privacy-first, local-first. Your data never leaves your machine.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
**1. Clone and install dependencies** (Docker, NVIDIA toolkit if needed):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git clone https://git.opensourcesolarpunk.com/pyr0ball/peregrine
|
||||||
|
cd peregrine
|
||||||
|
./manage.sh setup
|
||||||
|
```
|
||||||
|
|
||||||
|
**2. Start Peregrine:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./manage.sh start # remote profile (API-only, no GPU)
|
||||||
|
./manage.sh start --profile cpu # local Ollama (CPU, or Metal GPU on Apple Silicon — see below)
|
||||||
|
./manage.sh start --profile single-gpu # Ollama + Vision on GPU 0 (NVIDIA only)
|
||||||
|
./manage.sh start --profile dual-gpu # Ollama + Vision + vLLM (GPU 0 + 1) (NVIDIA only)
|
||||||
|
```
|
||||||
|
|
||||||
|
Or use `make` directly:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
make start # remote profile
|
||||||
|
make start PROFILE=single-gpu
|
||||||
|
```
|
||||||
|
|
||||||
|
**3.** Open http://localhost:8501 — the setup wizard guides you through the rest.
|
||||||
|
|
||||||
|
> **macOS / Apple Silicon:** Docker Desktop must be running. For Metal GPU-accelerated inference, install Ollama natively before starting — `setup.sh` will prompt you to do this. See [Apple Silicon GPU](#apple-silicon-gpu) below.
|
||||||
|
> **Windows:** Not supported — use WSL2 with Ubuntu.
|
||||||
|
|
||||||
|
### Installing to `/opt` or other system directories
|
||||||
|
|
||||||
|
If you clone into a root-owned directory (e.g. `sudo git clone ... /opt/peregrine`), two things need fixing:
|
||||||
|
|
||||||
|
**1. Git ownership warning** (`fatal: detected dubious ownership`) — `./manage.sh setup` fixes this automatically. If you need git to work *before* running setup:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git config --global --add safe.directory /opt/peregrine
|
||||||
|
```
|
||||||
|
|
||||||
|
**2. Preflight write access** — preflight writes `.env` and `compose.override.yml` into the repo directory. Fix ownership once:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo chown -R $USER:$USER /opt/peregrine
|
||||||
|
```
|
||||||
|
|
||||||
|
After that, run everything without `sudo`.
|
||||||
|
|
||||||
|
### Podman
|
||||||
|
|
||||||
|
Podman is rootless by default — **no `sudo` needed.** `./manage.sh setup` will configure `podman-compose` if it isn't already present.
|
||||||
|
|
||||||
|
### Docker
|
||||||
|
|
||||||
|
After `./manage.sh setup`, log out and back in for docker group membership to take effect. Until then, prefix commands with `sudo`. After re-login, `sudo` is no longer required.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Inference Profiles
|
||||||
|
|
||||||
|
| Profile | Services started | Use case |
|
||||||
|
|---------|-----------------|----------|
|
||||||
|
| `remote` | app + searxng | No GPU; LLM calls go to Anthropic / OpenAI |
|
||||||
|
| `cpu` | app + ollama + searxng | No GPU; local models on CPU. On Apple Silicon, use with native Ollama for Metal acceleration — see below. |
|
||||||
|
| `single-gpu` | app + ollama + vision + searxng | One **NVIDIA** GPU: cover letters, research, vision |
|
||||||
|
| `dual-gpu` | app + ollama + vllm + vision + searxng | Two **NVIDIA** GPUs: GPU 0 = Ollama, GPU 1 = vLLM |
|
||||||
|
|
||||||
|
### Apple Silicon GPU
|
||||||
|
|
||||||
|
Docker Desktop on macOS runs in a Linux VM — it cannot access the Apple GPU. Metal-accelerated inference requires Ollama to run **natively** on the host.
|
||||||
|
|
||||||
|
`setup.sh` handles this automatically: it offers to install Ollama via Homebrew, starts it as a background service, and explains what happens next. If Ollama is running on port 11434 when you start Peregrine, preflight detects it, stubs out the Docker Ollama container, and routes inference through the native process — which uses Metal automatically.
|
||||||
|
|
||||||
|
To do it manually:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
brew install ollama
|
||||||
|
brew services start ollama # starts at login, uses Metal GPU
|
||||||
|
./manage.sh start --profile cpu # preflight adopts native Ollama; Docker container is skipped
|
||||||
|
```
|
||||||
|
|
||||||
|
The `cpu` profile label is a slight misnomer in this context — Ollama will be running on the GPU. `single-gpu` and `dual-gpu` profiles are NVIDIA-specific and not applicable on Mac.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## First-Run Wizard
|
||||||
|
|
||||||
|
On first launch the setup wizard walks through seven steps:
|
||||||
|
|
||||||
|
1. **Hardware** — detects NVIDIA GPUs (Linux) or Apple Silicon GPU (macOS) and recommends a profile
|
||||||
|
2. **Tier** — choose free, paid, or premium (or use `dev_tier_override` for local testing)
|
||||||
|
3. **Identity** — name, email, phone, LinkedIn, career summary
|
||||||
|
4. **Resume** — upload a PDF/DOCX for LLM parsing, or use the guided form builder
|
||||||
|
5. **Inference** — configure LLM backends and API keys
|
||||||
|
6. **Search** — job titles, locations, boards, keywords, blocklist
|
||||||
|
7. **Integrations** — optional cloud storage, calendar, and notification services
|
||||||
|
|
||||||
|
Wizard state is saved after each step — a crash or browser close resumes where you left off.
|
||||||
|
Re-enter the wizard any time via **Settings → Developer → Reset wizard**.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Features
|
||||||
|
|
||||||
|
| Feature | Tier |
|
||||||
|
|---------|------|
|
||||||
|
| Job discovery (JobSpy + custom boards) | Free |
|
||||||
|
| Resume keyword matching & gap analysis | Free |
|
||||||
|
| Document storage sync (Google Drive, Dropbox, OneDrive, MEGA, Nextcloud) | Free |
|
||||||
|
| Webhook notifications (Discord, Home Assistant) | Free |
|
||||||
|
| **Cover letter generation** | Free with LLM¹ |
|
||||||
|
| **Company research briefs** | Free with LLM¹ |
|
||||||
|
| **Interview prep & practice Q&A** | Free with LLM¹ |
|
||||||
|
| **Survey assistant** (culture-fit Q&A, screenshot analysis) | Free with LLM¹ |
|
||||||
|
| **AI wizard helpers** (career summary, bullet expansion, skill suggestions) | Free with LLM¹ |
|
||||||
|
| Managed cloud LLM (no API key needed) | Paid |
|
||||||
|
| Email sync & auto-classification | Paid |
|
||||||
|
| Job tracking integrations (Notion, Airtable, Google Sheets) | Paid |
|
||||||
|
| Calendar sync (Google, Apple) | Paid |
|
||||||
|
| Slack notifications | Paid |
|
||||||
|
| CircuitForge shared cover-letter model | Paid |
|
||||||
|
| Cover letter model fine-tuning (your writing, your model) | Premium |
|
||||||
|
| Multi-user support | Premium |
|
||||||
|
|
||||||
|
¹ **BYOK unlock:** configure any LLM backend — a local [Ollama](https://ollama.com) or vLLM instance,
|
||||||
|
or your own API key (Anthropic, OpenAI-compatible) — and all AI features marked **Free with LLM**
|
||||||
|
unlock at no charge. The paid tier earns its price by providing managed cloud inference so you
|
||||||
|
don't need a key at all, plus integrations and email sync.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Email Sync
|
||||||
|
|
||||||
|
Monitors your inbox for job-related emails and automatically updates job stages (interview requests, rejections, survey links, offers).
|
||||||
|
|
||||||
|
Configure in **Settings → Email**. Requires IMAP access and, for Gmail, an App Password.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Integrations
|
||||||
|
|
||||||
|
Connect external services in **Settings → Integrations**:
|
||||||
|
|
||||||
|
- **Job tracking:** Notion, Airtable, Google Sheets
|
||||||
|
- **Document storage:** Google Drive, Dropbox, OneDrive, MEGA, Nextcloud
|
||||||
|
- **Calendar:** Google Calendar, Apple Calendar (CalDAV)
|
||||||
|
- **Notifications:** Slack, Discord (webhook), Home Assistant
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## CLI Reference (`manage.sh`)
|
||||||
|
|
||||||
|
`manage.sh` is the single entry point for all common operations — no need to remember Make targets or Docker commands.
|
||||||
|
|
||||||
|
```
|
||||||
|
./manage.sh setup Install Docker/Podman + NVIDIA toolkit
|
||||||
|
./manage.sh start [--profile P] Preflight check then start services
|
||||||
|
./manage.sh stop Stop all services
|
||||||
|
./manage.sh restart Restart all services
|
||||||
|
./manage.sh status Show running containers
|
||||||
|
./manage.sh logs [service] Tail logs (default: app)
|
||||||
|
./manage.sh update Pull latest images + rebuild app container
|
||||||
|
./manage.sh preflight Check ports + resources; write .env
|
||||||
|
./manage.sh test Run test suite
|
||||||
|
./manage.sh prepare-training Scan docs for cover letters → training JSONL
|
||||||
|
./manage.sh finetune Run LoRA fine-tune (needs --profile single-gpu+)
|
||||||
|
./manage.sh open Open the web UI in your browser
|
||||||
|
./manage.sh clean Remove containers, images, volumes (asks to confirm)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Developer Docs
|
||||||
|
|
||||||
|
Full documentation at: https://docs.circuitforge.tech/peregrine
|
||||||
|
|
||||||
|
- [Installation guide](https://docs.circuitforge.tech/peregrine/getting-started/installation/)
|
||||||
|
- [Adding a custom job board scraper](https://docs.circuitforge.tech/peregrine/developer-guide/adding-scrapers/)
|
||||||
|
- [Adding an integration](https://docs.circuitforge.tech/peregrine/developer-guide/adding-integrations/)
|
||||||
|
- [Contributing](https://docs.circuitforge.tech/peregrine/developer-guide/contributing/)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
Core discovery pipeline: [MIT](LICENSE-MIT)
|
||||||
|
AI features (cover letter generation, company research, interview prep, UI): [BSL 1.1](LICENSE-BSL)
|
||||||
|
|
||||||
|
© 2026 Circuit Forge LLC
|
||||||
26
SECURITY.md
Normal file
26
SECURITY.md
Normal file
|
|
@ -0,0 +1,26 @@
|
||||||
|
# Security Policy
|
||||||
|
|
||||||
|
## Reporting a Vulnerability
|
||||||
|
|
||||||
|
**Do not open a GitHub or Codeberg issue for security vulnerabilities.**
|
||||||
|
|
||||||
|
Email: `security@circuitforge.tech`
|
||||||
|
|
||||||
|
Include:
|
||||||
|
- A description of the vulnerability
|
||||||
|
- Steps to reproduce
|
||||||
|
- Potential impact
|
||||||
|
- Any suggested fix (optional)
|
||||||
|
|
||||||
|
**Response target:** 72 hours for acknowledgement, 14 days for triage.
|
||||||
|
|
||||||
|
We follow responsible disclosure — we will coordinate a fix and release before any
|
||||||
|
public disclosure and will credit you in the release notes unless you prefer to remain
|
||||||
|
anonymous.
|
||||||
|
|
||||||
|
## Supported Versions
|
||||||
|
|
||||||
|
| Version | Supported |
|
||||||
|
|---------|-----------|
|
||||||
|
| Latest release | ✅ |
|
||||||
|
| Older releases | ❌ — please upgrade |
|
||||||
156
app/Home.py
156
app/Home.py
|
|
@ -8,15 +8,81 @@ import sys
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
import streamlit as st
|
import streamlit as st
|
||||||
|
import yaml
|
||||||
|
|
||||||
sys.path.insert(0, str(Path(__file__).parent.parent))
|
sys.path.insert(0, str(Path(__file__).parent.parent))
|
||||||
|
|
||||||
from scripts.db import DEFAULT_DB, init_db, get_job_counts, purge_jobs, purge_email_data, \
|
from scripts.user_profile import UserProfile
|
||||||
|
|
||||||
|
_USER_YAML = Path(__file__).parent.parent / "config" / "user.yaml"
|
||||||
|
_profile = UserProfile(_USER_YAML) if UserProfile.exists(_USER_YAML) else None
|
||||||
|
_name = _profile.name if _profile else "Job Seeker"
|
||||||
|
|
||||||
|
from scripts.db import init_db, get_job_counts, purge_jobs, purge_email_data, \
|
||||||
purge_non_remote, archive_jobs, kill_stuck_tasks, get_task_for_job, get_active_tasks, \
|
purge_non_remote, archive_jobs, kill_stuck_tasks, get_task_for_job, get_active_tasks, \
|
||||||
insert_job, get_existing_urls
|
insert_job, get_existing_urls
|
||||||
from scripts.task_runner import submit_task
|
from scripts.task_runner import submit_task
|
||||||
|
from app.cloud_session import resolve_session, get_db_path
|
||||||
|
|
||||||
init_db(DEFAULT_DB)
|
resolve_session("peregrine")
|
||||||
|
init_db(get_db_path())
|
||||||
|
|
||||||
|
def _email_configured() -> bool:
|
||||||
|
_e = Path(__file__).parent.parent / "config" / "email.yaml"
|
||||||
|
if not _e.exists():
|
||||||
|
return False
|
||||||
|
import yaml as _yaml
|
||||||
|
_cfg = _yaml.safe_load(_e.read_text()) or {}
|
||||||
|
return bool(_cfg.get("username") or _cfg.get("user") or _cfg.get("imap_host"))
|
||||||
|
|
||||||
|
def _notion_configured() -> bool:
|
||||||
|
_n = Path(__file__).parent.parent / "config" / "notion.yaml"
|
||||||
|
if not _n.exists():
|
||||||
|
return False
|
||||||
|
import yaml as _yaml
|
||||||
|
_cfg = _yaml.safe_load(_n.read_text()) or {}
|
||||||
|
return bool(_cfg.get("token"))
|
||||||
|
|
||||||
|
def _keywords_configured() -> bool:
|
||||||
|
_k = Path(__file__).parent.parent / "config" / "resume_keywords.yaml"
|
||||||
|
if not _k.exists():
|
||||||
|
return False
|
||||||
|
import yaml as _yaml
|
||||||
|
_cfg = _yaml.safe_load(_k.read_text()) or {}
|
||||||
|
return bool(_cfg.get("keywords") or _cfg.get("required") or _cfg.get("preferred"))
|
||||||
|
|
||||||
|
_SETUP_BANNERS = [
|
||||||
|
{"key": "connect_cloud", "text": "Connect a cloud service for resume/cover letter storage",
|
||||||
|
"link_label": "Settings → Integrations",
|
||||||
|
"done": _notion_configured},
|
||||||
|
{"key": "setup_email", "text": "Set up email sync to catch recruiter outreach",
|
||||||
|
"link_label": "Settings → Email",
|
||||||
|
"done": _email_configured},
|
||||||
|
{"key": "setup_email_labels", "text": "Set up email label filters for auto-classification",
|
||||||
|
"link_label": "Settings → Email (label guide)",
|
||||||
|
"done": _email_configured},
|
||||||
|
{"key": "tune_mission", "text": "Tune your mission preferences for better cover letters",
|
||||||
|
"link_label": "Settings → My Profile"},
|
||||||
|
{"key": "configure_keywords", "text": "Configure keywords and blocklist for smarter search",
|
||||||
|
"link_label": "Settings → Search",
|
||||||
|
"done": _keywords_configured},
|
||||||
|
{"key": "upload_corpus", "text": "Upload your cover letter corpus for voice fine-tuning",
|
||||||
|
"link_label": "Settings → Fine-Tune"},
|
||||||
|
{"key": "configure_linkedin", "text": "Configure LinkedIn Easy Apply automation",
|
||||||
|
"link_label": "Settings → Integrations"},
|
||||||
|
{"key": "setup_searxng", "text": "Set up company research with SearXNG",
|
||||||
|
"link_label": "Settings → Services"},
|
||||||
|
{"key": "target_companies", "text": "Build a target company list for focused outreach",
|
||||||
|
"link_label": "Settings → Search"},
|
||||||
|
{"key": "setup_notifications", "text": "Set up notifications for stage changes",
|
||||||
|
"link_label": "Settings → Integrations"},
|
||||||
|
{"key": "tune_model", "text": "Tune a custom cover letter model on your writing",
|
||||||
|
"link_label": "Settings → Fine-Tune"},
|
||||||
|
{"key": "review_training", "text": "Review and curate training data for model tuning",
|
||||||
|
"link_label": "Settings → Fine-Tune"},
|
||||||
|
{"key": "setup_calendar", "text": "Set up calendar sync to track interview dates",
|
||||||
|
"link_label": "Settings → Integrations"},
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
def _dismissible(key: str, status: str, msg: str) -> None:
|
def _dismissible(key: str, status: str, msg: str) -> None:
|
||||||
|
|
@ -64,7 +130,7 @@ def _queue_url_imports(db_path: Path, urls: list) -> int:
|
||||||
return queued
|
return queued
|
||||||
|
|
||||||
|
|
||||||
st.title("🔍 Alex's Job Search")
|
st.title(f"🔍 {_name}'s Job Search")
|
||||||
st.caption("Discover → Review → Sync to Notion")
|
st.caption("Discover → Review → Sync to Notion")
|
||||||
|
|
||||||
st.divider()
|
st.divider()
|
||||||
|
|
@ -72,7 +138,7 @@ st.divider()
|
||||||
|
|
||||||
@st.fragment(run_every=10)
|
@st.fragment(run_every=10)
|
||||||
def _live_counts():
|
def _live_counts():
|
||||||
counts = get_job_counts(DEFAULT_DB)
|
counts = get_job_counts(get_db_path())
|
||||||
col1, col2, col3, col4, col5 = st.columns(5)
|
col1, col2, col3, col4, col5 = st.columns(5)
|
||||||
col1.metric("Pending Review", counts.get("pending", 0))
|
col1.metric("Pending Review", counts.get("pending", 0))
|
||||||
col2.metric("Approved", counts.get("approved", 0))
|
col2.metric("Approved", counts.get("approved", 0))
|
||||||
|
|
@ -91,18 +157,18 @@ with left:
|
||||||
st.subheader("Find New Jobs")
|
st.subheader("Find New Jobs")
|
||||||
st.caption("Scrapes all configured boards and adds new listings to your review queue.")
|
st.caption("Scrapes all configured boards and adds new listings to your review queue.")
|
||||||
|
|
||||||
_disc_task = get_task_for_job(DEFAULT_DB, "discovery", 0)
|
_disc_task = get_task_for_job(get_db_path(), "discovery", 0)
|
||||||
_disc_running = _disc_task and _disc_task["status"] in ("queued", "running")
|
_disc_running = _disc_task and _disc_task["status"] in ("queued", "running")
|
||||||
|
|
||||||
if st.button("🚀 Run Discovery", use_container_width=True, type="primary",
|
if st.button("🚀 Run Discovery", use_container_width=True, type="primary",
|
||||||
disabled=bool(_disc_running)):
|
disabled=bool(_disc_running)):
|
||||||
submit_task(DEFAULT_DB, "discovery", 0)
|
submit_task(get_db_path(), "discovery", 0)
|
||||||
st.rerun()
|
st.rerun()
|
||||||
|
|
||||||
if _disc_running:
|
if _disc_running:
|
||||||
@st.fragment(run_every=4)
|
@st.fragment(run_every=4)
|
||||||
def _disc_status():
|
def _disc_status():
|
||||||
t = get_task_for_job(DEFAULT_DB, "discovery", 0)
|
t = get_task_for_job(get_db_path(), "discovery", 0)
|
||||||
if t and t["status"] in ("queued", "running"):
|
if t and t["status"] in ("queued", "running"):
|
||||||
lbl = "Queued…" if t["status"] == "queued" else "Scraping job boards… this may take a minute"
|
lbl = "Queued…" if t["status"] == "queued" else "Scraping job boards… this may take a minute"
|
||||||
st.info(f"⏳ {lbl}")
|
st.info(f"⏳ {lbl}")
|
||||||
|
|
@ -120,18 +186,18 @@ with enrich_col:
|
||||||
st.subheader("Enrich Descriptions")
|
st.subheader("Enrich Descriptions")
|
||||||
st.caption("Re-fetch missing descriptions for any listing (LinkedIn, Indeed, Glassdoor, Adzuna, The Ladders, generic).")
|
st.caption("Re-fetch missing descriptions for any listing (LinkedIn, Indeed, Glassdoor, Adzuna, The Ladders, generic).")
|
||||||
|
|
||||||
_enrich_task = get_task_for_job(DEFAULT_DB, "enrich_descriptions", 0)
|
_enrich_task = get_task_for_job(get_db_path(), "enrich_descriptions", 0)
|
||||||
_enrich_running = _enrich_task and _enrich_task["status"] in ("queued", "running")
|
_enrich_running = _enrich_task and _enrich_task["status"] in ("queued", "running")
|
||||||
|
|
||||||
if st.button("🔍 Fill Missing Descriptions", use_container_width=True, type="primary",
|
if st.button("🔍 Fill Missing Descriptions", use_container_width=True, type="primary",
|
||||||
disabled=bool(_enrich_running)):
|
disabled=bool(_enrich_running)):
|
||||||
submit_task(DEFAULT_DB, "enrich_descriptions", 0)
|
submit_task(get_db_path(), "enrich_descriptions", 0)
|
||||||
st.rerun()
|
st.rerun()
|
||||||
|
|
||||||
if _enrich_running:
|
if _enrich_running:
|
||||||
@st.fragment(run_every=4)
|
@st.fragment(run_every=4)
|
||||||
def _enrich_status():
|
def _enrich_status():
|
||||||
t = get_task_for_job(DEFAULT_DB, "enrich_descriptions", 0)
|
t = get_task_for_job(get_db_path(), "enrich_descriptions", 0)
|
||||||
if t and t["status"] in ("queued", "running"):
|
if t and t["status"] in ("queued", "running"):
|
||||||
st.info("⏳ Fetching descriptions…")
|
st.info("⏳ Fetching descriptions…")
|
||||||
else:
|
else:
|
||||||
|
|
@ -146,10 +212,10 @@ with enrich_col:
|
||||||
|
|
||||||
with mid:
|
with mid:
|
||||||
unscored = sum(1 for j in __import__("scripts.db", fromlist=["get_jobs_by_status"])
|
unscored = sum(1 for j in __import__("scripts.db", fromlist=["get_jobs_by_status"])
|
||||||
.get_jobs_by_status(DEFAULT_DB, "pending")
|
.get_jobs_by_status(get_db_path(), "pending")
|
||||||
if j.get("match_score") is None and j.get("description"))
|
if j.get("match_score") is None and j.get("description"))
|
||||||
st.subheader("Score Listings")
|
st.subheader("Score Listings")
|
||||||
st.caption(f"Run TF-IDF match scoring against Alex's resume. {unscored} pending job{'s' if unscored != 1 else ''} unscored.")
|
st.caption(f"Run TF-IDF match scoring against {_name}'s resume. {unscored} pending job{'s' if unscored != 1 else ''} unscored.")
|
||||||
if st.button("📊 Score All Unscored Jobs", use_container_width=True, type="primary",
|
if st.button("📊 Score All Unscored Jobs", use_container_width=True, type="primary",
|
||||||
disabled=unscored == 0):
|
disabled=unscored == 0):
|
||||||
with st.spinner("Scoring…"):
|
with st.spinner("Scoring…"):
|
||||||
|
|
@ -167,7 +233,7 @@ with mid:
|
||||||
st.rerun()
|
st.rerun()
|
||||||
|
|
||||||
with right:
|
with right:
|
||||||
approved_count = get_job_counts(DEFAULT_DB).get("approved", 0)
|
approved_count = get_job_counts(get_db_path()).get("approved", 0)
|
||||||
st.subheader("Send to Notion")
|
st.subheader("Send to Notion")
|
||||||
st.caption("Push all approved jobs to your Notion tracking database.")
|
st.caption("Push all approved jobs to your Notion tracking database.")
|
||||||
if approved_count == 0:
|
if approved_count == 0:
|
||||||
|
|
@ -179,7 +245,7 @@ with right:
|
||||||
):
|
):
|
||||||
with st.spinner("Syncing to Notion…"):
|
with st.spinner("Syncing to Notion…"):
|
||||||
from scripts.sync import sync_to_notion
|
from scripts.sync import sync_to_notion
|
||||||
count = sync_to_notion(DEFAULT_DB)
|
count = sync_to_notion(get_db_path())
|
||||||
st.success(f"Synced {count} job{'s' if count != 1 else ''} to Notion!")
|
st.success(f"Synced {count} job{'s' if count != 1 else ''} to Notion!")
|
||||||
st.rerun()
|
st.rerun()
|
||||||
|
|
||||||
|
|
@ -194,18 +260,18 @@ with email_left:
|
||||||
"New recruiter outreach is added to your Job Review queue.")
|
"New recruiter outreach is added to your Job Review queue.")
|
||||||
|
|
||||||
with email_right:
|
with email_right:
|
||||||
_email_task = get_task_for_job(DEFAULT_DB, "email_sync", 0)
|
_email_task = get_task_for_job(get_db_path(), "email_sync", 0)
|
||||||
_email_running = _email_task and _email_task["status"] in ("queued", "running")
|
_email_running = _email_task and _email_task["status"] in ("queued", "running")
|
||||||
|
|
||||||
if st.button("📧 Sync Emails", use_container_width=True, type="primary",
|
if st.button("📧 Sync Emails", use_container_width=True, type="primary",
|
||||||
disabled=bool(_email_running)):
|
disabled=bool(_email_running)):
|
||||||
submit_task(DEFAULT_DB, "email_sync", 0)
|
submit_task(get_db_path(), "email_sync", 0)
|
||||||
st.rerun()
|
st.rerun()
|
||||||
|
|
||||||
if _email_running:
|
if _email_running:
|
||||||
@st.fragment(run_every=4)
|
@st.fragment(run_every=4)
|
||||||
def _email_status():
|
def _email_status():
|
||||||
t = get_task_for_job(DEFAULT_DB, "email_sync", 0)
|
t = get_task_for_job(get_db_path(), "email_sync", 0)
|
||||||
if t and t["status"] in ("queued", "running"):
|
if t and t["status"] in ("queued", "running"):
|
||||||
st.info("⏳ Syncing emails…")
|
st.info("⏳ Syncing emails…")
|
||||||
else:
|
else:
|
||||||
|
|
@ -240,7 +306,7 @@ with url_tab:
|
||||||
disabled=not (url_text or "").strip()):
|
disabled=not (url_text or "").strip()):
|
||||||
_urls = [u.strip() for u in url_text.strip().splitlines() if u.strip().startswith("http")]
|
_urls = [u.strip() for u in url_text.strip().splitlines() if u.strip().startswith("http")]
|
||||||
if _urls:
|
if _urls:
|
||||||
_n = _queue_url_imports(DEFAULT_DB, _urls)
|
_n = _queue_url_imports(get_db_path(), _urls)
|
||||||
if _n:
|
if _n:
|
||||||
st.success(f"Queued {_n} job{'s' if _n != 1 else ''} for import. Check Job Review shortly.")
|
st.success(f"Queued {_n} job{'s' if _n != 1 else ''} for import. Check Job Review shortly.")
|
||||||
else:
|
else:
|
||||||
|
|
@ -263,7 +329,7 @@ with csv_tab:
|
||||||
if _csv_urls:
|
if _csv_urls:
|
||||||
st.caption(f"Found {len(_csv_urls)} URL(s) in CSV.")
|
st.caption(f"Found {len(_csv_urls)} URL(s) in CSV.")
|
||||||
if st.button("📥 Import CSV Jobs", key="add_csv_btn", use_container_width=True):
|
if st.button("📥 Import CSV Jobs", key="add_csv_btn", use_container_width=True):
|
||||||
_n = _queue_url_imports(DEFAULT_DB, _csv_urls)
|
_n = _queue_url_imports(get_db_path(),_csv_urls)
|
||||||
st.success(f"Queued {_n} job{'s' if _n != 1 else ''} for import.")
|
st.success(f"Queued {_n} job{'s' if _n != 1 else ''} for import.")
|
||||||
st.rerun()
|
st.rerun()
|
||||||
else:
|
else:
|
||||||
|
|
@ -273,7 +339,7 @@ with csv_tab:
|
||||||
@st.fragment(run_every=3)
|
@st.fragment(run_every=3)
|
||||||
def _scrape_status():
|
def _scrape_status():
|
||||||
import sqlite3 as _sq
|
import sqlite3 as _sq
|
||||||
conn = _sq.connect(DEFAULT_DB)
|
conn = _sq.connect(get_db_path())
|
||||||
conn.row_factory = _sq.Row
|
conn.row_factory = _sq.Row
|
||||||
rows = conn.execute(
|
rows = conn.execute(
|
||||||
"""SELECT bt.status, bt.error, j.title, j.company, j.url
|
"""SELECT bt.status, bt.error, j.title, j.company, j.url
|
||||||
|
|
@ -320,7 +386,7 @@ with st.expander("⚠️ Danger Zone", expanded=False):
|
||||||
st.warning("Are you sure? This cannot be undone.")
|
st.warning("Are you sure? This cannot be undone.")
|
||||||
c1, c2 = st.columns(2)
|
c1, c2 = st.columns(2)
|
||||||
if c1.button("Yes, purge", type="primary", use_container_width=True):
|
if c1.button("Yes, purge", type="primary", use_container_width=True):
|
||||||
deleted = purge_jobs(DEFAULT_DB, statuses=["pending", "rejected"])
|
deleted = purge_jobs(get_db_path(), statuses=["pending", "rejected"])
|
||||||
st.success(f"Purged {deleted} jobs.")
|
st.success(f"Purged {deleted} jobs.")
|
||||||
st.session_state.pop("confirm_purge", None)
|
st.session_state.pop("confirm_purge", None)
|
||||||
st.rerun()
|
st.rerun()
|
||||||
|
|
@ -338,7 +404,7 @@ with st.expander("⚠️ Danger Zone", expanded=False):
|
||||||
st.warning("This deletes all email contacts and email-sourced jobs. Cannot be undone.")
|
st.warning("This deletes all email contacts and email-sourced jobs. Cannot be undone.")
|
||||||
c1, c2 = st.columns(2)
|
c1, c2 = st.columns(2)
|
||||||
if c1.button("Yes, purge emails", type="primary", use_container_width=True):
|
if c1.button("Yes, purge emails", type="primary", use_container_width=True):
|
||||||
contacts, jobs = purge_email_data(DEFAULT_DB)
|
contacts, jobs = purge_email_data(get_db_path())
|
||||||
st.success(f"Purged {contacts} email contacts, {jobs} email jobs.")
|
st.success(f"Purged {contacts} email contacts, {jobs} email jobs.")
|
||||||
st.session_state.pop("confirm_purge", None)
|
st.session_state.pop("confirm_purge", None)
|
||||||
st.rerun()
|
st.rerun()
|
||||||
|
|
@ -347,11 +413,11 @@ with st.expander("⚠️ Danger Zone", expanded=False):
|
||||||
st.rerun()
|
st.rerun()
|
||||||
|
|
||||||
with tasks_col:
|
with tasks_col:
|
||||||
_active = get_active_tasks(DEFAULT_DB)
|
_active = get_active_tasks(get_db_path())
|
||||||
st.markdown("**Kill stuck tasks**")
|
st.markdown("**Kill stuck tasks**")
|
||||||
st.caption(f"Force-fail all queued/running background tasks. Currently **{len(_active)}** active.")
|
st.caption(f"Force-fail all queued/running background tasks. Currently **{len(_active)}** active.")
|
||||||
if st.button("⏹ Kill All Tasks", use_container_width=True, disabled=len(_active) == 0):
|
if st.button("⏹ Kill All Tasks", use_container_width=True, disabled=len(_active) == 0):
|
||||||
killed = kill_stuck_tasks(DEFAULT_DB)
|
killed = kill_stuck_tasks(get_db_path())
|
||||||
st.success(f"Killed {killed} task(s).")
|
st.success(f"Killed {killed} task(s).")
|
||||||
st.rerun()
|
st.rerun()
|
||||||
|
|
||||||
|
|
@ -365,8 +431,8 @@ with st.expander("⚠️ Danger Zone", expanded=False):
|
||||||
st.warning("This will delete ALL pending, approved, and rejected jobs, then re-scrape. Applied and synced records are kept.")
|
st.warning("This will delete ALL pending, approved, and rejected jobs, then re-scrape. Applied and synced records are kept.")
|
||||||
c1, c2 = st.columns(2)
|
c1, c2 = st.columns(2)
|
||||||
if c1.button("Yes, wipe + scrape", type="primary", use_container_width=True):
|
if c1.button("Yes, wipe + scrape", type="primary", use_container_width=True):
|
||||||
purge_jobs(DEFAULT_DB, statuses=["pending", "approved", "rejected"])
|
purge_jobs(get_db_path(), statuses=["pending", "approved", "rejected"])
|
||||||
submit_task(DEFAULT_DB, "discovery", 0)
|
submit_task(get_db_path(), "discovery", 0)
|
||||||
st.session_state.pop("confirm_purge", None)
|
st.session_state.pop("confirm_purge", None)
|
||||||
st.rerun()
|
st.rerun()
|
||||||
if c2.button("Cancel ", use_container_width=True):
|
if c2.button("Cancel ", use_container_width=True):
|
||||||
|
|
@ -387,7 +453,7 @@ with st.expander("⚠️ Danger Zone", expanded=False):
|
||||||
st.warning("Deletes all pending jobs. Rejected jobs are kept. Cannot be undone.")
|
st.warning("Deletes all pending jobs. Rejected jobs are kept. Cannot be undone.")
|
||||||
c1, c2 = st.columns(2)
|
c1, c2 = st.columns(2)
|
||||||
if c1.button("Yes, purge pending", type="primary", use_container_width=True):
|
if c1.button("Yes, purge pending", type="primary", use_container_width=True):
|
||||||
deleted = purge_jobs(DEFAULT_DB, statuses=["pending"])
|
deleted = purge_jobs(get_db_path(), statuses=["pending"])
|
||||||
st.success(f"Purged {deleted} pending jobs.")
|
st.success(f"Purged {deleted} pending jobs.")
|
||||||
st.session_state.pop("confirm_purge", None)
|
st.session_state.pop("confirm_purge", None)
|
||||||
st.rerun()
|
st.rerun()
|
||||||
|
|
@ -405,7 +471,7 @@ with st.expander("⚠️ Danger Zone", expanded=False):
|
||||||
st.warning("Deletes all non-remote jobs not yet applied to. Cannot be undone.")
|
st.warning("Deletes all non-remote jobs not yet applied to. Cannot be undone.")
|
||||||
c1, c2 = st.columns(2)
|
c1, c2 = st.columns(2)
|
||||||
if c1.button("Yes, purge on-site", type="primary", use_container_width=True):
|
if c1.button("Yes, purge on-site", type="primary", use_container_width=True):
|
||||||
deleted = purge_non_remote(DEFAULT_DB)
|
deleted = purge_non_remote(get_db_path())
|
||||||
st.success(f"Purged {deleted} non-remote jobs.")
|
st.success(f"Purged {deleted} non-remote jobs.")
|
||||||
st.session_state.pop("confirm_purge", None)
|
st.session_state.pop("confirm_purge", None)
|
||||||
st.rerun()
|
st.rerun()
|
||||||
|
|
@ -423,7 +489,7 @@ with st.expander("⚠️ Danger Zone", expanded=False):
|
||||||
st.warning("Deletes all approved-but-not-applied jobs. Cannot be undone.")
|
st.warning("Deletes all approved-but-not-applied jobs. Cannot be undone.")
|
||||||
c1, c2 = st.columns(2)
|
c1, c2 = st.columns(2)
|
||||||
if c1.button("Yes, purge approved", type="primary", use_container_width=True):
|
if c1.button("Yes, purge approved", type="primary", use_container_width=True):
|
||||||
deleted = purge_jobs(DEFAULT_DB, statuses=["approved"])
|
deleted = purge_jobs(get_db_path(), statuses=["approved"])
|
||||||
st.success(f"Purged {deleted} approved jobs.")
|
st.success(f"Purged {deleted} approved jobs.")
|
||||||
st.session_state.pop("confirm_purge", None)
|
st.session_state.pop("confirm_purge", None)
|
||||||
st.rerun()
|
st.rerun()
|
||||||
|
|
@ -448,7 +514,7 @@ with st.expander("⚠️ Danger Zone", expanded=False):
|
||||||
st.info("Jobs will be archived (not deleted) — URLs are kept for dedup.")
|
st.info("Jobs will be archived (not deleted) — URLs are kept for dedup.")
|
||||||
c1, c2 = st.columns(2)
|
c1, c2 = st.columns(2)
|
||||||
if c1.button("Yes, archive", type="primary", use_container_width=True):
|
if c1.button("Yes, archive", type="primary", use_container_width=True):
|
||||||
archived = archive_jobs(DEFAULT_DB, statuses=["pending", "rejected"])
|
archived = archive_jobs(get_db_path(), statuses=["pending", "rejected"])
|
||||||
st.success(f"Archived {archived} jobs.")
|
st.success(f"Archived {archived} jobs.")
|
||||||
st.session_state.pop("confirm_purge", None)
|
st.session_state.pop("confirm_purge", None)
|
||||||
st.rerun()
|
st.rerun()
|
||||||
|
|
@ -466,10 +532,38 @@ with st.expander("⚠️ Danger Zone", expanded=False):
|
||||||
st.info("Approved jobs will be archived (not deleted).")
|
st.info("Approved jobs will be archived (not deleted).")
|
||||||
c1, c2 = st.columns(2)
|
c1, c2 = st.columns(2)
|
||||||
if c1.button("Yes, archive approved", type="primary", use_container_width=True):
|
if c1.button("Yes, archive approved", type="primary", use_container_width=True):
|
||||||
archived = archive_jobs(DEFAULT_DB, statuses=["approved"])
|
archived = archive_jobs(get_db_path(), statuses=["approved"])
|
||||||
st.success(f"Archived {archived} approved jobs.")
|
st.success(f"Archived {archived} approved jobs.")
|
||||||
st.session_state.pop("confirm_purge", None)
|
st.session_state.pop("confirm_purge", None)
|
||||||
st.rerun()
|
st.rerun()
|
||||||
if c2.button("Cancel ", use_container_width=True):
|
if c2.button("Cancel ", use_container_width=True):
|
||||||
st.session_state.pop("confirm_purge", None)
|
st.session_state.pop("confirm_purge", None)
|
||||||
st.rerun()
|
st.rerun()
|
||||||
|
|
||||||
|
# ── Setup banners ─────────────────────────────────────────────────────────────
|
||||||
|
if _profile and _profile.wizard_complete:
|
||||||
|
_dismissed = set(_profile.dismissed_banners)
|
||||||
|
_pending_banners = [
|
||||||
|
b for b in _SETUP_BANNERS
|
||||||
|
if b["key"] not in _dismissed and not b.get("done", lambda: False)()
|
||||||
|
]
|
||||||
|
if _pending_banners:
|
||||||
|
st.divider()
|
||||||
|
st.markdown("#### Finish setting up Peregrine")
|
||||||
|
for banner in _pending_banners:
|
||||||
|
_bcol, _bdismiss = st.columns([10, 1])
|
||||||
|
with _bcol:
|
||||||
|
_ic, _lc = st.columns([3, 1])
|
||||||
|
_ic.info(f"💡 {banner['text']}")
|
||||||
|
with _lc:
|
||||||
|
st.write("")
|
||||||
|
st.page_link("pages/2_Settings.py", label=banner['link_label'], icon="⚙️")
|
||||||
|
with _bdismiss:
|
||||||
|
st.write("")
|
||||||
|
if st.button("✕", key=f"dismiss_banner_{banner['key']}", help="Dismiss"):
|
||||||
|
_data = yaml.safe_load(_USER_YAML.read_text()) if _USER_YAML.exists() else {}
|
||||||
|
_data.setdefault("dismissed_banners", [])
|
||||||
|
if banner["key"] not in _data["dismissed_banners"]:
|
||||||
|
_data["dismissed_banners"].append(banner["key"])
|
||||||
|
_USER_YAML.write_text(yaml.dump(_data, default_flow_style=False, allow_unicode=True))
|
||||||
|
st.rerun()
|
||||||
|
|
|
||||||
0
app/__init__.py
Normal file
0
app/__init__.py
Normal file
92
app/app.py
92
app/app.py
|
|
@ -7,22 +7,32 @@ a "System" section so it doesn't crowd the navigation.
|
||||||
Run: streamlit run app/app.py
|
Run: streamlit run app/app.py
|
||||||
bash scripts/manage-ui.sh start
|
bash scripts/manage-ui.sh start
|
||||||
"""
|
"""
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
import subprocess
|
||||||
import sys
|
import sys
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
sys.path.insert(0, str(Path(__file__).parent.parent))
|
sys.path.insert(0, str(Path(__file__).parent.parent))
|
||||||
|
|
||||||
|
logging.basicConfig(level=logging.WARNING, format="%(name)s %(levelname)s: %(message)s")
|
||||||
|
|
||||||
|
IS_DEMO = os.environ.get("DEMO_MODE", "").lower() in ("1", "true", "yes")
|
||||||
|
|
||||||
import streamlit as st
|
import streamlit as st
|
||||||
from scripts.db import DEFAULT_DB, init_db, get_active_tasks
|
from scripts.db import DEFAULT_DB, init_db, get_active_tasks
|
||||||
|
from app.feedback import inject_feedback_button
|
||||||
|
from app.cloud_session import resolve_session, get_db_path, get_config_dir
|
||||||
import sqlite3
|
import sqlite3
|
||||||
|
|
||||||
st.set_page_config(
|
st.set_page_config(
|
||||||
page_title="Job Seeker",
|
page_title="Peregrine",
|
||||||
page_icon="💼",
|
page_icon="💼",
|
||||||
layout="wide",
|
layout="wide",
|
||||||
)
|
)
|
||||||
|
|
||||||
init_db(DEFAULT_DB)
|
resolve_session("peregrine")
|
||||||
|
init_db(get_db_path())
|
||||||
|
|
||||||
# ── Startup cleanup — runs once per server process via cache_resource ──────────
|
# ── Startup cleanup — runs once per server process via cache_resource ──────────
|
||||||
@st.cache_resource
|
@st.cache_resource
|
||||||
|
|
@ -32,12 +42,12 @@ def _startup() -> None:
|
||||||
2. Auto-queues re-runs for any research generated without SearXNG data,
|
2. Auto-queues re-runs for any research generated without SearXNG data,
|
||||||
if SearXNG is now reachable.
|
if SearXNG is now reachable.
|
||||||
"""
|
"""
|
||||||
conn = sqlite3.connect(DEFAULT_DB)
|
# Reset only in-flight tasks — queued tasks survive for the scheduler to resume.
|
||||||
conn.execute(
|
# MUST run before any submit_task() call in this function.
|
||||||
"UPDATE background_tasks SET status='failed', error='Interrupted by server restart',"
|
from scripts.db import reset_running_tasks
|
||||||
" finished_at=datetime('now') WHERE status IN ('queued','running')"
|
reset_running_tasks(get_db_path())
|
||||||
)
|
|
||||||
conn.commit()
|
conn = sqlite3.connect(get_db_path())
|
||||||
|
|
||||||
# Auto-recovery: re-run LLM-only research when SearXNG is available
|
# Auto-recovery: re-run LLM-only research when SearXNG is available
|
||||||
try:
|
try:
|
||||||
|
|
@ -53,7 +63,7 @@ def _startup() -> None:
|
||||||
_ACTIVE_STAGES,
|
_ACTIVE_STAGES,
|
||||||
).fetchall()
|
).fetchall()
|
||||||
for (job_id,) in rows:
|
for (job_id,) in rows:
|
||||||
submit_task(str(DEFAULT_DB), "company_research", job_id)
|
submit_task(str(get_db_path()), "company_research", job_id)
|
||||||
except Exception:
|
except Exception:
|
||||||
pass # never block startup
|
pass # never block startup
|
||||||
|
|
||||||
|
|
@ -61,6 +71,26 @@ def _startup() -> None:
|
||||||
|
|
||||||
_startup()
|
_startup()
|
||||||
|
|
||||||
|
# Silent license refresh on startup — no-op if unreachable
|
||||||
|
try:
|
||||||
|
from scripts.license import refresh_if_needed as _refresh_license
|
||||||
|
_refresh_license()
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
|
# ── First-run wizard gate ───────────────────────────────────────────────────────
|
||||||
|
from scripts.user_profile import UserProfile as _UserProfile
|
||||||
|
_USER_YAML = get_config_dir() / "user.yaml"
|
||||||
|
|
||||||
|
_show_wizard = not IS_DEMO and (
|
||||||
|
not _UserProfile.exists(_USER_YAML)
|
||||||
|
or not _UserProfile(_USER_YAML).wizard_complete
|
||||||
|
)
|
||||||
|
if _show_wizard:
|
||||||
|
_setup_page = st.Page("pages/0_Setup.py", title="Setup", icon="👋")
|
||||||
|
st.navigation({"": [_setup_page]}).run()
|
||||||
|
st.stop()
|
||||||
|
|
||||||
# ── Navigation ─────────────────────────────────────────────────────────────────
|
# ── Navigation ─────────────────────────────────────────────────────────────────
|
||||||
# st.navigation() must be called before any sidebar writes so it can establish
|
# st.navigation() must be called before any sidebar writes so it can establish
|
||||||
# the navigation structure first; sidebar additions come after.
|
# the navigation structure first; sidebar additions come after.
|
||||||
|
|
@ -85,7 +115,7 @@ pg = st.navigation(pages)
|
||||||
# The sidebar context WRAPS the fragment call — do not write to st.sidebar inside it.
|
# The sidebar context WRAPS the fragment call — do not write to st.sidebar inside it.
|
||||||
@st.fragment(run_every=3)
|
@st.fragment(run_every=3)
|
||||||
def _task_indicator():
|
def _task_indicator():
|
||||||
tasks = get_active_tasks(DEFAULT_DB)
|
tasks = get_active_tasks(get_db_path())
|
||||||
if not tasks:
|
if not tasks:
|
||||||
return
|
return
|
||||||
st.divider()
|
st.divider()
|
||||||
|
|
@ -105,6 +135,8 @@ def _task_indicator():
|
||||||
label = "Enriching"
|
label = "Enriching"
|
||||||
elif task_type == "scrape_url":
|
elif task_type == "scrape_url":
|
||||||
label = "Scraping URL"
|
label = "Scraping URL"
|
||||||
|
elif task_type == "wizard_generate":
|
||||||
|
label = "Wizard generation"
|
||||||
elif task_type == "enrich_craigslist":
|
elif task_type == "enrich_craigslist":
|
||||||
label = "Enriching listing"
|
label = "Enriching listing"
|
||||||
else:
|
else:
|
||||||
|
|
@ -113,7 +145,47 @@ def _task_indicator():
|
||||||
detail = f" · {stage}" if stage else (f" — {t.get('company')}" if t.get("company") else "")
|
detail = f" · {stage}" if stage else (f" — {t.get('company')}" if t.get("company") else "")
|
||||||
st.caption(f"{icon} {label}{detail}")
|
st.caption(f"{icon} {label}{detail}")
|
||||||
|
|
||||||
|
@st.cache_resource
|
||||||
|
def _get_version() -> str:
|
||||||
|
try:
|
||||||
|
return subprocess.check_output(
|
||||||
|
["git", "describe", "--tags", "--always"],
|
||||||
|
cwd=Path(__file__).parent.parent,
|
||||||
|
text=True,
|
||||||
|
).strip()
|
||||||
|
except Exception:
|
||||||
|
return "dev"
|
||||||
|
|
||||||
with st.sidebar:
|
with st.sidebar:
|
||||||
|
if IS_DEMO:
|
||||||
|
st.info(
|
||||||
|
"**Public demo** — read-only sample data. "
|
||||||
|
"AI features and data saves are disabled.\n\n"
|
||||||
|
"[Get your own instance →](https://circuitforge.tech/software/peregrine)",
|
||||||
|
icon="🔒",
|
||||||
|
)
|
||||||
_task_indicator()
|
_task_indicator()
|
||||||
|
|
||||||
|
# Cloud LLM indicator — shown whenever any cloud backend is active
|
||||||
|
_llm_cfg_path = Path(__file__).parent.parent / "config" / "llm.yaml"
|
||||||
|
try:
|
||||||
|
import yaml as _yaml
|
||||||
|
from scripts.byok_guard import cloud_backends as _cloud_backends
|
||||||
|
_active_cloud = _cloud_backends(_yaml.safe_load(_llm_cfg_path.read_text(encoding="utf-8")) or {})
|
||||||
|
except Exception:
|
||||||
|
_active_cloud = []
|
||||||
|
if _active_cloud:
|
||||||
|
_provider_names = ", ".join(b.replace("_", " ").title() for b in _active_cloud)
|
||||||
|
st.warning(
|
||||||
|
f"**Cloud LLM active**\n\n"
|
||||||
|
f"{_provider_names}\n\n"
|
||||||
|
"AI features send content to this provider. "
|
||||||
|
"[Change in Settings](2_Settings)",
|
||||||
|
icon="🔓",
|
||||||
|
)
|
||||||
|
|
||||||
|
st.divider()
|
||||||
|
st.caption(f"Peregrine {_get_version()}")
|
||||||
|
inject_feedback_button(page=pg.title)
|
||||||
|
|
||||||
pg.run()
|
pg.run()
|
||||||
|
|
|
||||||
187
app/cloud_session.py
Normal file
187
app/cloud_session.py
Normal file
|
|
@ -0,0 +1,187 @@
|
||||||
|
# peregrine/app/cloud_session.py
|
||||||
|
"""
|
||||||
|
Cloud session middleware for multi-tenant Peregrine deployment.
|
||||||
|
|
||||||
|
In local-first mode (CLOUD_MODE unset or false), all functions are no-ops.
|
||||||
|
In cloud mode (CLOUD_MODE=true), resolves the Directus session JWT from the
|
||||||
|
X-CF-Session header, validates it, and injects user_id + db_path into
|
||||||
|
st.session_state.
|
||||||
|
|
||||||
|
All Peregrine pages call get_db_path() instead of DEFAULT_DB directly to
|
||||||
|
transparently support both local and cloud deployments.
|
||||||
|
"""
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
import hmac
|
||||||
|
import hashlib
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import requests
|
||||||
|
import streamlit as st
|
||||||
|
|
||||||
|
from scripts.db import DEFAULT_DB
|
||||||
|
|
||||||
|
log = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
CLOUD_MODE: bool = os.environ.get("CLOUD_MODE", "").lower() in ("1", "true", "yes")
|
||||||
|
CLOUD_DATA_ROOT: Path = Path(os.environ.get("CLOUD_DATA_ROOT", "/devl/menagerie-data"))
|
||||||
|
DIRECTUS_JWT_SECRET: str = os.environ.get("DIRECTUS_JWT_SECRET", "")
|
||||||
|
SERVER_SECRET: str = os.environ.get("CF_SERVER_SECRET", "")
|
||||||
|
|
||||||
|
# Heimdall license server — internal URL preferred when running on the same host
|
||||||
|
HEIMDALL_URL: str = os.environ.get("HEIMDALL_URL", "https://license.circuitforge.tech")
|
||||||
|
HEIMDALL_ADMIN_TOKEN: str = os.environ.get("HEIMDALL_ADMIN_TOKEN", "")
|
||||||
|
|
||||||
|
|
||||||
|
def _extract_session_token(cookie_header: str) -> str:
|
||||||
|
"""Extract cf_session value from a Cookie header string."""
|
||||||
|
m = re.search(r'(?:^|;)\s*cf_session=([^;]+)', cookie_header)
|
||||||
|
return m.group(1).strip() if m else ""
|
||||||
|
|
||||||
|
|
||||||
|
@st.cache_data(ttl=300, show_spinner=False)
|
||||||
|
def _fetch_cloud_tier(user_id: str, product: str) -> str:
|
||||||
|
"""Call Heimdall to resolve the current cloud tier for this user.
|
||||||
|
|
||||||
|
Cached per (user_id, product) for 5 minutes to avoid hammering Heimdall
|
||||||
|
on every Streamlit rerun. Returns "free" on any error so the app degrades
|
||||||
|
gracefully rather than blocking the user.
|
||||||
|
"""
|
||||||
|
if not HEIMDALL_ADMIN_TOKEN:
|
||||||
|
log.warning("HEIMDALL_ADMIN_TOKEN not set — defaulting tier to free")
|
||||||
|
return "free"
|
||||||
|
try:
|
||||||
|
resp = requests.post(
|
||||||
|
f"{HEIMDALL_URL}/admin/cloud/resolve",
|
||||||
|
json={"user_id": user_id, "product": product},
|
||||||
|
headers={"Authorization": f"Bearer {HEIMDALL_ADMIN_TOKEN}"},
|
||||||
|
timeout=5,
|
||||||
|
)
|
||||||
|
if resp.status_code == 200:
|
||||||
|
return resp.json().get("tier", "free")
|
||||||
|
if resp.status_code == 404:
|
||||||
|
# No cloud key yet — user signed up before provision ran; return free.
|
||||||
|
return "free"
|
||||||
|
log.warning("Heimdall resolve returned %s — defaulting tier to free", resp.status_code)
|
||||||
|
except Exception as exc:
|
||||||
|
log.warning("Heimdall tier resolve failed: %s — defaulting to free", exc)
|
||||||
|
return "free"
|
||||||
|
|
||||||
|
|
||||||
|
def validate_session_jwt(token: str) -> str:
|
||||||
|
"""Validate a Directus session JWT and return the user UUID. Raises on failure."""
|
||||||
|
import jwt # PyJWT — lazy import so local mode never needs it
|
||||||
|
payload = jwt.decode(token, DIRECTUS_JWT_SECRET, algorithms=["HS256"])
|
||||||
|
user_id = payload.get("id") or payload.get("sub")
|
||||||
|
if not user_id:
|
||||||
|
raise ValueError("JWT missing user id claim")
|
||||||
|
return user_id
|
||||||
|
|
||||||
|
|
||||||
|
def _user_data_path(user_id: str, app: str) -> Path:
|
||||||
|
return CLOUD_DATA_ROOT / user_id / app
|
||||||
|
|
||||||
|
|
||||||
|
def derive_db_key(user_id: str) -> str:
|
||||||
|
"""Derive a per-user SQLCipher encryption key from the server secret."""
|
||||||
|
return hmac.new(
|
||||||
|
SERVER_SECRET.encode(),
|
||||||
|
user_id.encode(),
|
||||||
|
hashlib.sha256,
|
||||||
|
).hexdigest()
|
||||||
|
|
||||||
|
|
||||||
|
def _render_auth_wall(message: str = "Please sign in to continue.") -> None:
|
||||||
|
"""Render a branded sign-in prompt and halt the page."""
|
||||||
|
st.markdown(
|
||||||
|
"""
|
||||||
|
<style>
|
||||||
|
[data-testid="stSidebar"] { display: none; }
|
||||||
|
[data-testid="collapsedControl"] { display: none; }
|
||||||
|
</style>
|
||||||
|
""",
|
||||||
|
unsafe_allow_html=True,
|
||||||
|
)
|
||||||
|
col = st.columns([1, 2, 1])[1]
|
||||||
|
with col:
|
||||||
|
st.markdown("## 🦅 Peregrine")
|
||||||
|
st.info(message, icon="🔒")
|
||||||
|
st.link_button(
|
||||||
|
"Sign in to CircuitForge",
|
||||||
|
url=f"https://circuitforge.tech/login?next=/peregrine",
|
||||||
|
use_container_width=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def resolve_session(app: str = "peregrine") -> None:
|
||||||
|
"""
|
||||||
|
Call at the top of each Streamlit page.
|
||||||
|
In local mode: no-op.
|
||||||
|
In cloud mode: reads X-CF-Session header, validates JWT, creates user
|
||||||
|
data directory on first visit, and sets st.session_state keys:
|
||||||
|
- user_id: str
|
||||||
|
- db_path: Path
|
||||||
|
- db_key: str (SQLCipher key for this user)
|
||||||
|
- cloud_tier: str (free | paid | premium | ultra — resolved from Heimdall)
|
||||||
|
Idempotent — skips if user_id already in session_state.
|
||||||
|
"""
|
||||||
|
if not CLOUD_MODE:
|
||||||
|
return
|
||||||
|
if st.session_state.get("user_id"):
|
||||||
|
return
|
||||||
|
|
||||||
|
cookie_header = st.context.headers.get("x-cf-session", "")
|
||||||
|
session_jwt = _extract_session_token(cookie_header)
|
||||||
|
if not session_jwt:
|
||||||
|
_render_auth_wall("Please sign in to access Peregrine.")
|
||||||
|
st.stop()
|
||||||
|
|
||||||
|
try:
|
||||||
|
user_id = validate_session_jwt(session_jwt)
|
||||||
|
except Exception:
|
||||||
|
_render_auth_wall("Your session has expired. Please sign in again.")
|
||||||
|
st.stop()
|
||||||
|
|
||||||
|
user_path = _user_data_path(user_id, app)
|
||||||
|
user_path.mkdir(parents=True, exist_ok=True)
|
||||||
|
(user_path / "config").mkdir(exist_ok=True)
|
||||||
|
(user_path / "data").mkdir(exist_ok=True)
|
||||||
|
|
||||||
|
st.session_state["user_id"] = user_id
|
||||||
|
st.session_state["db_path"] = user_path / "staging.db"
|
||||||
|
st.session_state["db_key"] = derive_db_key(user_id)
|
||||||
|
st.session_state["cloud_tier"] = _fetch_cloud_tier(user_id, app)
|
||||||
|
|
||||||
|
|
||||||
|
def get_db_path() -> Path:
|
||||||
|
"""
|
||||||
|
Return the active db_path for this session.
|
||||||
|
Cloud: user-scoped path from session_state.
|
||||||
|
Local: DEFAULT_DB (from STAGING_DB env var or repo default).
|
||||||
|
"""
|
||||||
|
return st.session_state.get("db_path", DEFAULT_DB)
|
||||||
|
|
||||||
|
|
||||||
|
def get_config_dir() -> Path:
|
||||||
|
"""
|
||||||
|
Return the config directory for this session.
|
||||||
|
Cloud: per-user path (<data_root>/<user_id>/peregrine/config/) so each
|
||||||
|
user's YAML files (user.yaml, plain_text_resume.yaml, etc.) are
|
||||||
|
isolated and never shared across tenants.
|
||||||
|
Local: repo-level config/ directory.
|
||||||
|
"""
|
||||||
|
if CLOUD_MODE and st.session_state.get("db_path"):
|
||||||
|
return Path(st.session_state["db_path"]).parent / "config"
|
||||||
|
return Path(__file__).parent.parent / "config"
|
||||||
|
|
||||||
|
|
||||||
|
def get_cloud_tier() -> str:
|
||||||
|
"""
|
||||||
|
Return the current user's cloud tier.
|
||||||
|
Cloud mode: resolved from Heimdall at session start (cached 5 min).
|
||||||
|
Local mode: always returns "local" so pages can distinguish self-hosted from cloud.
|
||||||
|
"""
|
||||||
|
if not CLOUD_MODE:
|
||||||
|
return "local"
|
||||||
|
return st.session_state.get("cloud_tier", "free")
|
||||||
1
app/components/__init__.py
Normal file
1
app/components/__init__.py
Normal file
|
|
@ -0,0 +1 @@
|
||||||
|
# app/components/__init__.py
|
||||||
192
app/components/linkedin_import.py
Normal file
192
app/components/linkedin_import.py
Normal file
|
|
@ -0,0 +1,192 @@
|
||||||
|
# app/components/linkedin_import.py
|
||||||
|
"""
|
||||||
|
Shared LinkedIn import widget.
|
||||||
|
|
||||||
|
Usage in a page:
|
||||||
|
from app.components.linkedin_import import render_linkedin_tab
|
||||||
|
|
||||||
|
# At top of page render — check for pending import:
|
||||||
|
_li_data = st.session_state.pop("_linkedin_extracted", None)
|
||||||
|
if _li_data:
|
||||||
|
st.session_state["_parsed_resume"] = _li_data
|
||||||
|
st.rerun()
|
||||||
|
|
||||||
|
# Inside the LinkedIn tab:
|
||||||
|
with tab_linkedin:
|
||||||
|
render_linkedin_tab(config_dir=CONFIG_DIR, tier=tier)
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
import re
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import streamlit as st
|
||||||
|
|
||||||
|
_LINKEDIN_PROFILE_RE = re.compile(r"https?://(www\.)?linkedin\.com/in/", re.I)
|
||||||
|
|
||||||
|
|
||||||
|
def _stage_path(config_dir: Path) -> Path:
|
||||||
|
return config_dir / "linkedin_stage.json"
|
||||||
|
|
||||||
|
|
||||||
|
def _load_stage(config_dir: Path) -> dict | None:
|
||||||
|
path = _stage_path(config_dir)
|
||||||
|
if not path.exists():
|
||||||
|
return None
|
||||||
|
try:
|
||||||
|
return json.loads(path.read_text())
|
||||||
|
except Exception:
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _days_ago(iso_ts: str) -> str:
|
||||||
|
try:
|
||||||
|
dt = datetime.fromisoformat(iso_ts)
|
||||||
|
delta = datetime.now(timezone.utc) - dt
|
||||||
|
days = delta.days
|
||||||
|
if days == 0:
|
||||||
|
return "today"
|
||||||
|
if days == 1:
|
||||||
|
return "yesterday"
|
||||||
|
return f"{days} days ago"
|
||||||
|
except Exception:
|
||||||
|
return "unknown"
|
||||||
|
|
||||||
|
|
||||||
|
def _do_scrape(url: str, config_dir: Path) -> None:
|
||||||
|
"""Validate URL, run scrape, update state."""
|
||||||
|
if not _LINKEDIN_PROFILE_RE.match(url):
|
||||||
|
st.error("Please enter a LinkedIn profile URL (linkedin.com/in/…)")
|
||||||
|
return
|
||||||
|
|
||||||
|
with st.spinner("Fetching LinkedIn profile… (10–20 seconds)"):
|
||||||
|
try:
|
||||||
|
from scripts.linkedin_scraper import scrape_profile
|
||||||
|
scrape_profile(url, _stage_path(config_dir))
|
||||||
|
st.success("Profile imported successfully.")
|
||||||
|
st.rerun()
|
||||||
|
except ValueError as e:
|
||||||
|
st.error(str(e))
|
||||||
|
except RuntimeError as e:
|
||||||
|
st.warning(str(e))
|
||||||
|
except Exception as e:
|
||||||
|
st.error(f"Unexpected error: {e}")
|
||||||
|
|
||||||
|
|
||||||
|
def render_linkedin_tab(config_dir: Path, tier: str) -> None:
|
||||||
|
"""
|
||||||
|
Render the LinkedIn import UI.
|
||||||
|
|
||||||
|
When the user clicks "Use this data", writes the extracted dict to
|
||||||
|
st.session_state["_linkedin_extracted"] and calls st.rerun().
|
||||||
|
|
||||||
|
Caller reads: data = st.session_state.pop("_linkedin_extracted", None)
|
||||||
|
"""
|
||||||
|
stage = _load_stage(config_dir)
|
||||||
|
|
||||||
|
# ── Staged data status bar ────────────────────────────────────────────────
|
||||||
|
if stage:
|
||||||
|
scraped_at = stage.get("scraped_at", "")
|
||||||
|
source_label = "LinkedIn export" if stage.get("source") == "export_zip" else "LinkedIn profile"
|
||||||
|
col_info, col_refresh = st.columns([4, 1])
|
||||||
|
col_info.caption(f"Last imported from {source_label}: {_days_ago(scraped_at)}")
|
||||||
|
if col_refresh.button("🔄 Refresh", key="li_refresh"):
|
||||||
|
url = stage.get("url")
|
||||||
|
if url:
|
||||||
|
_do_scrape(url, config_dir)
|
||||||
|
else:
|
||||||
|
st.info("Original URL not available — paste the URL below to re-import.")
|
||||||
|
|
||||||
|
# ── URL import ────────────────────────────────────────────────────────────
|
||||||
|
st.markdown("**Import from LinkedIn profile URL**")
|
||||||
|
url_input = st.text_input(
|
||||||
|
"LinkedIn profile URL",
|
||||||
|
placeholder="https://linkedin.com/in/your-name",
|
||||||
|
label_visibility="collapsed",
|
||||||
|
key="li_url_input",
|
||||||
|
)
|
||||||
|
if st.button("🔗 Import from LinkedIn", key="li_import_btn", type="primary"):
|
||||||
|
if not url_input.strip():
|
||||||
|
st.warning("Please enter your LinkedIn profile URL.")
|
||||||
|
else:
|
||||||
|
_do_scrape(url_input.strip(), config_dir)
|
||||||
|
|
||||||
|
st.caption(
|
||||||
|
"Imports from your public LinkedIn profile. No login or credentials required. "
|
||||||
|
"Scraping typically takes 10–20 seconds."
|
||||||
|
)
|
||||||
|
st.info(
|
||||||
|
"**LinkedIn limits public profile data.** Without logging in, LinkedIn only "
|
||||||
|
"exposes your name, About summary, current employer, and certifications — "
|
||||||
|
"past roles, education, and skills are hidden behind their login wall. "
|
||||||
|
"For your full career history use the **data export zip** option below.",
|
||||||
|
icon="ℹ️",
|
||||||
|
)
|
||||||
|
|
||||||
|
# ── Section preview + use button ─────────────────────────────────────────
|
||||||
|
if stage:
|
||||||
|
from scripts.linkedin_parser import parse_stage
|
||||||
|
extracted, err = parse_stage(_stage_path(config_dir))
|
||||||
|
|
||||||
|
if err:
|
||||||
|
st.warning(f"Could not read staged data: {err}")
|
||||||
|
else:
|
||||||
|
st.divider()
|
||||||
|
st.markdown("**Preview**")
|
||||||
|
col1, col2, col3 = st.columns(3)
|
||||||
|
col1.metric("Experience entries", len(extracted.get("experience", [])))
|
||||||
|
col2.metric("Skills", len(extracted.get("skills", [])))
|
||||||
|
col3.metric("Certifications", len(extracted.get("achievements", [])))
|
||||||
|
|
||||||
|
if extracted.get("career_summary"):
|
||||||
|
with st.expander("Summary"):
|
||||||
|
st.write(extracted["career_summary"])
|
||||||
|
|
||||||
|
if extracted.get("experience"):
|
||||||
|
with st.expander(f"Experience ({len(extracted['experience'])} entries)"):
|
||||||
|
for exp in extracted["experience"]:
|
||||||
|
st.markdown(f"**{exp.get('title')}** @ {exp.get('company')} · {exp.get('date_range', '')}")
|
||||||
|
|
||||||
|
if extracted.get("education"):
|
||||||
|
with st.expander("Education"):
|
||||||
|
for edu in extracted["education"]:
|
||||||
|
st.markdown(f"**{edu.get('school')}** — {edu.get('degree')} {edu.get('field', '')}".strip())
|
||||||
|
|
||||||
|
if extracted.get("skills"):
|
||||||
|
with st.expander("Skills"):
|
||||||
|
st.write(", ".join(extracted["skills"]))
|
||||||
|
|
||||||
|
st.divider()
|
||||||
|
if st.button("✅ Use this data", key="li_use_btn", type="primary"):
|
||||||
|
st.session_state["_linkedin_extracted"] = extracted
|
||||||
|
st.rerun()
|
||||||
|
|
||||||
|
# ── Advanced: data export ─────────────────────────────────────────────────
|
||||||
|
with st.expander("⬇️ Import from LinkedIn data export (advanced)", expanded=False):
|
||||||
|
st.caption(
|
||||||
|
"Download your LinkedIn data: **Settings & Privacy → Data Privacy → "
|
||||||
|
"Get a copy of your data → Request archive → Fast file**. "
|
||||||
|
"The Fast file is available immediately and contains your profile, "
|
||||||
|
"experience, education, and skills."
|
||||||
|
)
|
||||||
|
zip_file = st.file_uploader(
|
||||||
|
"Upload LinkedIn export zip", type=["zip"], key="li_zip_upload"
|
||||||
|
)
|
||||||
|
if zip_file is not None:
|
||||||
|
if st.button("📦 Parse export", key="li_parse_zip"):
|
||||||
|
with st.spinner("Parsing export archive…"):
|
||||||
|
try:
|
||||||
|
from scripts.linkedin_scraper import parse_export_zip
|
||||||
|
extracted = parse_export_zip(
|
||||||
|
zip_file.read(), _stage_path(config_dir)
|
||||||
|
)
|
||||||
|
st.success(
|
||||||
|
f"Imported {len(extracted.get('experience', []))} experience entries, "
|
||||||
|
f"{len(extracted.get('skills', []))} skills. "
|
||||||
|
"Click 'Use this data' above to apply."
|
||||||
|
)
|
||||||
|
st.rerun()
|
||||||
|
except Exception as e:
|
||||||
|
st.error(f"Failed to parse export: {e}")
|
||||||
31
app/components/paste_image.py
Normal file
31
app/components/paste_image.py
Normal file
|
|
@ -0,0 +1,31 @@
|
||||||
|
"""
|
||||||
|
Paste-from-clipboard / drag-and-drop image component.
|
||||||
|
|
||||||
|
Uses st.components.v1.declare_component so JS can return image bytes to Python
|
||||||
|
(st.components.v1.html() is one-way only). No build step required — the
|
||||||
|
frontend is a single index.html file.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import base64
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import streamlit.components.v1 as components
|
||||||
|
|
||||||
|
_FRONTEND = Path(__file__).parent / "paste_image_ui"
|
||||||
|
|
||||||
|
_paste_image = components.declare_component("paste_image", path=str(_FRONTEND))
|
||||||
|
|
||||||
|
|
||||||
|
def paste_image_component(key: str | None = None) -> bytes | None:
|
||||||
|
"""
|
||||||
|
Render the paste/drop zone. Returns PNG/JPEG bytes when an image is
|
||||||
|
pasted or dropped, or None if nothing has been submitted yet.
|
||||||
|
"""
|
||||||
|
result = _paste_image(key=key)
|
||||||
|
if result:
|
||||||
|
try:
|
||||||
|
return base64.b64decode(result)
|
||||||
|
except Exception:
|
||||||
|
return None
|
||||||
|
return None
|
||||||
142
app/components/paste_image_ui/index.html
Normal file
142
app/components/paste_image_ui/index.html
Normal file
|
|
@ -0,0 +1,142 @@
|
||||||
|
<!DOCTYPE html>
|
||||||
|
<html>
|
||||||
|
<head>
|
||||||
|
<meta charset="utf-8">
|
||||||
|
<style>
|
||||||
|
* { box-sizing: border-box; margin: 0; padding: 0; }
|
||||||
|
body {
|
||||||
|
font-family: -apple-system, BlinkMacSystemFont, "Source Sans Pro", sans-serif;
|
||||||
|
background: transparent;
|
||||||
|
}
|
||||||
|
.zone {
|
||||||
|
width: 100%;
|
||||||
|
min-height: 72px;
|
||||||
|
border: 2px dashed var(--border, #ccc);
|
||||||
|
border-radius: 8px;
|
||||||
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
|
justify-content: center;
|
||||||
|
flex-direction: column;
|
||||||
|
gap: 6px;
|
||||||
|
padding: 12px 16px;
|
||||||
|
cursor: pointer;
|
||||||
|
outline: none;
|
||||||
|
transition: border-color 0.15s, background 0.15s;
|
||||||
|
color: var(--text-muted, #888);
|
||||||
|
font-size: 13px;
|
||||||
|
text-align: center;
|
||||||
|
user-select: none;
|
||||||
|
}
|
||||||
|
.zone:focus { border-color: var(--primary, #ff4b4b); background: var(--primary-faint, rgba(255,75,75,0.06)); }
|
||||||
|
.zone.dragover { border-color: var(--primary, #ff4b4b); background: var(--primary-faint, rgba(255,75,75,0.06)); }
|
||||||
|
.zone.done { border-style: solid; border-color: #00c853; color: #00c853; }
|
||||||
|
.icon { font-size: 22px; line-height: 1; }
|
||||||
|
.hint { font-size: 11px; opacity: 0.7; }
|
||||||
|
.status { margin-top: 5px; font-size: 11px; text-align: center; color: var(--text-muted, #888); min-height: 16px; }
|
||||||
|
</style>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<div class="zone" id="zone" tabindex="0" role="button"
|
||||||
|
aria-label="Click to focus, then paste with Ctrl+V, or drag and drop an image">
|
||||||
|
<span class="icon">📋</span>
|
||||||
|
<span id="mainMsg"><strong>Click here</strong>, then <strong>Ctrl+V</strong> to paste</span>
|
||||||
|
<span class="hint" id="hint">or drag & drop an image file</span>
|
||||||
|
</div>
|
||||||
|
<div class="status" id="status"></div>
|
||||||
|
|
||||||
|
<script>
|
||||||
|
const zone = document.getElementById('zone');
|
||||||
|
const status = document.getElementById('status');
|
||||||
|
const mainMsg = document.getElementById('mainMsg');
|
||||||
|
const hint = document.getElementById('hint');
|
||||||
|
|
||||||
|
// ── Streamlit handshake ─────────────────────────────────────────────────
|
||||||
|
window.parent.postMessage({ type: "streamlit:componentReady", apiVersion: 1 }, "*");
|
||||||
|
|
||||||
|
function setHeight() {
|
||||||
|
const h = document.body.scrollHeight + 4;
|
||||||
|
window.parent.postMessage({ type: "streamlit:setFrameHeight", height: h }, "*");
|
||||||
|
}
|
||||||
|
setHeight();
|
||||||
|
|
||||||
|
// ── Theme ───────────────────────────────────────────────────────────────
|
||||||
|
window.addEventListener("message", (e) => {
|
||||||
|
if (e.data && e.data.type === "streamlit:render") {
|
||||||
|
const t = e.data.args && e.data.args.theme;
|
||||||
|
if (!t) return;
|
||||||
|
const r = document.documentElement;
|
||||||
|
r.style.setProperty("--primary", t.primaryColor || "#ff4b4b");
|
||||||
|
r.style.setProperty("--primary-faint", (t.primaryColor || "#ff4b4b") + "10");
|
||||||
|
r.style.setProperty("--text-muted", t.textColor ? t.textColor + "99" : "#888");
|
||||||
|
r.style.setProperty("--border", t.textColor ? t.textColor + "33" : "#ccc");
|
||||||
|
document.body.style.background = t.backgroundColor || "transparent";
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
// ── Image handling ──────────────────────────────────────────────────────
|
||||||
|
function markDone() {
|
||||||
|
zone.classList.add('done');
|
||||||
|
// Clear children and rebuild with safe DOM methods
|
||||||
|
while (zone.firstChild) zone.removeChild(zone.firstChild);
|
||||||
|
const icon = document.createElement('span');
|
||||||
|
icon.className = 'icon';
|
||||||
|
icon.textContent = '\u2705';
|
||||||
|
const msg = document.createElement('span');
|
||||||
|
msg.textContent = 'Image ready \u2014 remove or replace below';
|
||||||
|
zone.appendChild(icon);
|
||||||
|
zone.appendChild(msg);
|
||||||
|
setHeight();
|
||||||
|
}
|
||||||
|
|
||||||
|
function sendImage(blob) {
|
||||||
|
const reader = new FileReader();
|
||||||
|
reader.onload = function(ev) {
|
||||||
|
const dataUrl = ev.target.result;
|
||||||
|
const b64 = dataUrl.slice(dataUrl.indexOf(',') + 1);
|
||||||
|
window.parent.postMessage({ type: "streamlit:setComponentValue", value: b64 }, "*");
|
||||||
|
markDone();
|
||||||
|
};
|
||||||
|
reader.readAsDataURL(blob);
|
||||||
|
}
|
||||||
|
|
||||||
|
function findImageItem(items) {
|
||||||
|
if (!items) return null;
|
||||||
|
for (let i = 0; i < items.length; i++) {
|
||||||
|
if (items[i].type && items[i].type.indexOf('image/') === 0) return items[i];
|
||||||
|
}
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Ctrl+V paste (works over HTTP — uses paste event, not Clipboard API)
|
||||||
|
document.addEventListener('paste', function(e) {
|
||||||
|
const item = findImageItem(e.clipboardData && e.clipboardData.items);
|
||||||
|
if (item) { sendImage(item.getAsFile()); e.preventDefault(); }
|
||||||
|
});
|
||||||
|
|
||||||
|
// Drag and drop
|
||||||
|
zone.addEventListener('dragover', function(e) {
|
||||||
|
e.preventDefault();
|
||||||
|
zone.classList.add('dragover');
|
||||||
|
});
|
||||||
|
zone.addEventListener('dragleave', function() {
|
||||||
|
zone.classList.remove('dragover');
|
||||||
|
});
|
||||||
|
zone.addEventListener('drop', function(e) {
|
||||||
|
e.preventDefault();
|
||||||
|
zone.classList.remove('dragover');
|
||||||
|
const files = e.dataTransfer && e.dataTransfer.files;
|
||||||
|
if (files && files.length) {
|
||||||
|
for (let i = 0; i < files.length; i++) {
|
||||||
|
if (files[i].type.indexOf('image/') === 0) { sendImage(files[i]); return; }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// Fallback: dataTransfer items (e.g. dragged from browser)
|
||||||
|
const item = findImageItem(e.dataTransfer && e.dataTransfer.items);
|
||||||
|
if (item) sendImage(item.getAsFile());
|
||||||
|
});
|
||||||
|
|
||||||
|
// Click to focus so Ctrl+V lands in this iframe
|
||||||
|
zone.addEventListener('click', function() { zone.focus(); });
|
||||||
|
</script>
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
247
app/feedback.py
Normal file
247
app/feedback.py
Normal file
|
|
@ -0,0 +1,247 @@
|
||||||
|
"""
|
||||||
|
Floating feedback button + dialog — thin Streamlit shell.
|
||||||
|
All business logic lives in scripts/feedback_api.py.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
sys.path.insert(0, str(Path(__file__).parent.parent))
|
||||||
|
|
||||||
|
import streamlit as st
|
||||||
|
|
||||||
|
# ── CSS: float the button to the bottom-right corner ─────────────────────────
|
||||||
|
# Targets the button by its aria-label (set via `help=` parameter).
|
||||||
|
_FLOAT_CSS = """
|
||||||
|
<style>
|
||||||
|
button[aria-label="Send feedback or report a bug"] {
|
||||||
|
position: fixed !important;
|
||||||
|
bottom: 2rem !important;
|
||||||
|
right: 2rem !important;
|
||||||
|
z-index: 9999 !important;
|
||||||
|
border-radius: 25px !important;
|
||||||
|
padding: 0.5rem 1.25rem !important;
|
||||||
|
box-shadow: 0 4px 16px rgba(0,0,0,0.25) !important;
|
||||||
|
font-size: 0.9rem !important;
|
||||||
|
}
|
||||||
|
</style>
|
||||||
|
"""
|
||||||
|
|
||||||
|
|
||||||
|
@st.dialog("Send Feedback", width="large")
|
||||||
|
def _feedback_dialog(page: str) -> None:
|
||||||
|
"""Two-step feedback dialog: form → consent/attachments → submit."""
|
||||||
|
from scripts.feedback_api import (
|
||||||
|
collect_context, collect_logs, collect_listings,
|
||||||
|
build_issue_body, create_forgejo_issue, upload_attachment,
|
||||||
|
)
|
||||||
|
from scripts.db import DEFAULT_DB
|
||||||
|
|
||||||
|
# ── Initialise step counter ───────────────────────────────────────────────
|
||||||
|
if "fb_step" not in st.session_state:
|
||||||
|
st.session_state.fb_step = 1
|
||||||
|
|
||||||
|
# ═════════════════════════════════════════════════════════════════════════
|
||||||
|
# STEP 1 — Form
|
||||||
|
# ═════════════════════════════════════════════════════════════════════════
|
||||||
|
if st.session_state.fb_step == 1:
|
||||||
|
st.subheader("What's on your mind?")
|
||||||
|
|
||||||
|
fb_type = st.selectbox(
|
||||||
|
"Type", ["Bug", "Feature Request", "Other"], key="fb_type"
|
||||||
|
)
|
||||||
|
fb_title = st.text_input(
|
||||||
|
"Title", placeholder="Short summary of the issue or idea", key="fb_title"
|
||||||
|
)
|
||||||
|
fb_desc = st.text_area(
|
||||||
|
"Description",
|
||||||
|
placeholder="Describe what happened or what you'd like to see...",
|
||||||
|
key="fb_desc",
|
||||||
|
)
|
||||||
|
if fb_type == "Bug":
|
||||||
|
st.text_area(
|
||||||
|
"Reproduction steps",
|
||||||
|
placeholder="1. Go to...\n2. Click...\n3. See error",
|
||||||
|
key="fb_repro",
|
||||||
|
)
|
||||||
|
|
||||||
|
col_cancel, _, col_next = st.columns([1, 3, 1])
|
||||||
|
with col_cancel:
|
||||||
|
if st.button("Cancel"):
|
||||||
|
_clear_feedback_state()
|
||||||
|
st.rerun() # intentionally closes the dialog
|
||||||
|
with col_next:
|
||||||
|
if st.button("Next →", type="primary"):
|
||||||
|
# Read widget values NOW (same rerun as the click — values are
|
||||||
|
# available here even on first click). Copy to non-widget keys
|
||||||
|
# so they survive step 2's render (Streamlit removes widget
|
||||||
|
# state for widgets that are no longer rendered).
|
||||||
|
title = fb_title.strip()
|
||||||
|
desc = fb_desc.strip()
|
||||||
|
if not title or not desc:
|
||||||
|
st.error("Please fill in both Title and Description.")
|
||||||
|
else:
|
||||||
|
st.session_state.fb_data_type = fb_type
|
||||||
|
st.session_state.fb_data_title = title
|
||||||
|
st.session_state.fb_data_desc = desc
|
||||||
|
st.session_state.fb_data_repro = st.session_state.get("fb_repro", "")
|
||||||
|
st.session_state.fb_step = 2
|
||||||
|
|
||||||
|
# ═════════════════════════════════════════════════════════════════════════
|
||||||
|
# STEP 2 — Consent + attachments
|
||||||
|
# ═════════════════════════════════════════════════════════════════════════
|
||||||
|
elif st.session_state.fb_step == 2:
|
||||||
|
st.subheader("Optional: attach diagnostic data")
|
||||||
|
|
||||||
|
# ── Diagnostic data toggle + preview ─────────────────────────────────
|
||||||
|
include_diag = st.toggle(
|
||||||
|
"Include diagnostic data (logs + recent listings)", key="fb_diag"
|
||||||
|
)
|
||||||
|
if include_diag:
|
||||||
|
with st.expander("Preview what will be sent", expanded=True):
|
||||||
|
st.caption("**App logs (last 100 lines, PII masked):**")
|
||||||
|
st.code(collect_logs(100), language=None)
|
||||||
|
st.caption("**Recent listings (title / company / URL only):**")
|
||||||
|
for j in collect_listings(DEFAULT_DB, 5):
|
||||||
|
st.write(f"- {j['title']} @ {j['company']} — {j['url']}")
|
||||||
|
|
||||||
|
# ── Screenshot ────────────────────────────────────────────────────────
|
||||||
|
st.divider()
|
||||||
|
st.caption("**Screenshot** (optional)")
|
||||||
|
|
||||||
|
from app.components.paste_image import paste_image_component
|
||||||
|
|
||||||
|
# Keyed so we can reset the component when the user removes the image
|
||||||
|
if "fb_paste_key" not in st.session_state:
|
||||||
|
st.session_state.fb_paste_key = 0
|
||||||
|
|
||||||
|
pasted = paste_image_component(key=f"fb_paste_{st.session_state.fb_paste_key}")
|
||||||
|
if pasted:
|
||||||
|
st.session_state.fb_screenshot = pasted
|
||||||
|
|
||||||
|
st.caption("or upload a file:")
|
||||||
|
uploaded = st.file_uploader(
|
||||||
|
"Upload screenshot",
|
||||||
|
type=["png", "jpg", "jpeg"],
|
||||||
|
label_visibility="collapsed",
|
||||||
|
key="fb_upload",
|
||||||
|
)
|
||||||
|
if uploaded:
|
||||||
|
st.session_state.fb_screenshot = uploaded.read()
|
||||||
|
|
||||||
|
if st.session_state.get("fb_screenshot"):
|
||||||
|
st.image(
|
||||||
|
st.session_state["fb_screenshot"],
|
||||||
|
caption="Screenshot preview — this will be attached to the issue",
|
||||||
|
use_container_width=True,
|
||||||
|
)
|
||||||
|
if st.button("🗑 Remove screenshot"):
|
||||||
|
st.session_state.pop("fb_screenshot", None)
|
||||||
|
st.session_state.fb_paste_key = st.session_state.get("fb_paste_key", 0) + 1
|
||||||
|
# no st.rerun() — button click already re-renders the dialog
|
||||||
|
|
||||||
|
# ── Attribution consent ───────────────────────────────────────────────
|
||||||
|
st.divider()
|
||||||
|
submitter: str | None = None
|
||||||
|
try:
|
||||||
|
import yaml
|
||||||
|
_ROOT = Path(__file__).parent.parent
|
||||||
|
user = yaml.safe_load((_ROOT / "config" / "user.yaml").read_text()) or {}
|
||||||
|
name = (user.get("name") or "").strip()
|
||||||
|
email = (user.get("email") or "").strip()
|
||||||
|
if name or email:
|
||||||
|
label = f"Include my name & email in the report: **{name}** ({email})"
|
||||||
|
if st.checkbox(label, key="fb_attr"):
|
||||||
|
submitter = f"{name} <{email}>"
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
|
# ── Navigation ────────────────────────────────────────────────────────
|
||||||
|
col_back, _, col_submit = st.columns([1, 3, 2])
|
||||||
|
with col_back:
|
||||||
|
if st.button("← Back"):
|
||||||
|
st.session_state.fb_step = 1
|
||||||
|
# no st.rerun() — button click already re-renders the dialog
|
||||||
|
|
||||||
|
with col_submit:
|
||||||
|
if st.button("Submit Feedback", type="primary"):
|
||||||
|
_submit(page, include_diag, submitter, collect_context,
|
||||||
|
collect_logs, collect_listings, build_issue_body,
|
||||||
|
create_forgejo_issue, upload_attachment, DEFAULT_DB)
|
||||||
|
|
||||||
|
|
||||||
|
def _submit(page, include_diag, submitter, collect_context, collect_logs,
|
||||||
|
collect_listings, build_issue_body, create_forgejo_issue,
|
||||||
|
upload_attachment, db_path) -> None:
|
||||||
|
"""Handle form submission: build body, file issue, upload screenshot."""
|
||||||
|
with st.spinner("Filing issue…"):
|
||||||
|
context = collect_context(page)
|
||||||
|
attachments: dict = {}
|
||||||
|
if include_diag:
|
||||||
|
attachments["logs"] = collect_logs(100)
|
||||||
|
attachments["listings"] = collect_listings(db_path, 5)
|
||||||
|
if submitter:
|
||||||
|
attachments["submitter"] = submitter
|
||||||
|
|
||||||
|
fb_type = st.session_state.get("fb_data_type", "Other")
|
||||||
|
type_key = {"Bug": "bug", "Feature Request": "feature", "Other": "other"}.get(
|
||||||
|
fb_type, "other"
|
||||||
|
)
|
||||||
|
labels = ["beta-feedback", "needs-triage"]
|
||||||
|
labels.append(
|
||||||
|
{"bug": "bug", "feature": "feature-request"}.get(type_key, "question")
|
||||||
|
)
|
||||||
|
|
||||||
|
form = {
|
||||||
|
"type": type_key,
|
||||||
|
"description": st.session_state.get("fb_data_desc", ""),
|
||||||
|
"repro": st.session_state.get("fb_data_repro", "") if type_key == "bug" else "",
|
||||||
|
}
|
||||||
|
|
||||||
|
body = build_issue_body(form, context, attachments)
|
||||||
|
|
||||||
|
try:
|
||||||
|
result = create_forgejo_issue(
|
||||||
|
st.session_state.get("fb_data_title", "Feedback"), body, labels
|
||||||
|
)
|
||||||
|
screenshot = st.session_state.get("fb_screenshot")
|
||||||
|
if screenshot:
|
||||||
|
upload_attachment(result["number"], screenshot)
|
||||||
|
|
||||||
|
_clear_feedback_state()
|
||||||
|
st.success(f"Issue filed! [View on Forgejo]({result['url']})")
|
||||||
|
st.balloons()
|
||||||
|
|
||||||
|
except Exception as exc:
|
||||||
|
st.error(f"Failed to file issue: {exc}")
|
||||||
|
|
||||||
|
|
||||||
|
def _clear_feedback_state() -> None:
|
||||||
|
for key in [
|
||||||
|
"fb_step",
|
||||||
|
"fb_type", "fb_title", "fb_desc", "fb_repro", # widget keys
|
||||||
|
"fb_data_type", "fb_data_title", "fb_data_desc", "fb_data_repro", # saved data
|
||||||
|
"fb_diag", "fb_upload", "fb_attr", "fb_screenshot", "fb_paste_key",
|
||||||
|
]:
|
||||||
|
st.session_state.pop(key, None)
|
||||||
|
|
||||||
|
|
||||||
|
def inject_feedback_button(page: str = "Unknown") -> None:
|
||||||
|
"""
|
||||||
|
Inject the floating feedback button. Call once per page render in app.py.
|
||||||
|
Hidden automatically in DEMO_MODE.
|
||||||
|
"""
|
||||||
|
if os.environ.get("DEMO_MODE", "").lower() in ("1", "true", "yes"):
|
||||||
|
return
|
||||||
|
if not os.environ.get("FORGEJO_API_TOKEN"):
|
||||||
|
return # silently skip if not configured
|
||||||
|
|
||||||
|
st.markdown(_FLOAT_CSS, unsafe_allow_html=True)
|
||||||
|
if st.button(
|
||||||
|
"💬 Feedback",
|
||||||
|
key="__feedback_floating_btn__",
|
||||||
|
help="Send feedback or report a bug",
|
||||||
|
):
|
||||||
|
_feedback_dialog(page)
|
||||||
744
app/pages/0_Setup.py
Normal file
744
app/pages/0_Setup.py
Normal file
|
|
@ -0,0 +1,744 @@
|
||||||
|
"""
|
||||||
|
First-run setup wizard orchestrator.
|
||||||
|
Shown by app.py when user.yaml is absent OR wizard_complete is False.
|
||||||
|
Seven steps: hardware → tier → identity → resume → inference → search → integrations (optional).
|
||||||
|
Steps 1-6 are mandatory; step 7 is optional and can be skipped.
|
||||||
|
Each step writes to user.yaml on "Next" for crash recovery.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
import json
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
sys.path.insert(0, str(Path(__file__).parent.parent.parent))
|
||||||
|
|
||||||
|
import streamlit as st
|
||||||
|
import yaml
|
||||||
|
|
||||||
|
from app.cloud_session import resolve_session, get_db_path, get_config_dir
|
||||||
|
resolve_session("peregrine")
|
||||||
|
|
||||||
|
_ROOT = Path(__file__).parent.parent.parent
|
||||||
|
CONFIG_DIR = get_config_dir() # per-user dir in cloud; repo config/ locally
|
||||||
|
USER_YAML = CONFIG_DIR / "user.yaml"
|
||||||
|
STEPS = 6 # mandatory steps
|
||||||
|
STEP_LABELS = ["Hardware", "Tier", "Resume", "Identity", "Inference", "Search"]
|
||||||
|
|
||||||
|
|
||||||
|
# ── Helpers ────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def _load_yaml() -> dict:
|
||||||
|
if USER_YAML.exists():
|
||||||
|
return yaml.safe_load(USER_YAML.read_text()) or {}
|
||||||
|
return {}
|
||||||
|
|
||||||
|
|
||||||
|
def _save_yaml(updates: dict) -> None:
|
||||||
|
existing = _load_yaml()
|
||||||
|
existing.update(updates)
|
||||||
|
CONFIG_DIR.mkdir(parents=True, exist_ok=True)
|
||||||
|
USER_YAML.write_text(
|
||||||
|
yaml.dump(existing, default_flow_style=False, allow_unicode=True)
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _detect_gpus() -> list[str]:
|
||||||
|
"""Detect GPUs. Prefers env vars written by preflight (works inside Docker)."""
|
||||||
|
import os
|
||||||
|
import subprocess
|
||||||
|
# Preflight writes PEREGRINE_GPU_NAMES to .env; compose passes it to the container.
|
||||||
|
# This is the reliable path when running inside Docker without nvidia-smi access.
|
||||||
|
env_names = os.environ.get("PEREGRINE_GPU_NAMES", "").strip()
|
||||||
|
if env_names:
|
||||||
|
return [n.strip() for n in env_names.split(",") if n.strip()]
|
||||||
|
# Fallback: try nvidia-smi directly (works when running bare or with GPU passthrough)
|
||||||
|
try:
|
||||||
|
out = subprocess.check_output(
|
||||||
|
["nvidia-smi", "--query-gpu=name", "--format=csv,noheader"],
|
||||||
|
text=True, timeout=5,
|
||||||
|
)
|
||||||
|
return [l.strip() for l in out.strip().splitlines() if l.strip()]
|
||||||
|
except Exception:
|
||||||
|
return []
|
||||||
|
|
||||||
|
|
||||||
|
def _suggest_profile(gpus: list[str]) -> str:
|
||||||
|
import os
|
||||||
|
# If preflight already ran and wrote a profile recommendation, use it.
|
||||||
|
recommended = os.environ.get("RECOMMENDED_PROFILE", "").strip()
|
||||||
|
if recommended:
|
||||||
|
return recommended
|
||||||
|
if len(gpus) >= 2:
|
||||||
|
return "dual-gpu"
|
||||||
|
if len(gpus) == 1:
|
||||||
|
return "single-gpu"
|
||||||
|
return "remote"
|
||||||
|
|
||||||
|
|
||||||
|
def _submit_wizard_task(section: str, input_data: dict) -> int:
|
||||||
|
"""Submit a wizard_generate background task. Returns task_id."""
|
||||||
|
from scripts.task_runner import submit_task
|
||||||
|
params = json.dumps({"section": section, "input": input_data})
|
||||||
|
task_id, _ = submit_task(get_db_path(), "wizard_generate", 0, params=params)
|
||||||
|
return task_id
|
||||||
|
|
||||||
|
|
||||||
|
def _poll_wizard_task(section: str) -> dict | None:
|
||||||
|
"""Return the most recent wizard_generate task row for a given section, or None."""
|
||||||
|
import sqlite3
|
||||||
|
conn = sqlite3.connect(get_db_path())
|
||||||
|
conn.row_factory = sqlite3.Row
|
||||||
|
row = conn.execute(
|
||||||
|
"SELECT * FROM background_tasks "
|
||||||
|
"WHERE task_type='wizard_generate' AND params LIKE ? "
|
||||||
|
"ORDER BY id DESC LIMIT 1",
|
||||||
|
(f'%"section": "{section}"%',),
|
||||||
|
).fetchone()
|
||||||
|
conn.close()
|
||||||
|
return dict(row) if row else None
|
||||||
|
|
||||||
|
|
||||||
|
def _generation_widget(section: str, label: str, tier: str,
|
||||||
|
feature_key: str, input_data: dict) -> str | None:
|
||||||
|
"""Render a generation button + polling fragment.
|
||||||
|
|
||||||
|
Returns the generated result string if completed and not yet applied, else None.
|
||||||
|
Call this inside a step to add LLM generation support.
|
||||||
|
The caller decides whether to auto-populate a field with the result.
|
||||||
|
"""
|
||||||
|
from app.wizard.tiers import can_use, tier_label as tl, has_configured_llm
|
||||||
|
|
||||||
|
_has_byok = has_configured_llm()
|
||||||
|
if not can_use(tier, feature_key, has_byok=_has_byok):
|
||||||
|
st.caption(f"{tl(feature_key, has_byok=_has_byok)} {label}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
col_btn, col_fb = st.columns([2, 5])
|
||||||
|
if col_btn.button(f"\u2728 {label}", key=f"gen_{section}"):
|
||||||
|
_submit_wizard_task(section, input_data)
|
||||||
|
st.rerun()
|
||||||
|
|
||||||
|
with st.expander("\u270f\ufe0f Request changes (optional)", expanded=False):
|
||||||
|
prev = st.session_state.get(f"_gen_result_{section}", "")
|
||||||
|
feedback = st.text_area(
|
||||||
|
"Describe what to change", key=f"_feedback_{section}",
|
||||||
|
placeholder="e.g. Make it shorter and emphasise leadership",
|
||||||
|
height=60,
|
||||||
|
)
|
||||||
|
if prev and st.button(f"\u21ba Regenerate with feedback", key=f"regen_{section}"):
|
||||||
|
_submit_wizard_task(section, {**input_data,
|
||||||
|
"previous_result": prev,
|
||||||
|
"feedback": feedback})
|
||||||
|
st.rerun()
|
||||||
|
|
||||||
|
# Polling fragment
|
||||||
|
result_key = f"_gen_result_{section}"
|
||||||
|
|
||||||
|
@st.fragment(run_every=3)
|
||||||
|
def _poll():
|
||||||
|
task = _poll_wizard_task(section)
|
||||||
|
if not task:
|
||||||
|
return
|
||||||
|
status = task.get("status")
|
||||||
|
if status in ("queued", "running"):
|
||||||
|
stage = task.get("stage") or "Queued"
|
||||||
|
st.info(f"\u23f3 {stage}\u2026")
|
||||||
|
elif status == "completed":
|
||||||
|
payload = json.loads(task.get("error") or "{}")
|
||||||
|
result = payload.get("result", "")
|
||||||
|
if result and result != st.session_state.get(result_key):
|
||||||
|
st.session_state[result_key] = result
|
||||||
|
st.rerun()
|
||||||
|
elif status == "failed":
|
||||||
|
st.warning(f"Generation failed: {task.get('error', 'unknown error')}")
|
||||||
|
|
||||||
|
_poll()
|
||||||
|
|
||||||
|
return st.session_state.get(result_key)
|
||||||
|
|
||||||
|
|
||||||
|
# ── Wizard state init ──────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
if "wizard_step" not in st.session_state:
|
||||||
|
saved = _load_yaml()
|
||||||
|
last_completed = saved.get("wizard_step", 0)
|
||||||
|
st.session_state.wizard_step = min(last_completed + 1, STEPS + 1) # resume at next step
|
||||||
|
|
||||||
|
step = st.session_state.wizard_step
|
||||||
|
saved_yaml = _load_yaml()
|
||||||
|
_tier = saved_yaml.get("dev_tier_override") or saved_yaml.get("tier", "free")
|
||||||
|
|
||||||
|
st.title("\U0001f44b Welcome to Peregrine")
|
||||||
|
st.caption("Complete the setup to start your job search. Progress saves automatically.")
|
||||||
|
st.progress(
|
||||||
|
min((step - 1) / STEPS, 1.0),
|
||||||
|
text=f"Step {min(step, STEPS)} of {STEPS}" if step <= STEPS else "Almost done!",
|
||||||
|
)
|
||||||
|
st.divider()
|
||||||
|
|
||||||
|
|
||||||
|
# ── Step 1: Hardware ───────────────────────────────────────────────────────────
|
||||||
|
if step == 1:
|
||||||
|
from app.cloud_session import CLOUD_MODE as _CLOUD_MODE
|
||||||
|
if _CLOUD_MODE:
|
||||||
|
# Cloud deployment: always single-gpu (Heimdall), skip hardware selection
|
||||||
|
_save_yaml({"inference_profile": "single-gpu", "wizard_step": 1})
|
||||||
|
st.session_state.wizard_step = 2
|
||||||
|
st.rerun()
|
||||||
|
|
||||||
|
from app.wizard.step_hardware import validate, PROFILES
|
||||||
|
|
||||||
|
st.subheader("Step 1 \u2014 Hardware Detection")
|
||||||
|
gpus = _detect_gpus()
|
||||||
|
suggested = _suggest_profile(gpus)
|
||||||
|
|
||||||
|
if gpus:
|
||||||
|
st.success(f"Detected {len(gpus)} GPU(s): {', '.join(gpus)}")
|
||||||
|
else:
|
||||||
|
st.info("No NVIDIA GPUs detected. 'Remote' or 'CPU' mode recommended.")
|
||||||
|
|
||||||
|
profile = st.selectbox(
|
||||||
|
"Inference mode", PROFILES, index=PROFILES.index(suggested),
|
||||||
|
help="Controls which Docker services start. Change later in Settings \u2192 Services.",
|
||||||
|
)
|
||||||
|
if profile in ("single-gpu", "dual-gpu") and not gpus:
|
||||||
|
st.warning(
|
||||||
|
"No GPUs detected \u2014 GPU profiles require the NVIDIA Container Toolkit. "
|
||||||
|
"See README for install instructions."
|
||||||
|
)
|
||||||
|
|
||||||
|
if st.button("Next \u2192", type="primary", key="hw_next"):
|
||||||
|
errs = validate({"inference_profile": profile})
|
||||||
|
if errs:
|
||||||
|
st.error("\n".join(errs))
|
||||||
|
else:
|
||||||
|
_save_yaml({"inference_profile": profile, "wizard_step": 1})
|
||||||
|
st.session_state.wizard_step = 2
|
||||||
|
st.rerun()
|
||||||
|
|
||||||
|
|
||||||
|
# ── Step 2: Tier ───────────────────────────────────────────────────────────────
|
||||||
|
elif step == 2:
|
||||||
|
from app.cloud_session import CLOUD_MODE as _CLOUD_MODE
|
||||||
|
if _CLOUD_MODE:
|
||||||
|
# Cloud mode: tier already resolved from Heimdall at session init
|
||||||
|
cloud_tier = st.session_state.get("cloud_tier", "free")
|
||||||
|
_save_yaml({"tier": cloud_tier, "wizard_step": 2})
|
||||||
|
st.session_state.wizard_step = 3
|
||||||
|
st.rerun()
|
||||||
|
|
||||||
|
from app.wizard.step_tier import validate
|
||||||
|
|
||||||
|
st.subheader("Step 2 \u2014 Choose Your Plan")
|
||||||
|
st.caption(
|
||||||
|
"**Free** is fully functional for self-hosted local use. "
|
||||||
|
"**Paid/Premium** unlock LLM-assisted features."
|
||||||
|
)
|
||||||
|
|
||||||
|
tier_options = {
|
||||||
|
"free": "\U0001f193 **Free** \u2014 Local discovery, apply workspace, interviews kanban",
|
||||||
|
"paid": "\U0001f4bc **Paid** \u2014 + AI career summary, company research, email classifier, calendar sync",
|
||||||
|
"premium": "\u2b50 **Premium** \u2014 + Voice guidelines, model fine-tuning, multi-user",
|
||||||
|
}
|
||||||
|
from app.wizard.tiers import TIERS
|
||||||
|
current_tier = saved_yaml.get("tier", "free")
|
||||||
|
selected_tier = st.radio(
|
||||||
|
"Plan",
|
||||||
|
list(tier_options.keys()),
|
||||||
|
format_func=lambda x: tier_options[x],
|
||||||
|
index=TIERS.index(current_tier) if current_tier in TIERS else 0,
|
||||||
|
)
|
||||||
|
|
||||||
|
col_back, col_next = st.columns([1, 4])
|
||||||
|
if col_back.button("\u2190 Back", key="tier_back"):
|
||||||
|
st.session_state.wizard_step = 1
|
||||||
|
st.rerun()
|
||||||
|
if col_next.button("Next \u2192", type="primary", key="tier_next"):
|
||||||
|
errs = validate({"tier": selected_tier})
|
||||||
|
if errs:
|
||||||
|
st.error("\n".join(errs))
|
||||||
|
else:
|
||||||
|
_save_yaml({"tier": selected_tier, "wizard_step": 2})
|
||||||
|
st.session_state.wizard_step = 3
|
||||||
|
st.rerun()
|
||||||
|
|
||||||
|
|
||||||
|
# ── Step 3: Resume ─────────────────────────────────────────────────────────────
|
||||||
|
elif step == 3:
|
||||||
|
from app.wizard.step_resume import validate
|
||||||
|
|
||||||
|
st.subheader("Step 3 \u2014 Resume")
|
||||||
|
st.caption("Upload your resume for fast parsing, or build it section by section.")
|
||||||
|
|
||||||
|
# Read LinkedIn import result before tabs render (spec: "at step render time")
|
||||||
|
_li_data = st.session_state.pop("_linkedin_extracted", None)
|
||||||
|
if _li_data:
|
||||||
|
st.session_state["_parsed_resume"] = _li_data
|
||||||
|
|
||||||
|
tab_upload, tab_builder, tab_linkedin = st.tabs([
|
||||||
|
"\U0001f4ce Upload", "\U0001f4dd Build Manually", "\U0001f517 LinkedIn"
|
||||||
|
])
|
||||||
|
|
||||||
|
with tab_upload:
|
||||||
|
uploaded = st.file_uploader("Upload PDF, DOCX, or ODT", type=["pdf", "docx", "odt"])
|
||||||
|
if uploaded and st.button("Parse Resume", type="primary", key="parse_resume"):
|
||||||
|
from scripts.resume_parser import (
|
||||||
|
extract_text_from_pdf, extract_text_from_docx,
|
||||||
|
extract_text_from_odt, structure_resume,
|
||||||
|
)
|
||||||
|
file_bytes = uploaded.read()
|
||||||
|
ext = uploaded.name.rsplit(".", 1)[-1].lower()
|
||||||
|
if ext == "pdf":
|
||||||
|
raw_text = extract_text_from_pdf(file_bytes)
|
||||||
|
elif ext == "odt":
|
||||||
|
raw_text = extract_text_from_odt(file_bytes)
|
||||||
|
else:
|
||||||
|
raw_text = extract_text_from_docx(file_bytes)
|
||||||
|
with st.spinner("Parsing\u2026"):
|
||||||
|
parsed, parse_err = structure_resume(raw_text)
|
||||||
|
|
||||||
|
# Diagnostic: show raw extraction + detected fields regardless of outcome
|
||||||
|
with st.expander("🔍 Parse diagnostics", expanded=not bool(parsed and any(
|
||||||
|
parsed.get(k) for k in ("name", "experience", "skills")
|
||||||
|
))):
|
||||||
|
st.caption("**Raw extracted text (first 800 chars)**")
|
||||||
|
st.code(raw_text[:800] if raw_text else "(empty)", language="text")
|
||||||
|
if parsed:
|
||||||
|
st.caption("**Detected fields**")
|
||||||
|
st.json({k: (v[:3] if isinstance(v, list) else v) for k, v in parsed.items()})
|
||||||
|
|
||||||
|
if parsed and any(parsed.get(k) for k in ("name", "experience", "skills")):
|
||||||
|
st.session_state["_parsed_resume"] = parsed
|
||||||
|
st.session_state["_raw_resume_text"] = raw_text
|
||||||
|
_save_yaml({"_raw_resume_text": raw_text[:8000]})
|
||||||
|
st.success("Parsed! Review the builder tab to edit entries.")
|
||||||
|
elif parsed:
|
||||||
|
# Parsed but empty — show what we got and let them proceed or build manually
|
||||||
|
st.session_state["_parsed_resume"] = parsed
|
||||||
|
st.warning("Resume text was extracted but no fields were recognised. "
|
||||||
|
"Check the diagnostics above — the section headers may use unusual labels. "
|
||||||
|
"You can still fill in the Build tab manually.")
|
||||||
|
else:
|
||||||
|
st.warning("Auto-parse failed \u2014 switch to the Build tab and add entries manually.")
|
||||||
|
if parse_err:
|
||||||
|
st.caption(f"Reason: {parse_err}")
|
||||||
|
|
||||||
|
with tab_builder:
|
||||||
|
parsed = st.session_state.get("_parsed_resume", {})
|
||||||
|
experience = st.session_state.get(
|
||||||
|
"_experience",
|
||||||
|
parsed.get("experience") or saved_yaml.get("experience", []),
|
||||||
|
)
|
||||||
|
|
||||||
|
for i, entry in enumerate(experience):
|
||||||
|
with st.expander(
|
||||||
|
f"{entry.get('title', 'Entry')} @ {entry.get('company', '?')}",
|
||||||
|
expanded=(i == len(experience) - 1),
|
||||||
|
):
|
||||||
|
entry["company"] = st.text_input("Company", entry.get("company", ""), key=f"co_{i}")
|
||||||
|
entry["title"] = st.text_input("Title", entry.get("title", ""), key=f"ti_{i}")
|
||||||
|
raw_bullets = st.text_area(
|
||||||
|
"Responsibilities (one per line)",
|
||||||
|
"\n".join(entry.get("bullets", [])),
|
||||||
|
key=f"bu_{i}", height=80,
|
||||||
|
)
|
||||||
|
entry["bullets"] = [b.strip() for b in raw_bullets.splitlines() if b.strip()]
|
||||||
|
if st.button("Remove entry", key=f"rm_{i}"):
|
||||||
|
experience.pop(i)
|
||||||
|
st.session_state["_experience"] = experience
|
||||||
|
st.rerun()
|
||||||
|
|
||||||
|
if st.button("\uff0b Add work experience entry", key="add_exp"):
|
||||||
|
experience.append({"company": "", "title": "", "bullets": []})
|
||||||
|
st.session_state["_experience"] = experience
|
||||||
|
st.rerun()
|
||||||
|
|
||||||
|
# Bullet expansion generation
|
||||||
|
if experience:
|
||||||
|
all_bullets = "\n".join(
|
||||||
|
b for e in experience for b in e.get("bullets", [])
|
||||||
|
)
|
||||||
|
_generation_widget(
|
||||||
|
section="expand_bullets",
|
||||||
|
label="Expand bullet points",
|
||||||
|
tier=_tier,
|
||||||
|
feature_key="llm_expand_bullets",
|
||||||
|
input_data={"bullet_notes": all_bullets},
|
||||||
|
)
|
||||||
|
|
||||||
|
with tab_linkedin:
|
||||||
|
from app.components.linkedin_import import render_linkedin_tab
|
||||||
|
render_linkedin_tab(config_dir=CONFIG_DIR, tier=_tier)
|
||||||
|
|
||||||
|
col_back, col_next = st.columns([1, 4])
|
||||||
|
if col_back.button("\u2190 Back", key="resume_back"):
|
||||||
|
st.session_state.wizard_step = 2
|
||||||
|
st.rerun()
|
||||||
|
if col_next.button("Next \u2192", type="primary", key="resume_next"):
|
||||||
|
parsed = st.session_state.get("_parsed_resume", {})
|
||||||
|
experience = (
|
||||||
|
parsed.get("experience") or
|
||||||
|
st.session_state.get("_experience", [])
|
||||||
|
)
|
||||||
|
errs = validate({"experience": experience})
|
||||||
|
if errs:
|
||||||
|
st.error("\n".join(errs))
|
||||||
|
else:
|
||||||
|
resume_yaml_path = CONFIG_DIR / "plain_text_resume.yaml"
|
||||||
|
resume_yaml_path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
resume_data = {**parsed, "experience": experience} if parsed else {"experience": experience}
|
||||||
|
resume_yaml_path.write_text(
|
||||||
|
yaml.dump(resume_data, default_flow_style=False, allow_unicode=True)
|
||||||
|
)
|
||||||
|
_save_yaml({"wizard_step": 3})
|
||||||
|
st.session_state.wizard_step = 4
|
||||||
|
st.rerun()
|
||||||
|
|
||||||
|
|
||||||
|
# ── Step 4: Identity ───────────────────────────────────────────────────────────
|
||||||
|
elif step == 4:
|
||||||
|
from app.wizard.step_identity import validate
|
||||||
|
|
||||||
|
st.subheader("Step 4 \u2014 Your Identity")
|
||||||
|
st.caption("Used in cover letter PDFs, LLM prompts, and the app header.")
|
||||||
|
|
||||||
|
c1, c2 = st.columns(2)
|
||||||
|
name = c1.text_input("Full Name *", saved_yaml.get("name", ""))
|
||||||
|
email = c1.text_input("Email *", saved_yaml.get("email", ""))
|
||||||
|
phone = c2.text_input("Phone", saved_yaml.get("phone", ""))
|
||||||
|
linkedin = c2.text_input("LinkedIn URL", saved_yaml.get("linkedin", ""))
|
||||||
|
|
||||||
|
# Career summary with optional LLM generation — resume text available now (step 3 ran first)
|
||||||
|
summary_default = st.session_state.get("_gen_result_career_summary") or saved_yaml.get("career_summary", "")
|
||||||
|
summary = st.text_area(
|
||||||
|
"Career Summary *", value=summary_default, height=120,
|
||||||
|
placeholder="Experienced professional with X years in [field]. Specialise in [skills].",
|
||||||
|
help="Injected into cover letter and research prompts as your professional context.",
|
||||||
|
)
|
||||||
|
|
||||||
|
gen_result = _generation_widget(
|
||||||
|
section="career_summary",
|
||||||
|
label="Generate from resume",
|
||||||
|
tier=_tier,
|
||||||
|
feature_key="llm_career_summary",
|
||||||
|
input_data={"resume_text": saved_yaml.get("_raw_resume_text", "")},
|
||||||
|
)
|
||||||
|
if gen_result and gen_result != summary:
|
||||||
|
st.info(f"\u2728 Suggested summary \u2014 paste it above if it looks good:\n\n{gen_result}")
|
||||||
|
|
||||||
|
col_back, col_next = st.columns([1, 4])
|
||||||
|
if col_back.button("\u2190 Back", key="ident_back"):
|
||||||
|
st.session_state.wizard_step = 3
|
||||||
|
st.rerun()
|
||||||
|
if col_next.button("Next \u2192", type="primary", key="ident_next"):
|
||||||
|
errs = validate({"name": name, "email": email, "career_summary": summary})
|
||||||
|
if errs:
|
||||||
|
st.error("\n".join(errs))
|
||||||
|
else:
|
||||||
|
_save_yaml({
|
||||||
|
"name": name, "email": email, "phone": phone,
|
||||||
|
"linkedin": linkedin, "career_summary": summary,
|
||||||
|
"wizard_complete": False, "wizard_step": 4,
|
||||||
|
})
|
||||||
|
st.session_state.wizard_step = 5
|
||||||
|
st.rerun()
|
||||||
|
|
||||||
|
|
||||||
|
# ── Step 5: Inference ──────────────────────────────────────────────────────────
|
||||||
|
elif step == 5:
|
||||||
|
from app.cloud_session import CLOUD_MODE as _CLOUD_MODE
|
||||||
|
if _CLOUD_MODE:
|
||||||
|
# Cloud deployment: inference is managed server-side; skip this step
|
||||||
|
_save_yaml({"wizard_step": 5})
|
||||||
|
st.session_state.wizard_step = 6
|
||||||
|
st.rerun()
|
||||||
|
|
||||||
|
from app.wizard.step_inference import validate
|
||||||
|
|
||||||
|
st.subheader("Step 5 \u2014 Inference & API Keys")
|
||||||
|
profile = saved_yaml.get("inference_profile", "remote")
|
||||||
|
|
||||||
|
if profile == "remote":
|
||||||
|
st.info("Remote mode: at least one external API key is required.")
|
||||||
|
anthropic_key = st.text_input("Anthropic API Key", type="password", placeholder="sk-ant-\u2026")
|
||||||
|
openai_url = st.text_input("OpenAI-compatible endpoint (optional)",
|
||||||
|
placeholder="https://api.together.xyz/v1")
|
||||||
|
openai_key = st.text_input("Endpoint API Key (optional)", type="password",
|
||||||
|
key="oai_key") if openai_url else ""
|
||||||
|
else:
|
||||||
|
st.info(f"Local mode ({profile}): Ollama provides inference.")
|
||||||
|
anthropic_key = openai_url = openai_key = ""
|
||||||
|
|
||||||
|
with st.expander("Advanced \u2014 Service Ports & Hosts"):
|
||||||
|
st.caption("Change only if services run on non-default ports or remote hosts.")
|
||||||
|
svc = dict(saved_yaml.get("services", {}))
|
||||||
|
for svc_name, default_host, default_port in [
|
||||||
|
("ollama", "ollama", 11434), # Docker service name
|
||||||
|
("vllm", "vllm", 8000), # Docker service name
|
||||||
|
("searxng", "searxng", 8080), # Docker internal port (host-mapped: 8888)
|
||||||
|
]:
|
||||||
|
c1, c2 = st.columns([3, 1])
|
||||||
|
svc[f"{svc_name}_host"] = c1.text_input(
|
||||||
|
f"{svc_name} host",
|
||||||
|
svc.get(f"{svc_name}_host", default_host),
|
||||||
|
key=f"h_{svc_name}",
|
||||||
|
)
|
||||||
|
svc[f"{svc_name}_port"] = int(c2.number_input(
|
||||||
|
"port",
|
||||||
|
value=int(svc.get(f"{svc_name}_port", default_port)),
|
||||||
|
step=1, key=f"p_{svc_name}",
|
||||||
|
))
|
||||||
|
|
||||||
|
confirmed = st.session_state.get("_inf_confirmed", False)
|
||||||
|
test_label = "\U0001f50c Test Ollama connection" if profile != "remote" else "\U0001f50c Test LLM connection"
|
||||||
|
if st.button(test_label, key="inf_test"):
|
||||||
|
if profile == "remote":
|
||||||
|
from scripts.llm_router import LLMRouter
|
||||||
|
try:
|
||||||
|
r = LLMRouter().complete("Reply with only: OK")
|
||||||
|
if r and r.strip():
|
||||||
|
st.success("LLM responding.")
|
||||||
|
st.session_state["_inf_confirmed"] = True
|
||||||
|
confirmed = True
|
||||||
|
except Exception as e:
|
||||||
|
st.error(f"LLM test failed: {e}")
|
||||||
|
else:
|
||||||
|
import requests
|
||||||
|
ollama_url = f"http://{svc.get('ollama_host','localhost')}:{svc.get('ollama_port',11434)}"
|
||||||
|
try:
|
||||||
|
requests.get(f"{ollama_url}/api/tags", timeout=5)
|
||||||
|
st.success("Ollama is running.")
|
||||||
|
st.session_state["_inf_confirmed"] = True
|
||||||
|
confirmed = True
|
||||||
|
except Exception:
|
||||||
|
st.warning("Ollama not responding \u2014 you can skip this check and configure later.")
|
||||||
|
st.session_state["_inf_confirmed"] = True
|
||||||
|
confirmed = True
|
||||||
|
|
||||||
|
col_back, col_next = st.columns([1, 4])
|
||||||
|
if col_back.button("\u2190 Back", key="inf_back"):
|
||||||
|
st.session_state.wizard_step = 4
|
||||||
|
st.rerun()
|
||||||
|
if col_next.button("Next \u2192", type="primary", key="inf_next", disabled=not confirmed):
|
||||||
|
errs = validate({"endpoint_confirmed": confirmed})
|
||||||
|
if errs:
|
||||||
|
st.error("\n".join(errs))
|
||||||
|
else:
|
||||||
|
# Write API keys to .env
|
||||||
|
env_path = _ROOT / ".env"
|
||||||
|
env_lines = env_path.read_text().splitlines() if env_path.exists() else []
|
||||||
|
|
||||||
|
def _set_env(lines: list[str], key: str, val: str) -> list[str]:
|
||||||
|
for i, l in enumerate(lines):
|
||||||
|
if l.startswith(f"{key}="):
|
||||||
|
lines[i] = f"{key}={val}"
|
||||||
|
return lines
|
||||||
|
lines.append(f"{key}={val}")
|
||||||
|
return lines
|
||||||
|
|
||||||
|
if anthropic_key:
|
||||||
|
env_lines = _set_env(env_lines, "ANTHROPIC_API_KEY", anthropic_key)
|
||||||
|
if openai_url:
|
||||||
|
env_lines = _set_env(env_lines, "OPENAI_COMPAT_URL", openai_url)
|
||||||
|
if openai_key:
|
||||||
|
env_lines = _set_env(env_lines, "OPENAI_COMPAT_KEY", openai_key)
|
||||||
|
if anthropic_key or openai_url:
|
||||||
|
env_path.write_text("\n".join(env_lines) + "\n")
|
||||||
|
|
||||||
|
_save_yaml({"services": svc, "wizard_step": 5})
|
||||||
|
st.session_state.wizard_step = 6
|
||||||
|
st.rerun()
|
||||||
|
|
||||||
|
|
||||||
|
# ── Step 6: Search ─────────────────────────────────────────────────────────────
|
||||||
|
elif step == 6:
|
||||||
|
from app.wizard.step_search import validate
|
||||||
|
|
||||||
|
st.subheader("Step 6 \u2014 Job Search Preferences")
|
||||||
|
st.caption("Set up what to search for. You can refine these in Settings \u2192 Search later.")
|
||||||
|
|
||||||
|
titles = st.session_state.get("_titles", saved_yaml.get("_wiz_titles", []))
|
||||||
|
locations = st.session_state.get("_locations", saved_yaml.get("_wiz_locations", []))
|
||||||
|
|
||||||
|
c1, c2 = st.columns(2)
|
||||||
|
|
||||||
|
with c1:
|
||||||
|
st.markdown("**Job Titles**")
|
||||||
|
for i, t in enumerate(titles):
|
||||||
|
tc1, tc2 = st.columns([5, 1])
|
||||||
|
tc1.text(t)
|
||||||
|
if tc2.button("\u00d7", key=f"rmtitle_{i}"):
|
||||||
|
titles.pop(i)
|
||||||
|
st.session_state["_titles"] = titles
|
||||||
|
st.rerun()
|
||||||
|
new_title = st.text_input("Add title", key="new_title_wiz",
|
||||||
|
placeholder="Software Engineer, Product Manager\u2026")
|
||||||
|
ac1, ac2 = st.columns([4, 1])
|
||||||
|
if ac2.button("\uff0b", key="add_title"):
|
||||||
|
if new_title.strip() and new_title.strip() not in titles:
|
||||||
|
titles.append(new_title.strip())
|
||||||
|
st.session_state["_titles"] = titles
|
||||||
|
st.rerun()
|
||||||
|
|
||||||
|
# LLM title suggestions
|
||||||
|
_generation_widget(
|
||||||
|
section="job_titles",
|
||||||
|
label="Suggest job titles",
|
||||||
|
tier=_tier,
|
||||||
|
feature_key="llm_job_titles",
|
||||||
|
input_data={
|
||||||
|
"resume_text": saved_yaml.get("_raw_resume_text", ""),
|
||||||
|
"current_titles": str(titles),
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
with c2:
|
||||||
|
st.markdown("**Locations**")
|
||||||
|
for i, l in enumerate(locations):
|
||||||
|
lc1, lc2 = st.columns([5, 1])
|
||||||
|
lc1.text(l)
|
||||||
|
if lc2.button("\u00d7", key=f"rmloc_{i}"):
|
||||||
|
locations.pop(i)
|
||||||
|
st.session_state["_locations"] = locations
|
||||||
|
st.rerun()
|
||||||
|
new_loc = st.text_input("Add location", key="new_loc_wiz",
|
||||||
|
placeholder="Remote, New York NY, San Francisco CA\u2026")
|
||||||
|
ll1, ll2 = st.columns([4, 1])
|
||||||
|
if ll2.button("\uff0b", key="add_loc"):
|
||||||
|
if new_loc.strip():
|
||||||
|
locations.append(new_loc.strip())
|
||||||
|
st.session_state["_locations"] = locations
|
||||||
|
st.rerun()
|
||||||
|
|
||||||
|
col_back, col_next = st.columns([1, 4])
|
||||||
|
if col_back.button("\u2190 Back", key="search_back"):
|
||||||
|
st.session_state.wizard_step = 5
|
||||||
|
st.rerun()
|
||||||
|
if col_next.button("Next \u2192", type="primary", key="search_next"):
|
||||||
|
errs = validate({"job_titles": titles, "locations": locations})
|
||||||
|
if errs:
|
||||||
|
st.error("\n".join(errs))
|
||||||
|
else:
|
||||||
|
search_profile_path = CONFIG_DIR / "search_profiles.yaml"
|
||||||
|
existing_profiles = {}
|
||||||
|
if search_profile_path.exists():
|
||||||
|
existing_profiles = yaml.safe_load(search_profile_path.read_text()) or {}
|
||||||
|
profiles_list = existing_profiles.get("profiles", [])
|
||||||
|
# Update or create "default" profile
|
||||||
|
default_idx = next(
|
||||||
|
(i for i, p in enumerate(profiles_list) if p.get("name") == "default"), None
|
||||||
|
)
|
||||||
|
default_profile = {
|
||||||
|
"name": "default",
|
||||||
|
"job_titles": titles,
|
||||||
|
"locations": locations,
|
||||||
|
"remote_only": False,
|
||||||
|
"boards": ["linkedin", "indeed", "glassdoor", "zip_recruiter"],
|
||||||
|
}
|
||||||
|
if default_idx is not None:
|
||||||
|
profiles_list[default_idx] = default_profile
|
||||||
|
else:
|
||||||
|
profiles_list.insert(0, default_profile)
|
||||||
|
search_profile_path.write_text(
|
||||||
|
yaml.dump({"profiles": profiles_list},
|
||||||
|
default_flow_style=False, allow_unicode=True)
|
||||||
|
)
|
||||||
|
_save_yaml({"wizard_step": 6})
|
||||||
|
st.session_state.wizard_step = 7
|
||||||
|
st.rerun()
|
||||||
|
|
||||||
|
|
||||||
|
# ── Step 7: Integrations (optional) ───────────────────────────────────────────
|
||||||
|
elif step == 7:
|
||||||
|
st.subheader("Step 7 \u2014 Integrations (Optional)")
|
||||||
|
st.caption(
|
||||||
|
"Connect cloud services, calendars, and notification tools. "
|
||||||
|
"You can add or change these any time in Settings \u2192 Integrations."
|
||||||
|
)
|
||||||
|
|
||||||
|
from scripts.integrations import REGISTRY
|
||||||
|
from app.wizard.step_integrations import get_available, is_connected
|
||||||
|
from app.wizard.tiers import tier_label
|
||||||
|
|
||||||
|
available = get_available(_tier)
|
||||||
|
|
||||||
|
for name, cls in sorted(REGISTRY.items(), key=lambda x: (x[0] not in available, x[0])):
|
||||||
|
is_conn = is_connected(name, CONFIG_DIR)
|
||||||
|
icon = "\u2705" if is_conn else "\u25cb"
|
||||||
|
lock = tier_label(f"{name}_sync") or tier_label(f"{name}_notifications")
|
||||||
|
|
||||||
|
with st.expander(f"{icon} {cls.label} {lock}"):
|
||||||
|
if name not in available:
|
||||||
|
st.caption(f"Upgrade to {cls.tier} to unlock {cls.label}.")
|
||||||
|
continue
|
||||||
|
|
||||||
|
inst = cls()
|
||||||
|
config: dict = {}
|
||||||
|
for field in inst.fields():
|
||||||
|
val = st.text_input(
|
||||||
|
field["label"],
|
||||||
|
type="password" if field["type"] == "password" else "default",
|
||||||
|
placeholder=field.get("placeholder", ""),
|
||||||
|
help=field.get("help", ""),
|
||||||
|
key=f"int_{name}_{field['key']}",
|
||||||
|
)
|
||||||
|
config[field["key"]] = val
|
||||||
|
|
||||||
|
required_filled = all(
|
||||||
|
config.get(f["key"])
|
||||||
|
for f in inst.fields()
|
||||||
|
if f.get("required")
|
||||||
|
)
|
||||||
|
if st.button(f"Connect {cls.label}", key=f"conn_{name}",
|
||||||
|
disabled=not required_filled):
|
||||||
|
inst.connect(config)
|
||||||
|
with st.spinner(f"Testing {cls.label} connection\u2026"):
|
||||||
|
if inst.test():
|
||||||
|
inst.save_config(config, CONFIG_DIR)
|
||||||
|
st.success(f"{cls.label} connected!")
|
||||||
|
st.rerun()
|
||||||
|
else:
|
||||||
|
st.error(
|
||||||
|
f"Connection test failed for {cls.label}. "
|
||||||
|
"Double-check your credentials."
|
||||||
|
)
|
||||||
|
|
||||||
|
st.divider()
|
||||||
|
col_back, col_skip, col_finish = st.columns([1, 1, 3])
|
||||||
|
|
||||||
|
if col_back.button("\u2190 Back", key="int_back"):
|
||||||
|
st.session_state.wizard_step = 6
|
||||||
|
st.rerun()
|
||||||
|
|
||||||
|
if col_skip.button("Skip \u2192"):
|
||||||
|
st.session_state.wizard_step = 8 # trigger Finish
|
||||||
|
st.rerun()
|
||||||
|
|
||||||
|
if col_finish.button("\U0001f389 Finish Setup", type="primary", key="finish_btn"):
|
||||||
|
st.session_state.wizard_step = 8
|
||||||
|
st.rerun()
|
||||||
|
|
||||||
|
|
||||||
|
# ── Finish ─────────────────────────────────────────────────────────────────────
|
||||||
|
elif step >= 8:
|
||||||
|
with st.spinner("Finalising setup\u2026"):
|
||||||
|
from scripts.user_profile import UserProfile
|
||||||
|
from scripts.generate_llm_config import apply_service_urls
|
||||||
|
|
||||||
|
try:
|
||||||
|
profile_obj = UserProfile(USER_YAML)
|
||||||
|
if (CONFIG_DIR / "llm.yaml").exists():
|
||||||
|
apply_service_urls(profile_obj, CONFIG_DIR / "llm.yaml")
|
||||||
|
except Exception:
|
||||||
|
pass # don't block finish on llm.yaml errors
|
||||||
|
|
||||||
|
data = _load_yaml()
|
||||||
|
data["wizard_complete"] = True
|
||||||
|
data.pop("wizard_step", None)
|
||||||
|
USER_YAML.write_text(
|
||||||
|
yaml.dump(data, default_flow_style=False, allow_unicode=True)
|
||||||
|
)
|
||||||
|
|
||||||
|
st.success("\u2705 Setup complete! Loading Peregrine\u2026")
|
||||||
|
st.session_state.clear()
|
||||||
|
st.rerun()
|
||||||
File diff suppressed because it is too large
Load diff
|
|
@ -1,191 +0,0 @@
|
||||||
# app/pages/3_Resume_Editor.py
|
|
||||||
"""
|
|
||||||
Resume Editor — form-based editor for Alex's AIHawk profile YAML.
|
|
||||||
FILL_IN fields highlighted in amber.
|
|
||||||
"""
|
|
||||||
import sys
|
|
||||||
from pathlib import Path
|
|
||||||
sys.path.insert(0, str(Path(__file__).parent.parent.parent))
|
|
||||||
|
|
||||||
import streamlit as st
|
|
||||||
import yaml
|
|
||||||
|
|
||||||
st.set_page_config(page_title="Resume Editor", page_icon="📝", layout="wide")
|
|
||||||
st.title("📝 Resume Editor")
|
|
||||||
st.caption("Edit Alex's application profile used by AIHawk for LinkedIn Easy Apply.")
|
|
||||||
|
|
||||||
RESUME_PATH = Path(__file__).parent.parent.parent / "aihawk" / "data_folder" / "plain_text_resume.yaml"
|
|
||||||
|
|
||||||
if not RESUME_PATH.exists():
|
|
||||||
st.error(f"Resume file not found at `{RESUME_PATH}`. Is AIHawk cloned?")
|
|
||||||
st.stop()
|
|
||||||
|
|
||||||
data = yaml.safe_load(RESUME_PATH.read_text()) or {}
|
|
||||||
|
|
||||||
|
|
||||||
def field(label: str, value: str, key: str, help: str = "", password: bool = False) -> str:
|
|
||||||
"""Render a text input, highlighted amber if value is FILL_IN or empty."""
|
|
||||||
needs_attention = str(value).startswith("FILL_IN") or value == ""
|
|
||||||
if needs_attention:
|
|
||||||
st.markdown(
|
|
||||||
'<p style="color:#F59E0B;font-size:0.8em;margin-bottom:2px">⚠️ Needs your attention</p>',
|
|
||||||
unsafe_allow_html=True,
|
|
||||||
)
|
|
||||||
return st.text_input(label, value=value or "", key=key, help=help,
|
|
||||||
type="password" if password else "default")
|
|
||||||
|
|
||||||
|
|
||||||
st.divider()
|
|
||||||
|
|
||||||
# ── Personal Info ─────────────────────────────────────────────────────────────
|
|
||||||
with st.expander("👤 Personal Information", expanded=True):
|
|
||||||
info = data.get("personal_information", {})
|
|
||||||
col1, col2 = st.columns(2)
|
|
||||||
with col1:
|
|
||||||
name = field("First Name", info.get("name", ""), "pi_name")
|
|
||||||
email = field("Email", info.get("email", ""), "pi_email")
|
|
||||||
phone = field("Phone", info.get("phone", ""), "pi_phone")
|
|
||||||
city = field("City", info.get("city", ""), "pi_city")
|
|
||||||
with col2:
|
|
||||||
surname = field("Last Name", info.get("surname", ""), "pi_surname")
|
|
||||||
linkedin = field("LinkedIn URL", info.get("linkedin", ""), "pi_linkedin")
|
|
||||||
zip_code = field("Zip Code", info.get("zip_code", ""), "pi_zip")
|
|
||||||
dob = field("Date of Birth", info.get("date_of_birth", ""), "pi_dob",
|
|
||||||
help="Format: MM/DD/YYYY")
|
|
||||||
|
|
||||||
# ── Education ─────────────────────────────────────────────────────────────────
|
|
||||||
with st.expander("🎓 Education"):
|
|
||||||
edu_list = data.get("education_details", [{}])
|
|
||||||
updated_edu = []
|
|
||||||
degree_options = ["Bachelor's Degree", "Master's Degree", "Some College",
|
|
||||||
"Associate's Degree", "High School", "Other"]
|
|
||||||
for i, edu in enumerate(edu_list):
|
|
||||||
st.markdown(f"**Entry {i+1}**")
|
|
||||||
col1, col2 = st.columns(2)
|
|
||||||
with col1:
|
|
||||||
inst = field("Institution", edu.get("institution", ""), f"edu_inst_{i}")
|
|
||||||
field_study = st.text_input("Field of Study", edu.get("field_of_study", ""), key=f"edu_field_{i}")
|
|
||||||
start = st.text_input("Start Year", edu.get("start_date", ""), key=f"edu_start_{i}")
|
|
||||||
with col2:
|
|
||||||
current_level = edu.get("education_level", "Some College")
|
|
||||||
level_idx = degree_options.index(current_level) if current_level in degree_options else 2
|
|
||||||
level = st.selectbox("Degree Level", degree_options, index=level_idx, key=f"edu_level_{i}")
|
|
||||||
end = st.text_input("Completion Year", edu.get("year_of_completion", ""), key=f"edu_end_{i}")
|
|
||||||
updated_edu.append({
|
|
||||||
"education_level": level, "institution": inst, "field_of_study": field_study,
|
|
||||||
"start_date": start, "year_of_completion": end, "final_evaluation_grade": "", "exam": {},
|
|
||||||
})
|
|
||||||
st.divider()
|
|
||||||
|
|
||||||
# ── Experience ────────────────────────────────────────────────────────────────
|
|
||||||
with st.expander("💼 Work Experience"):
|
|
||||||
exp_list = data.get("experience_details", [{}])
|
|
||||||
if "exp_count" not in st.session_state:
|
|
||||||
st.session_state.exp_count = len(exp_list)
|
|
||||||
if st.button("+ Add Experience Entry"):
|
|
||||||
st.session_state.exp_count += 1
|
|
||||||
exp_list.append({})
|
|
||||||
|
|
||||||
updated_exp = []
|
|
||||||
for i in range(st.session_state.exp_count):
|
|
||||||
exp = exp_list[i] if i < len(exp_list) else {}
|
|
||||||
st.markdown(f"**Position {i+1}**")
|
|
||||||
col1, col2 = st.columns(2)
|
|
||||||
with col1:
|
|
||||||
pos = field("Job Title", exp.get("position", ""), f"exp_pos_{i}")
|
|
||||||
company = field("Company", exp.get("company", ""), f"exp_co_{i}")
|
|
||||||
period = field("Employment Period", exp.get("employment_period", ""), f"exp_period_{i}",
|
|
||||||
help="e.g. 01/2022 - Present")
|
|
||||||
with col2:
|
|
||||||
location = st.text_input("Location", exp.get("location", ""), key=f"exp_loc_{i}")
|
|
||||||
industry = st.text_input("Industry", exp.get("industry", ""), key=f"exp_ind_{i}")
|
|
||||||
|
|
||||||
responsibilities = st.text_area(
|
|
||||||
"Key Responsibilities (one per line)",
|
|
||||||
value="\n".join(
|
|
||||||
r.get(f"responsibility_{j+1}", "") if isinstance(r, dict) else str(r)
|
|
||||||
for j, r in enumerate(exp.get("key_responsibilities", []))
|
|
||||||
),
|
|
||||||
key=f"exp_resp_{i}", height=100,
|
|
||||||
)
|
|
||||||
skills = st.text_input(
|
|
||||||
"Skills (comma-separated)",
|
|
||||||
value=", ".join(exp.get("skills_acquired", [])),
|
|
||||||
key=f"exp_skills_{i}",
|
|
||||||
)
|
|
||||||
resp_list = [{"responsibility_1": r.strip()} for r in responsibilities.splitlines() if r.strip()]
|
|
||||||
skill_list = [s.strip() for s in skills.split(",") if s.strip()]
|
|
||||||
updated_exp.append({
|
|
||||||
"position": pos, "company": company, "employment_period": period,
|
|
||||||
"location": location, "industry": industry,
|
|
||||||
"key_responsibilities": resp_list, "skills_acquired": skill_list,
|
|
||||||
})
|
|
||||||
st.divider()
|
|
||||||
|
|
||||||
# ── Preferences ───────────────────────────────────────────────────────────────
|
|
||||||
with st.expander("⚙️ Preferences & Availability"):
|
|
||||||
wp = data.get("work_preferences", {})
|
|
||||||
sal = data.get("salary_expectations", {})
|
|
||||||
avail = data.get("availability", {})
|
|
||||||
col1, col2 = st.columns(2)
|
|
||||||
with col1:
|
|
||||||
salary_range = st.text_input("Salary Range (USD)", sal.get("salary_range_usd", ""),
|
|
||||||
key="pref_salary", help="e.g. 120000 - 180000")
|
|
||||||
notice = st.text_input("Notice Period", avail.get("notice_period", "2 weeks"), key="pref_notice")
|
|
||||||
with col2:
|
|
||||||
remote_work = st.checkbox("Open to Remote", value=wp.get("remote_work", "Yes") == "Yes", key="pref_remote")
|
|
||||||
relocation = st.checkbox("Open to Relocation", value=wp.get("open_to_relocation", "No") == "Yes", key="pref_reloc")
|
|
||||||
assessments = st.checkbox("Willing to complete assessments",
|
|
||||||
value=wp.get("willing_to_complete_assessments", "Yes") == "Yes", key="pref_assess")
|
|
||||||
bg_checks = st.checkbox("Willing to undergo background checks",
|
|
||||||
value=wp.get("willing_to_undergo_background_checks", "Yes") == "Yes", key="pref_bg")
|
|
||||||
drug_tests = st.checkbox("Willing to undergo drug tests",
|
|
||||||
value=wp.get("willing_to_undergo_drug_tests", "No") == "Yes", key="pref_drug")
|
|
||||||
|
|
||||||
# ── Self-ID ───────────────────────────────────────────────────────────────────
|
|
||||||
with st.expander("🏳️🌈 Self-Identification (optional)"):
|
|
||||||
sid = data.get("self_identification", {})
|
|
||||||
col1, col2 = st.columns(2)
|
|
||||||
with col1:
|
|
||||||
gender = st.text_input("Gender identity", sid.get("gender", "Non-binary"), key="sid_gender",
|
|
||||||
help="Select 'Non-binary' or 'Prefer not to say' when options allow")
|
|
||||||
pronouns = st.text_input("Pronouns", sid.get("pronouns", "Any"), key="sid_pronouns")
|
|
||||||
ethnicity = field("Ethnicity", sid.get("ethnicity", ""), "sid_ethnicity",
|
|
||||||
help="'Prefer not to say' is always an option")
|
|
||||||
with col2:
|
|
||||||
vet_options = ["No", "Yes", "Prefer not to say"]
|
|
||||||
veteran = st.selectbox("Veteran status", vet_options,
|
|
||||||
index=vet_options.index(sid.get("veteran", "No")), key="sid_vet")
|
|
||||||
dis_options = ["Prefer not to say", "No", "Yes"]
|
|
||||||
disability = st.selectbox("Disability disclosure", dis_options,
|
|
||||||
index=dis_options.index(sid.get("disability", "Prefer not to say")),
|
|
||||||
key="sid_dis")
|
|
||||||
|
|
||||||
st.divider()
|
|
||||||
|
|
||||||
# ── Save ──────────────────────────────────────────────────────────────────────
|
|
||||||
if st.button("💾 Save Resume Profile", type="primary", use_container_width=True):
|
|
||||||
data["personal_information"] = {
|
|
||||||
**data.get("personal_information", {}),
|
|
||||||
"name": name, "surname": surname, "email": email, "phone": phone,
|
|
||||||
"city": city, "zip_code": zip_code, "linkedin": linkedin, "date_of_birth": dob,
|
|
||||||
}
|
|
||||||
data["education_details"] = updated_edu
|
|
||||||
data["experience_details"] = updated_exp
|
|
||||||
data["salary_expectations"] = {"salary_range_usd": salary_range}
|
|
||||||
data["availability"] = {"notice_period": notice}
|
|
||||||
data["work_preferences"] = {
|
|
||||||
**data.get("work_preferences", {}),
|
|
||||||
"remote_work": "Yes" if remote_work else "No",
|
|
||||||
"open_to_relocation": "Yes" if relocation else "No",
|
|
||||||
"willing_to_complete_assessments": "Yes" if assessments else "No",
|
|
||||||
"willing_to_undergo_background_checks": "Yes" if bg_checks else "No",
|
|
||||||
"willing_to_undergo_drug_tests": "Yes" if drug_tests else "No",
|
|
||||||
}
|
|
||||||
data["self_identification"] = {
|
|
||||||
"gender": gender, "pronouns": pronouns, "veteran": veteran,
|
|
||||||
"disability": disability, "ethnicity": ethnicity,
|
|
||||||
}
|
|
||||||
RESUME_PATH.write_text(yaml.dump(data, default_flow_style=False, allow_unicode=True))
|
|
||||||
st.success("✅ Profile saved!")
|
|
||||||
st.balloons()
|
|
||||||
|
|
@ -14,19 +14,28 @@ import streamlit as st
|
||||||
import streamlit.components.v1 as components
|
import streamlit.components.v1 as components
|
||||||
import yaml
|
import yaml
|
||||||
|
|
||||||
|
from scripts.user_profile import UserProfile
|
||||||
|
|
||||||
|
_USER_YAML = Path(__file__).parent.parent.parent / "config" / "user.yaml"
|
||||||
|
_profile = UserProfile(_USER_YAML) if UserProfile.exists(_USER_YAML) else None
|
||||||
|
_name = _profile.name if _profile else "Job Seeker"
|
||||||
|
|
||||||
from scripts.db import (
|
from scripts.db import (
|
||||||
DEFAULT_DB, init_db, get_jobs_by_status,
|
DEFAULT_DB, init_db, get_jobs_by_status,
|
||||||
update_cover_letter, mark_applied, update_job_status,
|
update_cover_letter, mark_applied, update_job_status,
|
||||||
get_task_for_job,
|
get_task_for_job,
|
||||||
)
|
)
|
||||||
from scripts.task_runner import submit_task
|
from scripts.task_runner import submit_task
|
||||||
|
from app.cloud_session import resolve_session, get_db_path
|
||||||
|
from app.telemetry import log_usage_event
|
||||||
|
|
||||||
DOCS_DIR = Path("/Library/Documents/JobSearch")
|
DOCS_DIR = _profile.docs_dir if _profile else Path.home() / "Documents" / "JobSearch"
|
||||||
RESUME_YAML = Path(__file__).parent.parent.parent / "aihawk" / "data_folder" / "plain_text_resume.yaml"
|
RESUME_YAML = Path(__file__).parent.parent.parent / "config" / "plain_text_resume.yaml"
|
||||||
|
|
||||||
st.title("🚀 Apply Workspace")
|
st.title("🚀 Apply Workspace")
|
||||||
|
|
||||||
init_db(DEFAULT_DB)
|
resolve_session("peregrine")
|
||||||
|
init_db(get_db_path())
|
||||||
|
|
||||||
# ── PDF generation ─────────────────────────────────────────────────────────────
|
# ── PDF generation ─────────────────────────────────────────────────────────────
|
||||||
def _make_cover_letter_pdf(job: dict, cover_letter: str, output_dir: Path) -> Path:
|
def _make_cover_letter_pdf(job: dict, cover_letter: str, output_dir: Path) -> Path:
|
||||||
|
|
@ -70,13 +79,16 @@ def _make_cover_letter_pdf(job: dict, cover_letter: str, output_dir: Path) -> Pa
|
||||||
textColor=dark, leading=16, spaceAfter=12, alignment=TA_LEFT,
|
textColor=dark, leading=16, spaceAfter=12, alignment=TA_LEFT,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
display_name = _profile.name.upper() if _profile else "YOUR NAME"
|
||||||
|
contact_line = " · ".join(filter(None, [
|
||||||
|
_profile.email if _profile else "",
|
||||||
|
_profile.phone if _profile else "",
|
||||||
|
_profile.linkedin if _profile else "",
|
||||||
|
]))
|
||||||
|
|
||||||
story = [
|
story = [
|
||||||
Paragraph("ALEX RIVERA", name_style),
|
Paragraph(display_name, name_style),
|
||||||
Paragraph(
|
Paragraph(contact_line, contact_style),
|
||||||
"alex@example.com · (555) 867-5309 · "
|
|
||||||
"linkedin.com/in/AlexMcCann · hirealexmccann.site",
|
|
||||||
contact_style,
|
|
||||||
),
|
|
||||||
HRFlowable(width="100%", thickness=1, color=teal, spaceBefore=8, spaceAfter=0),
|
HRFlowable(width="100%", thickness=1, color=teal, spaceBefore=8, spaceAfter=0),
|
||||||
Paragraph(datetime.now().strftime("%B %d, %Y"), date_style),
|
Paragraph(datetime.now().strftime("%B %d, %Y"), date_style),
|
||||||
]
|
]
|
||||||
|
|
@ -88,7 +100,7 @@ def _make_cover_letter_pdf(job: dict, cover_letter: str, output_dir: Path) -> Pa
|
||||||
|
|
||||||
story += [
|
story += [
|
||||||
Spacer(1, 6),
|
Spacer(1, 6),
|
||||||
Paragraph("Warm regards,<br/><br/>Alex Rivera", body_style),
|
Paragraph(f"Warm regards,<br/><br/>{_profile.name if _profile else 'Your Name'}", body_style),
|
||||||
]
|
]
|
||||||
|
|
||||||
doc.build(story)
|
doc.build(story)
|
||||||
|
|
@ -96,7 +108,7 @@ def _make_cover_letter_pdf(job: dict, cover_letter: str, output_dir: Path) -> Pa
|
||||||
|
|
||||||
# ── Application Q&A helper ─────────────────────────────────────────────────────
|
# ── Application Q&A helper ─────────────────────────────────────────────────────
|
||||||
def _answer_question(job: dict, question: str) -> str:
|
def _answer_question(job: dict, question: str) -> str:
|
||||||
"""Call the LLM to answer an application question in Alex's voice.
|
"""Call the LLM to answer an application question in the user's voice.
|
||||||
|
|
||||||
Uses research_fallback_order (claude_code → vllm → ollama_research)
|
Uses research_fallback_order (claude_code → vllm → ollama_research)
|
||||||
rather than the default cover-letter order — the fine-tuned cover letter
|
rather than the default cover-letter order — the fine-tuned cover letter
|
||||||
|
|
@ -106,21 +118,22 @@ def _answer_question(job: dict, question: str) -> str:
|
||||||
router = LLMRouter()
|
router = LLMRouter()
|
||||||
fallback = router.config.get("research_fallback_order") or router.config.get("fallback_order")
|
fallback = router.config.get("research_fallback_order") or router.config.get("fallback_order")
|
||||||
description_snippet = (job.get("description") or "")[:1200].strip()
|
description_snippet = (job.get("description") or "")[:1200].strip()
|
||||||
prompt = f"""You are answering job application questions for Alex Rivera, a customer success leader.
|
_persona_summary = (
|
||||||
|
_profile.career_summary[:200] if _profile and _profile.career_summary
|
||||||
|
else "a professional with experience in their field"
|
||||||
|
)
|
||||||
|
prompt = f"""You are answering job application questions for {_name}.
|
||||||
|
|
||||||
Background:
|
Background:
|
||||||
- 6+ years in customer success, technical account management, and CS leadership
|
{_persona_summary}
|
||||||
- Most recent role: led Americas Customer Success at UpGuard (cybersecurity SaaS), NPS consistently ≥95
|
|
||||||
- Also founder of M3 Consulting, a CS advisory practice for SaaS startups
|
|
||||||
- Based in SF Bay Area; open to remote/hybrid; pronouns: any
|
|
||||||
|
|
||||||
Role she's applying to: {job.get("title", "")} at {job.get("company", "")}
|
Role they're applying to: {job.get("title", "")} at {job.get("company", "")}
|
||||||
{f"Job description excerpt:{chr(10)}{description_snippet}" if description_snippet else ""}
|
{f"Job description excerpt:{chr(10)}{description_snippet}" if description_snippet else ""}
|
||||||
|
|
||||||
Application Question:
|
Application Question:
|
||||||
{question}
|
{question}
|
||||||
|
|
||||||
Answer in Alex's voice — specific, warm, and confident. If the question specifies a word or character limit, respect it. Answer only the question with no preamble or sign-off."""
|
Answer in {_name}'s voice — specific, warm, and confident. If the question specifies a word or character limit, respect it. Answer only the question with no preamble or sign-off."""
|
||||||
return router.complete(prompt, fallback_order=fallback).strip()
|
return router.complete(prompt, fallback_order=fallback).strip()
|
||||||
|
|
||||||
|
|
||||||
|
|
@ -146,7 +159,7 @@ def _copy_btn(text: str, label: str = "📋 Copy", done: str = "✅ Copied!", he
|
||||||
)
|
)
|
||||||
|
|
||||||
# ── Job selection ──────────────────────────────────────────────────────────────
|
# ── Job selection ──────────────────────────────────────────────────────────────
|
||||||
approved = get_jobs_by_status(DEFAULT_DB, "approved")
|
approved = get_jobs_by_status(get_db_path(), "approved")
|
||||||
if not approved:
|
if not approved:
|
||||||
st.info("No approved jobs — head to Job Review to approve some listings first.")
|
st.info("No approved jobs — head to Job Review to approve some listings first.")
|
||||||
st.stop()
|
st.stop()
|
||||||
|
|
@ -209,17 +222,17 @@ with col_tools:
|
||||||
if _cl_key not in st.session_state:
|
if _cl_key not in st.session_state:
|
||||||
st.session_state[_cl_key] = job.get("cover_letter") or ""
|
st.session_state[_cl_key] = job.get("cover_letter") or ""
|
||||||
|
|
||||||
_cl_task = get_task_for_job(DEFAULT_DB, "cover_letter", selected_id)
|
_cl_task = get_task_for_job(get_db_path(), "cover_letter", selected_id)
|
||||||
_cl_running = _cl_task and _cl_task["status"] in ("queued", "running")
|
_cl_running = _cl_task and _cl_task["status"] in ("queued", "running")
|
||||||
|
|
||||||
if st.button("✨ Generate / Regenerate", use_container_width=True, disabled=bool(_cl_running)):
|
if st.button("✨ Generate / Regenerate", use_container_width=True, disabled=bool(_cl_running)):
|
||||||
submit_task(DEFAULT_DB, "cover_letter", selected_id)
|
submit_task(get_db_path(), "cover_letter", selected_id)
|
||||||
st.rerun()
|
st.rerun()
|
||||||
|
|
||||||
if _cl_running:
|
if _cl_running:
|
||||||
@st.fragment(run_every=3)
|
@st.fragment(run_every=3)
|
||||||
def _cl_status_fragment():
|
def _cl_status_fragment():
|
||||||
t = get_task_for_job(DEFAULT_DB, "cover_letter", selected_id)
|
t = get_task_for_job(get_db_path(), "cover_letter", selected_id)
|
||||||
if t and t["status"] in ("queued", "running"):
|
if t and t["status"] in ("queued", "running"):
|
||||||
lbl = "Queued…" if t["status"] == "queued" else "Generating via LLM…"
|
lbl = "Queued…" if t["status"] == "queued" else "Generating via LLM…"
|
||||||
st.info(f"⏳ {lbl}")
|
st.info(f"⏳ {lbl}")
|
||||||
|
|
@ -245,6 +258,32 @@ with col_tools:
|
||||||
label_visibility="collapsed",
|
label_visibility="collapsed",
|
||||||
)
|
)
|
||||||
|
|
||||||
|
# ── Iterative refinement ──────────────────────
|
||||||
|
if cl_text and not _cl_running:
|
||||||
|
with st.expander("✏️ Refine with Feedback"):
|
||||||
|
st.caption("Describe what to change. The current draft is passed to the LLM as context.")
|
||||||
|
_fb_key = f"fb_{selected_id}"
|
||||||
|
feedback_text = st.text_area(
|
||||||
|
"Feedback",
|
||||||
|
placeholder="e.g. Shorten the second paragraph and add a line about cross-functional leadership.",
|
||||||
|
height=80,
|
||||||
|
key=_fb_key,
|
||||||
|
label_visibility="collapsed",
|
||||||
|
)
|
||||||
|
if st.button("✨ Regenerate with Feedback", use_container_width=True,
|
||||||
|
disabled=not (feedback_text or "").strip(),
|
||||||
|
key=f"cl_refine_{selected_id}"):
|
||||||
|
import json as _json
|
||||||
|
submit_task(
|
||||||
|
get_db_path(), "cover_letter", selected_id,
|
||||||
|
params=_json.dumps({
|
||||||
|
"previous_result": cl_text,
|
||||||
|
"feedback": feedback_text.strip(),
|
||||||
|
}),
|
||||||
|
)
|
||||||
|
st.session_state.pop(_fb_key, None)
|
||||||
|
st.rerun()
|
||||||
|
|
||||||
# Copy + Save row
|
# Copy + Save row
|
||||||
c1, c2 = st.columns(2)
|
c1, c2 = st.columns(2)
|
||||||
with c1:
|
with c1:
|
||||||
|
|
@ -252,7 +291,7 @@ with col_tools:
|
||||||
_copy_btn(cl_text, label="📋 Copy Letter")
|
_copy_btn(cl_text, label="📋 Copy Letter")
|
||||||
with c2:
|
with c2:
|
||||||
if st.button("💾 Save draft", use_container_width=True):
|
if st.button("💾 Save draft", use_container_width=True):
|
||||||
update_cover_letter(DEFAULT_DB, selected_id, cl_text)
|
update_cover_letter(get_db_path(), selected_id, cl_text)
|
||||||
st.success("Saved!")
|
st.success("Saved!")
|
||||||
|
|
||||||
# PDF generation
|
# PDF generation
|
||||||
|
|
@ -261,8 +300,10 @@ with col_tools:
|
||||||
with st.spinner("Generating PDF…"):
|
with st.spinner("Generating PDF…"):
|
||||||
try:
|
try:
|
||||||
pdf_path = _make_cover_letter_pdf(job, cl_text, DOCS_DIR)
|
pdf_path = _make_cover_letter_pdf(job, cl_text, DOCS_DIR)
|
||||||
update_cover_letter(DEFAULT_DB, selected_id, cl_text)
|
update_cover_letter(get_db_path(), selected_id, cl_text)
|
||||||
st.success(f"Saved: `{pdf_path.name}`")
|
st.success(f"Saved: `{pdf_path.name}`")
|
||||||
|
if user_id := st.session_state.get("user_id"):
|
||||||
|
log_usage_event(user_id, "peregrine", "cover_letter_generated")
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
st.error(f"PDF error: {e}")
|
st.error(f"PDF error: {e}")
|
||||||
|
|
||||||
|
|
@ -276,13 +317,15 @@ with col_tools:
|
||||||
with c4:
|
with c4:
|
||||||
if st.button("✅ Mark as Applied", use_container_width=True, type="primary"):
|
if st.button("✅ Mark as Applied", use_container_width=True, type="primary"):
|
||||||
if cl_text:
|
if cl_text:
|
||||||
update_cover_letter(DEFAULT_DB, selected_id, cl_text)
|
update_cover_letter(get_db_path(), selected_id, cl_text)
|
||||||
mark_applied(DEFAULT_DB, [selected_id])
|
mark_applied(get_db_path(), [selected_id])
|
||||||
st.success("Marked as applied!")
|
st.success("Marked as applied!")
|
||||||
|
if user_id := st.session_state.get("user_id"):
|
||||||
|
log_usage_event(user_id, "peregrine", "job_applied")
|
||||||
st.rerun()
|
st.rerun()
|
||||||
|
|
||||||
if st.button("🚫 Reject listing", use_container_width=True):
|
if st.button("🚫 Reject listing", use_container_width=True):
|
||||||
update_job_status(DEFAULT_DB, [selected_id], "rejected")
|
update_job_status(get_db_path(), [selected_id], "rejected")
|
||||||
# Advance selectbox to next job so list doesn't snap to first item
|
# Advance selectbox to next job so list doesn't snap to first item
|
||||||
current_idx = ids.index(selected_id) if selected_id in ids else 0
|
current_idx = ids.index(selected_id) if selected_id in ids else 0
|
||||||
if current_idx + 1 < len(ids):
|
if current_idx + 1 < len(ids):
|
||||||
|
|
@ -346,7 +389,7 @@ with col_tools:
|
||||||
|
|
||||||
st.markdown("---")
|
st.markdown("---")
|
||||||
else:
|
else:
|
||||||
st.warning("Resume YAML not found — check that AIHawk is cloned.")
|
st.warning("Resume profile not found — complete setup or upload a resume in Settings → Resume Profile.")
|
||||||
|
|
||||||
# ── Application Q&A ───────────────────────────────────────────────────────
|
# ── Application Q&A ───────────────────────────────────────────────────────
|
||||||
with st.expander("💬 Answer Application Questions"):
|
with st.expander("💬 Answer Application Questions"):
|
||||||
|
|
|
||||||
|
|
@ -22,6 +22,12 @@ sys.path.insert(0, str(Path(__file__).parent.parent.parent))
|
||||||
|
|
||||||
import streamlit as st
|
import streamlit as st
|
||||||
|
|
||||||
|
from scripts.user_profile import UserProfile
|
||||||
|
|
||||||
|
_USER_YAML = Path(__file__).parent.parent.parent / "config" / "user.yaml"
|
||||||
|
_profile = UserProfile(_USER_YAML) if UserProfile.exists(_USER_YAML) else None
|
||||||
|
_name = _profile.name if _profile else "Job Seeker"
|
||||||
|
|
||||||
from scripts.db import (
|
from scripts.db import (
|
||||||
DEFAULT_DB, init_db,
|
DEFAULT_DB, init_db,
|
||||||
get_interview_jobs, advance_to_stage, reject_at_stage,
|
get_interview_jobs, advance_to_stage, reject_at_stage,
|
||||||
|
|
@ -186,19 +192,21 @@ def _email_modal(job: dict) -> None:
|
||||||
with st.spinner("Drafting…"):
|
with st.spinner("Drafting…"):
|
||||||
try:
|
try:
|
||||||
from scripts.llm_router import complete
|
from scripts.llm_router import complete
|
||||||
|
_persona = (
|
||||||
|
f"{_name} is a {_profile.career_summary[:120] if _profile and _profile.career_summary else 'professional'}"
|
||||||
|
)
|
||||||
draft = complete(
|
draft = complete(
|
||||||
prompt=(
|
prompt=(
|
||||||
f"Draft a professional, warm reply to this email.\n\n"
|
f"Draft a professional, warm reply to this email.\n\n"
|
||||||
f"From: {last.get('from_addr', '')}\n"
|
f"From: {last.get('from_addr', '')}\n"
|
||||||
f"Subject: {last.get('subject', '')}\n\n"
|
f"Subject: {last.get('subject', '')}\n\n"
|
||||||
f"{last.get('body', '')}\n\n"
|
f"{last.get('body', '')}\n\n"
|
||||||
f"Context: Alex Rivera is a Customer Success / "
|
f"Context: {_persona} applying for "
|
||||||
f"Technical Account Manager applying for "
|
|
||||||
f"{job.get('title')} at {job.get('company')}."
|
f"{job.get('title')} at {job.get('company')}."
|
||||||
),
|
),
|
||||||
system=(
|
system=(
|
||||||
"You are Alex Rivera's professional email assistant. "
|
f"You are {_name}'s professional email assistant. "
|
||||||
"Write concise, warm, and professional replies in her voice. "
|
"Write concise, warm, and professional replies in their voice. "
|
||||||
"Keep it to 3–5 sentences unless more is needed."
|
"Keep it to 3–5 sentences unless more is needed."
|
||||||
),
|
),
|
||||||
)
|
)
|
||||||
|
|
|
||||||
|
|
@ -13,6 +13,12 @@ sys.path.insert(0, str(Path(__file__).parent.parent.parent))
|
||||||
|
|
||||||
import streamlit as st
|
import streamlit as st
|
||||||
|
|
||||||
|
from scripts.user_profile import UserProfile
|
||||||
|
|
||||||
|
_USER_YAML = Path(__file__).parent.parent.parent / "config" / "user.yaml"
|
||||||
|
_profile = UserProfile(_USER_YAML) if UserProfile.exists(_USER_YAML) else None
|
||||||
|
_name = _profile.name if _profile else "Job Seeker"
|
||||||
|
|
||||||
from scripts.db import (
|
from scripts.db import (
|
||||||
DEFAULT_DB, init_db,
|
DEFAULT_DB, init_db,
|
||||||
get_interview_jobs, get_contacts, get_research,
|
get_interview_jobs, get_contacts, get_research,
|
||||||
|
|
@ -231,7 +237,7 @@ with col_prep:
|
||||||
system=(
|
system=(
|
||||||
f"You are a recruiter at {job.get('company')} conducting "
|
f"You are a recruiter at {job.get('company')} conducting "
|
||||||
f"a phone screen for the {job.get('title')} role. "
|
f"a phone screen for the {job.get('title')} role. "
|
||||||
f"Ask one question at a time. After Alex answers, give "
|
f"Ask one question at a time. After {_name} answers, give "
|
||||||
f"brief feedback (1–2 sentences), then ask your next question. "
|
f"brief feedback (1–2 sentences), then ask your next question. "
|
||||||
f"Be professional but warm."
|
f"Be professional but warm."
|
||||||
),
|
),
|
||||||
|
|
@ -253,7 +259,7 @@ with col_prep:
|
||||||
"content": (
|
"content": (
|
||||||
f"You are a recruiter at {job.get('company')} conducting "
|
f"You are a recruiter at {job.get('company')} conducting "
|
||||||
f"a phone screen for the {job.get('title')} role. "
|
f"a phone screen for the {job.get('title')} role. "
|
||||||
f"Ask one question at a time. After Alex answers, give "
|
f"Ask one question at a time. After {_name} answers, give "
|
||||||
f"brief feedback (1–2 sentences), then ask your next question."
|
f"brief feedback (1–2 sentences), then ask your next question."
|
||||||
),
|
),
|
||||||
}
|
}
|
||||||
|
|
@ -265,7 +271,7 @@ with col_prep:
|
||||||
router = LLMRouter()
|
router = LLMRouter()
|
||||||
# Build prompt from history for single-turn backends
|
# Build prompt from history for single-turn backends
|
||||||
convo = "\n\n".join(
|
convo = "\n\n".join(
|
||||||
f"{'Interviewer' if m['role'] == 'assistant' else 'Alex'}: {m['content']}"
|
f"{'Interviewer' if m['role'] == 'assistant' else _name}: {m['content']}"
|
||||||
for m in history
|
for m in history
|
||||||
)
|
)
|
||||||
response = router.complete(
|
response = router.complete(
|
||||||
|
|
@ -331,12 +337,12 @@ with col_context:
|
||||||
f"From: {last.get('from_addr', '')}\n"
|
f"From: {last.get('from_addr', '')}\n"
|
||||||
f"Subject: {last.get('subject', '')}\n\n"
|
f"Subject: {last.get('subject', '')}\n\n"
|
||||||
f"{last.get('body', '')}\n\n"
|
f"{last.get('body', '')}\n\n"
|
||||||
f"Context: Alex is a CS/TAM professional applying "
|
f"Context: {_name} is a professional applying "
|
||||||
f"for {job.get('title')} at {job.get('company')}."
|
f"for {job.get('title')} at {job.get('company')}."
|
||||||
),
|
),
|
||||||
system=(
|
system=(
|
||||||
"You are Alex Rivera's professional email assistant. "
|
f"You are {_name}'s professional email assistant. "
|
||||||
"Write concise, warm, and professional replies in her voice."
|
"Write concise, warm, and professional replies in their voice."
|
||||||
),
|
),
|
||||||
)
|
)
|
||||||
st.session_state[f"draft_{selected_id}"] = draft
|
st.session_state[f"draft_{selected_id}"] = draft
|
||||||
|
|
|
||||||
127
app/telemetry.py
Normal file
127
app/telemetry.py
Normal file
|
|
@ -0,0 +1,127 @@
|
||||||
|
# peregrine/app/telemetry.py
|
||||||
|
"""
|
||||||
|
Usage event telemetry for cloud-hosted Peregrine.
|
||||||
|
|
||||||
|
In local-first mode (CLOUD_MODE unset/false), all functions are no-ops —
|
||||||
|
no network calls, no DB writes, no imports of psycopg2.
|
||||||
|
|
||||||
|
In cloud mode, events are written to the platform Postgres DB ONLY after
|
||||||
|
confirming the user's telemetry consent.
|
||||||
|
|
||||||
|
THE HARD RULE: if telemetry_consent.all_disabled is True for a user,
|
||||||
|
nothing is written, no exceptions. This function is the ONLY path to
|
||||||
|
usage_events — no feature may write there directly.
|
||||||
|
"""
|
||||||
|
import os
|
||||||
|
import json
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
CLOUD_MODE: bool = os.environ.get("CLOUD_MODE", "").lower() in ("1", "true", "yes")
|
||||||
|
PLATFORM_DB_URL: str = os.environ.get("PLATFORM_DB_URL", "")
|
||||||
|
|
||||||
|
_platform_conn = None
|
||||||
|
|
||||||
|
|
||||||
|
def get_platform_conn():
|
||||||
|
"""Lazy psycopg2 connection to the platform Postgres DB. Reconnects if closed."""
|
||||||
|
global _platform_conn
|
||||||
|
if _platform_conn is None or _platform_conn.closed:
|
||||||
|
import psycopg2
|
||||||
|
_platform_conn = psycopg2.connect(PLATFORM_DB_URL)
|
||||||
|
return _platform_conn
|
||||||
|
|
||||||
|
|
||||||
|
def get_consent(user_id: str) -> dict:
|
||||||
|
"""
|
||||||
|
Fetch telemetry consent for the user.
|
||||||
|
Returns safe defaults if record doesn't exist yet:
|
||||||
|
- usage_events_enabled: True (new cloud users start opted-in, per onboarding disclosure)
|
||||||
|
- all_disabled: False
|
||||||
|
"""
|
||||||
|
conn = get_platform_conn()
|
||||||
|
with conn.cursor() as cur:
|
||||||
|
cur.execute(
|
||||||
|
"SELECT all_disabled, usage_events_enabled "
|
||||||
|
"FROM telemetry_consent WHERE user_id = %s",
|
||||||
|
(user_id,)
|
||||||
|
)
|
||||||
|
row = cur.fetchone()
|
||||||
|
if row is None:
|
||||||
|
return {"all_disabled": False, "usage_events_enabled": True}
|
||||||
|
return {"all_disabled": row[0], "usage_events_enabled": row[1]}
|
||||||
|
|
||||||
|
|
||||||
|
def log_usage_event(
|
||||||
|
user_id: str,
|
||||||
|
app: str,
|
||||||
|
event_type: str,
|
||||||
|
metadata: dict[str, Any] | None = None,
|
||||||
|
) -> None:
|
||||||
|
"""
|
||||||
|
Write a usage event to the platform DB if consent allows.
|
||||||
|
|
||||||
|
Silent no-op in local mode. Silent no-op if telemetry is disabled.
|
||||||
|
Swallows all exceptions — telemetry must never crash the app.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
user_id: Directus user UUID (from st.session_state["user_id"])
|
||||||
|
app: App slug ('peregrine', 'falcon', etc.)
|
||||||
|
event_type: Snake_case event label ('cover_letter_generated', 'job_applied', etc.)
|
||||||
|
metadata: Optional JSON-serialisable dict — NO PII
|
||||||
|
"""
|
||||||
|
if not CLOUD_MODE:
|
||||||
|
return
|
||||||
|
|
||||||
|
try:
|
||||||
|
consent = get_consent(user_id)
|
||||||
|
if consent.get("all_disabled") or not consent.get("usage_events_enabled", True):
|
||||||
|
return
|
||||||
|
|
||||||
|
conn = get_platform_conn()
|
||||||
|
with conn.cursor() as cur:
|
||||||
|
cur.execute(
|
||||||
|
"INSERT INTO usage_events (user_id, app, event_type, metadata) "
|
||||||
|
"VALUES (%s, %s, %s, %s)",
|
||||||
|
(user_id, app, event_type, json.dumps(metadata) if metadata else None),
|
||||||
|
)
|
||||||
|
conn.commit()
|
||||||
|
except Exception:
|
||||||
|
# Telemetry must never crash the app
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
def update_consent(user_id: str, **fields) -> None:
|
||||||
|
"""
|
||||||
|
UPSERT telemetry consent for a user.
|
||||||
|
|
||||||
|
Accepted keyword args (all optional, any subset may be provided):
|
||||||
|
all_disabled: bool
|
||||||
|
usage_events_enabled: bool
|
||||||
|
content_sharing_enabled: bool
|
||||||
|
support_access_enabled: bool
|
||||||
|
|
||||||
|
Safe to call in cloud mode only — no-op in local mode.
|
||||||
|
Swallows all exceptions so the Settings UI is never broken by a DB hiccup.
|
||||||
|
"""
|
||||||
|
if not CLOUD_MODE:
|
||||||
|
return
|
||||||
|
allowed = {"all_disabled", "usage_events_enabled", "content_sharing_enabled", "support_access_enabled"}
|
||||||
|
cols = {k: v for k, v in fields.items() if k in allowed}
|
||||||
|
if not cols:
|
||||||
|
return
|
||||||
|
try:
|
||||||
|
conn = get_platform_conn()
|
||||||
|
col_names = ", ".join(cols)
|
||||||
|
placeholders = ", ".join(["%s"] * len(cols))
|
||||||
|
set_clause = ", ".join(f"{k} = EXCLUDED.{k}" for k in cols)
|
||||||
|
col_vals = list(cols.values())
|
||||||
|
with conn.cursor() as cur:
|
||||||
|
cur.execute(
|
||||||
|
f"INSERT INTO telemetry_consent (user_id, {col_names}) "
|
||||||
|
f"VALUES (%s, {placeholders}) "
|
||||||
|
f"ON CONFLICT (user_id) DO UPDATE SET {set_clause}, updated_at = NOW()",
|
||||||
|
[user_id] + col_vals,
|
||||||
|
)
|
||||||
|
conn.commit()
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
0
app/wizard/__init__.py
Normal file
0
app/wizard/__init__.py
Normal file
14
app/wizard/step_hardware.py
Normal file
14
app/wizard/step_hardware.py
Normal file
|
|
@ -0,0 +1,14 @@
|
||||||
|
"""Step 1 — Hardware detection and inference profile selection."""
|
||||||
|
|
||||||
|
PROFILES = ["remote", "cpu", "single-gpu", "dual-gpu"]
|
||||||
|
|
||||||
|
|
||||||
|
def validate(data: dict) -> list[str]:
|
||||||
|
"""Return list of validation errors. Empty list = step passes."""
|
||||||
|
errors = []
|
||||||
|
profile = data.get("inference_profile", "")
|
||||||
|
if not profile:
|
||||||
|
errors.append("Inference profile is required.")
|
||||||
|
elif profile not in PROFILES:
|
||||||
|
errors.append(f"Invalid inference profile '{profile}'. Choose: {', '.join(PROFILES)}.")
|
||||||
|
return errors
|
||||||
13
app/wizard/step_identity.py
Normal file
13
app/wizard/step_identity.py
Normal file
|
|
@ -0,0 +1,13 @@
|
||||||
|
"""Step 3 — Identity (name, email, phone, linkedin, career_summary)."""
|
||||||
|
|
||||||
|
|
||||||
|
def validate(data: dict) -> list[str]:
|
||||||
|
"""Return list of validation errors. Empty list = step passes."""
|
||||||
|
errors = []
|
||||||
|
if not (data.get("name") or "").strip():
|
||||||
|
errors.append("Full name is required.")
|
||||||
|
if not (data.get("email") or "").strip():
|
||||||
|
errors.append("Email address is required.")
|
||||||
|
if not (data.get("career_summary") or "").strip():
|
||||||
|
errors.append("Career summary is required.")
|
||||||
|
return errors
|
||||||
9
app/wizard/step_inference.py
Normal file
9
app/wizard/step_inference.py
Normal file
|
|
@ -0,0 +1,9 @@
|
||||||
|
"""Step 5 — LLM inference backend configuration and key entry."""
|
||||||
|
|
||||||
|
|
||||||
|
def validate(data: dict) -> list[str]:
|
||||||
|
"""Return list of validation errors. Empty list = step passes."""
|
||||||
|
errors = []
|
||||||
|
if not data.get("endpoint_confirmed"):
|
||||||
|
errors.append("At least one working LLM endpoint must be confirmed.")
|
||||||
|
return errors
|
||||||
36
app/wizard/step_integrations.py
Normal file
36
app/wizard/step_integrations.py
Normal file
|
|
@ -0,0 +1,36 @@
|
||||||
|
"""Step 7 — Optional integrations (cloud storage, calendars, notifications).
|
||||||
|
|
||||||
|
This step is never mandatory — validate() always returns [].
|
||||||
|
Helper functions support the wizard UI for tier-filtered integration cards.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
|
||||||
|
def validate(data: dict) -> list[str]:
|
||||||
|
"""Integrations step is optional — never blocks Finish."""
|
||||||
|
return []
|
||||||
|
|
||||||
|
|
||||||
|
def get_available(tier: str) -> list[str]:
|
||||||
|
"""Return list of integration names available for the given tier.
|
||||||
|
|
||||||
|
An integration is available if the user's tier meets or exceeds the
|
||||||
|
integration's minimum required tier (as declared by cls.tier).
|
||||||
|
"""
|
||||||
|
from scripts.integrations import REGISTRY
|
||||||
|
from app.wizard.tiers import TIERS
|
||||||
|
|
||||||
|
available = []
|
||||||
|
for name, cls in REGISTRY.items():
|
||||||
|
try:
|
||||||
|
if TIERS.index(tier) >= TIERS.index(cls.tier):
|
||||||
|
available.append(name)
|
||||||
|
except ValueError:
|
||||||
|
pass # unknown tier string — skip
|
||||||
|
return available
|
||||||
|
|
||||||
|
|
||||||
|
def is_connected(name: str, config_dir: Path) -> bool:
|
||||||
|
"""Return True if a live config file exists for this integration."""
|
||||||
|
return (config_dir / "integrations" / f"{name}.yaml").exists()
|
||||||
10
app/wizard/step_resume.py
Normal file
10
app/wizard/step_resume.py
Normal file
|
|
@ -0,0 +1,10 @@
|
||||||
|
"""Step 4 — Resume (upload or guided form builder)."""
|
||||||
|
|
||||||
|
|
||||||
|
def validate(data: dict) -> list[str]:
|
||||||
|
"""Return list of validation errors. Empty list = step passes."""
|
||||||
|
errors = []
|
||||||
|
experience = data.get("experience") or []
|
||||||
|
if not experience:
|
||||||
|
errors.append("At least one work experience entry is required.")
|
||||||
|
return errors
|
||||||
13
app/wizard/step_search.py
Normal file
13
app/wizard/step_search.py
Normal file
|
|
@ -0,0 +1,13 @@
|
||||||
|
"""Step 6 — Job search preferences (titles, locations, boards, keywords)."""
|
||||||
|
|
||||||
|
|
||||||
|
def validate(data: dict) -> list[str]:
|
||||||
|
"""Return list of validation errors. Empty list = step passes."""
|
||||||
|
errors = []
|
||||||
|
titles = data.get("job_titles") or []
|
||||||
|
locations = data.get("locations") or []
|
||||||
|
if not titles:
|
||||||
|
errors.append("At least one job title is required.")
|
||||||
|
if not locations:
|
||||||
|
errors.append("At least one location is required.")
|
||||||
|
return errors
|
||||||
13
app/wizard/step_tier.py
Normal file
13
app/wizard/step_tier.py
Normal file
|
|
@ -0,0 +1,13 @@
|
||||||
|
"""Step 2 — Tier selection (free / paid / premium)."""
|
||||||
|
from app.wizard.tiers import TIERS
|
||||||
|
|
||||||
|
|
||||||
|
def validate(data: dict) -> list[str]:
|
||||||
|
"""Return list of validation errors. Empty list = step passes."""
|
||||||
|
errors = []
|
||||||
|
tier = data.get("tier", "")
|
||||||
|
if not tier:
|
||||||
|
errors.append("Tier selection is required.")
|
||||||
|
elif tier not in TIERS:
|
||||||
|
errors.append(f"Invalid tier '{tier}'. Choose: {', '.join(TIERS)}.")
|
||||||
|
return errors
|
||||||
160
app/wizard/tiers.py
Normal file
160
app/wizard/tiers.py
Normal file
|
|
@ -0,0 +1,160 @@
|
||||||
|
"""
|
||||||
|
Tier definitions and feature gates for Peregrine.
|
||||||
|
|
||||||
|
Tiers: free < paid < premium
|
||||||
|
FEATURES maps feature key → minimum tier required.
|
||||||
|
Features not in FEATURES are available to all tiers (free).
|
||||||
|
|
||||||
|
BYOK policy
|
||||||
|
-----------
|
||||||
|
Features in BYOK_UNLOCKABLE are gated only because CircuitForge would otherwise
|
||||||
|
be providing the LLM compute. When a user has any configured LLM backend (local
|
||||||
|
ollama/vllm or their own API key), those features unlock regardless of tier.
|
||||||
|
Pass has_byok=has_configured_llm() to can_use() at call sites.
|
||||||
|
|
||||||
|
Features that stay gated even with BYOK:
|
||||||
|
- Integrations (Notion sync, calendars, etc.) — infrastructure we run
|
||||||
|
- llm_keywords_blocklist — orchestration pipeline over background keyword data
|
||||||
|
- email_classifier — training pipeline, not a single LLM call
|
||||||
|
- shared_cover_writer_model — our fine-tuned model weights
|
||||||
|
- model_fine_tuning — GPU infrastructure
|
||||||
|
- multi_user — account infrastructure
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
TIERS = ["free", "paid", "premium"]
|
||||||
|
|
||||||
|
# Maps feature key → minimum tier string required.
|
||||||
|
# Features absent from this dict are free (available to all).
|
||||||
|
FEATURES: dict[str, str] = {
|
||||||
|
# Wizard LLM generation — BYOK-unlockable (pure LLM calls)
|
||||||
|
"llm_career_summary": "paid",
|
||||||
|
"llm_expand_bullets": "paid",
|
||||||
|
"llm_suggest_skills": "paid",
|
||||||
|
"llm_voice_guidelines": "premium",
|
||||||
|
"llm_job_titles": "paid",
|
||||||
|
"llm_mission_notes": "paid",
|
||||||
|
|
||||||
|
# Orchestration — stays gated (background data pipeline, not just an LLM call)
|
||||||
|
"llm_keywords_blocklist": "paid",
|
||||||
|
|
||||||
|
# App features — BYOK-unlockable (pure LLM calls over job/profile data)
|
||||||
|
"company_research": "paid",
|
||||||
|
"interview_prep": "paid",
|
||||||
|
"survey_assistant": "paid",
|
||||||
|
|
||||||
|
# Orchestration / infrastructure — stays gated
|
||||||
|
"email_classifier": "paid",
|
||||||
|
"model_fine_tuning": "premium",
|
||||||
|
"shared_cover_writer_model": "paid",
|
||||||
|
"multi_user": "premium",
|
||||||
|
|
||||||
|
# Integrations — stays gated (infrastructure CircuitForge operates)
|
||||||
|
"notion_sync": "paid",
|
||||||
|
"google_sheets_sync": "paid",
|
||||||
|
"airtable_sync": "paid",
|
||||||
|
"google_calendar_sync": "paid",
|
||||||
|
"apple_calendar_sync": "paid",
|
||||||
|
"slack_notifications": "paid",
|
||||||
|
}
|
||||||
|
|
||||||
|
# Features that unlock when the user supplies any LLM backend (local or BYOK).
|
||||||
|
# These are pure LLM-call features — the only reason they're behind a tier is
|
||||||
|
# because CircuitForge would otherwise be providing the compute.
|
||||||
|
BYOK_UNLOCKABLE: frozenset[str] = frozenset({
|
||||||
|
"llm_career_summary",
|
||||||
|
"llm_expand_bullets",
|
||||||
|
"llm_suggest_skills",
|
||||||
|
"llm_voice_guidelines",
|
||||||
|
"llm_job_titles",
|
||||||
|
"llm_mission_notes",
|
||||||
|
"company_research",
|
||||||
|
"interview_prep",
|
||||||
|
"survey_assistant",
|
||||||
|
})
|
||||||
|
|
||||||
|
# Free integrations (not in FEATURES):
|
||||||
|
# google_drive_sync, dropbox_sync, onedrive_sync, mega_sync,
|
||||||
|
# nextcloud_sync, discord_notifications, home_assistant
|
||||||
|
|
||||||
|
_LLM_CFG = Path(__file__).parent.parent.parent / "config" / "llm.yaml"
|
||||||
|
|
||||||
|
|
||||||
|
def has_configured_llm(config_path: Path | None = None) -> bool:
|
||||||
|
"""Return True if at least one non-vision LLM backend is enabled in llm.yaml.
|
||||||
|
|
||||||
|
Local backends (ollama, vllm) count — the policy is "you're providing the
|
||||||
|
compute", whether that's your own hardware or your own API key.
|
||||||
|
"""
|
||||||
|
import yaml
|
||||||
|
path = config_path or _LLM_CFG
|
||||||
|
try:
|
||||||
|
with open(path) as f:
|
||||||
|
cfg = yaml.safe_load(f) or {}
|
||||||
|
return any(
|
||||||
|
b.get("enabled", True) and b.get("type") != "vision_service"
|
||||||
|
for b in cfg.get("backends", {}).values()
|
||||||
|
)
|
||||||
|
except Exception:
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def can_use(tier: str, feature: str, has_byok: bool = False) -> bool:
|
||||||
|
"""Return True if the given tier has access to the feature.
|
||||||
|
|
||||||
|
has_byok: pass has_configured_llm() to unlock BYOK_UNLOCKABLE features
|
||||||
|
for users who supply their own LLM backend regardless of tier.
|
||||||
|
|
||||||
|
Returns True for unknown features (not gated).
|
||||||
|
Returns False for unknown/invalid tier strings.
|
||||||
|
"""
|
||||||
|
required = FEATURES.get(feature)
|
||||||
|
if required is None:
|
||||||
|
return True # not gated — available to all
|
||||||
|
if has_byok and feature in BYOK_UNLOCKABLE:
|
||||||
|
return True
|
||||||
|
try:
|
||||||
|
return TIERS.index(tier) >= TIERS.index(required)
|
||||||
|
except ValueError:
|
||||||
|
return False # invalid tier string
|
||||||
|
|
||||||
|
|
||||||
|
def tier_label(feature: str, has_byok: bool = False) -> str:
|
||||||
|
"""Return a display label for a locked feature, or '' if free/unlocked."""
|
||||||
|
if has_byok and feature in BYOK_UNLOCKABLE:
|
||||||
|
return ""
|
||||||
|
required = FEATURES.get(feature)
|
||||||
|
if required is None:
|
||||||
|
return ""
|
||||||
|
return "🔒 Paid" if required == "paid" else "⭐ Premium"
|
||||||
|
|
||||||
|
|
||||||
|
def effective_tier(
|
||||||
|
profile=None,
|
||||||
|
license_path=None,
|
||||||
|
public_key_path=None,
|
||||||
|
) -> str:
|
||||||
|
"""Return the effective tier for this installation.
|
||||||
|
|
||||||
|
Priority:
|
||||||
|
1. profile.dev_tier_override (developer mode override)
|
||||||
|
2. License JWT verification (offline RS256 check)
|
||||||
|
3. "free" (fallback)
|
||||||
|
|
||||||
|
license_path and public_key_path default to production paths when None.
|
||||||
|
Pass explicit paths in tests to avoid touching real files.
|
||||||
|
"""
|
||||||
|
if profile and getattr(profile, "dev_tier_override", None):
|
||||||
|
return profile.dev_tier_override
|
||||||
|
|
||||||
|
from scripts.license import effective_tier as _license_tier
|
||||||
|
from pathlib import Path as _Path
|
||||||
|
|
||||||
|
kwargs = {}
|
||||||
|
if license_path is not None:
|
||||||
|
kwargs["license_path"] = _Path(license_path)
|
||||||
|
if public_key_path is not None:
|
||||||
|
kwargs["public_key_path"] = _Path(public_key_path)
|
||||||
|
return _license_tier(**kwargs)
|
||||||
57
compose.cloud.yml
Normal file
57
compose.cloud.yml
Normal file
|
|
@ -0,0 +1,57 @@
|
||||||
|
# compose.cloud.yml — Multi-tenant cloud stack for menagerie.circuitforge.tech/peregrine
|
||||||
|
#
|
||||||
|
# Each authenticated user gets their own encrypted SQLite data tree at
|
||||||
|
# /devl/menagerie-data/<user-id>/peregrine/
|
||||||
|
#
|
||||||
|
# Caddy injects the Directus session cookie as X-CF-Session header before forwarding.
|
||||||
|
# cloud_session.py resolves user_id → per-user db_path at session init.
|
||||||
|
#
|
||||||
|
# Usage:
|
||||||
|
# docker compose -f compose.cloud.yml --project-name peregrine-cloud up -d
|
||||||
|
# docker compose -f compose.cloud.yml --project-name peregrine-cloud down
|
||||||
|
# docker compose -f compose.cloud.yml --project-name peregrine-cloud logs app -f
|
||||||
|
|
||||||
|
services:
|
||||||
|
app:
|
||||||
|
build: .
|
||||||
|
container_name: peregrine-cloud
|
||||||
|
ports:
|
||||||
|
- "8505:8501"
|
||||||
|
volumes:
|
||||||
|
- /devl/menagerie-data:/devl/menagerie-data # per-user data trees
|
||||||
|
environment:
|
||||||
|
- CLOUD_MODE=true
|
||||||
|
- CLOUD_DATA_ROOT=/devl/menagerie-data
|
||||||
|
- DIRECTUS_JWT_SECRET=${DIRECTUS_JWT_SECRET}
|
||||||
|
- CF_SERVER_SECRET=${CF_SERVER_SECRET}
|
||||||
|
- PLATFORM_DB_URL=${PLATFORM_DB_URL}
|
||||||
|
- HEIMDALL_URL=${HEIMDALL_URL:-http://cf-license:8000}
|
||||||
|
- HEIMDALL_ADMIN_TOKEN=${HEIMDALL_ADMIN_TOKEN}
|
||||||
|
- STAGING_DB=/devl/menagerie-data/cloud-default.db # fallback only — never used
|
||||||
|
- DOCS_DIR=/tmp/cloud-docs
|
||||||
|
- STREAMLIT_SERVER_BASE_URL_PATH=peregrine
|
||||||
|
- PYTHONUNBUFFERED=1
|
||||||
|
- DEMO_MODE=false
|
||||||
|
depends_on:
|
||||||
|
searxng:
|
||||||
|
condition: service_healthy
|
||||||
|
extra_hosts:
|
||||||
|
- "host.docker.internal:host-gateway"
|
||||||
|
restart: unless-stopped
|
||||||
|
|
||||||
|
searxng:
|
||||||
|
image: searxng/searxng:latest
|
||||||
|
volumes:
|
||||||
|
- ./docker/searxng:/etc/searxng:ro
|
||||||
|
healthcheck:
|
||||||
|
test: ["CMD", "wget", "-q", "--spider", "http://localhost:8080/"]
|
||||||
|
interval: 10s
|
||||||
|
timeout: 5s
|
||||||
|
retries: 3
|
||||||
|
restart: unless-stopped
|
||||||
|
# No host port — internal only
|
||||||
|
|
||||||
|
networks:
|
||||||
|
default:
|
||||||
|
external: true
|
||||||
|
name: caddy-proxy_caddy-internal
|
||||||
52
compose.demo.yml
Normal file
52
compose.demo.yml
Normal file
|
|
@ -0,0 +1,52 @@
|
||||||
|
# compose.demo.yml — Public demo stack for demo.circuitforge.tech/peregrine
|
||||||
|
#
|
||||||
|
# Runs a fully isolated, neutered Peregrine instance:
|
||||||
|
# - DEMO_MODE=true: blocks all LLM inference in llm_router.py
|
||||||
|
# - demo/config/: pre-seeded demo user profile, all backends disabled
|
||||||
|
# - demo/data/: isolated SQLite DB (no personal job data)
|
||||||
|
# - No personal documents mounted
|
||||||
|
# - Port 8504 (separate from the personal instance on 8502)
|
||||||
|
#
|
||||||
|
# Usage:
|
||||||
|
# docker compose -f compose.demo.yml --project-name peregrine-demo up -d
|
||||||
|
# docker compose -f compose.demo.yml --project-name peregrine-demo down
|
||||||
|
#
|
||||||
|
# Caddy demo.circuitforge.tech/peregrine* → host port 8504
|
||||||
|
|
||||||
|
services:
|
||||||
|
|
||||||
|
app:
|
||||||
|
build: .
|
||||||
|
ports:
|
||||||
|
- "8504:8501"
|
||||||
|
volumes:
|
||||||
|
- ./demo/config:/app/config
|
||||||
|
- ./demo/data:/app/data
|
||||||
|
# No /docs mount — demo has no personal documents
|
||||||
|
environment:
|
||||||
|
- DEMO_MODE=true
|
||||||
|
- STAGING_DB=/app/data/staging.db
|
||||||
|
- DOCS_DIR=/tmp/demo-docs
|
||||||
|
- STREAMLIT_SERVER_BASE_URL_PATH=peregrine
|
||||||
|
- PYTHONUNBUFFERED=1
|
||||||
|
- PYTHONLOGGING=WARNING
|
||||||
|
# No API keys — inference is blocked by DEMO_MODE before any key is needed
|
||||||
|
depends_on:
|
||||||
|
searxng:
|
||||||
|
condition: service_healthy
|
||||||
|
extra_hosts:
|
||||||
|
- "host.docker.internal:host-gateway"
|
||||||
|
restart: unless-stopped
|
||||||
|
|
||||||
|
searxng:
|
||||||
|
image: searxng/searxng:latest
|
||||||
|
volumes:
|
||||||
|
- ./docker/searxng:/etc/searxng:ro
|
||||||
|
healthcheck:
|
||||||
|
test: ["CMD", "wget", "-q", "--spider", "http://localhost:8080/"]
|
||||||
|
interval: 10s
|
||||||
|
timeout: 5s
|
||||||
|
retries: 3
|
||||||
|
restart: unless-stopped
|
||||||
|
# No host port published — internal only; demo app uses it for job description enrichment
|
||||||
|
# (non-AI scraping is allowed; only LLM inference is blocked)
|
||||||
55
compose.gpu.yml
Normal file
55
compose.gpu.yml
Normal file
|
|
@ -0,0 +1,55 @@
|
||||||
|
# compose.gpu.yml — Docker NVIDIA GPU overlay
|
||||||
|
#
|
||||||
|
# Adds NVIDIA GPU reservations to Peregrine services.
|
||||||
|
# Applied automatically by `make start PROFILE=single-gpu|dual-gpu` when Docker is detected.
|
||||||
|
# Manual: docker compose -f compose.yml -f compose.gpu.yml --profile single-gpu up -d
|
||||||
|
#
|
||||||
|
# Prerequisites:
|
||||||
|
# sudo nvidia-ctk runtime configure --runtime=docker
|
||||||
|
# sudo systemctl restart docker
|
||||||
|
#
|
||||||
|
services:
|
||||||
|
ollama:
|
||||||
|
deploy:
|
||||||
|
resources:
|
||||||
|
reservations:
|
||||||
|
devices:
|
||||||
|
- driver: nvidia
|
||||||
|
device_ids: ["0"]
|
||||||
|
capabilities: [gpu]
|
||||||
|
|
||||||
|
ollama_research:
|
||||||
|
deploy:
|
||||||
|
resources:
|
||||||
|
reservations:
|
||||||
|
devices:
|
||||||
|
- driver: nvidia
|
||||||
|
device_ids: ["1"]
|
||||||
|
capabilities: [gpu]
|
||||||
|
|
||||||
|
vision:
|
||||||
|
deploy:
|
||||||
|
resources:
|
||||||
|
reservations:
|
||||||
|
devices:
|
||||||
|
- driver: nvidia
|
||||||
|
device_ids: ["0"]
|
||||||
|
capabilities: [gpu]
|
||||||
|
|
||||||
|
vllm:
|
||||||
|
deploy:
|
||||||
|
resources:
|
||||||
|
reservations:
|
||||||
|
devices:
|
||||||
|
- driver: nvidia
|
||||||
|
device_ids: ["1"]
|
||||||
|
capabilities: [gpu]
|
||||||
|
|
||||||
|
finetune:
|
||||||
|
deploy:
|
||||||
|
resources:
|
||||||
|
reservations:
|
||||||
|
devices:
|
||||||
|
- driver: nvidia
|
||||||
|
device_ids: ["0"]
|
||||||
|
capabilities: [gpu]
|
||||||
51
compose.podman-gpu.yml
Normal file
51
compose.podman-gpu.yml
Normal file
|
|
@ -0,0 +1,51 @@
|
||||||
|
# compose.podman-gpu.yml — Podman GPU override
|
||||||
|
#
|
||||||
|
# Replaces Docker-specific `driver: nvidia` reservations with CDI device specs
|
||||||
|
# for rootless Podman. Applied automatically via `make start PROFILE=single-gpu|dual-gpu`
|
||||||
|
# when podman/podman-compose is detected, or manually:
|
||||||
|
# podman-compose -f compose.yml -f compose.podman-gpu.yml --profile single-gpu up -d
|
||||||
|
#
|
||||||
|
# Prerequisites:
|
||||||
|
# sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
|
||||||
|
# (requires nvidia-container-toolkit >= 1.14)
|
||||||
|
#
|
||||||
|
services:
|
||||||
|
ollama:
|
||||||
|
devices:
|
||||||
|
- nvidia.com/gpu=0
|
||||||
|
deploy:
|
||||||
|
resources:
|
||||||
|
reservations:
|
||||||
|
devices: []
|
||||||
|
|
||||||
|
ollama_research:
|
||||||
|
devices:
|
||||||
|
- nvidia.com/gpu=1
|
||||||
|
deploy:
|
||||||
|
resources:
|
||||||
|
reservations:
|
||||||
|
devices: []
|
||||||
|
|
||||||
|
vision:
|
||||||
|
devices:
|
||||||
|
- nvidia.com/gpu=0
|
||||||
|
deploy:
|
||||||
|
resources:
|
||||||
|
reservations:
|
||||||
|
devices: []
|
||||||
|
|
||||||
|
vllm:
|
||||||
|
devices:
|
||||||
|
- nvidia.com/gpu=1
|
||||||
|
deploy:
|
||||||
|
resources:
|
||||||
|
reservations:
|
||||||
|
devices: []
|
||||||
|
|
||||||
|
finetune:
|
||||||
|
devices:
|
||||||
|
- nvidia.com/gpu=0
|
||||||
|
deploy:
|
||||||
|
resources:
|
||||||
|
reservations:
|
||||||
|
devices: []
|
||||||
127
compose.yml
Normal file
127
compose.yml
Normal file
|
|
@ -0,0 +1,127 @@
|
||||||
|
# compose.yml — Peregrine by Circuit Forge LLC
|
||||||
|
# Profiles: remote | cpu | single-gpu | dual-gpu-ollama | dual-gpu-vllm | dual-gpu-mixed
|
||||||
|
services:
|
||||||
|
|
||||||
|
app:
|
||||||
|
build: .
|
||||||
|
command: >
|
||||||
|
bash -c "streamlit run app/app.py
|
||||||
|
--server.port=8501
|
||||||
|
--server.headless=true
|
||||||
|
--server.fileWatcherType=none
|
||||||
|
2>&1 | tee /app/data/.streamlit.log"
|
||||||
|
ports:
|
||||||
|
- "${STREAMLIT_PORT:-8501}:8501"
|
||||||
|
volumes:
|
||||||
|
- ./config:/app/config
|
||||||
|
- ./data:/app/data
|
||||||
|
- ${DOCS_DIR:-~/Documents/JobSearch}:/docs
|
||||||
|
- /var/run/docker.sock:/var/run/docker.sock
|
||||||
|
- /usr/bin/docker:/usr/bin/docker:ro
|
||||||
|
environment:
|
||||||
|
- STAGING_DB=/app/data/staging.db
|
||||||
|
- DOCS_DIR=/docs
|
||||||
|
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY:-}
|
||||||
|
- OPENAI_COMPAT_URL=${OPENAI_COMPAT_URL:-}
|
||||||
|
- OPENAI_COMPAT_KEY=${OPENAI_COMPAT_KEY:-}
|
||||||
|
- PEREGRINE_GPU_COUNT=${PEREGRINE_GPU_COUNT:-0}
|
||||||
|
- PEREGRINE_GPU_NAMES=${PEREGRINE_GPU_NAMES:-}
|
||||||
|
- RECOMMENDED_PROFILE=${RECOMMENDED_PROFILE:-remote}
|
||||||
|
- STREAMLIT_SERVER_BASE_URL_PATH=${STREAMLIT_BASE_URL_PATH:-}
|
||||||
|
- FORGEJO_API_TOKEN=${FORGEJO_API_TOKEN:-}
|
||||||
|
- FORGEJO_REPO=${FORGEJO_REPO:-}
|
||||||
|
- FORGEJO_API_URL=${FORGEJO_API_URL:-}
|
||||||
|
- PYTHONUNBUFFERED=1
|
||||||
|
- PYTHONLOGGING=WARNING
|
||||||
|
depends_on:
|
||||||
|
searxng:
|
||||||
|
condition: service_healthy
|
||||||
|
extra_hosts:
|
||||||
|
- "host.docker.internal:host-gateway"
|
||||||
|
restart: unless-stopped
|
||||||
|
|
||||||
|
searxng:
|
||||||
|
image: searxng/searxng:latest
|
||||||
|
ports:
|
||||||
|
- "${SEARXNG_PORT:-8888}:8080"
|
||||||
|
volumes:
|
||||||
|
- ./docker/searxng:/etc/searxng:ro
|
||||||
|
healthcheck:
|
||||||
|
test: ["CMD", "wget", "-q", "--spider", "http://localhost:8080/"]
|
||||||
|
interval: 10s
|
||||||
|
timeout: 5s
|
||||||
|
retries: 3
|
||||||
|
restart: unless-stopped
|
||||||
|
|
||||||
|
ollama:
|
||||||
|
image: ollama/ollama:latest
|
||||||
|
ports:
|
||||||
|
- "${OLLAMA_PORT:-11434}:11434"
|
||||||
|
volumes:
|
||||||
|
- ${OLLAMA_MODELS_DIR:-~/models/ollama}:/root/.ollama
|
||||||
|
- ./docker/ollama/entrypoint.sh:/entrypoint.sh
|
||||||
|
environment:
|
||||||
|
- OLLAMA_MODELS=/root/.ollama
|
||||||
|
- DEFAULT_OLLAMA_MODEL=${OLLAMA_DEFAULT_MODEL:-llama3.2:3b}
|
||||||
|
entrypoint: ["/bin/bash", "/entrypoint.sh"]
|
||||||
|
profiles: [cpu, single-gpu, dual-gpu-ollama, dual-gpu-vllm, dual-gpu-mixed]
|
||||||
|
restart: unless-stopped
|
||||||
|
|
||||||
|
ollama_research:
|
||||||
|
image: ollama/ollama:latest
|
||||||
|
ports:
|
||||||
|
- "${OLLAMA_RESEARCH_PORT:-11435}:11434"
|
||||||
|
volumes:
|
||||||
|
- ${OLLAMA_MODELS_DIR:-~/models/ollama}:/root/.ollama
|
||||||
|
- ./docker/ollama/entrypoint.sh:/entrypoint.sh
|
||||||
|
environment:
|
||||||
|
- OLLAMA_MODELS=/root/.ollama
|
||||||
|
- DEFAULT_OLLAMA_MODEL=${OLLAMA_RESEARCH_MODEL:-llama3.2:3b}
|
||||||
|
entrypoint: ["/bin/bash", "/entrypoint.sh"]
|
||||||
|
profiles: [dual-gpu-ollama, dual-gpu-mixed]
|
||||||
|
restart: unless-stopped
|
||||||
|
|
||||||
|
vision:
|
||||||
|
build:
|
||||||
|
context: .
|
||||||
|
dockerfile: scripts/vision_service/Dockerfile
|
||||||
|
ports:
|
||||||
|
- "${VISION_PORT:-8002}:8002"
|
||||||
|
environment:
|
||||||
|
- VISION_MODEL=${VISION_MODEL:-vikhyatk/moondream2}
|
||||||
|
- VISION_REVISION=${VISION_REVISION:-2025-01-09}
|
||||||
|
profiles: [single-gpu, dual-gpu-ollama, dual-gpu-vllm, dual-gpu-mixed]
|
||||||
|
restart: unless-stopped
|
||||||
|
|
||||||
|
vllm:
|
||||||
|
image: vllm/vllm-openai:latest
|
||||||
|
ports:
|
||||||
|
- "${VLLM_PORT:-8000}:8000"
|
||||||
|
volumes:
|
||||||
|
- ${VLLM_MODELS_DIR:-~/models/vllm}:/models
|
||||||
|
command: >
|
||||||
|
--model /models/${VLLM_MODEL:-Ouro-1.4B}
|
||||||
|
--trust-remote-code
|
||||||
|
--max-model-len 4096
|
||||||
|
--gpu-memory-utilization 0.75
|
||||||
|
--enforce-eager
|
||||||
|
--max-num-seqs 8
|
||||||
|
--cpu-offload-gb ${CPU_OFFLOAD_GB:-0}
|
||||||
|
profiles: [dual-gpu-vllm, dual-gpu-mixed]
|
||||||
|
restart: unless-stopped
|
||||||
|
|
||||||
|
finetune:
|
||||||
|
build:
|
||||||
|
context: .
|
||||||
|
dockerfile: Dockerfile.finetune
|
||||||
|
volumes:
|
||||||
|
- ${DOCS_DIR:-~/Documents/JobSearch}:/docs
|
||||||
|
- ${OLLAMA_MODELS_DIR:-~/models/ollama}:/ollama-models
|
||||||
|
- ./config:/app/config
|
||||||
|
environment:
|
||||||
|
- DOCS_DIR=/docs
|
||||||
|
- OLLAMA_URL=http://ollama:11434
|
||||||
|
- OLLAMA_MODELS_MOUNT=/ollama-models
|
||||||
|
- OLLAMA_MODELS_OLLAMA_PATH=/root/.ollama
|
||||||
|
profiles: [finetune]
|
||||||
|
restart: "no"
|
||||||
|
|
@ -3,7 +3,8 @@
|
||||||
|
|
||||||
# Company name blocklist — partial case-insensitive match on the company field.
|
# Company name blocklist — partial case-insensitive match on the company field.
|
||||||
# e.g. "Amazon" blocks any listing where company contains "amazon".
|
# e.g. "Amazon" blocks any listing where company contains "amazon".
|
||||||
companies: []
|
companies:
|
||||||
|
- jobgether
|
||||||
|
|
||||||
# Industry/content blocklist — blocked if company name OR job description contains any keyword.
|
# Industry/content blocklist — blocked if company name OR job description contains any keyword.
|
||||||
# Use this for industries you will never work in regardless of company.
|
# Use this for industries you will never work in regardless of company.
|
||||||
|
|
|
||||||
3
config/integrations/airtable.yaml.example
Normal file
3
config/integrations/airtable.yaml.example
Normal file
|
|
@ -0,0 +1,3 @@
|
||||||
|
api_key: "patXXX..."
|
||||||
|
base_id: "appXXX..."
|
||||||
|
table_name: "Jobs"
|
||||||
4
config/integrations/apple_calendar.yaml.example
Normal file
4
config/integrations/apple_calendar.yaml.example
Normal file
|
|
@ -0,0 +1,4 @@
|
||||||
|
caldav_url: "https://caldav.icloud.com/"
|
||||||
|
username: "you@icloud.com"
|
||||||
|
app_password: "xxxx-xxxx-xxxx-xxxx"
|
||||||
|
calendar_name: "Interviews"
|
||||||
1
config/integrations/discord.yaml.example
Normal file
1
config/integrations/discord.yaml.example
Normal file
|
|
@ -0,0 +1 @@
|
||||||
|
webhook_url: "https://discord.com/api/webhooks/..."
|
||||||
2
config/integrations/dropbox.yaml.example
Normal file
2
config/integrations/dropbox.yaml.example
Normal file
|
|
@ -0,0 +1,2 @@
|
||||||
|
access_token: "sl...."
|
||||||
|
folder_path: "/Peregrine"
|
||||||
2
config/integrations/google_calendar.yaml.example
Normal file
2
config/integrations/google_calendar.yaml.example
Normal file
|
|
@ -0,0 +1,2 @@
|
||||||
|
calendar_id: "primary"
|
||||||
|
credentials_json: "~/credentials/google-calendar-sa.json"
|
||||||
2
config/integrations/google_drive.yaml.example
Normal file
2
config/integrations/google_drive.yaml.example
Normal file
|
|
@ -0,0 +1,2 @@
|
||||||
|
folder_id: "your-google-drive-folder-id"
|
||||||
|
credentials_json: "~/credentials/google-drive-sa.json"
|
||||||
3
config/integrations/google_sheets.yaml.example
Normal file
3
config/integrations/google_sheets.yaml.example
Normal file
|
|
@ -0,0 +1,3 @@
|
||||||
|
spreadsheet_id: "your-spreadsheet-id"
|
||||||
|
sheet_name: "Jobs"
|
||||||
|
credentials_json: "~/credentials/google-sheets-sa.json"
|
||||||
3
config/integrations/home_assistant.yaml.example
Normal file
3
config/integrations/home_assistant.yaml.example
Normal file
|
|
@ -0,0 +1,3 @@
|
||||||
|
base_url: "http://homeassistant.local:8123"
|
||||||
|
token: "eyJ0eXAiOiJKV1Qi..."
|
||||||
|
notification_service: "notify.mobile_app_my_phone"
|
||||||
3
config/integrations/mega.yaml.example
Normal file
3
config/integrations/mega.yaml.example
Normal file
|
|
@ -0,0 +1,3 @@
|
||||||
|
email: "you@example.com"
|
||||||
|
password: "your-mega-password"
|
||||||
|
folder_path: "/Peregrine"
|
||||||
4
config/integrations/nextcloud.yaml.example
Normal file
4
config/integrations/nextcloud.yaml.example
Normal file
|
|
@ -0,0 +1,4 @@
|
||||||
|
host: "https://nextcloud.example.com"
|
||||||
|
username: "your-username"
|
||||||
|
password: "your-app-password"
|
||||||
|
folder_path: "/Peregrine"
|
||||||
2
config/integrations/notion.yaml.example
Normal file
2
config/integrations/notion.yaml.example
Normal file
|
|
@ -0,0 +1,2 @@
|
||||||
|
token: "secret_..."
|
||||||
|
database_id: "32-character-notion-db-id"
|
||||||
3
config/integrations/onedrive.yaml.example
Normal file
3
config/integrations/onedrive.yaml.example
Normal file
|
|
@ -0,0 +1,3 @@
|
||||||
|
client_id: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
|
||||||
|
client_secret: "your-client-secret"
|
||||||
|
folder_path: "/Peregrine"
|
||||||
2
config/integrations/slack.yaml.example
Normal file
2
config/integrations/slack.yaml.example
Normal file
|
|
@ -0,0 +1,2 @@
|
||||||
|
webhook_url: "https://hooks.slack.com/services/..."
|
||||||
|
channel: "#job-alerts"
|
||||||
|
|
@ -3,48 +3,55 @@ backends:
|
||||||
api_key_env: ANTHROPIC_API_KEY
|
api_key_env: ANTHROPIC_API_KEY
|
||||||
enabled: false
|
enabled: false
|
||||||
model: claude-sonnet-4-6
|
model: claude-sonnet-4-6
|
||||||
type: anthropic
|
|
||||||
supports_images: true
|
supports_images: true
|
||||||
|
type: anthropic
|
||||||
claude_code:
|
claude_code:
|
||||||
api_key: any
|
api_key: any
|
||||||
base_url: http://localhost:3009/v1
|
base_url: http://localhost:3009/v1
|
||||||
enabled: false
|
enabled: false
|
||||||
model: claude-code-terminal
|
model: claude-code-terminal
|
||||||
type: openai_compat
|
|
||||||
supports_images: true
|
supports_images: true
|
||||||
|
type: openai_compat
|
||||||
github_copilot:
|
github_copilot:
|
||||||
api_key: any
|
api_key: any
|
||||||
base_url: http://localhost:3010/v1
|
base_url: http://localhost:3010/v1
|
||||||
enabled: false
|
enabled: false
|
||||||
model: gpt-4o
|
model: gpt-4o
|
||||||
type: openai_compat
|
|
||||||
supports_images: false
|
supports_images: false
|
||||||
|
type: openai_compat
|
||||||
ollama:
|
ollama:
|
||||||
api_key: ollama
|
api_key: ollama
|
||||||
base_url: http://localhost:11434/v1
|
base_url: http://host.docker.internal:11434/v1
|
||||||
enabled: true
|
enabled: true
|
||||||
model: alex-cover-writer:latest
|
model: llama3.2:3b
|
||||||
type: openai_compat
|
|
||||||
supports_images: false
|
supports_images: false
|
||||||
|
type: openai_compat
|
||||||
ollama_research:
|
ollama_research:
|
||||||
api_key: ollama
|
api_key: ollama
|
||||||
base_url: http://localhost:11434/v1
|
base_url: http://host.docker.internal:11434/v1
|
||||||
enabled: true
|
enabled: true
|
||||||
model: llama3.1:8b
|
model: llama3.2:3b
|
||||||
type: openai_compat
|
|
||||||
supports_images: false
|
supports_images: false
|
||||||
|
type: openai_compat
|
||||||
|
vision_service:
|
||||||
|
base_url: http://host.docker.internal:8002
|
||||||
|
enabled: true
|
||||||
|
supports_images: true
|
||||||
|
type: vision_service
|
||||||
vllm:
|
vllm:
|
||||||
api_key: ''
|
api_key: ''
|
||||||
base_url: http://localhost:8000/v1
|
base_url: http://host.docker.internal:8000/v1
|
||||||
enabled: true
|
enabled: true
|
||||||
model: __auto__
|
model: __auto__
|
||||||
type: openai_compat
|
|
||||||
supports_images: false
|
supports_images: false
|
||||||
vision_service:
|
type: openai_compat
|
||||||
base_url: http://localhost:8002
|
vllm_research:
|
||||||
enabled: false
|
api_key: ''
|
||||||
type: vision_service
|
base_url: http://host.docker.internal:8000/v1
|
||||||
supports_images: true
|
enabled: true
|
||||||
|
model: __auto__
|
||||||
|
supports_images: false
|
||||||
|
type: openai_compat
|
||||||
fallback_order:
|
fallback_order:
|
||||||
- ollama
|
- ollama
|
||||||
- claude_code
|
- claude_code
|
||||||
|
|
@ -53,7 +60,7 @@ fallback_order:
|
||||||
- anthropic
|
- anthropic
|
||||||
research_fallback_order:
|
research_fallback_order:
|
||||||
- claude_code
|
- claude_code
|
||||||
- vllm
|
- vllm_research
|
||||||
- ollama_research
|
- ollama_research
|
||||||
- github_copilot
|
- github_copilot
|
||||||
- anthropic
|
- anthropic
|
||||||
|
|
@ -61,6 +68,3 @@ vision_fallback_order:
|
||||||
- vision_service
|
- vision_service
|
||||||
- claude_code
|
- claude_code
|
||||||
- anthropic
|
- anthropic
|
||||||
# Note: 'ollama' (alex-cover-writer) intentionally excluded — research
|
|
||||||
# must never use the fine-tuned writer model, and this also avoids evicting
|
|
||||||
# the writer from GPU memory while a cover letter task is in flight.
|
|
||||||
|
|
|
||||||
|
|
@ -21,21 +21,21 @@ backends:
|
||||||
supports_images: false
|
supports_images: false
|
||||||
ollama:
|
ollama:
|
||||||
api_key: ollama
|
api_key: ollama
|
||||||
base_url: http://localhost:11434/v1
|
base_url: http://ollama:11434/v1 # Docker service name; use localhost:11434 outside Docker
|
||||||
enabled: true
|
enabled: true
|
||||||
model: alex-cover-writer:latest
|
model: llama3.2:3b
|
||||||
type: openai_compat
|
type: openai_compat
|
||||||
supports_images: false
|
supports_images: false
|
||||||
ollama_research:
|
ollama_research:
|
||||||
api_key: ollama
|
api_key: ollama
|
||||||
base_url: http://localhost:11434/v1
|
base_url: http://ollama:11434/v1 # Docker service name; use localhost:11434 outside Docker
|
||||||
enabled: true
|
enabled: true
|
||||||
model: llama3.1:8b
|
model: llama3.2:3b
|
||||||
type: openai_compat
|
type: openai_compat
|
||||||
supports_images: false
|
supports_images: false
|
||||||
vllm:
|
vllm:
|
||||||
api_key: ''
|
api_key: ''
|
||||||
base_url: http://localhost:8000/v1
|
base_url: http://vllm:8000/v1 # Docker service name; use localhost:8000 outside Docker
|
||||||
enabled: true
|
enabled: true
|
||||||
model: __auto__
|
model: __auto__
|
||||||
type: openai_compat
|
type: openai_compat
|
||||||
|
|
@ -64,3 +64,14 @@ vision_fallback_order:
|
||||||
# Note: 'ollama' (alex-cover-writer) intentionally excluded — research
|
# Note: 'ollama' (alex-cover-writer) intentionally excluded — research
|
||||||
# must never use the fine-tuned writer model, and this also avoids evicting
|
# must never use the fine-tuned writer model, and this also avoids evicting
|
||||||
# the writer from GPU memory while a cover letter task is in flight.
|
# the writer from GPU memory while a cover letter task is in flight.
|
||||||
|
|
||||||
|
# ── Scheduler — LLM batch queue optimizer ─────────────────────────────────────
|
||||||
|
# The scheduler batches LLM tasks by model type to avoid GPU model switching.
|
||||||
|
# VRAM budgets are conservative peak estimates (GB) for each task type.
|
||||||
|
# Increase if your models are larger; decrease if tasks share GPU memory well.
|
||||||
|
scheduler:
|
||||||
|
vram_budgets:
|
||||||
|
cover_letter: 2.5 # alex-cover-writer:latest (~2GB GGUF + headroom)
|
||||||
|
company_research: 5.0 # llama3.1:8b or vllm model
|
||||||
|
wizard_generate: 2.5 # same model family as cover_letter
|
||||||
|
max_queue_depth: 500 # max pending tasks per type before drops (with logged warning)
|
||||||
|
|
|
||||||
|
|
@ -1,4 +1,15 @@
|
||||||
profiles:
|
profiles:
|
||||||
|
- boards:
|
||||||
|
- linkedin
|
||||||
|
- indeed
|
||||||
|
- glassdoor
|
||||||
|
- zip_recruiter
|
||||||
|
job_titles:
|
||||||
|
- Customer Service Specialist
|
||||||
|
locations:
|
||||||
|
- San Francisco CA
|
||||||
|
name: default
|
||||||
|
remote_only: false
|
||||||
- boards:
|
- boards:
|
||||||
- linkedin
|
- linkedin
|
||||||
- indeed
|
- indeed
|
||||||
|
|
|
||||||
14
config/server.yaml.example
Normal file
14
config/server.yaml.example
Normal file
|
|
@ -0,0 +1,14 @@
|
||||||
|
# config/server.yaml — Peregrine deployment / server settings
|
||||||
|
# Copy to config/server.yaml and edit. Gitignored — do not commit.
|
||||||
|
# Changes require restarting Peregrine to take effect (./manage.sh restart).
|
||||||
|
|
||||||
|
# base_url_path: URL prefix when serving Peregrine behind a reverse proxy.
|
||||||
|
# Leave empty ("") for direct access (http://localhost:8502).
|
||||||
|
# Set to "peregrine" when proxied at https://example.com/peregrine.
|
||||||
|
# Maps to STREAMLIT_BASE_URL_PATH in .env → STREAMLIT_SERVER_BASE_URL_PATH
|
||||||
|
# in the container. See: https://docs.streamlit.io/develop/api-reference/configuration/config.toml#server
|
||||||
|
base_url_path: ""
|
||||||
|
|
||||||
|
# server_port: Port Streamlit listens on inside the container (usually 8501).
|
||||||
|
# The external/host port is set via STREAMLIT_PORT in .env.
|
||||||
|
server_port: 8501
|
||||||
193
config/skills_suggestions.yaml
Normal file
193
config/skills_suggestions.yaml
Normal file
|
|
@ -0,0 +1,193 @@
|
||||||
|
# skills_suggestions.yaml — Bundled tag suggestions for the Skills & Keywords UI.
|
||||||
|
# Shown as searchable options in the multiselect. Users can add custom tags beyond these.
|
||||||
|
# Future: community aggregate (paid tier) will supplement this list from anonymised installs.
|
||||||
|
|
||||||
|
skills:
|
||||||
|
# ── Customer Success & Account Management ──
|
||||||
|
- Customer Success
|
||||||
|
- Technical Account Management
|
||||||
|
- Account Management
|
||||||
|
- Customer Onboarding
|
||||||
|
- Renewal Management
|
||||||
|
- Churn Prevention
|
||||||
|
- Expansion Revenue
|
||||||
|
- Executive Relationship Management
|
||||||
|
- Escalation Management
|
||||||
|
- QBR Facilitation
|
||||||
|
- Customer Advocacy
|
||||||
|
- Voice of the Customer
|
||||||
|
- Customer Health Scoring
|
||||||
|
- Success Planning
|
||||||
|
- Customer Education
|
||||||
|
- Implementation Management
|
||||||
|
# ── Revenue & Operations ──
|
||||||
|
- Revenue Operations
|
||||||
|
- Sales Operations
|
||||||
|
- Pipeline Management
|
||||||
|
- Forecasting
|
||||||
|
- Contract Negotiation
|
||||||
|
- Upsell & Cross-sell
|
||||||
|
- ARR / MRR Management
|
||||||
|
- NRR Optimization
|
||||||
|
- Quota Attainment
|
||||||
|
# ── Leadership & Management ──
|
||||||
|
- Team Leadership
|
||||||
|
- People Management
|
||||||
|
- Cross-functional Collaboration
|
||||||
|
- Change Management
|
||||||
|
- Stakeholder Management
|
||||||
|
- Executive Presentation
|
||||||
|
- Strategic Planning
|
||||||
|
- OKR Setting
|
||||||
|
- Hiring & Recruiting
|
||||||
|
- Coaching & Mentoring
|
||||||
|
- Performance Management
|
||||||
|
# ── Project & Program Management ──
|
||||||
|
- Project Management
|
||||||
|
- Program Management
|
||||||
|
- Agile / Scrum
|
||||||
|
- Kanban
|
||||||
|
- Risk Management
|
||||||
|
- Resource Planning
|
||||||
|
- Process Improvement
|
||||||
|
- SOP Development
|
||||||
|
# ── Technical Skills ──
|
||||||
|
- SQL
|
||||||
|
- Python
|
||||||
|
- Data Analysis
|
||||||
|
- Tableau
|
||||||
|
- Looker
|
||||||
|
- Power BI
|
||||||
|
- Excel / Google Sheets
|
||||||
|
- REST APIs
|
||||||
|
- Salesforce
|
||||||
|
- HubSpot
|
||||||
|
- Gainsight
|
||||||
|
- Totango
|
||||||
|
- ChurnZero
|
||||||
|
- Zendesk
|
||||||
|
- Intercom
|
||||||
|
- Jira
|
||||||
|
- Confluence
|
||||||
|
- Notion
|
||||||
|
- Slack
|
||||||
|
- Zoom
|
||||||
|
# ── Communications & Writing ──
|
||||||
|
- Executive Communication
|
||||||
|
- Technical Writing
|
||||||
|
- Proposal Writing
|
||||||
|
- Presentation Skills
|
||||||
|
- Public Speaking
|
||||||
|
- Stakeholder Communication
|
||||||
|
# ── Compliance & Security ──
|
||||||
|
- Compliance
|
||||||
|
- Risk Assessment
|
||||||
|
- SOC 2
|
||||||
|
- ISO 27001
|
||||||
|
- GDPR
|
||||||
|
- Security Awareness
|
||||||
|
- Vendor Management
|
||||||
|
|
||||||
|
domains:
|
||||||
|
# ── Software & Tech ──
|
||||||
|
- B2B SaaS
|
||||||
|
- Enterprise Software
|
||||||
|
- Cloud Infrastructure
|
||||||
|
- Developer Tools
|
||||||
|
- Cybersecurity
|
||||||
|
- Data & Analytics
|
||||||
|
- AI / ML Platform
|
||||||
|
- FinTech
|
||||||
|
- InsurTech
|
||||||
|
- LegalTech
|
||||||
|
- HR Tech
|
||||||
|
- MarTech
|
||||||
|
- AdTech
|
||||||
|
- DevOps / Platform Engineering
|
||||||
|
- Open Source
|
||||||
|
# ── Industry Verticals ──
|
||||||
|
- Healthcare / HealthTech
|
||||||
|
- Education / EdTech
|
||||||
|
- Non-profit / Social Impact
|
||||||
|
- Government / GovTech
|
||||||
|
- E-commerce / Retail
|
||||||
|
- Manufacturing
|
||||||
|
- Financial Services
|
||||||
|
- Media & Entertainment
|
||||||
|
- Music Industry
|
||||||
|
- Logistics & Supply Chain
|
||||||
|
- Real Estate / PropTech
|
||||||
|
- Energy / CleanTech
|
||||||
|
- Hospitality & Travel
|
||||||
|
# ── Market Segments ──
|
||||||
|
- Enterprise
|
||||||
|
- Mid-Market
|
||||||
|
- SMB / SME
|
||||||
|
- Startup
|
||||||
|
- Fortune 500
|
||||||
|
- Public Sector
|
||||||
|
- International / Global
|
||||||
|
# ── Business Models ──
|
||||||
|
- Subscription / SaaS
|
||||||
|
- Marketplace
|
||||||
|
- Usage-based Pricing
|
||||||
|
- Professional Services
|
||||||
|
- Self-serve / PLG
|
||||||
|
|
||||||
|
keywords:
|
||||||
|
# ── CS Metrics & Outcomes ──
|
||||||
|
- NPS
|
||||||
|
- CSAT
|
||||||
|
- CES
|
||||||
|
- Churn Rate
|
||||||
|
- Net Revenue Retention
|
||||||
|
- Gross Revenue Retention
|
||||||
|
- Logo Retention
|
||||||
|
- Time-to-Value
|
||||||
|
- Product Adoption
|
||||||
|
- Feature Utilisation
|
||||||
|
- Health Score
|
||||||
|
- Customer Lifetime Value
|
||||||
|
# ── Sales & Growth ──
|
||||||
|
- ARR
|
||||||
|
- MRR
|
||||||
|
- GRR
|
||||||
|
- NRR
|
||||||
|
- Expansion ARR
|
||||||
|
- Pipeline Coverage
|
||||||
|
- Win Rate
|
||||||
|
- Average Contract Value
|
||||||
|
- Land & Expand
|
||||||
|
- Multi-threading
|
||||||
|
# ── Process & Delivery ──
|
||||||
|
- Onboarding
|
||||||
|
- Implementation
|
||||||
|
- Knowledge Transfer
|
||||||
|
- Escalation
|
||||||
|
- SLA
|
||||||
|
- Root Cause Analysis
|
||||||
|
- Post-mortem
|
||||||
|
- Runbook
|
||||||
|
- Playbook Development
|
||||||
|
- Feedback Loop
|
||||||
|
- Product Roadmap Input
|
||||||
|
# ── Team & Culture ──
|
||||||
|
- Cross-functional
|
||||||
|
- Distributed Team
|
||||||
|
- Remote-first
|
||||||
|
- High-growth
|
||||||
|
- Fast-paced
|
||||||
|
- Autonomous
|
||||||
|
- Data-driven
|
||||||
|
- Customer-centric
|
||||||
|
- Empathetic Leadership
|
||||||
|
- Inclusive Culture
|
||||||
|
# ── Job-seeker Keywords ──
|
||||||
|
- Strategic
|
||||||
|
- Proactive
|
||||||
|
- Hands-on
|
||||||
|
- Scalable Processes
|
||||||
|
- Operational Excellence
|
||||||
|
- Business Impact
|
||||||
|
- Executive Visibility
|
||||||
|
- Player-Coach
|
||||||
66
config/user.yaml.example
Normal file
66
config/user.yaml.example
Normal file
|
|
@ -0,0 +1,66 @@
|
||||||
|
# config/user.yaml.example
|
||||||
|
# Copy to config/user.yaml and fill in your details.
|
||||||
|
# The first-run wizard will create this file automatically.
|
||||||
|
|
||||||
|
name: "Your Name"
|
||||||
|
email: "you@example.com"
|
||||||
|
phone: "555-000-0000"
|
||||||
|
linkedin: "linkedin.com/in/yourprofile"
|
||||||
|
career_summary: >
|
||||||
|
Experienced professional with X years in [your field].
|
||||||
|
Specialise in [key skills]. Known for [strength].
|
||||||
|
|
||||||
|
nda_companies: [] # e.g. ["FormerEmployer"] — masked in research briefs
|
||||||
|
|
||||||
|
# Optional: industries you genuinely care about.
|
||||||
|
# When a company/JD matches an industry, the cover letter prompt injects
|
||||||
|
# your personal note so Para 3 can reflect authentic alignment.
|
||||||
|
# Leave a value empty ("") to use a sensible generic default.
|
||||||
|
mission_preferences:
|
||||||
|
music: "" # e.g. "I've played in bands for 15 years and care deeply about how artists get paid"
|
||||||
|
animal_welfare: "" # e.g. "I volunteer at my local shelter every weekend"
|
||||||
|
education: "" # e.g. "I tutored underserved kids for 3 years and care deeply about literacy"
|
||||||
|
social_impact: "" # e.g. "I want my work to reach people who need help most"
|
||||||
|
health: "" # e.g. "I care about people navigating rare or poorly-understood health conditions"
|
||||||
|
# Note: if left empty, Para 3 defaults to focusing on the people the company
|
||||||
|
# serves — not the industry. Fill in for a more personal connection.
|
||||||
|
|
||||||
|
# Optional: how you write and communicate. Used to shape cover letter voice.
|
||||||
|
# e.g. "Warm and direct. Cares about people first. Finds rare and complex situations fascinating."
|
||||||
|
candidate_voice: ""
|
||||||
|
|
||||||
|
# Set to true to include optional identity-related sections in research briefs.
|
||||||
|
# Both are for your personal decision-making only — never included in applications.
|
||||||
|
|
||||||
|
# Adds a disability inclusion & accessibility section (ADA, ERGs, WCAG signals).
|
||||||
|
candidate_accessibility_focus: false
|
||||||
|
|
||||||
|
# Adds an LGBTQIA+ inclusion section (ERGs, non-discrimination policies, culture signals).
|
||||||
|
candidate_lgbtq_focus: false
|
||||||
|
|
||||||
|
tier: free # free | paid | premium
|
||||||
|
dev_tier_override: null # overrides tier locally (for testing only)
|
||||||
|
wizard_complete: false
|
||||||
|
wizard_step: 0
|
||||||
|
dismissed_banners: []
|
||||||
|
|
||||||
|
docs_dir: "~/Documents/JobSearch"
|
||||||
|
ollama_models_dir: "~/models/ollama"
|
||||||
|
vllm_models_dir: "~/models/vllm"
|
||||||
|
|
||||||
|
inference_profile: "remote" # remote | cpu | single-gpu | dual-gpu
|
||||||
|
|
||||||
|
services:
|
||||||
|
streamlit_port: 8501
|
||||||
|
ollama_host: ollama # Docker service name; use "localhost" if running outside Docker
|
||||||
|
ollama_port: 11434
|
||||||
|
ollama_ssl: false
|
||||||
|
ollama_ssl_verify: true
|
||||||
|
vllm_host: vllm # Docker service name; use "localhost" if running outside Docker
|
||||||
|
vllm_port: 8000
|
||||||
|
vllm_ssl: false
|
||||||
|
vllm_ssl_verify: true
|
||||||
|
searxng_host: searxng # Docker service name; use "localhost" if running outside Docker
|
||||||
|
searxng_port: 8080 # internal Docker port; use 8888 for host-mapped access
|
||||||
|
searxng_ssl: false
|
||||||
|
searxng_ssl_verify: true
|
||||||
8
data/email_score.jsonl.example
Normal file
8
data/email_score.jsonl.example
Normal file
|
|
@ -0,0 +1,8 @@
|
||||||
|
{"subject": "Interview Invitation — Senior Engineer", "body": "Hi Alex, we'd love to schedule a 30-min phone screen. Are you available Thursday at 2pm? Please reply to confirm.", "label": "interview_scheduled"}
|
||||||
|
{"subject": "Your application to Acme Corp", "body": "Thank you for your interest in the Senior Engineer role. After careful consideration, we have decided to move forward with other candidates whose experience more closely matches our current needs.", "label": "rejected"}
|
||||||
|
{"subject": "Offer Letter — Product Manager at Initech", "body": "Dear Alex, we are thrilled to extend an offer of employment for the Product Manager position. Please find the attached offer letter outlining compensation and start date.", "label": "offer_received"}
|
||||||
|
{"subject": "Quick question about your background", "body": "Hi Alex, I came across your profile and would love to connect. We have a few roles that seem like a great match. Would you be open to a brief chat this week?", "label": "positive_response"}
|
||||||
|
{"subject": "Company Culture Survey — Acme Corp", "body": "Alex, as part of our evaluation process, we invite all candidates to complete our culture fit assessment. The survey takes approximately 15 minutes. Please click the link below.", "label": "survey_received"}
|
||||||
|
{"subject": "Application Received — DataCo", "body": "Thank you for submitting your application for the Data Engineer role at DataCo. We have received your materials and will be in touch if your qualifications match our needs.", "label": "neutral"}
|
||||||
|
{"subject": "Following up on your application", "body": "Hi Alex, I wanted to follow up on your recent application. Your background looks interesting and we'd like to learn more. Can we set up a quick call?", "label": "positive_response"}
|
||||||
|
{"subject": "We're moving forward with other candidates", "body": "Dear Alex, thank you for taking the time to interview with us. After thoughtful consideration, we have decided not to move forward with your candidacy at this time.", "label": "rejected"}
|
||||||
68
demo/config/llm.yaml
Normal file
68
demo/config/llm.yaml
Normal file
|
|
@ -0,0 +1,68 @@
|
||||||
|
# Demo LLM config — all backends disabled.
|
||||||
|
# DEMO_MODE=true in the environment blocks the router before any backend is tried,
|
||||||
|
# so these values are never actually used. Kept for schema completeness.
|
||||||
|
backends:
|
||||||
|
anthropic:
|
||||||
|
api_key_env: ANTHROPIC_API_KEY
|
||||||
|
enabled: false
|
||||||
|
model: claude-sonnet-4-6
|
||||||
|
supports_images: true
|
||||||
|
type: anthropic
|
||||||
|
claude_code:
|
||||||
|
api_key: any
|
||||||
|
base_url: http://localhost:3009/v1
|
||||||
|
enabled: false
|
||||||
|
model: claude-code-terminal
|
||||||
|
supports_images: true
|
||||||
|
type: openai_compat
|
||||||
|
github_copilot:
|
||||||
|
api_key: any
|
||||||
|
base_url: http://localhost:3010/v1
|
||||||
|
enabled: false
|
||||||
|
model: gpt-4o
|
||||||
|
supports_images: false
|
||||||
|
type: openai_compat
|
||||||
|
ollama:
|
||||||
|
api_key: ollama
|
||||||
|
base_url: http://localhost:11434/v1
|
||||||
|
enabled: false
|
||||||
|
model: llama3.2:3b
|
||||||
|
supports_images: false
|
||||||
|
type: openai_compat
|
||||||
|
ollama_research:
|
||||||
|
api_key: ollama
|
||||||
|
base_url: http://localhost:11434/v1
|
||||||
|
enabled: false
|
||||||
|
model: llama3.2:3b
|
||||||
|
supports_images: false
|
||||||
|
type: openai_compat
|
||||||
|
vision_service:
|
||||||
|
base_url: http://localhost:8002
|
||||||
|
enabled: false
|
||||||
|
supports_images: true
|
||||||
|
type: vision_service
|
||||||
|
vllm:
|
||||||
|
api_key: ''
|
||||||
|
base_url: http://localhost:8000/v1
|
||||||
|
enabled: false
|
||||||
|
model: __auto__
|
||||||
|
supports_images: false
|
||||||
|
type: openai_compat
|
||||||
|
vllm_research:
|
||||||
|
api_key: ''
|
||||||
|
base_url: http://localhost:8000/v1
|
||||||
|
enabled: false
|
||||||
|
model: __auto__
|
||||||
|
supports_images: false
|
||||||
|
type: openai_compat
|
||||||
|
fallback_order:
|
||||||
|
- ollama
|
||||||
|
- vllm
|
||||||
|
- anthropic
|
||||||
|
research_fallback_order:
|
||||||
|
- vllm_research
|
||||||
|
- ollama_research
|
||||||
|
- anthropic
|
||||||
|
vision_fallback_order:
|
||||||
|
- vision_service
|
||||||
|
- anthropic
|
||||||
44
demo/config/user.yaml
Normal file
44
demo/config/user.yaml
Normal file
|
|
@ -0,0 +1,44 @@
|
||||||
|
candidate_accessibility_focus: false
|
||||||
|
candidate_lgbtq_focus: false
|
||||||
|
candidate_voice: Clear, direct, and human. Focuses on impact over jargon.
|
||||||
|
career_summary: 'Experienced software engineer with a background in full-stack development,
|
||||||
|
cloud infrastructure, and data pipelines. Passionate about building tools that help
|
||||||
|
people navigate complex systems.
|
||||||
|
|
||||||
|
'
|
||||||
|
dev_tier_override: null
|
||||||
|
dismissed_banners:
|
||||||
|
- connect_cloud
|
||||||
|
- setup_email
|
||||||
|
docs_dir: /docs
|
||||||
|
email: demo@circuitforge.tech
|
||||||
|
inference_profile: remote
|
||||||
|
linkedin: ''
|
||||||
|
mission_preferences:
|
||||||
|
animal_welfare: ''
|
||||||
|
education: ''
|
||||||
|
health: ''
|
||||||
|
music: ''
|
||||||
|
social_impact: Want my work to reach people who need it most.
|
||||||
|
name: Demo User
|
||||||
|
nda_companies: []
|
||||||
|
ollama_models_dir: ~/models/ollama
|
||||||
|
phone: ''
|
||||||
|
services:
|
||||||
|
ollama_host: localhost
|
||||||
|
ollama_port: 11434
|
||||||
|
ollama_ssl: false
|
||||||
|
ollama_ssl_verify: true
|
||||||
|
searxng_host: searxng
|
||||||
|
searxng_port: 8080
|
||||||
|
searxng_ssl: false
|
||||||
|
searxng_ssl_verify: true
|
||||||
|
streamlit_port: 8501
|
||||||
|
vllm_host: localhost
|
||||||
|
vllm_port: 8000
|
||||||
|
vllm_ssl: false
|
||||||
|
vllm_ssl_verify: true
|
||||||
|
tier: free
|
||||||
|
vllm_models_dir: ~/models/vllm
|
||||||
|
wizard_complete: true
|
||||||
|
wizard_step: 0
|
||||||
0
demo/data/.gitkeep
Normal file
0
demo/data/.gitkeep
Normal file
10
docker/ollama/entrypoint.sh
Executable file
10
docker/ollama/entrypoint.sh
Executable file
|
|
@ -0,0 +1,10 @@
|
||||||
|
#!/usr/bin/env bash
|
||||||
|
# Start Ollama server and pull a default model if none are present
|
||||||
|
ollama serve &
|
||||||
|
sleep 5
|
||||||
|
if [ -z "$(ollama list 2>/dev/null | tail -n +2)" ]; then
|
||||||
|
MODEL="${DEFAULT_OLLAMA_MODEL:-llama3.2:3b}"
|
||||||
|
echo "No models found — pulling $MODEL..."
|
||||||
|
ollama pull "$MODEL"
|
||||||
|
fi
|
||||||
|
wait
|
||||||
8
docker/searxng/settings.yml
Normal file
8
docker/searxng/settings.yml
Normal file
|
|
@ -0,0 +1,8 @@
|
||||||
|
use_default_settings: true
|
||||||
|
search:
|
||||||
|
formats:
|
||||||
|
- html
|
||||||
|
- json
|
||||||
|
server:
|
||||||
|
secret_key: "change-me-in-production"
|
||||||
|
bind_address: "0.0.0.0:8080"
|
||||||
0
docs/.gitkeep
Normal file
0
docs/.gitkeep
Normal file
197
docs/backlog.md
Normal file
197
docs/backlog.md
Normal file
|
|
@ -0,0 +1,197 @@
|
||||||
|
# Peregrine — Feature Backlog
|
||||||
|
|
||||||
|
Unscheduled ideas and deferred features. Roughly grouped by area.
|
||||||
|
|
||||||
|
See also: `circuitforge-plans/shared/2026-03-07-launch-checklist.md` for pre-launch blockers
|
||||||
|
(legal docs, Stripe live keys, website deployment, demo DB ownership fix).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Launch Blockers (tracked in shared launch checklist)
|
||||||
|
|
||||||
|
- **ToS + Refund Policy** — required before live Stripe charges. Files go in `website/content/legal/`.
|
||||||
|
- **Stripe live key rotation** — swap test keys to live in `website/.env` (zero code changes).
|
||||||
|
- **Website deployment to bastion** — Caddy route for Nuxt frontend at `circuitforge.tech`.
|
||||||
|
- **Demo DB ownership** — `demo/data/staging.db` is root-owned (Docker artifact); fix with `sudo chown alan:alan` then re-run `demo/seed_demo.py`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Post-Launch / Infrastructure
|
||||||
|
|
||||||
|
- **Accessibility Statement** — WCAG 2.1 conformance doc at `website/content/legal/accessibility.md`. High credibility value for ND audience.
|
||||||
|
- **Data deletion request process** — published procedure at `website/content/legal/data-deletion.md` (GDPR/CCPA; references `privacy@circuitforge.tech`).
|
||||||
|
- **Uptime Kuma monitors** — 6 monitors need to be added manually (website, Heimdall, demo, Directus, Forgejo, Peregrine container health).
|
||||||
|
- **Directus admin password rotation** — change from `changeme-set-via-ui-on-first-run` before website goes public.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Discovery — Community Scraper Plugin System
|
||||||
|
|
||||||
|
Design doc: `circuitforge-plans/peregrine/2026-03-07-community-scraper-plugin-design.md`
|
||||||
|
|
||||||
|
**Summary:** Add a `scripts/plugins/` directory with auto-discovery and a documented MIT-licensed
|
||||||
|
plugin API. Separates CF-built custom scrapers (paid, BSL 1.1, in `scripts/custom_boards/`) from
|
||||||
|
community-contributed and CF-freebie scrapers (free, MIT, in `scripts/plugins/`).
|
||||||
|
|
||||||
|
**Implementation tasks:**
|
||||||
|
- [ ] Add `scripts/plugins/` with `__init__.py`, `README.md`, and `example_plugin.py`
|
||||||
|
- [ ] Add `config/plugins/` directory with `.gitkeep`; gitignore `config/plugins/*.yaml` (not `.example`)
|
||||||
|
- [ ] Update `discover.py`: `load_plugins()` auto-discovery + tier gate (`custom_boards` = paid, `plugins` = free)
|
||||||
|
- [ ] Update `search_profiles.yaml` schema: add `plugins:` list + `plugin_config:` block
|
||||||
|
- [ ] Migrate `scripts/custom_boards/craigslist.py` → `scripts/plugins/craigslist.py` (CF freebie)
|
||||||
|
- [ ] Settings UI: render `CONFIG_SCHEMA` fields for installed plugins (Settings → Search)
|
||||||
|
- [ ] Rewrite `docs/developer-guide/adding-scrapers.md` to document the plugin API
|
||||||
|
- [ ] Add `scripts/plugins/LICENSE` (MIT) to make the dual-license split explicit
|
||||||
|
|
||||||
|
**CF freebie candidates** (future, after plugin system ships):
|
||||||
|
- Dice.com (tech-focused, no API key)
|
||||||
|
- We Work Remotely (remote-only, clean HTML)
|
||||||
|
- Wellfound / AngelList (startup roles)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Discovery — Jobgether Non-Headless Scraper
|
||||||
|
|
||||||
|
Design doc: `peregrine/docs/superpowers/specs/2026-03-15-jobgether-integration-design.md`
|
||||||
|
|
||||||
|
**Background:** Headless Playwright is blocked by Cloudflare Turnstile on all `jobgether.com` pages.
|
||||||
|
A non-headless Playwright instance backed by `Xvfb` (virtual framebuffer) renders as a real browser and
|
||||||
|
bypasses Turnstile. Heimdall already has Xvfb available.
|
||||||
|
|
||||||
|
**Live-inspection findings (2026-03-15):**
|
||||||
|
- Search URL: `https://jobgether.com/search-offers?keyword=<query>`
|
||||||
|
- Job cards: `div.new-opportunity` — one per listing
|
||||||
|
- Card URL: `div.new-opportunity > a[href*="/offer/"]` (`href` attr)
|
||||||
|
- Title: `#offer-body h3`
|
||||||
|
- Company: `#offer-body p.font-medium`
|
||||||
|
- Dedup: existing URL-based dedup in `discover.py` covers Jobgether↔other-board overlap
|
||||||
|
|
||||||
|
**Implementation tasks (blocked until Xvfb-Playwright integration is in place):**
|
||||||
|
- [ ] Add `Xvfb` launch helper to `scripts/custom_boards/` (shared util, or inline in scraper)
|
||||||
|
- [ ] Implement `scripts/custom_boards/jobgether.py` using `p.chromium.launch(headless=False)` with `DISPLAY=:99`
|
||||||
|
- [ ] Pre-launch `Xvfb :99 -screen 0 1280x720x24` (or assert `DISPLAY` is already set)
|
||||||
|
- [ ] Register `jobgether` in `discover.py` `CUSTOM_SCRAPERS` (currently omitted — no viable scraper)
|
||||||
|
- [ ] Add `jobgether` to `custom_boards` in remote-eligible profiles in `config/search_profiles.yaml`
|
||||||
|
- [ ] Remove or update the "Jobgether discovery scraper — decided against" note in the design spec
|
||||||
|
|
||||||
|
**Pre-condition:** Validate Xvfb approach manually (headless=False + `DISPLAY=:99`) before implementing.
|
||||||
|
The `filter-api.jobgether.com` endpoint still requires auth and `robots.txt` still blocks bots —
|
||||||
|
confirm Turnstile acceptance is the only remaining blocker before beginning.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Settings / Data Management
|
||||||
|
|
||||||
|
- **Backup / Restore / Teleport** — Settings panel option to export a full config snapshot (user.yaml + all gitignored configs) as a zip, restore from a snapshot, and "teleport" (export + import to a new machine or Docker volume). Useful for migrations, multi-machine setups, and safe wizard testing.
|
||||||
|
- **Complete Google Drive integration test()** — `scripts/integrations/google_drive.py` `test()` currently only checks that the credentials file exists (TODO comment). Implement actual Google Drive API call using `google-api-python-client` to verify the token works.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## First-Run Wizard
|
||||||
|
|
||||||
|
- **Wire real LLM test in Step 5 (Inference)** — `app/wizard/step_inference.py` validates an `endpoint_confirmed` boolean flag only. Replace with an actual LLM call: submit a minimal prompt to the configured endpoint, show pass/fail, and only set `endpoint_confirmed: true` on success. Should test whichever backend the user selected (Ollama, vLLM, Anthropic, etc.).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## LinkedIn Import
|
||||||
|
|
||||||
|
Shipped in v0.4.0. Ongoing maintenance and known decisions:
|
||||||
|
|
||||||
|
- **Selector maintenance** — LinkedIn changes their DOM periodically. When import stops working, update
|
||||||
|
CSS selectors in `scripts/linkedin_utils.py` only (all other files import from there). Real `data-section`
|
||||||
|
attribute values (as of 2025 DOM): `summary`, `currentPositionsDetails`, `educationsDetails`,
|
||||||
|
`certifications`, `posts`, `volunteering`, `publications`, `projects`.
|
||||||
|
|
||||||
|
- **Data export zip is the recommended path for full history** — LinkedIn's unauthenticated public profile
|
||||||
|
page is server-side degraded: experience titles, past roles, education, and skills are blurred/omitted.
|
||||||
|
Only available without login: name, About summary (truncated), current employer name, certifications.
|
||||||
|
The "Import from LinkedIn data export zip" expander (Settings → Resume Profile and Wizard step 3) is the
|
||||||
|
correct path for full career history. UI already shows an `ℹ️` callout explaining this.
|
||||||
|
|
||||||
|
- **LinkedIn OAuth — decided: not viable** — LinkedIn's OAuth API is restricted to approved partner
|
||||||
|
programs. Even if approved, it only grants name + email (not career history, experience, or skills).
|
||||||
|
This is a deliberate LinkedIn platform restriction, not a technical gap. Do not pursue this path.
|
||||||
|
|
||||||
|
- **Selector test harness** (future) — A lightweight test that fetches a known-public LinkedIn profile
|
||||||
|
and asserts at least N fields non-empty would catch DOM breakage before users report it. Low priority
|
||||||
|
until selector breakage becomes a recurring support issue.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Cover Letter / Resume Generation
|
||||||
|
|
||||||
|
- ~~**Iterative refinement feedback loop**~~ — ✅ Done (`94225c9`): `generate()` accepts `previous_result`/`feedback`; task_runner parses params JSON; Apply Workspace has "Refine with Feedback" expander. Same pattern available for wizard `expand_bullets` via `_run_wizard_generate`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Apply / Browser Integration
|
||||||
|
|
||||||
|
- **Browser autofill extension** — Chrome/Firefox extension that reads job application forms and auto-fills from the user's profile + generated cover letter; syncs submitted applications back into the pipeline automatically. (Phase 2 paid+ feature per business plan.)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Ultra Tier — Managed Applications (White-Glove Service)
|
||||||
|
|
||||||
|
- **Concept** — A human-in-the-loop concierge tier where a trained operator submits applications on the user's behalf, powered by AI-generated artifacts (cover letter, company research, survey responses). AI handles ~80% of the work; operator handles form submission, CAPTCHAs, and complex custom questions.
|
||||||
|
- **Pricing model** — Per-application or bundle pricing rather than flat "X apps/month" — application complexity varies too much for flat pricing to be sustainable.
|
||||||
|
- **Operator interface** — Thin admin UI (separate from user-facing app) that reads from the same `staging.db`: shows candidate profile, job listing, generated cover letter, company brief, and a "Mark submitted" button. New job status `queued_for_operator` to represent the handoff.
|
||||||
|
- **Key unlock** — Browser autofill extension (above) becomes the operator's primary tool; pre-fills forms from profile + cover letter, operator reviews and submits.
|
||||||
|
- **Tier addition** — Add `"ultra"` to `TIERS` in `app/wizard/tiers.py`; gate `"managed_applications"` feature. The existing tier system is designed to accommodate this cleanly.
|
||||||
|
- **Quality / trust** — Each submission requires explicit per-job user approval before operator acts. Full audit trail (who submitted, when, what was sent). Clear ToS around representation.
|
||||||
|
- **Bootstrap strategy** — Waitlist + small trusted operator team initially to validate workflow before scaling or automating further. Don't build operator tooling until the manual flow is proven.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Container Runtime
|
||||||
|
|
||||||
|
- ~~**Podman support**~~ — ✅ Done: `Makefile` auto-detects `docker compose` / `podman compose` / `podman-compose`; `compose.podman-gpu.yml` CDI override for GPU profiles; `setup.sh` detects existing Podman and skips Docker install.
|
||||||
|
- **FastAPI migration path** — When concurrent-user scale demands it: port Streamlit pages to FastAPI + React/HTMX, keep `scripts/` layer unchanged, replace daemon threads with Celery + Redis. The `scripts/` separation already makes this clean.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Email Sync
|
||||||
|
|
||||||
|
See also: `docs/plans/email-sync-testing-checklist.md` for outstanding test coverage items.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Circuit Forge LLC — Product Expansion ("Heinous Tasks" Platform)
|
||||||
|
|
||||||
|
The core insight: the Peregrine pipeline architecture (monitor → AI assist → human approval → execute) is domain-agnostic. Job searching is the proof-of-concept. The same pattern applies to any task that is high-stakes, repetitive, opaque, or just deeply unpleasant.
|
||||||
|
|
||||||
|
Each product ships as a **separate app** sharing the same underlying scaffold (pipeline engine, LLM router, background tasks, wizard, tier system, operator interface for Ultra tier). The business is Circuit Forge LLC; the brand positioning is: *"AI for the tasks you hate most."*
|
||||||
|
|
||||||
|
### Candidate products (rough priority order)
|
||||||
|
|
||||||
|
- **Falcon** — Government form assistance. Benefits applications, disability claims, FAFSA, immigration forms, small business permits. AI pre-fills from user profile, flags ambiguous questions, generates supporting statements. High value: mistakes here are costly and correction is slow.
|
||||||
|
|
||||||
|
- **Osprey** — Customer service queue management. Monitors hold queues, auto-navigates IVR trees via speech synthesis, escalates to human agent at the right moment, drafts complaint letters and dispute emails with the right tone and regulatory citations (CFPB, FCC, etc.). Tracks ticket status across cases.
|
||||||
|
|
||||||
|
- **Kestrel** — DMV / government appointment booking. Monitors appointment availability for DMV, passport offices, Social Security offices, USCIS biometrics, etc. Auto-books the moment a slot opens. Sends reminders with checklist of required documents.
|
||||||
|
|
||||||
|
- **Harrier** — Insurance navigation. Prior authorization tracking, claim dispute drafting, EOB reconciliation, appeal letters. High willingness-to-pay: a denied $50k claim is worth paying to fight.
|
||||||
|
|
||||||
|
- **Merlin** — Rental / housing applications. Monitors listings, auto-applies to matching properties, generates cover letters for competitive rental markets, tracks responses, flags lease red flags.
|
||||||
|
|
||||||
|
- **Ibis** — Healthcare coordination. The sacred ibis was the symbol of Thoth, Egyptian god of medicine — the name carries genuine medical heritage. Referral tracking, specialist waitlist monitoring, prescription renewal reminders, medical record request management, prior auth paper trails.
|
||||||
|
|
||||||
|
- **Tern** — Travel planning. The Arctic tern makes the longest migration of any animal (44,000 miles/year, pole to pole) — the ultimate traveler. Flight/hotel monitoring, itinerary generation, visa requirement research, travel insurance comparison, rebooking assistance on disruption.
|
||||||
|
|
||||||
|
- **Wren** — Contractor engagement. Wrens are legendary nest-builders — meticulous, structural, persistent. Contractor discovery, quote comparison, scope-of-work generation, milestone tracking, dispute documentation, lien waiver management.
|
||||||
|
|
||||||
|
- **Martin** — Car / home maintenance. The house martin nests on the exterior of buildings and returns to the same site every year to maintain it — almost too on-the-nose. Service scheduling, maintenance history tracking, recall monitoring, warranty tracking, finding trusted local providers.
|
||||||
|
|
||||||
|
### Shared architecture decisions
|
||||||
|
|
||||||
|
- **Separate repos, shared `circuitforge-core` package** — pipeline engine, LLM router, background task runner, wizard framework, tier system, operator interface all extracted into a private PyPI package that each product imports.
|
||||||
|
- **Same Docker Compose scaffold** — each product is a `compose.yml` away from deployment.
|
||||||
|
- **Same Ultra tier model** — operator interface reads from product's DB, human-in-the-loop for tasks that can't be automated (CAPTCHAs, phone calls, wet signatures).
|
||||||
|
- **Prove Peregrine first** — don't extract `circuitforge-core` until the second product is actively being built. Premature extraction is over-engineering.
|
||||||
|
|
||||||
|
### What makes this viable
|
||||||
|
- Each domain has the same pain profile: high-stakes, time-sensitive, opaque processes with inconsistent UX.
|
||||||
|
- Users are highly motivated to pay — the alternative is hours of their own time on hold or filling out forms.
|
||||||
|
- The human-in-the-loop (Ultra) model handles the hardest cases without requiring full automation.
|
||||||
|
- Regulatory moat: knowing which citations matter (CFPB for billing disputes, ADA for accommodation requests) is defensible knowledge that gets baked into prompts over time.
|
||||||
|
|
||||||
|
---
|
||||||
249
docs/developer-guide/adding-integrations.md
Normal file
249
docs/developer-guide/adding-integrations.md
Normal file
|
|
@ -0,0 +1,249 @@
|
||||||
|
# Adding an Integration
|
||||||
|
|
||||||
|
Peregrine's integration system is auto-discovered — add a class and a config example, and it appears in the wizard and Settings automatically. No registration step is needed.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 1 — Create the integration module
|
||||||
|
|
||||||
|
Create `scripts/integrations/myservice.py`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# scripts/integrations/myservice.py
|
||||||
|
|
||||||
|
from scripts.integrations.base import IntegrationBase
|
||||||
|
|
||||||
|
|
||||||
|
class MyServiceIntegration(IntegrationBase):
|
||||||
|
name = "myservice" # must be unique; matches config filename
|
||||||
|
label = "My Service" # display name shown in the UI
|
||||||
|
tier = "free" # "free" | "paid" | "premium"
|
||||||
|
|
||||||
|
def fields(self) -> list[dict]:
|
||||||
|
"""Return form field definitions for the connection card in the wizard/Settings UI."""
|
||||||
|
return [
|
||||||
|
{
|
||||||
|
"key": "api_key",
|
||||||
|
"label": "API Key",
|
||||||
|
"type": "password", # "text" | "password" | "url" | "checkbox"
|
||||||
|
"placeholder": "sk-...",
|
||||||
|
"required": True,
|
||||||
|
"help": "Get your key at myservice.com/settings/api",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"key": "workspace_id",
|
||||||
|
"label": "Workspace ID",
|
||||||
|
"type": "text",
|
||||||
|
"placeholder": "ws_abc123",
|
||||||
|
"required": True,
|
||||||
|
"help": "Found in your workspace URL",
|
||||||
|
},
|
||||||
|
]
|
||||||
|
|
||||||
|
def connect(self, config: dict) -> bool:
|
||||||
|
"""
|
||||||
|
Store credentials in memory. Return True if all required fields are present.
|
||||||
|
Does NOT verify credentials — call test() for that.
|
||||||
|
"""
|
||||||
|
self._api_key = config.get("api_key", "").strip()
|
||||||
|
self._workspace_id = config.get("workspace_id", "").strip()
|
||||||
|
return bool(self._api_key and self._workspace_id)
|
||||||
|
|
||||||
|
def test(self) -> bool:
|
||||||
|
"""
|
||||||
|
Verify the stored credentials actually work.
|
||||||
|
Returns True on success, False on any failure.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
import requests
|
||||||
|
r = requests.get(
|
||||||
|
"https://api.myservice.com/v1/ping",
|
||||||
|
headers={"Authorization": f"Bearer {self._api_key}"},
|
||||||
|
params={"workspace": self._workspace_id},
|
||||||
|
timeout=5,
|
||||||
|
)
|
||||||
|
return r.ok
|
||||||
|
except Exception:
|
||||||
|
return False
|
||||||
|
|
||||||
|
def sync(self, jobs: list[dict]) -> int:
|
||||||
|
"""
|
||||||
|
Optional: push jobs to the external service.
|
||||||
|
Return the count of successfully synced jobs.
|
||||||
|
The default implementation in IntegrationBase returns 0 (no-op).
|
||||||
|
Only override this if your integration supports job syncing
|
||||||
|
(e.g. Notion, Airtable, Google Sheets).
|
||||||
|
"""
|
||||||
|
synced = 0
|
||||||
|
for job in jobs:
|
||||||
|
try:
|
||||||
|
self._push_job(job)
|
||||||
|
synced += 1
|
||||||
|
except Exception as e:
|
||||||
|
print(f"[myservice] sync error for job {job.get('id')}: {e}")
|
||||||
|
return synced
|
||||||
|
|
||||||
|
def _push_job(self, job: dict) -> None:
|
||||||
|
import requests
|
||||||
|
requests.post(
|
||||||
|
"https://api.myservice.com/v1/records",
|
||||||
|
headers={"Authorization": f"Bearer {self._api_key}"},
|
||||||
|
json={
|
||||||
|
"workspace": self._workspace_id,
|
||||||
|
"title": job.get("title", ""),
|
||||||
|
"company": job.get("company", ""),
|
||||||
|
"status": job.get("status", "pending"),
|
||||||
|
"url": job.get("url", ""),
|
||||||
|
},
|
||||||
|
timeout=10,
|
||||||
|
).raise_for_status()
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 2 — Create the config example file
|
||||||
|
|
||||||
|
Create `config/integrations/myservice.yaml.example`:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# config/integrations/myservice.yaml.example
|
||||||
|
# Copy to config/integrations/myservice.yaml and fill in your credentials.
|
||||||
|
# This file is gitignored — never commit the live credentials.
|
||||||
|
api_key: ""
|
||||||
|
workspace_id: ""
|
||||||
|
```
|
||||||
|
|
||||||
|
The live credentials file (`config/integrations/myservice.yaml`) is gitignored automatically via the `config/integrations/` entry in `.gitignore`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 3 — Auto-discovery
|
||||||
|
|
||||||
|
No registration step is needed. The integration registry (`scripts/integrations/__init__.py`) imports all `.py` files in the `integrations/` directory and discovers subclasses of `IntegrationBase` automatically.
|
||||||
|
|
||||||
|
On next startup, `myservice` will appear in:
|
||||||
|
- The first-run wizard Step 7 (Integrations)
|
||||||
|
- **Settings → Integrations** with a connection card rendered from `fields()`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 4 — Tier-gate new features (optional)
|
||||||
|
|
||||||
|
If you want to gate a specific action (not just the integration itself) behind a tier, add an entry to `app/wizard/tiers.py`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
FEATURES: dict[str, str] = {
|
||||||
|
# ...existing entries...
|
||||||
|
"myservice_sync": "paid", # or "free" | "premium"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Then guard the action in the relevant UI page:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from app.wizard.tiers import can_use
|
||||||
|
from scripts.user_profile import UserProfile
|
||||||
|
|
||||||
|
user = UserProfile()
|
||||||
|
if can_use(user.tier, "myservice_sync"):
|
||||||
|
# show the sync button
|
||||||
|
else:
|
||||||
|
st.info("MyService sync requires a Paid plan.")
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 5 — Write a test
|
||||||
|
|
||||||
|
Create or add to `tests/test_integrations.py`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# tests/test_integrations.py (add to existing file)
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
from unittest.mock import patch, MagicMock
|
||||||
|
from pathlib import Path
|
||||||
|
from scripts.integrations.myservice import MyServiceIntegration
|
||||||
|
|
||||||
|
|
||||||
|
def test_fields_returns_required_keys():
|
||||||
|
integration = MyServiceIntegration()
|
||||||
|
fields = integration.fields()
|
||||||
|
assert len(fields) >= 1
|
||||||
|
for field in fields:
|
||||||
|
assert "key" in field
|
||||||
|
assert "label" in field
|
||||||
|
assert "type" in field
|
||||||
|
assert "required" in field
|
||||||
|
|
||||||
|
|
||||||
|
def test_connect_returns_true_with_valid_config():
|
||||||
|
integration = MyServiceIntegration()
|
||||||
|
result = integration.connect({"api_key": "sk-abc", "workspace_id": "ws-123"})
|
||||||
|
assert result is True
|
||||||
|
|
||||||
|
|
||||||
|
def test_connect_returns_false_with_missing_required_field():
|
||||||
|
integration = MyServiceIntegration()
|
||||||
|
result = integration.connect({"api_key": "", "workspace_id": "ws-123"})
|
||||||
|
assert result is False
|
||||||
|
|
||||||
|
|
||||||
|
def test_test_returns_true_on_200(tmp_path):
|
||||||
|
integration = MyServiceIntegration()
|
||||||
|
integration.connect({"api_key": "sk-abc", "workspace_id": "ws-123"})
|
||||||
|
|
||||||
|
mock_resp = MagicMock()
|
||||||
|
mock_resp.ok = True
|
||||||
|
|
||||||
|
with patch("scripts.integrations.myservice.requests.get", return_value=mock_resp):
|
||||||
|
assert integration.test() is True
|
||||||
|
|
||||||
|
|
||||||
|
def test_test_returns_false_on_error(tmp_path):
|
||||||
|
integration = MyServiceIntegration()
|
||||||
|
integration.connect({"api_key": "sk-abc", "workspace_id": "ws-123"})
|
||||||
|
|
||||||
|
with patch("scripts.integrations.myservice.requests.get", side_effect=Exception("timeout")):
|
||||||
|
assert integration.test() is False
|
||||||
|
|
||||||
|
|
||||||
|
def test_is_configured_reflects_file_presence(tmp_path):
|
||||||
|
config_dir = tmp_path / "config"
|
||||||
|
config_dir.mkdir()
|
||||||
|
(config_dir / "integrations").mkdir()
|
||||||
|
|
||||||
|
assert MyServiceIntegration.is_configured(config_dir) is False
|
||||||
|
|
||||||
|
(config_dir / "integrations" / "myservice.yaml").write_text("api_key: sk-abc\n")
|
||||||
|
assert MyServiceIntegration.is_configured(config_dir) is True
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## IntegrationBase Reference
|
||||||
|
|
||||||
|
All integrations inherit from `scripts/integrations/base.py`. Here is the full interface:
|
||||||
|
|
||||||
|
| Method / attribute | Required | Description |
|
||||||
|
|-------------------|----------|-------------|
|
||||||
|
| `name: str` | Yes | Machine key — must be unique. Matches the YAML config filename. |
|
||||||
|
| `label: str` | Yes | Human-readable display name for the UI. |
|
||||||
|
| `tier: str` | Yes | Minimum tier: `"free"`, `"paid"`, or `"premium"`. |
|
||||||
|
| `fields() -> list[dict]` | Yes | Returns form field definitions. Each dict: `key`, `label`, `type`, `placeholder`, `required`, `help`. |
|
||||||
|
| `connect(config: dict) -> bool` | Yes | Stores credentials in memory. Returns `True` if required fields are present. Does NOT verify credentials. |
|
||||||
|
| `test() -> bool` | Yes | Makes a real network call to verify stored credentials. Returns `True` on success. |
|
||||||
|
| `sync(jobs: list[dict]) -> int` | No | Pushes jobs to the external service. Returns count synced. Default is a no-op returning 0. |
|
||||||
|
| `config_path(config_dir: Path) -> Path` | Inherited | Returns `config_dir / "integrations" / f"{name}.yaml"`. |
|
||||||
|
| `is_configured(config_dir: Path) -> bool` | Inherited | Returns `True` if the config YAML file exists. |
|
||||||
|
| `save_config(config: dict, config_dir: Path)` | Inherited | Writes config dict to the YAML file. Call after `test()` returns `True`. |
|
||||||
|
| `load_config(config_dir: Path) -> dict` | Inherited | Loads and returns the YAML config, or `{}` if not configured. |
|
||||||
|
|
||||||
|
### Field type values
|
||||||
|
|
||||||
|
| `type` value | UI widget rendered |
|
||||||
|
|-------------|-------------------|
|
||||||
|
| `"text"` | Plain text input |
|
||||||
|
| `"password"` | Password input (masked) |
|
||||||
|
| `"url"` | URL input |
|
||||||
|
| `"checkbox"` | Boolean checkbox |
|
||||||
244
docs/developer-guide/adding-scrapers.md
Normal file
244
docs/developer-guide/adding-scrapers.md
Normal file
|
|
@ -0,0 +1,244 @@
|
||||||
|
# Adding a Custom Job Board Scraper
|
||||||
|
|
||||||
|
Peregrine supports pluggable custom job board scrapers. Standard boards use the JobSpy library. Custom scrapers handle boards with non-standard APIs, paywalls, or SSR-rendered pages.
|
||||||
|
|
||||||
|
This guide walks through adding a new scraper from scratch.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 1 — Create the scraper module
|
||||||
|
|
||||||
|
Create `scripts/custom_boards/myboard.py`. Every custom scraper must implement one function:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# scripts/custom_boards/myboard.py
|
||||||
|
|
||||||
|
def scrape(profile: dict, db_path: str) -> list[dict]:
|
||||||
|
"""
|
||||||
|
Scrape job listings from MyBoard for the given search profile.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
profile: The active search profile dict from search_profiles.yaml.
|
||||||
|
Keys include: titles (list), locations (list),
|
||||||
|
hours_old (int), results_per_board (int).
|
||||||
|
db_path: Absolute path to staging.db. Use this if you need to
|
||||||
|
check for existing URLs before returning.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of job dicts. Each dict must contain at minimum:
|
||||||
|
title (str) — job title
|
||||||
|
company (str) — company name
|
||||||
|
url (str) — canonical job URL (used as unique key)
|
||||||
|
source (str) — board identifier, e.g. "myboard"
|
||||||
|
location (str) — "Remote" or "City, State"
|
||||||
|
is_remote (bool) — True if remote
|
||||||
|
salary (str) — salary string or "" if unknown
|
||||||
|
description (str) — full job description text or "" if unavailable
|
||||||
|
date_found (str) — ISO 8601 datetime string, e.g. "2026-02-25T12:00:00"
|
||||||
|
"""
|
||||||
|
jobs = []
|
||||||
|
|
||||||
|
for title in profile.get("titles", []):
|
||||||
|
for location in profile.get("locations", []):
|
||||||
|
results = _fetch_from_myboard(title, location, profile)
|
||||||
|
jobs.extend(results)
|
||||||
|
|
||||||
|
return jobs
|
||||||
|
|
||||||
|
|
||||||
|
def _fetch_from_myboard(title: str, location: str, profile: dict) -> list[dict]:
|
||||||
|
"""Internal helper — call the board's API and transform results."""
|
||||||
|
import requests
|
||||||
|
from datetime import datetime
|
||||||
|
|
||||||
|
params = {
|
||||||
|
"q": title,
|
||||||
|
"l": location,
|
||||||
|
"limit": profile.get("results_per_board", 50),
|
||||||
|
}
|
||||||
|
|
||||||
|
try:
|
||||||
|
resp = requests.get(
|
||||||
|
"https://api.myboard.com/jobs",
|
||||||
|
params=params,
|
||||||
|
timeout=15,
|
||||||
|
)
|
||||||
|
resp.raise_for_status()
|
||||||
|
data = resp.json()
|
||||||
|
except Exception as e:
|
||||||
|
print(f"[myboard] fetch error: {e}")
|
||||||
|
return []
|
||||||
|
|
||||||
|
jobs = []
|
||||||
|
for item in data.get("results", []):
|
||||||
|
jobs.append({
|
||||||
|
"title": item.get("title", ""),
|
||||||
|
"company": item.get("company", ""),
|
||||||
|
"url": item.get("url", ""),
|
||||||
|
"source": "myboard",
|
||||||
|
"location": item.get("location", ""),
|
||||||
|
"is_remote": "remote" in item.get("location", "").lower(),
|
||||||
|
"salary": item.get("salary", ""),
|
||||||
|
"description": item.get("description", ""),
|
||||||
|
"date_found": datetime.utcnow().isoformat(),
|
||||||
|
})
|
||||||
|
|
||||||
|
return jobs
|
||||||
|
```
|
||||||
|
|
||||||
|
### Required fields
|
||||||
|
|
||||||
|
| Field | Type | Notes |
|
||||||
|
|-------|------|-------|
|
||||||
|
| `title` | str | Job title |
|
||||||
|
| `company` | str | Company name |
|
||||||
|
| `url` | str | **Unique key** — must be stable and canonical |
|
||||||
|
| `source` | str | Short board identifier, e.g. `"myboard"` |
|
||||||
|
| `location` | str | `"Remote"` or `"City, ST"` |
|
||||||
|
| `is_remote` | bool | `True` if remote |
|
||||||
|
| `salary` | str | Salary string or `""` |
|
||||||
|
| `description` | str | Full description text or `""` |
|
||||||
|
| `date_found` | str | ISO 8601 UTC datetime |
|
||||||
|
|
||||||
|
### Deduplication
|
||||||
|
|
||||||
|
`discover.py` deduplicates by `url` before inserting into the database. If a job with the same URL already exists, it is silently skipped. You do not need to handle deduplication inside your scraper.
|
||||||
|
|
||||||
|
### Rate limiting
|
||||||
|
|
||||||
|
Be a good citizen:
|
||||||
|
- Add a `time.sleep(0.5)` between paginated requests
|
||||||
|
- Respect `Retry-After` headers
|
||||||
|
- Do not scrape faster than a human browsing the site
|
||||||
|
- If the site provides an official API, prefer that over scraping HTML
|
||||||
|
|
||||||
|
### Credentials
|
||||||
|
|
||||||
|
If your scraper requires API keys or credentials:
|
||||||
|
- Create `config/myboard.yaml.example` as a template
|
||||||
|
- Create `config/myboard.yaml` (gitignored) for live credentials
|
||||||
|
- Read it in your scraper with `yaml.safe_load(open("config/myboard.yaml"))`
|
||||||
|
- Document the credential setup in comments at the top of your module
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 2 — Register the scraper
|
||||||
|
|
||||||
|
Open `scripts/discover.py` and add your scraper to the `CUSTOM_SCRAPERS` dict:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from scripts.custom_boards import adzuna, theladders, craigslist, myboard
|
||||||
|
|
||||||
|
CUSTOM_SCRAPERS = {
|
||||||
|
"adzuna": adzuna.scrape,
|
||||||
|
"theladders": theladders.scrape,
|
||||||
|
"craigslist": craigslist.scrape,
|
||||||
|
"myboard": myboard.scrape, # add this line
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 3 — Activate in a search profile
|
||||||
|
|
||||||
|
Open `config/search_profiles.yaml` and add `myboard` to `custom_boards` in any profile:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
profiles:
|
||||||
|
- name: cs_leadership
|
||||||
|
boards:
|
||||||
|
- linkedin
|
||||||
|
- indeed
|
||||||
|
custom_boards:
|
||||||
|
- adzuna
|
||||||
|
- myboard # add this line
|
||||||
|
titles:
|
||||||
|
- Customer Success Manager
|
||||||
|
locations:
|
||||||
|
- Remote
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 4 — Write a test
|
||||||
|
|
||||||
|
Create `tests/test_myboard.py`. Mock the HTTP call to avoid hitting the live API during tests:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# tests/test_myboard.py
|
||||||
|
|
||||||
|
from unittest.mock import patch
|
||||||
|
from scripts.custom_boards.myboard import scrape
|
||||||
|
|
||||||
|
MOCK_RESPONSE = {
|
||||||
|
"results": [
|
||||||
|
{
|
||||||
|
"title": "Customer Success Manager",
|
||||||
|
"company": "Acme Corp",
|
||||||
|
"url": "https://myboard.com/jobs/12345",
|
||||||
|
"location": "Remote",
|
||||||
|
"salary": "$80,000 - $100,000",
|
||||||
|
"description": "We are looking for a CSM...",
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
def test_scrape_returns_correct_shape():
|
||||||
|
profile = {
|
||||||
|
"titles": ["Customer Success Manager"],
|
||||||
|
"locations": ["Remote"],
|
||||||
|
"results_per_board": 10,
|
||||||
|
"hours_old": 240,
|
||||||
|
}
|
||||||
|
|
||||||
|
with patch("scripts.custom_boards.myboard.requests.get") as mock_get:
|
||||||
|
mock_get.return_value.ok = True
|
||||||
|
mock_get.return_value.raise_for_status = lambda: None
|
||||||
|
mock_get.return_value.json.return_value = MOCK_RESPONSE
|
||||||
|
|
||||||
|
jobs = scrape(profile, db_path="nonexistent.db")
|
||||||
|
|
||||||
|
assert len(jobs) == 1
|
||||||
|
job = jobs[0]
|
||||||
|
|
||||||
|
# Required fields
|
||||||
|
assert "title" in job
|
||||||
|
assert "company" in job
|
||||||
|
assert "url" in job
|
||||||
|
assert "source" in job
|
||||||
|
assert "location" in job
|
||||||
|
assert "is_remote" in job
|
||||||
|
assert "salary" in job
|
||||||
|
assert "description" in job
|
||||||
|
assert "date_found" in job
|
||||||
|
|
||||||
|
assert job["source"] == "myboard"
|
||||||
|
assert job["title"] == "Customer Success Manager"
|
||||||
|
assert job["url"] == "https://myboard.com/jobs/12345"
|
||||||
|
|
||||||
|
|
||||||
|
def test_scrape_handles_http_error_gracefully():
|
||||||
|
profile = {
|
||||||
|
"titles": ["Customer Success Manager"],
|
||||||
|
"locations": ["Remote"],
|
||||||
|
"results_per_board": 10,
|
||||||
|
"hours_old": 240,
|
||||||
|
}
|
||||||
|
|
||||||
|
with patch("scripts.custom_boards.myboard.requests.get") as mock_get:
|
||||||
|
mock_get.side_effect = Exception("Connection refused")
|
||||||
|
|
||||||
|
jobs = scrape(profile, db_path="nonexistent.db")
|
||||||
|
|
||||||
|
assert jobs == []
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Existing Scrapers as Reference
|
||||||
|
|
||||||
|
| Scraper | Notes |
|
||||||
|
|---------|-------|
|
||||||
|
| `scripts/custom_boards/adzuna.py` | REST API with `app_id` + `app_key` authentication |
|
||||||
|
| `scripts/custom_boards/theladders.py` | SSR scraper using `curl_cffi` to parse `__NEXT_DATA__` JSON embedded in the page |
|
||||||
|
| `scripts/custom_boards/craigslist.py` | RSS feed scraper |
|
||||||
286
docs/developer-guide/architecture.md
Normal file
286
docs/developer-guide/architecture.md
Normal file
|
|
@ -0,0 +1,286 @@
|
||||||
|
# Architecture
|
||||||
|
|
||||||
|
This page describes Peregrine's system structure, layer boundaries, and key design decisions.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## System Overview
|
||||||
|
|
||||||
|
### Pipeline
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
flowchart LR
|
||||||
|
sources["JobSpy\nCustom Boards"]
|
||||||
|
discover["discover.py"]
|
||||||
|
db[("staging.db\nSQLite")]
|
||||||
|
match["match.py\nScoring"]
|
||||||
|
review["Job Review\nApprove / Reject"]
|
||||||
|
apply["Apply Workspace\nCover letter + PDF"]
|
||||||
|
kanban["Interviews\nphone_screen → hired"]
|
||||||
|
sync["sync.py"]
|
||||||
|
notion["Notion DB"]
|
||||||
|
|
||||||
|
sources --> discover --> db --> match --> review --> apply --> kanban
|
||||||
|
db --> sync --> notion
|
||||||
|
```
|
||||||
|
|
||||||
|
### Docker Compose Services
|
||||||
|
|
||||||
|
Three compose files serve different deployment contexts:
|
||||||
|
|
||||||
|
| File | Project name | Port | Purpose |
|
||||||
|
|------|-------------|------|---------|
|
||||||
|
| `compose.yml` | `peregrine` | 8502 | Local self-hosted install (default) |
|
||||||
|
| `compose.demo.yml` | `peregrine-demo` | 8504 | Public demo at `demo.circuitforge.tech/peregrine` — `DEMO_MODE=true`, no LLM |
|
||||||
|
| `compose.cloud.yml` | `peregrine-cloud` | 8505 | Cloud managed instance at `menagerie.circuitforge.tech/peregrine` — `CLOUD_MODE=true`, per-user data |
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
flowchart TB
|
||||||
|
subgraph local["compose.yml (local)"]
|
||||||
|
app_l["**app** :8502\nStreamlit UI"]
|
||||||
|
ollama_l["**ollama**\nLocal LLM"]
|
||||||
|
vllm_l["**vllm**\nvLLM"]
|
||||||
|
vision_l["**vision**\nMoondream2"]
|
||||||
|
searxng_l["**searxng**\nWeb Search"]
|
||||||
|
db_l[("staging.db\nSQLite")]
|
||||||
|
end
|
||||||
|
|
||||||
|
subgraph cloud["compose.cloud.yml (cloud)"]
|
||||||
|
app_c["**app** :8505\nStreamlit UI\nCLOUD_MODE=true"]
|
||||||
|
searxng_c["**searxng**\nWeb Search"]
|
||||||
|
db_c[("menagerie-data/\n<user-id>/staging.db\nSQLCipher")]
|
||||||
|
pg[("Postgres\nplatform DB\n:5433")]
|
||||||
|
end
|
||||||
|
```
|
||||||
|
|
||||||
|
Solid lines = always connected. Dashed lines = optional/profile-dependent backends.
|
||||||
|
|
||||||
|
### Streamlit App Layer
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
flowchart TD
|
||||||
|
entry["app/app.py\nEntry point · navigation · sidebar task badge"]
|
||||||
|
|
||||||
|
setup["0_Setup.py\nFirst-run wizard\n⚠️ Gates everything"]
|
||||||
|
review["1_Job_Review.py\nApprove / reject queue"]
|
||||||
|
settings["2_Settings.py\nAll user configuration"]
|
||||||
|
apply["4_Apply.py\nCover letter gen + PDF export"]
|
||||||
|
interviews["5_Interviews.py\nKanban: phone_screen → hired"]
|
||||||
|
prep["6_Interview_Prep.py\nResearch brief + practice Q&A"]
|
||||||
|
survey["7_Survey.py\nCulture-fit survey assistant"]
|
||||||
|
wizard["app/wizard/\nstep_hardware.py … step_integrations.py\ntiers.py — feature gate definitions"]
|
||||||
|
|
||||||
|
entry --> setup
|
||||||
|
entry --> review
|
||||||
|
entry --> settings
|
||||||
|
entry --> apply
|
||||||
|
entry --> interviews
|
||||||
|
entry --> prep
|
||||||
|
entry --> survey
|
||||||
|
setup <-.->|wizard steps| wizard
|
||||||
|
```
|
||||||
|
|
||||||
|
### Scripts Layer
|
||||||
|
|
||||||
|
Framework-independent — no Streamlit imports. Can be called from CLI, FastAPI, or background threads.
|
||||||
|
|
||||||
|
| Script | Purpose |
|
||||||
|
|--------|---------|
|
||||||
|
| `discover.py` | JobSpy + custom board orchestration |
|
||||||
|
| `match.py` | Resume keyword scoring |
|
||||||
|
| `db.py` | All SQLite helpers (single source of truth) |
|
||||||
|
| `llm_router.py` | LLM fallback chain |
|
||||||
|
| `generate_cover_letter.py` | Cover letter generation |
|
||||||
|
| `company_research.py` | Pre-interview research brief |
|
||||||
|
| `task_runner.py` | Background daemon thread executor |
|
||||||
|
| `imap_sync.py` | IMAP email fetch + classify |
|
||||||
|
| `sync.py` | Push to external integrations |
|
||||||
|
| `user_profile.py` | `UserProfile` wrapper for `user.yaml` |
|
||||||
|
| `preflight.py` | Port + resource check |
|
||||||
|
| `custom_boards/` | Per-board scrapers |
|
||||||
|
| `integrations/` | Per-service integration drivers |
|
||||||
|
| `vision_service/` | FastAPI Moondream2 inference server |
|
||||||
|
|
||||||
|
### Config Layer
|
||||||
|
|
||||||
|
Plain YAML files. Gitignored files contain secrets; `.example` files are committed as templates.
|
||||||
|
|
||||||
|
| File | Purpose |
|
||||||
|
|------|---------|
|
||||||
|
| `config/user.yaml` | Personal data + wizard state |
|
||||||
|
| `config/llm.yaml` | LLM backends + fallback chains |
|
||||||
|
| `config/search_profiles.yaml` | Job search configuration |
|
||||||
|
| `config/resume_keywords.yaml` | Scoring keywords |
|
||||||
|
| `config/blocklist.yaml` | Excluded companies/domains |
|
||||||
|
| `config/email.yaml` | IMAP credentials |
|
||||||
|
| `config/integrations/` | Per-integration credentials |
|
||||||
|
|
||||||
|
### Database Layer
|
||||||
|
|
||||||
|
**Local mode** — `staging.db`: SQLite, single file, gitignored.
|
||||||
|
|
||||||
|
**Cloud mode** — Hybrid:
|
||||||
|
|
||||||
|
- **Postgres (platform layer):** account data, subscriptions, telemetry consent. Shared across all users.
|
||||||
|
- **SQLite-per-user (content layer):** each user's job data in an isolated, SQLCipher-encrypted file at `/devl/menagerie-data/<user-id>/peregrine/staging.db`. Schema is identical to local — the app sees no difference.
|
||||||
|
|
||||||
|
#### Local SQLite tables
|
||||||
|
|
||||||
|
| Table | Purpose |
|
||||||
|
|-------|---------|
|
||||||
|
| `jobs` | Core pipeline — all job data |
|
||||||
|
| `job_contacts` | Email thread log per job |
|
||||||
|
| `company_research` | LLM-generated research briefs |
|
||||||
|
| `background_tasks` | Async task queue state |
|
||||||
|
| `survey_responses` | Culture-fit survey Q&A pairs |
|
||||||
|
|
||||||
|
#### Postgres platform tables (cloud only)
|
||||||
|
|
||||||
|
| Table | Purpose |
|
||||||
|
|-------|---------|
|
||||||
|
| `subscriptions` | User tier, license JWT, product |
|
||||||
|
| `usage_events` | Anonymous usage telemetry (consent-gated) |
|
||||||
|
| `telemetry_consent` | Per-user telemetry preferences + hard kill switch |
|
||||||
|
| `support_access_grants` | Time-limited support session grants |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Cloud Session Middleware
|
||||||
|
|
||||||
|
`app/cloud_session.py` handles multi-tenant routing transparently:
|
||||||
|
|
||||||
|
```
|
||||||
|
Request → Caddy injects X-CF-Session header (from Directus session cookie)
|
||||||
|
→ resolve_session() validates JWT, derives db_path + db_key
|
||||||
|
→ all DB calls use get_db_path() instead of DEFAULT_DB
|
||||||
|
```
|
||||||
|
|
||||||
|
Key functions:
|
||||||
|
|
||||||
|
| Function | Purpose |
|
||||||
|
|----------|---------|
|
||||||
|
| `resolve_session(app)` | Called at top of every page — no-op in local mode |
|
||||||
|
| `get_db_path()` | Returns per-user `db_path` (cloud) or `DEFAULT_DB` (local) |
|
||||||
|
| `derive_db_key(user_id)` | `HMAC(SERVER_SECRET, user_id)` — deterministic per-user SQLCipher key |
|
||||||
|
|
||||||
|
The app code never branches on `CLOUD_MODE` except at the entry points (`resolve_session` and `get_db_path`). Everything downstream is transparent.
|
||||||
|
|
||||||
|
### Telemetry (cloud only)
|
||||||
|
|
||||||
|
`app/telemetry.py` is the **only** path to the `usage_events` table. No feature may write there directly.
|
||||||
|
|
||||||
|
```python
|
||||||
|
from app.telemetry import log_usage_event
|
||||||
|
|
||||||
|
log_usage_event(user_id, "peregrine", "cover_letter_generated", {"words": 350})
|
||||||
|
```
|
||||||
|
|
||||||
|
- Complete no-op when `CLOUD_MODE=false`
|
||||||
|
- Checks `telemetry_consent.all_disabled` first — if set, nothing is written, no exceptions
|
||||||
|
- Swallows all exceptions so telemetry never crashes the app
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Layer Boundaries
|
||||||
|
|
||||||
|
### App layer (app/)
|
||||||
|
|
||||||
|
The Streamlit UI layer. Its only responsibilities are:
|
||||||
|
|
||||||
|
- Reading from `scripts/db.py` helpers
|
||||||
|
- Calling `scripts/` functions directly or via `task_runner.submit_task()`
|
||||||
|
- Rendering results to the browser
|
||||||
|
|
||||||
|
The app layer does not contain business logic. Database queries, LLM calls, and integrations all live in `scripts/`.
|
||||||
|
|
||||||
|
### Scripts layer (scripts/)
|
||||||
|
|
||||||
|
This is the stable public API of Peregrine. Scripts are designed to be framework-independent — they do not import Streamlit and can be called from a CLI, FastAPI endpoint, or background thread without modification.
|
||||||
|
|
||||||
|
All personal data access goes through `scripts/user_profile.py` (`UserProfile` class). Scripts never read `config/user.yaml` directly.
|
||||||
|
|
||||||
|
All database access goes through `scripts/db.py`. No script does raw SQLite outside of `db.py`.
|
||||||
|
|
||||||
|
### Config layer (config/)
|
||||||
|
|
||||||
|
Plain YAML files. Gitignored files contain secrets; `.example` files are committed as templates.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Background Tasks
|
||||||
|
|
||||||
|
`scripts/task_runner.py` provides a simple background thread executor for long-running LLM tasks.
|
||||||
|
|
||||||
|
```python
|
||||||
|
from scripts.task_runner import submit_task
|
||||||
|
|
||||||
|
# Queue a cover letter generation task
|
||||||
|
submit_task(db_path, task_type="cover_letter", job_id=42)
|
||||||
|
|
||||||
|
# Queue a company research task
|
||||||
|
submit_task(db_path, task_type="company_research", job_id=42)
|
||||||
|
```
|
||||||
|
|
||||||
|
Tasks are recorded in the `background_tasks` table with the following state machine:
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
stateDiagram-v2
|
||||||
|
[*] --> queued : submit_task()
|
||||||
|
queued --> running : daemon picks up
|
||||||
|
running --> completed
|
||||||
|
running --> failed
|
||||||
|
queued --> failed : server restart clears stuck tasks
|
||||||
|
completed --> [*]
|
||||||
|
failed --> [*]
|
||||||
|
```
|
||||||
|
|
||||||
|
**Dedup rule:** Only one `queued` or `running` task per `(task_type, job_id)` pair is allowed at a time. Submitting a duplicate is a silent no-op.
|
||||||
|
|
||||||
|
**On startup:** `app/app.py` resets any `running` or `queued` rows to `failed` to clear tasks that were interrupted by a server restart.
|
||||||
|
|
||||||
|
**Sidebar indicator:** `app/app.py` polls the `background_tasks` table every 3 seconds via a Streamlit fragment and displays a badge in the sidebar.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## LLM Router
|
||||||
|
|
||||||
|
`scripts/llm_router.py` provides a single `complete()` call that tries backends in priority order and falls back transparently. See [LLM Router](../reference/llm-router.md) for full documentation.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Key Design Decisions
|
||||||
|
|
||||||
|
### scripts/ is framework-independent
|
||||||
|
|
||||||
|
The scripts layer was deliberately kept free of Streamlit imports. This means the full pipeline can be migrated to a FastAPI or Celery backend without rewriting business logic.
|
||||||
|
|
||||||
|
### All personal data via UserProfile
|
||||||
|
|
||||||
|
`scripts/user_profile.py` is the single source of truth for all user data. This makes it easy to swap the storage backend (e.g. from YAML to a database) without touching every script.
|
||||||
|
|
||||||
|
### SQLite as staging layer
|
||||||
|
|
||||||
|
`staging.db` acts as the staging layer between discovery and external integrations. This lets discovery, matching, and the UI all run independently without network dependencies. External integrations (Notion, Airtable, etc.) are push-only and optional.
|
||||||
|
|
||||||
|
### Tier system in app/wizard/tiers.py
|
||||||
|
|
||||||
|
`FEATURES` is a single dict that maps feature key → minimum tier. `can_use(tier, feature)` is the single gating function. New features are added to `FEATURES` in one place.
|
||||||
|
|
||||||
|
### Vision service is a separate process
|
||||||
|
|
||||||
|
Moondream2 requires `torch` and `transformers`, which are incompatible with the lightweight main conda environment. The vision service runs as a separate FastAPI process in a separate conda environment (`job-seeker-vision`), keeping the main env free of GPU dependencies.
|
||||||
|
|
||||||
|
### Cloud mode is a transparent layer, not a fork
|
||||||
|
|
||||||
|
`CLOUD_MODE=true` activates two entry points (`resolve_session`, `get_db_path`) and the telemetry middleware. Every other line of app code is unchanged. There is no cloud branch, no conditional imports, no schema divergence. The local-first architecture is preserved end-to-end; the cloud layer sits on top of it.
|
||||||
|
|
||||||
|
### SQLite-per-user instead of shared Postgres
|
||||||
|
|
||||||
|
Each cloud user gets their own encrypted SQLite file. This means:
|
||||||
|
|
||||||
|
- No SQL migrations when the schema changes — new users get the latest schema, existing users keep their file as-is
|
||||||
|
- Zero risk of cross-user data leakage at the DB layer
|
||||||
|
- GDPR deletion is `rm -rf /devl/menagerie-data/<user-id>/` — auditable and complete
|
||||||
|
- The app can be tested locally with `CLOUD_MODE=false` without any Postgres dependency
|
||||||
|
|
||||||
|
The Postgres platform DB holds only account metadata (subscriptions, consent, telemetry) — never job search content.
|
||||||
198
docs/developer-guide/cloud-deployment.md
Normal file
198
docs/developer-guide/cloud-deployment.md
Normal file
|
|
@ -0,0 +1,198 @@
|
||||||
|
# Cloud Deployment
|
||||||
|
|
||||||
|
This page covers operating the Peregrine cloud managed instance at `menagerie.circuitforge.tech/peregrine`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Architecture Overview
|
||||||
|
|
||||||
|
```
|
||||||
|
Browser → Caddy (bastion) → host:8505 → peregrine-cloud container
|
||||||
|
│
|
||||||
|
┌─────────────────────────┼──────────────────────────┐
|
||||||
|
│ │ │
|
||||||
|
cloud_session.py /devl/menagerie-data/ Postgres :5433
|
||||||
|
(session routing) <user-id>/peregrine/ (platform DB)
|
||||||
|
staging.db (SQLCipher)
|
||||||
|
```
|
||||||
|
|
||||||
|
Caddy injects the Directus session cookie as `X-CF-Session`. `cloud_session.py` validates the JWT, derives the per-user db path and SQLCipher key, and injects both into `st.session_state`. All downstream DB calls are transparent — the app never knows it's multi-tenant.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Compose File
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Start
|
||||||
|
docker compose -f compose.cloud.yml --project-name peregrine-cloud --env-file .env up -d
|
||||||
|
|
||||||
|
# Stop
|
||||||
|
docker compose -f compose.cloud.yml --project-name peregrine-cloud down
|
||||||
|
|
||||||
|
# Logs
|
||||||
|
docker compose -f compose.cloud.yml --project-name peregrine-cloud logs app -f
|
||||||
|
|
||||||
|
# Rebuild after code changes
|
||||||
|
docker compose -f compose.cloud.yml --project-name peregrine-cloud build app
|
||||||
|
docker compose -f compose.cloud.yml --project-name peregrine-cloud up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Required Environment Variables
|
||||||
|
|
||||||
|
These must be present in `.env` (gitignored) before starting the cloud stack:
|
||||||
|
|
||||||
|
| Variable | Description | Where to find |
|
||||||
|
|----------|-------------|---------------|
|
||||||
|
| `CLOUD_MODE` | Must be `true` | Hardcoded in compose.cloud.yml |
|
||||||
|
| `CLOUD_DATA_ROOT` | Host path for per-user data trees | `/devl/menagerie-data` |
|
||||||
|
| `DIRECTUS_JWT_SECRET` | Directus signing secret — validates session JWTs | `website/.env` → `DIRECTUS_SECRET` |
|
||||||
|
| `CF_SERVER_SECRET` | Server secret for SQLCipher key derivation | Generate: `openssl rand -base64 32 \| tr -d '/=+' \| cut -c1-32` |
|
||||||
|
| `PLATFORM_DB_URL` | Postgres connection string for platform DB | `postgresql://cf_platform:<pass>@host.docker.internal:5433/circuitforge_platform` |
|
||||||
|
|
||||||
|
!!! warning "SECRET ROTATION"
|
||||||
|
`CF_SERVER_SECRET` is used to derive all per-user SQLCipher keys via `HMAC(secret, user_id)`. Rotating this secret renders all existing user databases unreadable. Do not rotate it without a migration plan.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Data Root
|
||||||
|
|
||||||
|
User data lives at `/devl/menagerie-data/` on the host, bind-mounted into the container:
|
||||||
|
|
||||||
|
```
|
||||||
|
/devl/menagerie-data/
|
||||||
|
<directus-user-uuid>/
|
||||||
|
peregrine/
|
||||||
|
staging.db ← SQLCipher-encrypted (AES-256)
|
||||||
|
config/ ← llm.yaml, server.yaml, user.yaml, etc.
|
||||||
|
data/ ← documents, exports, attachments
|
||||||
|
```
|
||||||
|
|
||||||
|
The directory is created automatically on first login. The SQLCipher key for each user is derived deterministically: `HMAC-SHA256(CF_SERVER_SECRET, user_id)`.
|
||||||
|
|
||||||
|
### GDPR / Data deletion
|
||||||
|
|
||||||
|
To fully delete a user's data:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Remove all content data
|
||||||
|
rm -rf /devl/menagerie-data/<user-id>/
|
||||||
|
|
||||||
|
# Remove platform DB rows (cascades)
|
||||||
|
docker exec cf-platform-db psql -U cf_platform -d circuitforge_platform \
|
||||||
|
-c "DELETE FROM subscriptions WHERE user_id = '<user-id>';"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Platform Database
|
||||||
|
|
||||||
|
The Postgres platform DB runs as `cf-platform-db` in the website compose stack (port 5433 on host).
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Connect
|
||||||
|
docker exec cf-platform-db psql -U cf_platform -d circuitforge_platform
|
||||||
|
|
||||||
|
# Check tables
|
||||||
|
\dt
|
||||||
|
|
||||||
|
# View telemetry consent for a user
|
||||||
|
SELECT * FROM telemetry_consent WHERE user_id = '<uuid>';
|
||||||
|
|
||||||
|
# View recent usage events
|
||||||
|
SELECT user_id, event_type, occurred_at FROM usage_events
|
||||||
|
ORDER BY occurred_at DESC LIMIT 20;
|
||||||
|
```
|
||||||
|
|
||||||
|
The schema is initialised on container start from `platform-db/init.sql` in the website repo.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Telemetry
|
||||||
|
|
||||||
|
`app/telemetry.py` is the **only** entry point to `usage_events`. Never write to that table directly.
|
||||||
|
|
||||||
|
```python
|
||||||
|
from app.telemetry import log_usage_event
|
||||||
|
|
||||||
|
# Fires in cloud mode only; no-op locally
|
||||||
|
log_usage_event(user_id, "peregrine", "cover_letter_generated", {"words": 350})
|
||||||
|
```
|
||||||
|
|
||||||
|
Events are blocked if:
|
||||||
|
|
||||||
|
1. `telemetry_consent.all_disabled = true` (hard kill switch, overrides all)
|
||||||
|
2. `telemetry_consent.usage_events_enabled = false`
|
||||||
|
|
||||||
|
The user controls both from Settings → 🔒 Privacy.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Backup / Restore (Cloud Mode)
|
||||||
|
|
||||||
|
The Settings → 💾 Data tab handles backup/restore transparently. In cloud mode:
|
||||||
|
|
||||||
|
- **Export:** the SQLCipher-encrypted DB is decrypted before zipping — the downloaded `.zip` is a portable plain SQLite archive, compatible with any local Docker install.
|
||||||
|
- **Import:** a plain SQLite backup is re-encrypted with the user's key on restore.
|
||||||
|
|
||||||
|
The user's `base_dir` in cloud mode is `get_db_path().parent` (`/devl/menagerie-data/<user-id>/peregrine/`), not the app root.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Routing (Caddy)
|
||||||
|
|
||||||
|
`menagerie.circuitforge.tech` in `/devl/caddy-proxy/Caddyfile`:
|
||||||
|
|
||||||
|
```caddy
|
||||||
|
menagerie.circuitforge.tech {
|
||||||
|
encode gzip zstd
|
||||||
|
handle /peregrine* {
|
||||||
|
reverse_proxy http://host.docker.internal:8505 {
|
||||||
|
header_up X-CF-Session {header.Cookie}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
handle {
|
||||||
|
respond "This app is not yet available in the managed cloud — check back soon." 503
|
||||||
|
}
|
||||||
|
log {
|
||||||
|
output file /data/logs/menagerie.circuitforge.tech.log
|
||||||
|
format json
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
`header_up X-CF-Session {header.Cookie}` passes the full cookie header so `cloud_session.py` can extract the Directus session token.
|
||||||
|
|
||||||
|
!!! note "Caddy inode gotcha"
|
||||||
|
After editing the Caddyfile, run `docker restart caddy-proxy` — not `caddy reload`. The Edit tool creates a new inode; Docker bind mounts pin to the original inode and `caddy reload` re-reads the stale one.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Demo Instance
|
||||||
|
|
||||||
|
The public demo at `demo.circuitforge.tech/peregrine` runs separately:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Start demo
|
||||||
|
docker compose -f compose.demo.yml --project-name peregrine-demo up -d
|
||||||
|
|
||||||
|
# Rebuild after code changes
|
||||||
|
docker compose -f compose.demo.yml --project-name peregrine-demo build app
|
||||||
|
docker compose -f compose.demo.yml --project-name peregrine-demo up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
`DEMO_MODE=true` blocks all LLM inference calls at `llm_router.py`. Discovery, job enrichment, and the UI work normally. Demo data lives in `demo/config/` and `demo/data/` — isolated from personal data.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Adding a New App to the Cloud
|
||||||
|
|
||||||
|
To onboard a new menagerie app (e.g. `falcon`) to the cloud:
|
||||||
|
|
||||||
|
1. Add `resolve_session("falcon")` at the top of each page (calls `cloud_session.py` with the app slug)
|
||||||
|
2. Replace `DEFAULT_DB` references with `get_db_path()`
|
||||||
|
3. Add `app/telemetry.py` import and `log_usage_event()` calls at key action points
|
||||||
|
4. Create `compose.cloud.yml` following the Peregrine pattern (port, `CLOUD_MODE=true`, data mount)
|
||||||
|
5. Add a Caddy `handle /falcon*` block in `menagerie.circuitforge.tech`, routing to the new port
|
||||||
|
6. `cloud_session.py` automatically creates `<data_root>/<user-id>/falcon/` on first login
|
||||||
120
docs/developer-guide/contributing.md
Normal file
120
docs/developer-guide/contributing.md
Normal file
|
|
@ -0,0 +1,120 @@
|
||||||
|
# Contributing
|
||||||
|
|
||||||
|
Thank you for your interest in contributing to Peregrine. This guide covers the development environment, code standards, test requirements, and pull request process.
|
||||||
|
|
||||||
|
!!! note "License"
|
||||||
|
Peregrine uses a dual licence. The discovery pipeline (`scripts/discover.py`, `scripts/match.py`, `scripts/db.py`, `scripts/custom_boards/`) is MIT. All AI features, the UI, and everything else is BSL 1.1.
|
||||||
|
Do not add `Co-Authored-By:` trailers or AI-attribution notices to commits — this is a commercial repository.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Fork and Clone
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git clone https://git.circuitforge.io/circuitforge/peregrine
|
||||||
|
cd peregrine
|
||||||
|
```
|
||||||
|
|
||||||
|
Create a feature branch from `main`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git checkout -b feat/my-feature
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Dev Environment Setup
|
||||||
|
|
||||||
|
Peregrine's Python dependencies are managed with conda. The same `job-seeker` environment is used for both the legacy personal app and Peregrine.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Create the environment from the lockfile
|
||||||
|
conda env create -f environment.yml
|
||||||
|
|
||||||
|
# Activate
|
||||||
|
conda activate job-seeker
|
||||||
|
```
|
||||||
|
|
||||||
|
Alternatively, install from `requirements.txt` into an existing Python 3.12 environment:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install -r requirements.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
!!! warning "Keep the env lightweight"
|
||||||
|
Do not add `torch`, `sentence-transformers`, `bitsandbytes`, `transformers`, or any other CUDA/GPU package to the main environment. These live in separate conda environments (`job-seeker-vision` for the vision service, `ogma` for fine-tuning). Adding them to the main env causes out-of-memory failures during test runs.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Running Tests
|
||||||
|
|
||||||
|
```bash
|
||||||
|
conda run -n job-seeker python -m pytest tests/ -v
|
||||||
|
```
|
||||||
|
|
||||||
|
Or with the direct binary (avoids runaway process spawning):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
/path/to/miniconda3/envs/job-seeker/bin/pytest tests/ -v
|
||||||
|
```
|
||||||
|
|
||||||
|
The `pytest.ini` file scopes collection to the `tests/` directory only — do not widen this.
|
||||||
|
|
||||||
|
All tests must pass before submitting a PR. See [Testing](testing.md) for patterns and conventions.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Code Style
|
||||||
|
|
||||||
|
- **PEP 8** for all Python code — use `flake8` or `ruff` to check
|
||||||
|
- **Type hints preferred** on function signatures — not required but strongly encouraged
|
||||||
|
- **Docstrings** on all public functions and classes
|
||||||
|
- **No print statements** in library code (`scripts/`); use Python's `logging` module or return status in the return value. `print` is acceptable in one-off scripts and `discover.py`-style entry points.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Branch Naming
|
||||||
|
|
||||||
|
| Prefix | Use for |
|
||||||
|
|--------|---------|
|
||||||
|
| `feat/` | New features |
|
||||||
|
| `fix/` | Bug fixes |
|
||||||
|
| `docs/` | Documentation only |
|
||||||
|
| `refactor/` | Code reorganisation without behaviour change |
|
||||||
|
| `test/` | Test additions or corrections |
|
||||||
|
| `chore/` | Dependency updates, CI, tooling |
|
||||||
|
|
||||||
|
Example: `feat/add-greenhouse-scraper`, `fix/email-imap-timeout`, `docs/add-integration-guide`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## PR Checklist
|
||||||
|
|
||||||
|
Before opening a pull request:
|
||||||
|
|
||||||
|
- [ ] All tests pass: `conda run -n job-seeker python -m pytest tests/ -v`
|
||||||
|
- [ ] New behaviour is covered by at least one test
|
||||||
|
- [ ] No new dependencies added to `environment.yml` or `requirements.txt` without a clear justification in the PR description
|
||||||
|
- [ ] Documentation updated if the PR changes user-visible behaviour (update the relevant page in `docs/`)
|
||||||
|
- [ ] Config file changes are reflected in the `.example` file
|
||||||
|
- [ ] No secrets, tokens, or personal data in any committed file
|
||||||
|
- [ ] Gitignored files (`config/*.yaml`, `staging.db`, `aihawk/`, `.env`) are not committed
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What NOT to Do
|
||||||
|
|
||||||
|
- Do not commit `config/user.yaml`, `config/notion.yaml`, `config/email.yaml`, `config/adzuna.yaml`, or any `config/integrations/*.yaml` — all are gitignored
|
||||||
|
- Do not commit `staging.db`
|
||||||
|
- Do not add `torch`, `bitsandbytes`, `transformers`, or `sentence-transformers` to the main environment
|
||||||
|
- Do not add `Co-Authored-By:` or AI-attribution lines to commit messages
|
||||||
|
- Do not force-push to `main`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Getting Help
|
||||||
|
|
||||||
|
Open an issue on the repository with the `question` label. Include:
|
||||||
|
- Your OS and Docker version
|
||||||
|
- The `inference_profile` from your `config/user.yaml`
|
||||||
|
- Relevant log output from `make logs`
|
||||||
181
docs/developer-guide/testing.md
Normal file
181
docs/developer-guide/testing.md
Normal file
|
|
@ -0,0 +1,181 @@
|
||||||
|
# Testing
|
||||||
|
|
||||||
|
Peregrine has a test suite covering the core scripts layer, LLM router, integrations, wizard steps, and database helpers.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Running the Test Suite
|
||||||
|
|
||||||
|
```bash
|
||||||
|
conda run -n job-seeker python -m pytest tests/ -v
|
||||||
|
```
|
||||||
|
|
||||||
|
Or using the direct binary (recommended to avoid runaway process spawning):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
/path/to/miniconda3/envs/job-seeker/bin/pytest tests/ -v
|
||||||
|
```
|
||||||
|
|
||||||
|
`pytest.ini` scopes test collection to `tests/` only:
|
||||||
|
|
||||||
|
```ini
|
||||||
|
[pytest]
|
||||||
|
testpaths = tests
|
||||||
|
```
|
||||||
|
|
||||||
|
Do not widen this — the `aihawk/` subtree has its own test files that pull in GPU dependencies.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What Is Covered
|
||||||
|
|
||||||
|
The suite currently has approximately 219 tests covering:
|
||||||
|
|
||||||
|
| Module | What is tested |
|
||||||
|
|--------|---------------|
|
||||||
|
| `scripts/db.py` | CRUD helpers, status transitions, dedup logic |
|
||||||
|
| `scripts/llm_router.py` | Fallback chain, backend selection, vision routing, error handling |
|
||||||
|
| `scripts/match.py` | Keyword scoring, gap calculation |
|
||||||
|
| `scripts/imap_sync.py` | Email parsing, classification label mapping |
|
||||||
|
| `scripts/company_research.py` | Prompt construction, output parsing |
|
||||||
|
| `scripts/generate_cover_letter.py` | Mission alignment detection, prompt injection |
|
||||||
|
| `scripts/task_runner.py` | Task submission, dedup, status transitions |
|
||||||
|
| `scripts/user_profile.py` | Accessor methods, defaults, YAML round-trip |
|
||||||
|
| `scripts/integrations/` | Base class contract, per-driver `fields()` and `connect()` |
|
||||||
|
| `app/wizard/tiers.py` | `can_use()`, `tier_label()`, edge cases |
|
||||||
|
| `scripts/custom_boards/` | Scraper return shape, HTTP error handling |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Test Structure
|
||||||
|
|
||||||
|
Tests live in `tests/`. File naming mirrors the module being tested:
|
||||||
|
|
||||||
|
```
|
||||||
|
tests/
|
||||||
|
test_db.py
|
||||||
|
test_llm_router.py
|
||||||
|
test_match.py
|
||||||
|
test_imap_sync.py
|
||||||
|
test_company_research.py
|
||||||
|
test_cover_letter.py
|
||||||
|
test_task_runner.py
|
||||||
|
test_user_profile.py
|
||||||
|
test_integrations.py
|
||||||
|
test_tiers.py
|
||||||
|
test_adzuna.py
|
||||||
|
test_theladders.py
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Key Patterns
|
||||||
|
|
||||||
|
### tmp_path for YAML files
|
||||||
|
|
||||||
|
Use pytest's built-in `tmp_path` fixture for any test that reads or writes YAML config files:
|
||||||
|
|
||||||
|
```python
|
||||||
|
def test_user_profile_reads_name(tmp_path):
|
||||||
|
config = tmp_path / "user.yaml"
|
||||||
|
config.write_text("name: Alice\nemail: alice@example.com\n")
|
||||||
|
|
||||||
|
from scripts.user_profile import UserProfile
|
||||||
|
profile = UserProfile(config_path=config)
|
||||||
|
assert profile.name == "Alice"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Mocking LLM calls
|
||||||
|
|
||||||
|
Never make real LLM calls in tests. Patch `LLMRouter.complete`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from unittest.mock import patch
|
||||||
|
|
||||||
|
def test_cover_letter_calls_llm(tmp_path):
|
||||||
|
with patch("scripts.generate_cover_letter.LLMRouter") as MockRouter:
|
||||||
|
MockRouter.return_value.complete.return_value = "Dear Hiring Manager,\n..."
|
||||||
|
from scripts.generate_cover_letter import generate
|
||||||
|
result = generate(job={...}, user_profile={...})
|
||||||
|
|
||||||
|
assert "Dear Hiring Manager" in result
|
||||||
|
MockRouter.return_value.complete.assert_called_once()
|
||||||
|
```
|
||||||
|
|
||||||
|
### Mocking HTTP in scraper tests
|
||||||
|
|
||||||
|
```python
|
||||||
|
from unittest.mock import patch
|
||||||
|
|
||||||
|
def test_adzuna_returns_jobs():
|
||||||
|
with patch("scripts.custom_boards.adzuna.requests.get") as mock_get:
|
||||||
|
mock_get.return_value.ok = True
|
||||||
|
mock_get.return_value.raise_for_status = lambda: None
|
||||||
|
mock_get.return_value.json.return_value = {"results": [...]}
|
||||||
|
|
||||||
|
from scripts.custom_boards.adzuna import scrape
|
||||||
|
jobs = scrape(profile={...}, db_path="nonexistent.db")
|
||||||
|
|
||||||
|
assert len(jobs) > 0
|
||||||
|
```
|
||||||
|
|
||||||
|
### In-memory SQLite for DB tests
|
||||||
|
|
||||||
|
```python
|
||||||
|
import sqlite3, tempfile, os
|
||||||
|
|
||||||
|
def test_insert_job():
|
||||||
|
with tempfile.NamedTemporaryFile(suffix=".db", delete=False) as f:
|
||||||
|
db_path = f.name
|
||||||
|
try:
|
||||||
|
from scripts.db import init_db, insert_job
|
||||||
|
init_db(db_path)
|
||||||
|
insert_job(db_path, title="CSM", company="Acme", url="https://example.com/1", ...)
|
||||||
|
# assert...
|
||||||
|
finally:
|
||||||
|
os.unlink(db_path)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What NOT to Test
|
||||||
|
|
||||||
|
- **Streamlit widget rendering** — Streamlit has no headless test support. Do not try to test `st.button()` or `st.text_input()` calls. Test the underlying script functions instead.
|
||||||
|
- **Real network calls** — always mock HTTP and LLM clients
|
||||||
|
- **Real GPU inference** — mock the vision service and LLM router
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Adding Tests for New Code
|
||||||
|
|
||||||
|
### New scraper
|
||||||
|
|
||||||
|
Create `tests/test_myboard.py`. Required test cases:
|
||||||
|
1. Happy path: mock HTTP returns valid data → correct job dict shape
|
||||||
|
2. HTTP error: mock raises `Exception` → function returns `[]` (does not raise)
|
||||||
|
3. Empty results: API returns `{"results": []}` → function returns `[]`
|
||||||
|
|
||||||
|
### New integration
|
||||||
|
|
||||||
|
Add to `tests/test_integrations.py`. Required test cases:
|
||||||
|
1. `fields()` returns list of dicts with required keys
|
||||||
|
2. `connect()` returns `True` with valid config, `False` with missing required field
|
||||||
|
3. `test()` returns `True` with mocked successful HTTP, `False` with exception
|
||||||
|
4. `is_configured()` reflects file presence in `tmp_path`
|
||||||
|
|
||||||
|
### New wizard step
|
||||||
|
|
||||||
|
Add to `tests/test_wizard_steps.py`. Test the step's pure-logic functions (validation, data extraction). Do not test the Streamlit rendering.
|
||||||
|
|
||||||
|
### New tier feature gate
|
||||||
|
|
||||||
|
Add to `tests/test_tiers.py`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from app.wizard.tiers import can_use
|
||||||
|
|
||||||
|
def test_my_new_feature_requires_paid():
|
||||||
|
assert can_use("free", "my_new_feature") is False
|
||||||
|
assert can_use("paid", "my_new_feature") is True
|
||||||
|
assert can_use("premium", "my_new_feature") is True
|
||||||
|
```
|
||||||
118
docs/getting-started/docker-profiles.md
Normal file
118
docs/getting-started/docker-profiles.md
Normal file
|
|
@ -0,0 +1,118 @@
|
||||||
|
# Docker Profiles
|
||||||
|
|
||||||
|
Peregrine uses Docker Compose profiles to start only the services your hardware can support. Choose a profile with `make start PROFILE=<name>`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Profile Reference
|
||||||
|
|
||||||
|
| Profile | Services started | Use case |
|
||||||
|
|---------|----------------|----------|
|
||||||
|
| `remote` | `app`, `searxng` | No GPU. LLM calls go to an external API (Anthropic, OpenAI-compatible). |
|
||||||
|
| `cpu` | `app`, `ollama`, `searxng` | No GPU. Runs local models on CPU — functional but slow. |
|
||||||
|
| `single-gpu` | `app`, `ollama`, `vision`, `searxng` | One NVIDIA GPU. Covers cover letters, research, and vision (survey screenshots). |
|
||||||
|
| `dual-gpu` | `app`, `ollama`, `vllm`, `vision`, `searxng` | Two NVIDIA GPUs. GPU 0 = Ollama (cover letters), GPU 1 = vLLM (research). |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Service Descriptions
|
||||||
|
|
||||||
|
| Service | Image / Source | Port | Purpose |
|
||||||
|
|---------|---------------|------|---------|
|
||||||
|
| `app` | `Dockerfile` (Streamlit) | 8501 | The main Peregrine UI |
|
||||||
|
| `ollama` | `ollama/ollama` | 11434 | Local model inference — cover letters and general tasks |
|
||||||
|
| `vllm` | `vllm/vllm-openai` | 8000 | High-throughput local inference — research tasks |
|
||||||
|
| `vision` | `scripts/vision_service/` | 8002 | Moondream2 — survey screenshot analysis |
|
||||||
|
| `searxng` | `searxng/searxng` | 8888 | Private meta-search engine — company research web scraping |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Choosing a Profile
|
||||||
|
|
||||||
|
### remote
|
||||||
|
|
||||||
|
Use `remote` if:
|
||||||
|
- You have no NVIDIA GPU
|
||||||
|
- You plan to use Anthropic Claude or another API-hosted model exclusively
|
||||||
|
- You want the fastest startup (only two containers)
|
||||||
|
|
||||||
|
You must configure at least one external LLM backend in **Settings → LLM Backends**.
|
||||||
|
|
||||||
|
### cpu
|
||||||
|
|
||||||
|
Use `cpu` if:
|
||||||
|
- You have no GPU but want to run models locally (e.g. for privacy)
|
||||||
|
- Acceptable for light use; cover letter generation may take several minutes per request
|
||||||
|
|
||||||
|
Pull a model after the container starts:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker exec -it peregrine-ollama-1 ollama pull llama3.1:8b
|
||||||
|
```
|
||||||
|
|
||||||
|
### single-gpu
|
||||||
|
|
||||||
|
Use `single-gpu` if:
|
||||||
|
- You have one NVIDIA GPU with at least 8 GB VRAM
|
||||||
|
- Recommended for most single-user installs
|
||||||
|
- The vision service (Moondream2) starts on the same GPU using 4-bit quantisation (~1.5 GB VRAM)
|
||||||
|
|
||||||
|
### dual-gpu
|
||||||
|
|
||||||
|
Use `dual-gpu` if:
|
||||||
|
- You have two or more NVIDIA GPUs
|
||||||
|
- GPU 0 handles Ollama (cover letters, quick tasks)
|
||||||
|
- GPU 1 handles vLLM (research, long-context tasks)
|
||||||
|
- The vision service shares GPU 0 with Ollama
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## GPU Memory Guidance
|
||||||
|
|
||||||
|
| GPU VRAM | Recommended profile | Notes |
|
||||||
|
|----------|-------------------|-------|
|
||||||
|
| < 4 GB | `cpu` | GPU too small for practical model loading |
|
||||||
|
| 4–8 GB | `single-gpu` | Run smaller models (3B–8B parameters) |
|
||||||
|
| 8–16 GB | `single-gpu` | Run 8B–13B models comfortably |
|
||||||
|
| 16–24 GB | `single-gpu` | Run 13B–34B models |
|
||||||
|
| 24 GB+ | `single-gpu` or `dual-gpu` | 70B models with quantisation |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## How preflight.py Works
|
||||||
|
|
||||||
|
`make start` calls `scripts/preflight.py` before launching Docker. Preflight does the following:
|
||||||
|
|
||||||
|
1. **Port conflict detection** — checks whether `STREAMLIT_PORT`, `OLLAMA_PORT`, `VLLM_PORT`, `SEARXNG_PORT`, and `VISION_PORT` are already in use. Reports any conflicts and suggests alternatives.
|
||||||
|
|
||||||
|
2. **GPU enumeration** — queries `nvidia-smi` for GPU count and VRAM per card.
|
||||||
|
|
||||||
|
3. **RAM check** — reads `/proc/meminfo` (Linux) or `vm_stat` (macOS) to determine available system RAM.
|
||||||
|
|
||||||
|
4. **KV cache offload** — if GPU VRAM is less than 10 GB, preflight calculates `CPU_OFFLOAD_GB` (the amount of KV cache to spill to system RAM) and writes it to `.env`. The vLLM container picks this up via `--cpu-offload-gb`.
|
||||||
|
|
||||||
|
5. **Profile recommendation** — writes `RECOMMENDED_PROFILE` to `.env`. This is informational; `make start` uses the `PROFILE` variable you specify (defaulting to `remote`).
|
||||||
|
|
||||||
|
You can run preflight independently:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
make preflight
|
||||||
|
# or
|
||||||
|
python scripts/preflight.py
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Customising Ports
|
||||||
|
|
||||||
|
Edit `.env` before running `make start`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
STREAMLIT_PORT=8501
|
||||||
|
OLLAMA_PORT=11434
|
||||||
|
VLLM_PORT=8000
|
||||||
|
SEARXNG_PORT=8888
|
||||||
|
VISION_PORT=8002
|
||||||
|
```
|
||||||
|
|
||||||
|
All containers read from `.env` via the `env_file` directive in `compose.yml`.
|
||||||
165
docs/getting-started/first-run-wizard.md
Normal file
165
docs/getting-started/first-run-wizard.md
Normal file
|
|
@ -0,0 +1,165 @@
|
||||||
|
# First-Run Wizard
|
||||||
|
|
||||||
|
When you open Peregrine for the first time, the setup wizard launches automatically. It walks through seven steps and saves your progress after each one — if your browser closes or the server restarts, it resumes where you left off.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 1 — Hardware
|
||||||
|
|
||||||
|
Peregrine detects NVIDIA GPUs using `nvidia-smi` and reports:
|
||||||
|
|
||||||
|
- Number of GPUs found
|
||||||
|
- VRAM per GPU
|
||||||
|
- Available system RAM
|
||||||
|
|
||||||
|
Based on this, it recommends a Docker Compose profile:
|
||||||
|
|
||||||
|
| Recommendation | Condition |
|
||||||
|
|---------------|-----------|
|
||||||
|
| `remote` | No GPU detected |
|
||||||
|
| `cpu` | GPU detected but VRAM < 4 GB |
|
||||||
|
| `single-gpu` | One GPU with VRAM >= 4 GB |
|
||||||
|
| `dual-gpu` | Two or more GPUs |
|
||||||
|
|
||||||
|
You can override the recommendation and select any profile manually. The selection is written to `config/user.yaml` as `inference_profile`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 2 — Tier
|
||||||
|
|
||||||
|
Select your Peregrine tier:
|
||||||
|
|
||||||
|
| Tier | Description |
|
||||||
|
|------|-------------|
|
||||||
|
| **Free** | Job discovery, matching, and basic pipeline — no LLM features |
|
||||||
|
| **Paid** | Adds cover letters, company research, email sync, integrations, and all AI features |
|
||||||
|
| **Premium** | Adds fine-tuning and multi-user support |
|
||||||
|
|
||||||
|
Your tier is written to `config/user.yaml` as `tier`.
|
||||||
|
|
||||||
|
**Dev tier override** — for local testing without a paid licence, set `dev_tier_override: premium` in `config/user.yaml`. This is for development use only and has no effect on production deployments.
|
||||||
|
|
||||||
|
See [Tier System](../reference/tier-system.md) for the full feature gate table.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 3 — Identity
|
||||||
|
|
||||||
|
Enter your personal details. These are stored locally in `config/user.yaml` and used to personalise cover letters and research briefs.
|
||||||
|
|
||||||
|
| Field | Description |
|
||||||
|
|-------|-------------|
|
||||||
|
| Name | Your full name |
|
||||||
|
| Email | Primary contact email |
|
||||||
|
| Phone | Contact phone number |
|
||||||
|
| LinkedIn | LinkedIn profile URL |
|
||||||
|
| Career summary | 2–4 sentence professional summary — used in cover letters and interview prep |
|
||||||
|
|
||||||
|
**LLM-assisted writing (Paid):** If you have a paid tier, the wizard offers to generate your career summary from a few bullet points using your configured LLM backend.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 4 — Resume
|
||||||
|
|
||||||
|
Two paths are available:
|
||||||
|
|
||||||
|
### Upload PDF or DOCX
|
||||||
|
|
||||||
|
Upload your existing resume. The LLM parses it and extracts:
|
||||||
|
- Work experience (employer, title, dates, bullets)
|
||||||
|
- Education
|
||||||
|
- Skills
|
||||||
|
- Certifications
|
||||||
|
|
||||||
|
The extracted data is stored in `config/user.yaml` and used when generating cover letters.
|
||||||
|
|
||||||
|
### Guided form builder
|
||||||
|
|
||||||
|
Fill in each section manually using structured form fields. Useful if you do not have a digital resume file ready, or if the parser misses something important.
|
||||||
|
|
||||||
|
Both paths produce the same data structure. You can mix them — upload first, then edit the result in the form.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 5 — Inference
|
||||||
|
|
||||||
|
Configure which LLM backends Peregrine uses. Backends are tried in priority order; if the first fails, Peregrine falls back to the next.
|
||||||
|
|
||||||
|
Available backend types:
|
||||||
|
|
||||||
|
| Type | Examples | Notes |
|
||||||
|
|------|---------|-------|
|
||||||
|
| `openai_compat` | Ollama, vLLM, Claude Code wrapper, Copilot wrapper | Any OpenAI-compatible API |
|
||||||
|
| `anthropic` | Claude via Anthropic API | Requires `ANTHROPIC_API_KEY` env var |
|
||||||
|
| `vision_service` | Moondream2 local service | Used for survey screenshot analysis only |
|
||||||
|
|
||||||
|
For each backend you want to enable:
|
||||||
|
|
||||||
|
1. Enter the base URL (e.g. `http://localhost:11434/v1` for Ollama)
|
||||||
|
2. Enter an API key if required (Anthropic, OpenAI)
|
||||||
|
3. Click **Test** — Peregrine pings the `/health` endpoint and attempts a short completion
|
||||||
|
|
||||||
|
The full backend configuration is written to `config/llm.yaml`. You can edit it directly later via **Settings → LLM Backends**.
|
||||||
|
|
||||||
|
!!! tip "Recommended minimum"
|
||||||
|
Enable at least Ollama with a general-purpose model (e.g. `llama3.1:8b`) for research tasks, and either Ollama or Anthropic for cover letter generation. The wizard will not block you if no backend is configured, but most features will not work.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 6 — Search
|
||||||
|
|
||||||
|
Define what jobs to look for. Search configuration is written to `config/search_profiles.yaml`.
|
||||||
|
|
||||||
|
| Field | Description |
|
||||||
|
|-------|-------------|
|
||||||
|
| Profile name | A label for this search profile (e.g. `cs_leadership`) |
|
||||||
|
| Job titles | List of titles to search for (e.g. `Customer Success Manager`, `TAM`) |
|
||||||
|
| Locations | City/region strings or `Remote` |
|
||||||
|
| Boards | Standard boards: `linkedin`, `indeed`, `glassdoor`, `zip_recruiter`, `google` |
|
||||||
|
| Custom boards | Additional scrapers: `adzuna`, `theladders`, `craigslist` |
|
||||||
|
| Exclude keywords | Jobs containing these words in the title are dropped |
|
||||||
|
| Results per board | Max jobs to fetch per board per run |
|
||||||
|
| Hours old | Only fetch jobs posted within this many hours |
|
||||||
|
|
||||||
|
You can create multiple profiles (e.g. one for remote roles, one for a target industry). Run them all from the Home page or run a specific one.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 7 — Integrations
|
||||||
|
|
||||||
|
Connect optional external services. All integrations are optional — skip this step if you want to use Peregrine without external accounts.
|
||||||
|
|
||||||
|
Available integrations:
|
||||||
|
|
||||||
|
**Job tracking (Paid):** Notion, Airtable, Google Sheets
|
||||||
|
|
||||||
|
**Document storage (Free):** Google Drive, Dropbox, OneDrive, MEGA, Nextcloud
|
||||||
|
|
||||||
|
**Calendar (Paid):** Google Calendar, Apple Calendar (CalDAV)
|
||||||
|
|
||||||
|
**Notifications (Paid for Slack; Free for Discord and Home Assistant):** Slack, Discord, Home Assistant
|
||||||
|
|
||||||
|
Each integration has a connection card with the required credentials. Click **Test** to verify the connection before saving. Credentials are written to `config/integrations/<name>.yaml` (gitignored).
|
||||||
|
|
||||||
|
See [Integrations](../user-guide/integrations.md) for per-service details.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Crash Recovery
|
||||||
|
|
||||||
|
The wizard saves your progress to `config/user.yaml` after each step is completed (`wizard_step` field). If anything goes wrong:
|
||||||
|
|
||||||
|
- Restart Peregrine and navigate to http://localhost:8501
|
||||||
|
- The wizard resumes at the last completed step
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Re-entering the Wizard
|
||||||
|
|
||||||
|
To go through the wizard again (e.g. to change your search profile or swap LLM backends):
|
||||||
|
|
||||||
|
1. Open **Settings**
|
||||||
|
2. Go to the **Developer** tab
|
||||||
|
3. Click **Reset wizard**
|
||||||
|
|
||||||
|
This sets `wizard_complete: false` and `wizard_step: 0` in `config/user.yaml`. Your previously entered data is preserved as defaults.
|
||||||
134
docs/getting-started/installation.md
Normal file
134
docs/getting-started/installation.md
Normal file
|
|
@ -0,0 +1,134 @@
|
||||||
|
# Installation
|
||||||
|
|
||||||
|
This page walks through a full Peregrine installation from scratch.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
- **Git** — to clone the repository
|
||||||
|
- **Internet connection** — `setup.sh` downloads Docker and other dependencies
|
||||||
|
- **Operating system**: Ubuntu/Debian, Fedora/RHEL, Arch Linux, or macOS (with Docker Desktop)
|
||||||
|
|
||||||
|
!!! warning "Windows"
|
||||||
|
Windows is not supported. Use [WSL2 with Ubuntu](https://docs.microsoft.com/windows/wsl/install) instead.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 1 — Clone the repository
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git clone https://git.circuitforge.io/circuitforge/peregrine
|
||||||
|
cd peregrine
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 2 — Run setup.sh
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash setup.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
`setup.sh` performs the following automatically:
|
||||||
|
|
||||||
|
1. **Detects your platform** (Ubuntu/Debian, Fedora/RHEL, Arch, macOS)
|
||||||
|
2. **Installs Git** if not already present
|
||||||
|
3. **Installs Docker Engine** and the Docker Compose v2 plugin via the official Docker repositories
|
||||||
|
4. **Adds your user to the `docker` group** so you do not need `sudo` for docker commands (Linux only — log out and back in after this)
|
||||||
|
5. **Detects NVIDIA GPUs** — if `nvidia-smi` is present and working, installs the NVIDIA Container Toolkit and configures Docker to use it
|
||||||
|
6. **Creates `.env` from `.env.example`** — edit `.env` to customise ports and model storage paths before starting
|
||||||
|
|
||||||
|
!!! note "macOS"
|
||||||
|
`setup.sh` installs Docker Desktop via Homebrew (`brew install --cask docker`) then exits. Open Docker Desktop, start it, then re-run the script.
|
||||||
|
|
||||||
|
!!! note "GPU requirement"
|
||||||
|
For GPU support, `nvidia-smi` must return output before you run `setup.sh`. Install your NVIDIA driver first. The Container Toolkit installation will fail silently if the driver is not present.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 3 — (Optional) Edit .env
|
||||||
|
|
||||||
|
The `.env` file controls ports and volume mount paths. The defaults work for most single-user installs:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Default ports
|
||||||
|
STREAMLIT_PORT=8501
|
||||||
|
OLLAMA_PORT=11434
|
||||||
|
VLLM_PORT=8000
|
||||||
|
SEARXNG_PORT=8888
|
||||||
|
VISION_PORT=8002
|
||||||
|
```
|
||||||
|
|
||||||
|
Change `STREAMLIT_PORT` if 8501 is taken on your machine.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 4 — Start Peregrine
|
||||||
|
|
||||||
|
Choose a profile based on your hardware:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
make start # remote — no GPU, use API-only LLMs
|
||||||
|
make start PROFILE=cpu # cpu — local models on CPU (slow)
|
||||||
|
make start PROFILE=single-gpu # single-gpu — one NVIDIA GPU
|
||||||
|
make start PROFILE=dual-gpu # dual-gpu — GPU 0 = Ollama, GPU 1 = vLLM
|
||||||
|
```
|
||||||
|
|
||||||
|
`make start` runs `preflight.py` first, which checks for port conflicts and writes GPU/RAM recommendations back to `.env`. Then it calls `docker compose --profile <PROFILE> up -d`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 5 — Open the UI
|
||||||
|
|
||||||
|
Navigate to **http://localhost:8501** (or whatever `STREAMLIT_PORT` you set).
|
||||||
|
|
||||||
|
The first-run wizard launches automatically. See [First-Run Wizard](first-run-wizard.md) for a step-by-step guide through all seven steps.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Supported Platforms
|
||||||
|
|
||||||
|
| Platform | Tested | Notes |
|
||||||
|
|----------|--------|-------|
|
||||||
|
| Ubuntu 22.04 / 24.04 | Yes | Primary target |
|
||||||
|
| Debian 12 | Yes | |
|
||||||
|
| Fedora 39/40 | Yes | |
|
||||||
|
| RHEL / Rocky / AlmaLinux | Yes | |
|
||||||
|
| Arch Linux / Manjaro | Yes | |
|
||||||
|
| macOS (Apple Silicon) | Yes | Docker Desktop required; no GPU support |
|
||||||
|
| macOS (Intel) | Yes | Docker Desktop required; no GPU support |
|
||||||
|
| Windows | No | Use WSL2 with Ubuntu |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## GPU Support
|
||||||
|
|
||||||
|
Only NVIDIA GPUs are supported. AMD ROCm is not currently supported.
|
||||||
|
|
||||||
|
Requirements:
|
||||||
|
- NVIDIA driver installed and `nvidia-smi` working before running `setup.sh`
|
||||||
|
- CUDA 12.x recommended (CUDA 11.x may work but is untested)
|
||||||
|
- Minimum 8 GB VRAM for `single-gpu` profile with default models
|
||||||
|
- For `dual-gpu`: GPU 0 is assigned to Ollama, GPU 1 to vLLM
|
||||||
|
|
||||||
|
If your GPU has less than 10 GB VRAM, `preflight.py` will calculate a `CPU_OFFLOAD_GB` value and write it to `.env`. The vLLM container picks this up via `--cpu-offload-gb` to overflow KV cache to system RAM.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Stopping Peregrine
|
||||||
|
|
||||||
|
```bash
|
||||||
|
make stop # stop all containers
|
||||||
|
make restart # stop then start again (runs preflight first)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Reinstalling / Clean State
|
||||||
|
|
||||||
|
```bash
|
||||||
|
make clean # removes containers, images, and data volumes (destructive)
|
||||||
|
```
|
||||||
|
|
||||||
|
You will be prompted to type `yes` to confirm.
|
||||||
65
docs/index.md
Normal file
65
docs/index.md
Normal file
|
|
@ -0,0 +1,65 @@
|
||||||
|
# Peregrine
|
||||||
|
|
||||||
|
**AI-powered job search pipeline — by [Circuit Forge LLC](https://circuitforge.io)**
|
||||||
|
|
||||||
|
Peregrine automates the full job search lifecycle: discovery, matching, cover letter generation, application tracking, and interview preparation. It is privacy-first and local-first — your data never leaves your machine unless you configure an external integration.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. Clone and install dependencies
|
||||||
|
git clone https://git.circuitforge.io/circuitforge/peregrine
|
||||||
|
cd peregrine
|
||||||
|
bash setup.sh
|
||||||
|
|
||||||
|
# 2. Start Peregrine
|
||||||
|
make start # no GPU, API-only
|
||||||
|
make start PROFILE=single-gpu # one NVIDIA GPU
|
||||||
|
make start PROFILE=dual-gpu # dual GPU (Ollama + vLLM)
|
||||||
|
|
||||||
|
# 3. Open the UI
|
||||||
|
# http://localhost:8501
|
||||||
|
```
|
||||||
|
|
||||||
|
The first-run wizard guides you through hardware detection, tier selection, identity, resume, LLM configuration, search profiles, and integrations. See [Installation](getting-started/installation.md) for the full walkthrough.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Feature Overview
|
||||||
|
|
||||||
|
| Feature | Free | Paid | Premium |
|
||||||
|
|---------|------|------|---------|
|
||||||
|
| Job discovery (JobSpy + custom boards) | Yes | Yes | Yes |
|
||||||
|
| Resume keyword matching | Yes | Yes | Yes |
|
||||||
|
| Cover letter generation | - | Yes | Yes |
|
||||||
|
| Company research briefs | - | Yes | Yes |
|
||||||
|
| Interview prep & practice Q&A | - | Yes | Yes |
|
||||||
|
| Email sync & auto-classification | - | Yes | Yes |
|
||||||
|
| Survey assistant (culture-fit Q&A) | - | Yes | Yes |
|
||||||
|
| Integration connectors (Notion, Airtable, etc.) | Partial | Yes | Yes |
|
||||||
|
| Calendar sync (Google, Apple) | - | Yes | Yes |
|
||||||
|
| Cover letter model fine-tuning | - | - | Yes |
|
||||||
|
| Multi-user support | - | - | Yes |
|
||||||
|
|
||||||
|
See [Tier System](reference/tier-system.md) for the full feature gate table.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Documentation Sections
|
||||||
|
|
||||||
|
- **[Getting Started](getting-started/installation.md)** — Install, configure, and launch Peregrine
|
||||||
|
- **[User Guide](user-guide/job-discovery.md)** — How to use every feature in the UI
|
||||||
|
- **[Developer Guide](developer-guide/contributing.md)** — Add scrapers, integrations, and contribute code
|
||||||
|
- **[Reference](reference/tier-system.md)** — Tier system, LLM router, and config file schemas
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
Core discovery pipeline: [MIT](https://git.circuitforge.io/circuitforge/peregrine/src/branch/main/LICENSE-MIT)
|
||||||
|
|
||||||
|
AI features (cover letter generation, company research, interview prep, UI): [BSL 1.1](https://git.circuitforge.io/circuitforge/peregrine/src/branch/main/LICENSE-BSL)
|
||||||
|
|
||||||
|
© 2026 Circuit Forge LLC
|
||||||
|
|
@ -1,201 +0,0 @@
|
||||||
# Job Seeker Platform — Design Document
|
|
||||||
**Date:** 2026-02-20
|
|
||||||
**Status:** Approved
|
|
||||||
**Candidate:** Alex Rivera
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Overview
|
|
||||||
|
|
||||||
A monorepo project at `/devl/job-seeker/` that integrates three FOSS tools into a
|
|
||||||
cohesive job search pipeline: automated discovery (JobSpy), resume-to-listing keyword
|
|
||||||
matching (Resume Matcher), and automated application submission (AIHawk). Job listings
|
|
||||||
and interactive documents are tracked in Notion; source documents live in
|
|
||||||
`/Library/Documents/JobSearch/`.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Project Structure
|
|
||||||
|
|
||||||
```
|
|
||||||
/devl/job-seeker/
|
|
||||||
├── config/
|
|
||||||
│ ├── search_profiles.yaml # JobSpy queries (titles, locations, boards)
|
|
||||||
│ ├── llm.yaml # LLM router: backends + fallback order
|
|
||||||
│ └── notion.yaml # Notion DB IDs and field mappings
|
|
||||||
├── aihawk/ # git clone — Auto_Jobs_Applier_AIHawk
|
|
||||||
├── resume_matcher/ # git clone — Resume-Matcher
|
|
||||||
├── scripts/
|
|
||||||
│ ├── discover.py # JobSpy → deduplicate → push to Notion
|
|
||||||
│ ├── match.py # Notion job URL → Resume Matcher → write score back
|
|
||||||
│ └── llm_router.py # LLM abstraction layer with priority fallback chain
|
|
||||||
├── docs/plans/ # Design and implementation docs (no resume files)
|
|
||||||
├── environment.yml # conda env spec (env name: job-seeker)
|
|
||||||
└── .gitignore
|
|
||||||
```
|
|
||||||
|
|
||||||
**Document storage rule:** Resumes, cover letters, and any interactable documents live
|
|
||||||
in `/Library/Documents/JobSearch/` or Notion — never committed to this repo.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Architecture
|
|
||||||
|
|
||||||
### Data Flow
|
|
||||||
|
|
||||||
```
|
|
||||||
JobSpy (LinkedIn / Indeed / Glassdoor / ZipRecruiter)
|
|
||||||
└─▶ discover.py
|
|
||||||
├─ deduplicate by URL against existing Notion records
|
|
||||||
└─▶ Notion DB (Status: "New")
|
|
||||||
|
|
||||||
Notion DB (daily review — decide what to pursue)
|
|
||||||
└─▶ match.py <notion-page-url>
|
|
||||||
├─ fetch job description from listing URL
|
|
||||||
├─ run Resume Matcher vs. /Library/Documents/JobSearch/Alex_Rivera_Resume_02-19-2025.pdf
|
|
||||||
└─▶ write Match Score + Keyword Gaps back to Notion page
|
|
||||||
|
|
||||||
AIHawk (when ready to apply)
|
|
||||||
├─ reads config pointing to same resume + personal_info.yaml
|
|
||||||
├─ llm_router.py → best available LLM backend
|
|
||||||
├─ submits LinkedIn Easy Apply
|
|
||||||
└─▶ Notion status → "Applied"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Notion Database Schema
|
|
||||||
|
|
||||||
| Field | Type | Notes |
|
|
||||||
|---------------|----------|------------------------------------------------------------|
|
|
||||||
| Job Title | Title | Primary identifier |
|
|
||||||
| Company | Text | |
|
|
||||||
| Location | Text | |
|
|
||||||
| Remote | Checkbox | |
|
|
||||||
| URL | URL | Deduplication key |
|
|
||||||
| Source | Select | LinkedIn / Indeed / Glassdoor / ZipRecruiter |
|
|
||||||
| Status | Select | New → Reviewing → Applied → Interview → Offer → Rejected |
|
|
||||||
| Match Score | Number | 0–100, written by match.py |
|
|
||||||
| Keyword Gaps | Text | Comma-separated missing keywords from Resume Matcher |
|
|
||||||
| Salary | Text | If listed |
|
|
||||||
| Date Found | Date | Set at discovery time |
|
|
||||||
| Notes | Text | Manual field |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## LLM Router (`scripts/llm_router.py`)
|
|
||||||
|
|
||||||
Single `complete(prompt, system=None)` interface. On each call: health-check each
|
|
||||||
backend in configured order, use the first that responds. Falls back silently on
|
|
||||||
connection error, timeout, or 5xx. Logs which backend was used.
|
|
||||||
|
|
||||||
All backends except Anthropic use the `openai` Python package (OpenAI-compatible
|
|
||||||
endpoints). Anthropic uses the `anthropic` package.
|
|
||||||
|
|
||||||
### `config/llm.yaml`
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
fallback_order:
|
|
||||||
- claude_code # port 3009 — Claude via local pipeline (highest quality)
|
|
||||||
- ollama # port 11434 — local, always-on
|
|
||||||
- vllm # port 8000 — start when needed
|
|
||||||
- github_copilot # port 3010 — Copilot via gh token
|
|
||||||
- anthropic # cloud fallback, burns API credits
|
|
||||||
|
|
||||||
backends:
|
|
||||||
claude_code:
|
|
||||||
type: openai_compat
|
|
||||||
base_url: http://localhost:3009/v1
|
|
||||||
model: claude-code-terminal
|
|
||||||
api_key: "any"
|
|
||||||
|
|
||||||
ollama:
|
|
||||||
type: openai_compat
|
|
||||||
base_url: http://localhost:11434/v1
|
|
||||||
model: llama3.2
|
|
||||||
api_key: "ollama"
|
|
||||||
|
|
||||||
vllm:
|
|
||||||
type: openai_compat
|
|
||||||
base_url: http://localhost:8000/v1
|
|
||||||
model: __auto__
|
|
||||||
api_key: ""
|
|
||||||
|
|
||||||
github_copilot:
|
|
||||||
type: openai_compat
|
|
||||||
base_url: http://localhost:3010/v1
|
|
||||||
model: gpt-4o
|
|
||||||
api_key: "any"
|
|
||||||
|
|
||||||
anthropic:
|
|
||||||
type: anthropic
|
|
||||||
model: claude-sonnet-4-6
|
|
||||||
api_key_env: ANTHROPIC_API_KEY
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Job Search Profile
|
|
||||||
|
|
||||||
### `config/search_profiles.yaml` (initial)
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
profiles:
|
|
||||||
- name: cs_leadership
|
|
||||||
titles:
|
|
||||||
- "Customer Success Manager"
|
|
||||||
- "Director of Customer Success"
|
|
||||||
- "VP Customer Success"
|
|
||||||
- "Head of Customer Success"
|
|
||||||
- "Technical Account Manager"
|
|
||||||
- "Revenue Operations Manager"
|
|
||||||
- "Customer Experience Lead"
|
|
||||||
locations:
|
|
||||||
- "Remote"
|
|
||||||
- "San Francisco Bay Area, CA"
|
|
||||||
boards:
|
|
||||||
- linkedin
|
|
||||||
- indeed
|
|
||||||
- glassdoor
|
|
||||||
- zip_recruiter
|
|
||||||
results_per_board: 25
|
|
||||||
remote_only: false # remote preferred but Bay Area in-person ok
|
|
||||||
hours_old: 72 # listings posted in last 3 days
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Conda Environment
|
|
||||||
|
|
||||||
New dedicated env `job-seeker` (not base). Core packages:
|
|
||||||
|
|
||||||
- `python-jobspy` — job scraping
|
|
||||||
- `notion-client` — Notion API
|
|
||||||
- `openai` — OpenAI-compatible calls (Ollama, vLLM, Copilot, Claude pipeline)
|
|
||||||
- `anthropic` — Anthropic API fallback
|
|
||||||
- `pyyaml` — config parsing
|
|
||||||
- `pandas` — CSV handling and dedup
|
|
||||||
- Resume Matcher dependencies (sentence-transformers, streamlit — installed from clone)
|
|
||||||
|
|
||||||
Resume Matcher Streamlit UI runs on port **8501** (confirmed clear).
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Port Map
|
|
||||||
|
|
||||||
| Port | Service | Status |
|
|
||||||
|-------|--------------------------------|----------------|
|
|
||||||
| 3009 | Claude Code OpenAI wrapper | Start via manage.sh in Post Fight Processing |
|
|
||||||
| 3010 | GitHub Copilot wrapper | Start via manage-copilot.sh |
|
|
||||||
| 11434 | Ollama | Running |
|
|
||||||
| 8000 | vLLM | Start when needed |
|
|
||||||
| 8501 | Resume Matcher (Streamlit) | Start when needed |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Out of Scope (this phase)
|
|
||||||
|
|
||||||
- Scheduled/cron automation (run discover.py manually for now)
|
|
||||||
- Email/SMS alerts for new listings
|
|
||||||
- ATS resume rebuild (separate task)
|
|
||||||
- Applications to non-LinkedIn platforms via AIHawk
|
|
||||||
File diff suppressed because it is too large
Load diff
|
|
@ -1,148 +0,0 @@
|
||||||
# Job Seeker Platform — Web UI Design
|
|
||||||
|
|
||||||
**Date:** 2026-02-20
|
|
||||||
**Status:** Approved
|
|
||||||
|
|
||||||
## Overview
|
|
||||||
|
|
||||||
A Streamlit multi-page web UI that gives Alex (and her partner) a friendly interface to review scraped job listings, curate them before they hit Notion, edit search/LLM/Notion settings, and fill out her AIHawk application profile. Designed to be usable by anyone — no technical knowledge required.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Architecture & Data Flow
|
|
||||||
|
|
||||||
```
|
|
||||||
discover.py → SQLite staging.db (status: pending)
|
|
||||||
↓
|
|
||||||
Streamlit UI
|
|
||||||
review / approve / reject
|
|
||||||
↓
|
|
||||||
"Sync N approved jobs" button
|
|
||||||
↓
|
|
||||||
Notion DB (status: synced)
|
|
||||||
```
|
|
||||||
|
|
||||||
`discover.py` is modified to write to SQLite instead of directly to Notion.
|
|
||||||
A new `sync.py` handles the approved → Notion push.
|
|
||||||
`db.py` provides shared SQLite helpers used by both scripts and UI pages.
|
|
||||||
|
|
||||||
### SQLite Schema (`staging.db`, gitignored)
|
|
||||||
|
|
||||||
```sql
|
|
||||||
CREATE TABLE jobs (
|
|
||||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
||||||
title TEXT,
|
|
||||||
company TEXT,
|
|
||||||
url TEXT UNIQUE,
|
|
||||||
source TEXT,
|
|
||||||
location TEXT,
|
|
||||||
is_remote INTEGER,
|
|
||||||
salary TEXT,
|
|
||||||
description TEXT,
|
|
||||||
match_score REAL,
|
|
||||||
keyword_gaps TEXT,
|
|
||||||
date_found TEXT,
|
|
||||||
status TEXT DEFAULT 'pending', -- pending / approved / rejected / synced
|
|
||||||
notion_page_id TEXT
|
|
||||||
);
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Pages
|
|
||||||
|
|
||||||
### Home (Dashboard)
|
|
||||||
- Stat cards: Pending / Approved / Rejected / Synced counts
|
|
||||||
- "Run Discovery" button — runs `discover.py` as subprocess, streams output
|
|
||||||
- "Sync N approved jobs → Notion" button — visible only when approved count > 0
|
|
||||||
- Recent activity list (last 10 jobs found)
|
|
||||||
|
|
||||||
### Job Review
|
|
||||||
- Filterable table/card view of pending jobs
|
|
||||||
- Filters: source (LinkedIn/Indeed/etc), remote only toggle, minimum match score slider
|
|
||||||
- Checkboxes for batch selection
|
|
||||||
- "Approve Selected" / "Reject Selected" buttons
|
|
||||||
- Rejected jobs hidden by default, togglable
|
|
||||||
- Match score shown as colored badge (green ≥70, amber 40–69, red <40)
|
|
||||||
|
|
||||||
### Settings
|
|
||||||
Three tabs:
|
|
||||||
|
|
||||||
**Search** — edit `config/search_profiles.yaml`:
|
|
||||||
- Job titles (add/remove tags)
|
|
||||||
- Locations (add/remove)
|
|
||||||
- Boards checkboxes
|
|
||||||
- Hours old slider
|
|
||||||
- Results per board slider
|
|
||||||
|
|
||||||
**LLM Backends** — edit `config/llm.yaml`:
|
|
||||||
- Fallback order (drag or up/down arrows)
|
|
||||||
- Per-backend: URL, model name, enabled toggle
|
|
||||||
- "Test connection" button per backend
|
|
||||||
|
|
||||||
**Notion** — edit `config/notion.yaml`:
|
|
||||||
- Token field (masked, show/hide toggle)
|
|
||||||
- Database ID
|
|
||||||
- "Test connection" button
|
|
||||||
|
|
||||||
### Resume Editor
|
|
||||||
Sectioned form over `aihawk/data_folder/plain_text_resume.yaml`:
|
|
||||||
- **Personal Info** — name, email, phone, LinkedIn, city, zip
|
|
||||||
- **Education** — list of entries, add/remove buttons
|
|
||||||
- **Experience** — list of entries, add/remove buttons
|
|
||||||
- **Skills & Interests** — tag-style inputs
|
|
||||||
- **Preferences** — salary range, notice period, remote/relocation toggles
|
|
||||||
- **Self-Identification** — gender, pronouns, veteran, disability, ethnicity (with "prefer not to say" options)
|
|
||||||
- **Legal** — work authorization checkboxes
|
|
||||||
|
|
||||||
`FILL_IN` fields highlighted in amber with "Needs your attention" note.
|
|
||||||
Save button writes back to YAML. No raw YAML shown by default.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Theme & Styling
|
|
||||||
|
|
||||||
Central theme at `app/.streamlit/config.toml`:
|
|
||||||
- Dark base, accent color teal/green (job search = growth)
|
|
||||||
- Consistent font (Inter or system sans-serif)
|
|
||||||
- Responsive column layouts — usable on tablet/mobile
|
|
||||||
- No jargon — "Run Discovery" not "Execute scrape", "Sync to Notion" not "Push records"
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## File Layout
|
|
||||||
|
|
||||||
```
|
|
||||||
app/
|
|
||||||
├── .streamlit/
|
|
||||||
│ └── config.toml # central theme
|
|
||||||
├── Home.py # dashboard
|
|
||||||
└── pages/
|
|
||||||
├── 1_Job_Review.py
|
|
||||||
├── 2_Settings.py
|
|
||||||
└── 3_Resume_Editor.py
|
|
||||||
scripts/
|
|
||||||
├── db.py # new: SQLite helpers
|
|
||||||
├── sync.py # new: approved → Notion push
|
|
||||||
├── discover.py # modified: write to SQLite not Notion
|
|
||||||
├── match.py # unchanged
|
|
||||||
└── llm_router.py # unchanged
|
|
||||||
```
|
|
||||||
|
|
||||||
Run: `conda run -n job-seeker streamlit run app/Home.py`
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## New Dependencies
|
|
||||||
|
|
||||||
None — `streamlit` already installed via resume_matcher deps.
|
|
||||||
`sqlite3` is Python stdlib.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Out of Scope
|
|
||||||
|
|
||||||
- Real-time collaboration
|
|
||||||
- Mobile native app
|
|
||||||
- Cover letter editor (handled separately via LoRA fine-tune task)
|
|
||||||
- AIHawk trigger from UI (run manually for now)
|
|
||||||
File diff suppressed because it is too large
Load diff
|
|
@ -1,100 +0,0 @@
|
||||||
# Background Task Processing — Design
|
|
||||||
|
|
||||||
**Date:** 2026-02-21
|
|
||||||
**Status:** Approved
|
|
||||||
|
|
||||||
## Problem
|
|
||||||
|
|
||||||
Cover letter generation (`4_Apply.py`) and company research (`6_Interview_Prep.py`) call LLM scripts synchronously inside `st.spinner()`. If the user navigates away during generation, Streamlit abandons the in-progress call and the result is lost. Both results are already persisted to SQLite on completion, so if the task kept running in the background the result would be available on return.
|
|
||||||
|
|
||||||
## Solution Overview
|
|
||||||
|
|
||||||
Python threading + SQLite task table. When a user clicks Generate, a daemon thread is spawned immediately and the task is recorded in a new `background_tasks` table. The thread writes results to the existing tables (`jobs.cover_letter`, `company_research`) and marks itself complete/failed. All pages share a sidebar indicator that auto-refreshes while tasks are active. Individual pages show task-level status inline.
|
|
||||||
|
|
||||||
## SQLite Schema
|
|
||||||
|
|
||||||
New table `background_tasks` added in `scripts/db.py`:
|
|
||||||
|
|
||||||
```sql
|
|
||||||
CREATE TABLE IF NOT EXISTS background_tasks (
|
|
||||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
||||||
task_type TEXT NOT NULL, -- "cover_letter" | "company_research"
|
|
||||||
job_id INTEGER NOT NULL,
|
|
||||||
status TEXT NOT NULL DEFAULT 'queued', -- queued | running | completed | failed
|
|
||||||
error TEXT,
|
|
||||||
created_at DATETIME DEFAULT (datetime('now')),
|
|
||||||
started_at DATETIME,
|
|
||||||
finished_at DATETIME
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
## Deduplication Rule
|
|
||||||
|
|
||||||
Before inserting a new task, check for an existing `queued` or `running` row with the same `(task_type, job_id)`. If one exists, reject the submission (return the existing task's id). Different task types for the same job (e.g. cover letter + research) are allowed to run concurrently. Different jobs of the same type are allowed concurrently.
|
|
||||||
|
|
||||||
## Components
|
|
||||||
|
|
||||||
### `scripts/task_runner.py` (new)
|
|
||||||
|
|
||||||
- `submit_task(db, task_type, job_id) -> int` — dedup check, insert row, spawn daemon thread, return task id
|
|
||||||
- `_run_task(db, task_id, task_type, job_id)` — thread body: mark running, call generator, save result, mark completed/failed
|
|
||||||
- `get_active_tasks(db) -> list[dict]` — all queued/running rows with job title+company joined
|
|
||||||
- `get_task_for_job(db, task_type, job_id) -> dict | None` — latest task row for a specific job+type
|
|
||||||
|
|
||||||
### `scripts/db.py` (modified)
|
|
||||||
|
|
||||||
- Add `init_background_tasks(conn)` called inside `init_db()`
|
|
||||||
- Add `insert_task`, `update_task_status`, `get_active_tasks`, `get_task_for_job` helpers
|
|
||||||
|
|
||||||
### `app/app.py` (modified)
|
|
||||||
|
|
||||||
- After `st.navigation()`, call `get_active_tasks()` and render sidebar indicator
|
|
||||||
- Use `st.fragment` with `time.sleep(3)` + `st.rerun(scope="fragment")` to poll while tasks are active
|
|
||||||
- Sidebar shows: `⏳ N task(s) running` count + per-task line (type + company name)
|
|
||||||
- Fragment polling stops when active task count reaches zero
|
|
||||||
|
|
||||||
### `app/pages/4_Apply.py` (modified)
|
|
||||||
|
|
||||||
- Generate button calls `submit_task(db, "cover_letter", job_id)` instead of running inline
|
|
||||||
- If a task is `queued`/`running` for the selected job, disable button and show inline status fragment (polls every 3s)
|
|
||||||
- On `completed`, load cover letter from `jobs` row (already saved by thread)
|
|
||||||
- On `failed`, show error message and re-enable button
|
|
||||||
|
|
||||||
### `app/pages/6_Interview_Prep.py` (modified)
|
|
||||||
|
|
||||||
- Generate/Refresh buttons call `submit_task(db, "company_research", job_id)` instead of running inline
|
|
||||||
- Same inline status fragment pattern as Apply page
|
|
||||||
|
|
||||||
## Data Flow
|
|
||||||
|
|
||||||
```
|
|
||||||
User clicks Generate
|
|
||||||
→ submit_task(db, type, job_id)
|
|
||||||
→ dedup check (reject if already queued/running for same type+job)
|
|
||||||
→ INSERT background_tasks row (status=queued)
|
|
||||||
→ spawn daemon thread
|
|
||||||
→ return task_id
|
|
||||||
→ page shows inline "⏳ Queued…" fragment
|
|
||||||
|
|
||||||
Thread runs
|
|
||||||
→ UPDATE status=running, started_at=now
|
|
||||||
→ call generate_cover_letter.generate() OR research_company()
|
|
||||||
→ write result to jobs.cover_letter OR company_research table
|
|
||||||
→ UPDATE status=completed, finished_at=now
|
|
||||||
(on exception: UPDATE status=failed, error=str(e))
|
|
||||||
|
|
||||||
Sidebar fragment (every 3s while active tasks > 0)
|
|
||||||
→ get_active_tasks() → render count + list
|
|
||||||
→ st.rerun(scope="fragment")
|
|
||||||
|
|
||||||
Page fragment (every 3s while task for this job is running)
|
|
||||||
→ get_task_for_job() → render status
|
|
||||||
→ on completed: st.rerun() (full rerun to reload cover letter / research)
|
|
||||||
```
|
|
||||||
|
|
||||||
## What Is Not Changed
|
|
||||||
|
|
||||||
- `generate_cover_letter.generate()` and `research_company()` are called unchanged from the thread
|
|
||||||
- `update_cover_letter()` and `save_research()` DB helpers are reused unchanged
|
|
||||||
- No new Python packages required
|
|
||||||
- No separate worker process — daemon threads die with the Streamlit server, but results already written to SQLite survive
|
|
||||||
|
|
@ -1,933 +0,0 @@
|
||||||
# Background Task Processing Implementation Plan
|
|
||||||
|
|
||||||
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
|
|
||||||
|
|
||||||
**Goal:** Replace synchronous LLM calls in Apply and Interview Prep pages with background threads so cover letter and research generation survive page navigation.
|
|
||||||
|
|
||||||
**Architecture:** A new `background_tasks` SQLite table tracks task state. `scripts/task_runner.py` spawns daemon threads that call existing generator functions and write results via existing DB helpers. The Streamlit sidebar polls active tasks every 3s via `@st.fragment(run_every=3)`; individual pages show per-job status with the same pattern.
|
|
||||||
|
|
||||||
**Tech Stack:** Python `threading` (stdlib), SQLite, Streamlit `st.fragment` (≥1.33 — already installed)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 1: Add background_tasks table and DB helpers
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `scripts/db.py`
|
|
||||||
- Test: `tests/test_db.py`
|
|
||||||
|
|
||||||
### Step 1: Write the failing tests
|
|
||||||
|
|
||||||
Add to `tests/test_db.py`:
|
|
||||||
|
|
||||||
```python
|
|
||||||
# ── background_tasks tests ────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
def test_init_db_creates_background_tasks_table(tmp_path):
|
|
||||||
"""init_db creates a background_tasks table."""
|
|
||||||
from scripts.db import init_db
|
|
||||||
db_path = tmp_path / "test.db"
|
|
||||||
init_db(db_path)
|
|
||||||
import sqlite3
|
|
||||||
conn = sqlite3.connect(db_path)
|
|
||||||
cur = conn.execute(
|
|
||||||
"SELECT name FROM sqlite_master WHERE type='table' AND name='background_tasks'"
|
|
||||||
)
|
|
||||||
assert cur.fetchone() is not None
|
|
||||||
conn.close()
|
|
||||||
|
|
||||||
|
|
||||||
def test_insert_task_returns_id_and_true(tmp_path):
|
|
||||||
"""insert_task returns (task_id, True) for a new task."""
|
|
||||||
from scripts.db import init_db, insert_job, insert_task
|
|
||||||
db_path = tmp_path / "test.db"
|
|
||||||
init_db(db_path)
|
|
||||||
job_id = insert_job(db_path, {
|
|
||||||
"title": "CSM", "company": "Acme", "url": "https://ex.com/1",
|
|
||||||
"source": "linkedin", "location": "Remote", "is_remote": True,
|
|
||||||
"salary": "", "description": "", "date_found": "2026-02-20",
|
|
||||||
})
|
|
||||||
task_id, is_new = insert_task(db_path, "cover_letter", job_id)
|
|
||||||
assert isinstance(task_id, int) and task_id > 0
|
|
||||||
assert is_new is True
|
|
||||||
|
|
||||||
|
|
||||||
def test_insert_task_deduplicates_active_task(tmp_path):
|
|
||||||
"""insert_task returns (existing_id, False) if a queued/running task already exists."""
|
|
||||||
from scripts.db import init_db, insert_job, insert_task
|
|
||||||
db_path = tmp_path / "test.db"
|
|
||||||
init_db(db_path)
|
|
||||||
job_id = insert_job(db_path, {
|
|
||||||
"title": "CSM", "company": "Acme", "url": "https://ex.com/1",
|
|
||||||
"source": "linkedin", "location": "Remote", "is_remote": True,
|
|
||||||
"salary": "", "description": "", "date_found": "2026-02-20",
|
|
||||||
})
|
|
||||||
first_id, _ = insert_task(db_path, "cover_letter", job_id)
|
|
||||||
second_id, is_new = insert_task(db_path, "cover_letter", job_id)
|
|
||||||
assert second_id == first_id
|
|
||||||
assert is_new is False
|
|
||||||
|
|
||||||
|
|
||||||
def test_insert_task_allows_different_types_same_job(tmp_path):
|
|
||||||
"""insert_task allows cover_letter and company_research for the same job concurrently."""
|
|
||||||
from scripts.db import init_db, insert_job, insert_task
|
|
||||||
db_path = tmp_path / "test.db"
|
|
||||||
init_db(db_path)
|
|
||||||
job_id = insert_job(db_path, {
|
|
||||||
"title": "CSM", "company": "Acme", "url": "https://ex.com/1",
|
|
||||||
"source": "linkedin", "location": "Remote", "is_remote": True,
|
|
||||||
"salary": "", "description": "", "date_found": "2026-02-20",
|
|
||||||
})
|
|
||||||
_, cl_new = insert_task(db_path, "cover_letter", job_id)
|
|
||||||
_, res_new = insert_task(db_path, "company_research", job_id)
|
|
||||||
assert cl_new is True
|
|
||||||
assert res_new is True
|
|
||||||
|
|
||||||
|
|
||||||
def test_update_task_status_running(tmp_path):
|
|
||||||
"""update_task_status('running') sets started_at."""
|
|
||||||
from scripts.db import init_db, insert_job, insert_task, update_task_status
|
|
||||||
import sqlite3
|
|
||||||
db_path = tmp_path / "test.db"
|
|
||||||
init_db(db_path)
|
|
||||||
job_id = insert_job(db_path, {
|
|
||||||
"title": "CSM", "company": "Acme", "url": "https://ex.com/1",
|
|
||||||
"source": "linkedin", "location": "Remote", "is_remote": True,
|
|
||||||
"salary": "", "description": "", "date_found": "2026-02-20",
|
|
||||||
})
|
|
||||||
task_id, _ = insert_task(db_path, "cover_letter", job_id)
|
|
||||||
update_task_status(db_path, task_id, "running")
|
|
||||||
conn = sqlite3.connect(db_path)
|
|
||||||
row = conn.execute("SELECT status, started_at FROM background_tasks WHERE id=?", (task_id,)).fetchone()
|
|
||||||
conn.close()
|
|
||||||
assert row[0] == "running"
|
|
||||||
assert row[1] is not None
|
|
||||||
|
|
||||||
|
|
||||||
def test_update_task_status_completed(tmp_path):
|
|
||||||
"""update_task_status('completed') sets finished_at."""
|
|
||||||
from scripts.db import init_db, insert_job, insert_task, update_task_status
|
|
||||||
import sqlite3
|
|
||||||
db_path = tmp_path / "test.db"
|
|
||||||
init_db(db_path)
|
|
||||||
job_id = insert_job(db_path, {
|
|
||||||
"title": "CSM", "company": "Acme", "url": "https://ex.com/1",
|
|
||||||
"source": "linkedin", "location": "Remote", "is_remote": True,
|
|
||||||
"salary": "", "description": "", "date_found": "2026-02-20",
|
|
||||||
})
|
|
||||||
task_id, _ = insert_task(db_path, "cover_letter", job_id)
|
|
||||||
update_task_status(db_path, task_id, "completed")
|
|
||||||
conn = sqlite3.connect(db_path)
|
|
||||||
row = conn.execute("SELECT status, finished_at FROM background_tasks WHERE id=?", (task_id,)).fetchone()
|
|
||||||
conn.close()
|
|
||||||
assert row[0] == "completed"
|
|
||||||
assert row[1] is not None
|
|
||||||
|
|
||||||
|
|
||||||
def test_update_task_status_failed_stores_error(tmp_path):
|
|
||||||
"""update_task_status('failed') stores error message and sets finished_at."""
|
|
||||||
from scripts.db import init_db, insert_job, insert_task, update_task_status
|
|
||||||
import sqlite3
|
|
||||||
db_path = tmp_path / "test.db"
|
|
||||||
init_db(db_path)
|
|
||||||
job_id = insert_job(db_path, {
|
|
||||||
"title": "CSM", "company": "Acme", "url": "https://ex.com/1",
|
|
||||||
"source": "linkedin", "location": "Remote", "is_remote": True,
|
|
||||||
"salary": "", "description": "", "date_found": "2026-02-20",
|
|
||||||
})
|
|
||||||
task_id, _ = insert_task(db_path, "cover_letter", job_id)
|
|
||||||
update_task_status(db_path, task_id, "failed", error="LLM timeout")
|
|
||||||
conn = sqlite3.connect(db_path)
|
|
||||||
row = conn.execute("SELECT status, error, finished_at FROM background_tasks WHERE id=?", (task_id,)).fetchone()
|
|
||||||
conn.close()
|
|
||||||
assert row[0] == "failed"
|
|
||||||
assert row[1] == "LLM timeout"
|
|
||||||
assert row[2] is not None
|
|
||||||
|
|
||||||
|
|
||||||
def test_get_active_tasks_returns_only_active(tmp_path):
|
|
||||||
"""get_active_tasks returns only queued/running tasks with job info joined."""
|
|
||||||
from scripts.db import init_db, insert_job, insert_task, update_task_status, get_active_tasks
|
|
||||||
db_path = tmp_path / "test.db"
|
|
||||||
init_db(db_path)
|
|
||||||
job_id = insert_job(db_path, {
|
|
||||||
"title": "CSM", "company": "Acme", "url": "https://ex.com/1",
|
|
||||||
"source": "linkedin", "location": "Remote", "is_remote": True,
|
|
||||||
"salary": "", "description": "", "date_found": "2026-02-20",
|
|
||||||
})
|
|
||||||
active_id, _ = insert_task(db_path, "cover_letter", job_id)
|
|
||||||
done_id, _ = insert_task(db_path, "company_research", job_id)
|
|
||||||
update_task_status(db_path, done_id, "completed")
|
|
||||||
|
|
||||||
tasks = get_active_tasks(db_path)
|
|
||||||
assert len(tasks) == 1
|
|
||||||
assert tasks[0]["id"] == active_id
|
|
||||||
assert tasks[0]["company"] == "Acme"
|
|
||||||
assert tasks[0]["title"] == "CSM"
|
|
||||||
|
|
||||||
|
|
||||||
def test_get_task_for_job_returns_latest(tmp_path):
|
|
||||||
"""get_task_for_job returns the most recent task for the given type+job."""
|
|
||||||
from scripts.db import init_db, insert_job, insert_task, update_task_status, get_task_for_job
|
|
||||||
db_path = tmp_path / "test.db"
|
|
||||||
init_db(db_path)
|
|
||||||
job_id = insert_job(db_path, {
|
|
||||||
"title": "CSM", "company": "Acme", "url": "https://ex.com/1",
|
|
||||||
"source": "linkedin", "location": "Remote", "is_remote": True,
|
|
||||||
"salary": "", "description": "", "date_found": "2026-02-20",
|
|
||||||
})
|
|
||||||
first_id, _ = insert_task(db_path, "cover_letter", job_id)
|
|
||||||
update_task_status(db_path, first_id, "completed")
|
|
||||||
second_id, _ = insert_task(db_path, "cover_letter", job_id) # allowed since first is done
|
|
||||||
|
|
||||||
task = get_task_for_job(db_path, "cover_letter", job_id)
|
|
||||||
assert task is not None
|
|
||||||
assert task["id"] == second_id
|
|
||||||
|
|
||||||
|
|
||||||
def test_get_task_for_job_returns_none_when_absent(tmp_path):
|
|
||||||
"""get_task_for_job returns None when no task exists for that job+type."""
|
|
||||||
from scripts.db import init_db, insert_job, get_task_for_job
|
|
||||||
db_path = tmp_path / "test.db"
|
|
||||||
init_db(db_path)
|
|
||||||
job_id = insert_job(db_path, {
|
|
||||||
"title": "CSM", "company": "Acme", "url": "https://ex.com/1",
|
|
||||||
"source": "linkedin", "location": "Remote", "is_remote": True,
|
|
||||||
"salary": "", "description": "", "date_found": "2026-02-20",
|
|
||||||
})
|
|
||||||
assert get_task_for_job(db_path, "cover_letter", job_id) is None
|
|
||||||
```
|
|
||||||
|
|
||||||
### Step 2: Run tests to verify they fail
|
|
||||||
|
|
||||||
```bash
|
|
||||||
/devl/miniconda3/envs/job-seeker/bin/pytest tests/test_db.py -v -k "background_tasks or insert_task or update_task_status or get_active_tasks or get_task_for_job"
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected: FAIL with `ImportError: cannot import name 'insert_task'`
|
|
||||||
|
|
||||||
### Step 3: Implement in scripts/db.py
|
|
||||||
|
|
||||||
Add the DDL constant after `CREATE_COMPANY_RESEARCH`:
|
|
||||||
|
|
||||||
```python
|
|
||||||
CREATE_BACKGROUND_TASKS = """
|
|
||||||
CREATE TABLE IF NOT EXISTS background_tasks (
|
|
||||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
||||||
task_type TEXT NOT NULL,
|
|
||||||
job_id INTEGER NOT NULL,
|
|
||||||
status TEXT NOT NULL DEFAULT 'queued',
|
|
||||||
error TEXT,
|
|
||||||
created_at DATETIME DEFAULT (datetime('now')),
|
|
||||||
started_at DATETIME,
|
|
||||||
finished_at DATETIME
|
|
||||||
)
|
|
||||||
"""
|
|
||||||
```
|
|
||||||
|
|
||||||
Add `conn.execute(CREATE_BACKGROUND_TASKS)` inside `init_db()`, after the existing three `conn.execute()` calls:
|
|
||||||
|
|
||||||
```python
|
|
||||||
def init_db(db_path: Path = DEFAULT_DB) -> None:
|
|
||||||
"""Create tables if they don't exist, then run migrations."""
|
|
||||||
conn = sqlite3.connect(db_path)
|
|
||||||
conn.execute(CREATE_JOBS)
|
|
||||||
conn.execute(CREATE_JOB_CONTACTS)
|
|
||||||
conn.execute(CREATE_COMPANY_RESEARCH)
|
|
||||||
conn.execute(CREATE_BACKGROUND_TASKS) # ← add this line
|
|
||||||
conn.commit()
|
|
||||||
conn.close()
|
|
||||||
_migrate_db(db_path)
|
|
||||||
```
|
|
||||||
|
|
||||||
Add the four helper functions at the end of `scripts/db.py`:
|
|
||||||
|
|
||||||
```python
|
|
||||||
# ── Background task helpers ───────────────────────────────────────────────────
|
|
||||||
|
|
||||||
def insert_task(db_path: Path = DEFAULT_DB, task_type: str = "",
|
|
||||||
job_id: int = None) -> tuple[int, bool]:
|
|
||||||
"""Insert a new background task.
|
|
||||||
|
|
||||||
Returns (task_id, True) if inserted, or (existing_id, False) if a
|
|
||||||
queued/running task for the same (task_type, job_id) already exists.
|
|
||||||
"""
|
|
||||||
conn = sqlite3.connect(db_path)
|
|
||||||
existing = conn.execute(
|
|
||||||
"SELECT id FROM background_tasks WHERE task_type=? AND job_id=? AND status IN ('queued','running')",
|
|
||||||
(task_type, job_id),
|
|
||||||
).fetchone()
|
|
||||||
if existing:
|
|
||||||
conn.close()
|
|
||||||
return existing[0], False
|
|
||||||
cur = conn.execute(
|
|
||||||
"INSERT INTO background_tasks (task_type, job_id, status) VALUES (?, ?, 'queued')",
|
|
||||||
(task_type, job_id),
|
|
||||||
)
|
|
||||||
task_id = cur.lastrowid
|
|
||||||
conn.commit()
|
|
||||||
conn.close()
|
|
||||||
return task_id, True
|
|
||||||
|
|
||||||
|
|
||||||
def update_task_status(db_path: Path = DEFAULT_DB, task_id: int = None,
|
|
||||||
status: str = "", error: Optional[str] = None) -> None:
|
|
||||||
"""Update a task's status and set the appropriate timestamp."""
|
|
||||||
now = datetime.now().isoformat()[:16]
|
|
||||||
conn = sqlite3.connect(db_path)
|
|
||||||
if status == "running":
|
|
||||||
conn.execute(
|
|
||||||
"UPDATE background_tasks SET status=?, started_at=? WHERE id=?",
|
|
||||||
(status, now, task_id),
|
|
||||||
)
|
|
||||||
elif status in ("completed", "failed"):
|
|
||||||
conn.execute(
|
|
||||||
"UPDATE background_tasks SET status=?, finished_at=?, error=? WHERE id=?",
|
|
||||||
(status, now, error, task_id),
|
|
||||||
)
|
|
||||||
else:
|
|
||||||
conn.execute("UPDATE background_tasks SET status=? WHERE id=?", (status, task_id))
|
|
||||||
conn.commit()
|
|
||||||
conn.close()
|
|
||||||
|
|
||||||
|
|
||||||
def get_active_tasks(db_path: Path = DEFAULT_DB) -> list[dict]:
|
|
||||||
"""Return all queued/running tasks with job title and company joined in."""
|
|
||||||
conn = sqlite3.connect(db_path)
|
|
||||||
conn.row_factory = sqlite3.Row
|
|
||||||
rows = conn.execute("""
|
|
||||||
SELECT bt.*, j.title, j.company
|
|
||||||
FROM background_tasks bt
|
|
||||||
LEFT JOIN jobs j ON j.id = bt.job_id
|
|
||||||
WHERE bt.status IN ('queued', 'running')
|
|
||||||
ORDER BY bt.created_at ASC
|
|
||||||
""").fetchall()
|
|
||||||
conn.close()
|
|
||||||
return [dict(r) for r in rows]
|
|
||||||
|
|
||||||
|
|
||||||
def get_task_for_job(db_path: Path = DEFAULT_DB, task_type: str = "",
|
|
||||||
job_id: int = None) -> Optional[dict]:
|
|
||||||
"""Return the most recent task row for a (task_type, job_id) pair, or None."""
|
|
||||||
conn = sqlite3.connect(db_path)
|
|
||||||
conn.row_factory = sqlite3.Row
|
|
||||||
row = conn.execute(
|
|
||||||
"""SELECT * FROM background_tasks
|
|
||||||
WHERE task_type=? AND job_id=?
|
|
||||||
ORDER BY id DESC LIMIT 1""",
|
|
||||||
(task_type, job_id),
|
|
||||||
).fetchone()
|
|
||||||
conn.close()
|
|
||||||
return dict(row) if row else None
|
|
||||||
```
|
|
||||||
|
|
||||||
### Step 4: Run tests to verify they pass
|
|
||||||
|
|
||||||
```bash
|
|
||||||
/devl/miniconda3/envs/job-seeker/bin/pytest tests/test_db.py -v -k "background_tasks or insert_task or update_task_status or get_active_tasks or get_task_for_job"
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected: all new tests PASS, no regressions
|
|
||||||
|
|
||||||
### Step 5: Run full test suite
|
|
||||||
|
|
||||||
```bash
|
|
||||||
/devl/miniconda3/envs/job-seeker/bin/pytest tests/ -v
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected: all tests PASS
|
|
||||||
|
|
||||||
### Step 6: Commit
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add scripts/db.py tests/test_db.py
|
|
||||||
git commit -m "feat: add background_tasks table and DB helpers"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 2: Create scripts/task_runner.py
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Create: `scripts/task_runner.py`
|
|
||||||
- Test: `tests/test_task_runner.py`
|
|
||||||
|
|
||||||
### Step 1: Write the failing tests
|
|
||||||
|
|
||||||
Create `tests/test_task_runner.py`:
|
|
||||||
|
|
||||||
```python
|
|
||||||
import threading
|
|
||||||
import time
|
|
||||||
import pytest
|
|
||||||
from pathlib import Path
|
|
||||||
from unittest.mock import patch, MagicMock
|
|
||||||
import sqlite3
|
|
||||||
|
|
||||||
|
|
||||||
def _make_db(tmp_path):
|
|
||||||
from scripts.db import init_db, insert_job
|
|
||||||
db = tmp_path / "test.db"
|
|
||||||
init_db(db)
|
|
||||||
job_id = insert_job(db, {
|
|
||||||
"title": "CSM", "company": "Acme", "url": "https://ex.com/1",
|
|
||||||
"source": "linkedin", "location": "Remote", "is_remote": True,
|
|
||||||
"salary": "", "description": "Great role.", "date_found": "2026-02-20",
|
|
||||||
})
|
|
||||||
return db, job_id
|
|
||||||
|
|
||||||
|
|
||||||
def test_submit_task_returns_id_and_true(tmp_path):
|
|
||||||
"""submit_task returns (task_id, True) and spawns a thread."""
|
|
||||||
db, job_id = _make_db(tmp_path)
|
|
||||||
with patch("scripts.task_runner._run_task"): # don't actually call LLM
|
|
||||||
from scripts.task_runner import submit_task
|
|
||||||
task_id, is_new = submit_task(db, "cover_letter", job_id)
|
|
||||||
assert isinstance(task_id, int) and task_id > 0
|
|
||||||
assert is_new is True
|
|
||||||
|
|
||||||
|
|
||||||
def test_submit_task_deduplicates(tmp_path):
|
|
||||||
"""submit_task returns (existing_id, False) for a duplicate in-flight task."""
|
|
||||||
db, job_id = _make_db(tmp_path)
|
|
||||||
with patch("scripts.task_runner._run_task"):
|
|
||||||
from scripts.task_runner import submit_task
|
|
||||||
first_id, _ = submit_task(db, "cover_letter", job_id)
|
|
||||||
second_id, is_new = submit_task(db, "cover_letter", job_id)
|
|
||||||
assert second_id == first_id
|
|
||||||
assert is_new is False
|
|
||||||
|
|
||||||
|
|
||||||
def test_run_task_cover_letter_success(tmp_path):
|
|
||||||
"""_run_task marks running→completed and saves cover letter to DB."""
|
|
||||||
db, job_id = _make_db(tmp_path)
|
|
||||||
from scripts.db import insert_task, get_task_for_job, get_jobs_by_status
|
|
||||||
task_id, _ = insert_task(db, "cover_letter", job_id)
|
|
||||||
|
|
||||||
with patch("scripts.generate_cover_letter.generate", return_value="Dear Hiring Manager,\nGreat fit!"):
|
|
||||||
from scripts.task_runner import _run_task
|
|
||||||
_run_task(db, task_id, "cover_letter", job_id)
|
|
||||||
|
|
||||||
task = get_task_for_job(db, "cover_letter", job_id)
|
|
||||||
assert task["status"] == "completed"
|
|
||||||
assert task["error"] is None
|
|
||||||
|
|
||||||
conn = sqlite3.connect(db)
|
|
||||||
row = conn.execute("SELECT cover_letter FROM jobs WHERE id=?", (job_id,)).fetchone()
|
|
||||||
conn.close()
|
|
||||||
assert row[0] == "Dear Hiring Manager,\nGreat fit!"
|
|
||||||
|
|
||||||
|
|
||||||
def test_run_task_company_research_success(tmp_path):
|
|
||||||
"""_run_task marks running→completed and saves research to DB."""
|
|
||||||
db, job_id = _make_db(tmp_path)
|
|
||||||
from scripts.db import insert_task, get_task_for_job, get_research
|
|
||||||
|
|
||||||
task_id, _ = insert_task(db, "company_research", job_id)
|
|
||||||
fake_result = {
|
|
||||||
"raw_output": "raw", "company_brief": "brief",
|
|
||||||
"ceo_brief": "ceo", "talking_points": "points",
|
|
||||||
}
|
|
||||||
with patch("scripts.company_research.research_company", return_value=fake_result):
|
|
||||||
from scripts.task_runner import _run_task
|
|
||||||
_run_task(db, task_id, "company_research", job_id)
|
|
||||||
|
|
||||||
task = get_task_for_job(db, "company_research", job_id)
|
|
||||||
assert task["status"] == "completed"
|
|
||||||
|
|
||||||
research = get_research(db, job_id=job_id)
|
|
||||||
assert research["company_brief"] == "brief"
|
|
||||||
|
|
||||||
|
|
||||||
def test_run_task_marks_failed_on_exception(tmp_path):
|
|
||||||
"""_run_task marks status=failed and stores error when generator raises."""
|
|
||||||
db, job_id = _make_db(tmp_path)
|
|
||||||
from scripts.db import insert_task, get_task_for_job
|
|
||||||
task_id, _ = insert_task(db, "cover_letter", job_id)
|
|
||||||
|
|
||||||
with patch("scripts.generate_cover_letter.generate", side_effect=RuntimeError("LLM timeout")):
|
|
||||||
from scripts.task_runner import _run_task
|
|
||||||
_run_task(db, task_id, "cover_letter", job_id)
|
|
||||||
|
|
||||||
task = get_task_for_job(db, "cover_letter", job_id)
|
|
||||||
assert task["status"] == "failed"
|
|
||||||
assert "LLM timeout" in task["error"]
|
|
||||||
|
|
||||||
|
|
||||||
def test_submit_task_actually_completes(tmp_path):
|
|
||||||
"""Integration: submit_task spawns a thread that completes asynchronously."""
|
|
||||||
db, job_id = _make_db(tmp_path)
|
|
||||||
from scripts.db import get_task_for_job
|
|
||||||
|
|
||||||
with patch("scripts.generate_cover_letter.generate", return_value="Cover letter text"):
|
|
||||||
from scripts.task_runner import submit_task
|
|
||||||
task_id, _ = submit_task(db, "cover_letter", job_id)
|
|
||||||
# Wait for thread to complete (max 5s)
|
|
||||||
for _ in range(50):
|
|
||||||
task = get_task_for_job(db, "cover_letter", job_id)
|
|
||||||
if task and task["status"] in ("completed", "failed"):
|
|
||||||
break
|
|
||||||
time.sleep(0.1)
|
|
||||||
|
|
||||||
task = get_task_for_job(db, "cover_letter", job_id)
|
|
||||||
assert task["status"] == "completed"
|
|
||||||
```
|
|
||||||
|
|
||||||
### Step 2: Run tests to verify they fail
|
|
||||||
|
|
||||||
```bash
|
|
||||||
/devl/miniconda3/envs/job-seeker/bin/pytest tests/test_task_runner.py -v
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected: FAIL with `ModuleNotFoundError: No module named 'scripts.task_runner'`
|
|
||||||
|
|
||||||
### Step 3: Implement scripts/task_runner.py
|
|
||||||
|
|
||||||
Create `scripts/task_runner.py`:
|
|
||||||
|
|
||||||
```python
|
|
||||||
# scripts/task_runner.py
|
|
||||||
"""
|
|
||||||
Background task runner for LLM generation tasks.
|
|
||||||
|
|
||||||
Submitting a task inserts a row in background_tasks and spawns a daemon thread.
|
|
||||||
The thread calls the appropriate generator, writes results to existing tables,
|
|
||||||
and marks the task completed or failed.
|
|
||||||
|
|
||||||
Deduplication: only one queued/running task per (task_type, job_id) is allowed.
|
|
||||||
Different task types for the same job run concurrently (e.g. cover letter + research).
|
|
||||||
"""
|
|
||||||
import sqlite3
|
|
||||||
import threading
|
|
||||||
from pathlib import Path
|
|
||||||
|
|
||||||
from scripts.db import (
|
|
||||||
DEFAULT_DB,
|
|
||||||
insert_task,
|
|
||||||
update_task_status,
|
|
||||||
update_cover_letter,
|
|
||||||
save_research,
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def submit_task(db_path: Path = DEFAULT_DB, task_type: str = "",
|
|
||||||
job_id: int = None) -> tuple[int, bool]:
|
|
||||||
"""Submit a background LLM task.
|
|
||||||
|
|
||||||
Returns (task_id, True) if a new task was queued and a thread spawned.
|
|
||||||
Returns (existing_id, False) if an identical task is already in-flight.
|
|
||||||
"""
|
|
||||||
task_id, is_new = insert_task(db_path, task_type, job_id)
|
|
||||||
if is_new:
|
|
||||||
t = threading.Thread(
|
|
||||||
target=_run_task,
|
|
||||||
args=(db_path, task_id, task_type, job_id),
|
|
||||||
daemon=True,
|
|
||||||
)
|
|
||||||
t.start()
|
|
||||||
return task_id, is_new
|
|
||||||
|
|
||||||
|
|
||||||
def _run_task(db_path: Path, task_id: int, task_type: str, job_id: int) -> None:
|
|
||||||
"""Thread body: run the generator and persist the result."""
|
|
||||||
conn = sqlite3.connect(db_path)
|
|
||||||
conn.row_factory = sqlite3.Row
|
|
||||||
row = conn.execute("SELECT * FROM jobs WHERE id=?", (job_id,)).fetchone()
|
|
||||||
conn.close()
|
|
||||||
if row is None:
|
|
||||||
update_task_status(db_path, task_id, "failed", error=f"Job {job_id} not found")
|
|
||||||
return
|
|
||||||
|
|
||||||
job = dict(row)
|
|
||||||
update_task_status(db_path, task_id, "running")
|
|
||||||
|
|
||||||
try:
|
|
||||||
if task_type == "cover_letter":
|
|
||||||
from scripts.generate_cover_letter import generate
|
|
||||||
result = generate(
|
|
||||||
job.get("title", ""),
|
|
||||||
job.get("company", ""),
|
|
||||||
job.get("description", ""),
|
|
||||||
)
|
|
||||||
update_cover_letter(db_path, job_id, result)
|
|
||||||
|
|
||||||
elif task_type == "company_research":
|
|
||||||
from scripts.company_research import research_company
|
|
||||||
result = research_company(job)
|
|
||||||
save_research(db_path, job_id=job_id, **result)
|
|
||||||
|
|
||||||
else:
|
|
||||||
raise ValueError(f"Unknown task_type: {task_type!r}")
|
|
||||||
|
|
||||||
update_task_status(db_path, task_id, "completed")
|
|
||||||
|
|
||||||
except Exception as exc:
|
|
||||||
update_task_status(db_path, task_id, "failed", error=str(exc))
|
|
||||||
```
|
|
||||||
|
|
||||||
### Step 4: Run tests to verify they pass
|
|
||||||
|
|
||||||
```bash
|
|
||||||
/devl/miniconda3/envs/job-seeker/bin/pytest tests/test_task_runner.py -v
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected: all tests PASS
|
|
||||||
|
|
||||||
### Step 5: Run full test suite
|
|
||||||
|
|
||||||
```bash
|
|
||||||
/devl/miniconda3/envs/job-seeker/bin/pytest tests/ -v
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected: all tests PASS
|
|
||||||
|
|
||||||
### Step 6: Commit
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add scripts/task_runner.py tests/test_task_runner.py
|
|
||||||
git commit -m "feat: add task_runner — background thread executor for LLM tasks"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 3: Add sidebar task indicator to app/app.py
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `app/app.py`
|
|
||||||
|
|
||||||
No new tests needed — this is pure UI wiring.
|
|
||||||
|
|
||||||
### Step 1: Replace the contents of app/app.py
|
|
||||||
|
|
||||||
Current file is 33 lines. Replace entirely with:
|
|
||||||
|
|
||||||
```python
|
|
||||||
# app/app.py
|
|
||||||
"""
|
|
||||||
Streamlit entry point — uses st.navigation() to control the sidebar.
|
|
||||||
Main workflow pages are listed at the top; Settings is separated into
|
|
||||||
a "System" section so it doesn't crowd the navigation.
|
|
||||||
|
|
||||||
Run: streamlit run app/app.py
|
|
||||||
bash scripts/manage-ui.sh start
|
|
||||||
"""
|
|
||||||
import sys
|
|
||||||
from pathlib import Path
|
|
||||||
|
|
||||||
sys.path.insert(0, str(Path(__file__).parent.parent))
|
|
||||||
|
|
||||||
import streamlit as st
|
|
||||||
from scripts.db import DEFAULT_DB, init_db, get_active_tasks
|
|
||||||
|
|
||||||
st.set_page_config(
|
|
||||||
page_title="Job Seeker",
|
|
||||||
page_icon="💼",
|
|
||||||
layout="wide",
|
|
||||||
)
|
|
||||||
|
|
||||||
init_db(DEFAULT_DB)
|
|
||||||
|
|
||||||
# ── Background task sidebar indicator ─────────────────────────────────────────
|
|
||||||
@st.fragment(run_every=3)
|
|
||||||
def _task_sidebar() -> None:
|
|
||||||
tasks = get_active_tasks(DEFAULT_DB)
|
|
||||||
if not tasks:
|
|
||||||
return
|
|
||||||
with st.sidebar:
|
|
||||||
st.divider()
|
|
||||||
st.markdown(f"**⏳ {len(tasks)} task(s) running**")
|
|
||||||
for t in tasks:
|
|
||||||
icon = "⏳" if t["status"] == "running" else "🕐"
|
|
||||||
label = "Cover letter" if t["task_type"] == "cover_letter" else "Research"
|
|
||||||
st.caption(f"{icon} {label} — {t.get('company') or 'unknown'}")
|
|
||||||
|
|
||||||
_task_sidebar()
|
|
||||||
|
|
||||||
# ── Navigation ─────────────────────────────────────────────────────────────────
|
|
||||||
pages = {
|
|
||||||
"": [
|
|
||||||
st.Page("Home.py", title="Home", icon="🏠"),
|
|
||||||
st.Page("pages/1_Job_Review.py", title="Job Review", icon="📋"),
|
|
||||||
st.Page("pages/4_Apply.py", title="Apply Workspace", icon="🚀"),
|
|
||||||
st.Page("pages/5_Interviews.py", title="Interviews", icon="🎯"),
|
|
||||||
st.Page("pages/6_Interview_Prep.py", title="Interview Prep", icon="📞"),
|
|
||||||
],
|
|
||||||
"System": [
|
|
||||||
st.Page("pages/2_Settings.py", title="Settings", icon="⚙️"),
|
|
||||||
],
|
|
||||||
}
|
|
||||||
|
|
||||||
pg = st.navigation(pages)
|
|
||||||
pg.run()
|
|
||||||
```
|
|
||||||
|
|
||||||
### Step 2: Smoke-test by running the UI
|
|
||||||
|
|
||||||
```bash
|
|
||||||
bash /devl/job-seeker/scripts/manage-ui.sh restart
|
|
||||||
```
|
|
||||||
|
|
||||||
Navigate to http://localhost:8501 and confirm the app loads without error. The sidebar task indicator does not appear when no tasks are running (correct).
|
|
||||||
|
|
||||||
### Step 3: Commit
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add app/app.py
|
|
||||||
git commit -m "feat: sidebar background task indicator with 3s auto-refresh"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 4: Update 4_Apply.py to use background generation
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `app/pages/4_Apply.py`
|
|
||||||
|
|
||||||
No new unit tests — covered by existing test suite for DB layer. Smoke-test in browser.
|
|
||||||
|
|
||||||
### Step 1: Add imports at the top of 4_Apply.py
|
|
||||||
|
|
||||||
After the existing imports block (after `from scripts.db import ...`), add:
|
|
||||||
|
|
||||||
```python
|
|
||||||
from scripts.db import get_task_for_job
|
|
||||||
from scripts.task_runner import submit_task
|
|
||||||
```
|
|
||||||
|
|
||||||
So the full import block becomes:
|
|
||||||
|
|
||||||
```python
|
|
||||||
from scripts.db import (
|
|
||||||
DEFAULT_DB, init_db, get_jobs_by_status,
|
|
||||||
update_cover_letter, mark_applied,
|
|
||||||
get_task_for_job,
|
|
||||||
)
|
|
||||||
from scripts.task_runner import submit_task
|
|
||||||
```
|
|
||||||
|
|
||||||
### Step 2: Replace the Generate button section
|
|
||||||
|
|
||||||
Find this block (around line 174–185):
|
|
||||||
|
|
||||||
```python
|
|
||||||
if st.button("✨ Generate / Regenerate", use_container_width=True):
|
|
||||||
with st.spinner("Generating via LLM…"):
|
|
||||||
try:
|
|
||||||
from scripts.generate_cover_letter import generate as _gen
|
|
||||||
st.session_state[_cl_key] = _gen(
|
|
||||||
job.get("title", ""),
|
|
||||||
job.get("company", ""),
|
|
||||||
job.get("description", ""),
|
|
||||||
)
|
|
||||||
st.rerun()
|
|
||||||
except Exception as e:
|
|
||||||
st.error(f"Generation failed: {e}")
|
|
||||||
```
|
|
||||||
|
|
||||||
Replace with:
|
|
||||||
|
|
||||||
```python
|
|
||||||
_cl_task = get_task_for_job(DEFAULT_DB, "cover_letter", selected_id)
|
|
||||||
_cl_running = _cl_task and _cl_task["status"] in ("queued", "running")
|
|
||||||
|
|
||||||
if st.button("✨ Generate / Regenerate", use_container_width=True, disabled=bool(_cl_running)):
|
|
||||||
submit_task(DEFAULT_DB, "cover_letter", selected_id)
|
|
||||||
st.rerun()
|
|
||||||
|
|
||||||
if _cl_running:
|
|
||||||
@st.fragment(run_every=3)
|
|
||||||
def _cl_status_fragment():
|
|
||||||
t = get_task_for_job(DEFAULT_DB, "cover_letter", selected_id)
|
|
||||||
if t and t["status"] in ("queued", "running"):
|
|
||||||
lbl = "Queued…" if t["status"] == "queued" else "Generating via LLM…"
|
|
||||||
st.info(f"⏳ {lbl}")
|
|
||||||
else:
|
|
||||||
st.rerun() # full page rerun — reloads cover letter from DB
|
|
||||||
_cl_status_fragment()
|
|
||||||
elif _cl_task and _cl_task["status"] == "failed":
|
|
||||||
st.error(f"Generation failed: {_cl_task.get('error', 'unknown error')}")
|
|
||||||
```
|
|
||||||
|
|
||||||
Also update the session-state initialiser just below (line 171–172) so it loads from DB after background completion. The existing code already does this correctly:
|
|
||||||
|
|
||||||
```python
|
|
||||||
if _cl_key not in st.session_state:
|
|
||||||
st.session_state[_cl_key] = job.get("cover_letter") or ""
|
|
||||||
```
|
|
||||||
|
|
||||||
This is fine — `job` is fetched fresh on each full-page rerun, so when the background thread writes to `jobs.cover_letter`, the next full rerun picks it up.
|
|
||||||
|
|
||||||
### Step 3: Smoke-test in browser
|
|
||||||
|
|
||||||
1. Navigate to Apply Workspace
|
|
||||||
2. Select an approved job
|
|
||||||
3. Click "Generate / Regenerate"
|
|
||||||
4. Navigate away to Home
|
|
||||||
5. Navigate back to Apply Workspace for the same job
|
|
||||||
6. Observe: button is disabled and "⏳ Generating via LLM…" shows while running; cover letter appears when done
|
|
||||||
|
|
||||||
### Step 4: Commit
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add app/pages/4_Apply.py
|
|
||||||
git commit -m "feat: cover letter generation runs in background, survives navigation"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Task 5: Update 6_Interview_Prep.py to use background research
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `app/pages/6_Interview_Prep.py`
|
|
||||||
|
|
||||||
### Step 1: Add imports at the top of 6_Interview_Prep.py
|
|
||||||
|
|
||||||
After the existing `from scripts.db import (...)` block, add:
|
|
||||||
|
|
||||||
```python
|
|
||||||
from scripts.db import get_task_for_job
|
|
||||||
from scripts.task_runner import submit_task
|
|
||||||
```
|
|
||||||
|
|
||||||
So the full import block becomes:
|
|
||||||
|
|
||||||
```python
|
|
||||||
from scripts.db import (
|
|
||||||
DEFAULT_DB, init_db,
|
|
||||||
get_interview_jobs, get_contacts, get_research,
|
|
||||||
save_research, get_task_for_job,
|
|
||||||
)
|
|
||||||
from scripts.task_runner import submit_task
|
|
||||||
```
|
|
||||||
|
|
||||||
### Step 2: Replace the "no research yet" generate button block
|
|
||||||
|
|
||||||
Find this block (around line 99–111):
|
|
||||||
|
|
||||||
```python
|
|
||||||
if not research:
|
|
||||||
st.warning("No research brief yet for this job.")
|
|
||||||
if st.button("🔬 Generate research brief", type="primary", use_container_width=True):
|
|
||||||
with st.spinner("Generating… this may take 30–60 seconds"):
|
|
||||||
try:
|
|
||||||
from scripts.company_research import research_company
|
|
||||||
result = research_company(job)
|
|
||||||
save_research(DEFAULT_DB, job_id=selected_id, **result)
|
|
||||||
st.success("Done!")
|
|
||||||
st.rerun()
|
|
||||||
except Exception as e:
|
|
||||||
st.error(f"Error: {e}")
|
|
||||||
st.stop()
|
|
||||||
else:
|
|
||||||
```
|
|
||||||
|
|
||||||
Replace with:
|
|
||||||
|
|
||||||
```python
|
|
||||||
_res_task = get_task_for_job(DEFAULT_DB, "company_research", selected_id)
|
|
||||||
_res_running = _res_task and _res_task["status"] in ("queued", "running")
|
|
||||||
|
|
||||||
if not research:
|
|
||||||
if not _res_running:
|
|
||||||
st.warning("No research brief yet for this job.")
|
|
||||||
if _res_task and _res_task["status"] == "failed":
|
|
||||||
st.error(f"Last attempt failed: {_res_task.get('error', '')}")
|
|
||||||
if st.button("🔬 Generate research brief", type="primary", use_container_width=True):
|
|
||||||
submit_task(DEFAULT_DB, "company_research", selected_id)
|
|
||||||
st.rerun()
|
|
||||||
|
|
||||||
if _res_running:
|
|
||||||
@st.fragment(run_every=3)
|
|
||||||
def _res_status_initial():
|
|
||||||
t = get_task_for_job(DEFAULT_DB, "company_research", selected_id)
|
|
||||||
if t and t["status"] in ("queued", "running"):
|
|
||||||
lbl = "Queued…" if t["status"] == "queued" else "Generating… this may take 30–60 seconds"
|
|
||||||
st.info(f"⏳ {lbl}")
|
|
||||||
else:
|
|
||||||
st.rerun()
|
|
||||||
_res_status_initial()
|
|
||||||
|
|
||||||
st.stop()
|
|
||||||
else:
|
|
||||||
```
|
|
||||||
|
|
||||||
### Step 3: Replace the "refresh" button block
|
|
||||||
|
|
||||||
Find this block (around line 113–124):
|
|
||||||
|
|
||||||
```python
|
|
||||||
generated_at = research.get("generated_at", "")
|
|
||||||
col_ts, col_btn = st.columns([3, 1])
|
|
||||||
col_ts.caption(f"Research generated: {generated_at}")
|
|
||||||
if col_btn.button("🔄 Refresh", use_container_width=True):
|
|
||||||
with st.spinner("Refreshing…"):
|
|
||||||
try:
|
|
||||||
from scripts.company_research import research_company
|
|
||||||
result = research_company(job)
|
|
||||||
save_research(DEFAULT_DB, job_id=selected_id, **result)
|
|
||||||
st.rerun()
|
|
||||||
except Exception as e:
|
|
||||||
st.error(f"Error: {e}")
|
|
||||||
```
|
|
||||||
|
|
||||||
Replace with:
|
|
||||||
|
|
||||||
```python
|
|
||||||
generated_at = research.get("generated_at", "")
|
|
||||||
col_ts, col_btn = st.columns([3, 1])
|
|
||||||
col_ts.caption(f"Research generated: {generated_at}")
|
|
||||||
if col_btn.button("🔄 Refresh", use_container_width=True, disabled=bool(_res_running)):
|
|
||||||
submit_task(DEFAULT_DB, "company_research", selected_id)
|
|
||||||
st.rerun()
|
|
||||||
|
|
||||||
if _res_running:
|
|
||||||
@st.fragment(run_every=3)
|
|
||||||
def _res_status_refresh():
|
|
||||||
t = get_task_for_job(DEFAULT_DB, "company_research", selected_id)
|
|
||||||
if t and t["status"] in ("queued", "running"):
|
|
||||||
lbl = "Queued…" if t["status"] == "queued" else "Refreshing research…"
|
|
||||||
st.info(f"⏳ {lbl}")
|
|
||||||
else:
|
|
||||||
st.rerun()
|
|
||||||
_res_status_refresh()
|
|
||||||
elif _res_task and _res_task["status"] == "failed":
|
|
||||||
st.error(f"Refresh failed: {_res_task.get('error', '')}")
|
|
||||||
```
|
|
||||||
|
|
||||||
### Step 4: Smoke-test in browser
|
|
||||||
|
|
||||||
1. Move a job to Phone Screen on the Interviews page
|
|
||||||
2. Navigate to Interview Prep, select that job
|
|
||||||
3. Click "Generate research brief"
|
|
||||||
4. Navigate away to Home
|
|
||||||
5. Navigate back — observe "⏳ Generating…" inline indicator
|
|
||||||
6. Wait for completion — research sections populate automatically
|
|
||||||
|
|
||||||
### Step 5: Run full test suite one final time
|
|
||||||
|
|
||||||
```bash
|
|
||||||
/devl/miniconda3/envs/job-seeker/bin/pytest tests/ -v
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected: all tests PASS
|
|
||||||
|
|
||||||
### Step 6: Commit
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git add app/pages/6_Interview_Prep.py
|
|
||||||
git commit -m "feat: company research generation runs in background, survives navigation"
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Summary of Changes
|
|
||||||
|
|
||||||
| File | Change |
|
|
||||||
|------|--------|
|
|
||||||
| `scripts/db.py` | Add `CREATE_BACKGROUND_TASKS`, `init_db` call, 4 new helpers |
|
|
||||||
| `scripts/task_runner.py` | New file — `submit_task` + `_run_task` thread body |
|
|
||||||
| `app/app.py` | Add `_task_sidebar` fragment with 3s auto-refresh |
|
|
||||||
| `app/pages/4_Apply.py` | Generate button → `submit_task`; inline status fragment |
|
|
||||||
| `app/pages/6_Interview_Prep.py` | Generate/Refresh buttons → `submit_task`; inline status fragments |
|
|
||||||
| `tests/test_db.py` | 9 new tests for background_tasks helpers |
|
|
||||||
| `tests/test_task_runner.py` | New file — 6 tests for task_runner |
|
|
||||||
|
|
@ -1,91 +0,0 @@
|
||||||
# Email Handling Design
|
|
||||||
|
|
||||||
**Date:** 2026-02-21
|
|
||||||
**Status:** Approved
|
|
||||||
|
|
||||||
## Problem
|
|
||||||
|
|
||||||
IMAP sync already pulls emails for active pipeline jobs, but two gaps exist:
|
|
||||||
1. Inbound emails suggesting a stage change (e.g. "let's schedule a call") produce no signal — the recruiter's message just sits in the email log.
|
|
||||||
2. Recruiter outreach to email addresses not yet in the pipeline is invisible — those leads never enter Job Review.
|
|
||||||
|
|
||||||
## Goals
|
|
||||||
|
|
||||||
- Surface stage-change suggestions inline on the Interviews kanban card (suggest-only, never auto-advance).
|
|
||||||
- Capture recruiter leads from unmatched inbound email and surface them in Job Review.
|
|
||||||
- Make email sync a background task triggerable from the UI (Home page + Interviews sidebar).
|
|
||||||
|
|
||||||
## Data Model
|
|
||||||
|
|
||||||
**No new tables.** Two columns added to `job_contacts`:
|
|
||||||
|
|
||||||
```sql
|
|
||||||
ALTER TABLE job_contacts ADD COLUMN stage_signal TEXT;
|
|
||||||
ALTER TABLE job_contacts ADD COLUMN suggestion_dismissed INTEGER DEFAULT 0;
|
|
||||||
```
|
|
||||||
|
|
||||||
- `stage_signal` — one of: `interview_scheduled`, `offer_received`, `rejected`, `positive_response`, `neutral` (or NULL if not yet classified).
|
|
||||||
- `suggestion_dismissed` — 1 when the user clicks Dismiss; prevents the banner re-appearing.
|
|
||||||
|
|
||||||
Email leads reuse the existing `jobs` table with `source = 'email'` and `status = 'pending'`. No new columns needed.
|
|
||||||
|
|
||||||
## Components
|
|
||||||
|
|
||||||
### 1. Stage Signal Classification (`scripts/imap_sync.py`)
|
|
||||||
|
|
||||||
After saving each **inbound** contact row, call `phi3:mini` via Ollama to classify the email into one of the five labels. Store the result in `stage_signal`. If classification fails, default to `NULL` (no suggestion shown).
|
|
||||||
|
|
||||||
**Model:** `phi3:mini` via `LLMRouter.complete(model_override="phi3:mini", fallback_order=["ollama_research"])`.
|
|
||||||
Benchmarked at 100% accuracy / 3.0 s per email on a 12-case test suite. Runner-up Qwen2.5-3B untested but phi3-mini is the safe choice.
|
|
||||||
|
|
||||||
### 2. Recruiter Lead Extraction (`scripts/imap_sync.py`)
|
|
||||||
|
|
||||||
A second pass after per-job sync: scan INBOX broadly for recruitment-keyword emails that don't match any known pipeline company. For each unmatched email, call **Nemotron 1.5B** (already in use for company research) to extract `{company, title}`. If extraction returns a company name not already in the DB, insert a new job row `source='email', status='pending'`.
|
|
||||||
|
|
||||||
**Dedup:** checked by `message_id` against all known contacts (cross-job), plus `url` uniqueness on the jobs table (the email lead URL is set to a synthetic `email://<from_domain>/<message_id>` value).
|
|
||||||
|
|
||||||
### 3. Background Task (`scripts/task_runner.py`)
|
|
||||||
|
|
||||||
New task type: `email_sync` with `job_id = 0`.
|
|
||||||
`submit_task(db, "email_sync", 0)` → daemon thread → `sync_all()` → returns summary via task `error` field.
|
|
||||||
|
|
||||||
Deduplication: only one `email_sync` can be queued/running at a time (existing insert_task logic handles this).
|
|
||||||
|
|
||||||
### 4. UI — Sync Button (Home + Interviews)
|
|
||||||
|
|
||||||
**Home.py:** New "Sync Emails" section alongside Find Jobs / Score / Notion sync.
|
|
||||||
**5_Interviews.py:** Existing sync button already present in sidebar; convert from synchronous `sync_all()` call to `submit_task()` + fragment polling.
|
|
||||||
|
|
||||||
### 5. UI — Email Leads (Job Review)
|
|
||||||
|
|
||||||
When `show_status == "pending"`, prepend email leads (`source = 'email'`) at the top of the list with a distinct `📧 Email Lead` badge. Actions are identical to scraped pending jobs (Approve / Reject).
|
|
||||||
|
|
||||||
### 6. UI — Stage Suggestion Banner (Interviews Kanban)
|
|
||||||
|
|
||||||
Inside `_render_card()`, before the advance/reject buttons, check for unseen stage signals:
|
|
||||||
|
|
||||||
```
|
|
||||||
💡 Email suggests: interview_scheduled
|
|
||||||
From: sarah@company.com · "Let's book a call"
|
|
||||||
[→ Move to Phone Screen] [Dismiss]
|
|
||||||
```
|
|
||||||
|
|
||||||
- "Move" calls `advance_to_stage()` + `submit_task("company_research")` then reruns.
|
|
||||||
- "Dismiss" calls `dismiss_stage_signal(contact_id)` then reruns.
|
|
||||||
- Only the most recent undismissed signal is shown per card.
|
|
||||||
|
|
||||||
## Error Handling
|
|
||||||
|
|
||||||
| Failure | Behaviour |
|
|
||||||
|---------|-----------|
|
|
||||||
| IMAP connection fails | Error stored in task `error` field; shown as warning in UI after sync |
|
|
||||||
| Classifier call fails | `stage_signal` left NULL; no suggestion shown; sync continues |
|
|
||||||
| Lead extractor fails | Email skipped; appended to `result["errors"]`; sync continues |
|
|
||||||
| Duplicate `email_sync` task | `insert_task` returns existing id; no new thread spawned |
|
|
||||||
| LLM extraction returns no company | Email silently skipped (not a lead) |
|
|
||||||
|
|
||||||
## Out of Scope
|
|
||||||
|
|
||||||
- Auto-advancing pipeline stage (suggest only).
|
|
||||||
- Sending email replies from the app (draft helper already exists).
|
|
||||||
- OAuth / token-refresh IMAP (config/email.yaml credentials only).
|
|
||||||
Some files were not shown because too many files have changed in this diff Show more
Loading…
Reference in a new issue