peregrine/docs/reference/config-files.md
pyr0ball 41c7954b9d docs: mkdocs wiki — installation, user guide, developer guide, reference
Adds a full MkDocs documentation site under docs/ with Material theme.

Getting Started: installation walkthrough, 7-step first-run wizard guide,
Docker Compose profile reference with GPU memory guidance and preflight.py
description.

User Guide: job discovery (search profiles, custom boards, enrichment),
job review (sorting, match scores, batch actions), apply workspace (cover
letter gen, PDF export, mark applied), interviews (kanban stages, company
research auto-trigger, survey assistant), email sync (IMAP, Gmail App
Password, classification labels, stage auto-updates), integrations (all 13
drivers with tier requirements), settings (every tab documented).

Developer Guide: contributing (dev env setup, code style, branch naming, PR
checklist), architecture (ASCII layer diagram, design decisions), adding
scrapers (full scrape() interface, registration, search profile config,
test patterns), adding integrations (IntegrationBase full interface, auto-
discovery, tier gating, test patterns), testing (patterns, fixtures, what
not to test).

Reference: tier system (full FEATURES table, can_use/tier_label API, dev
override, adding gates), LLM router (backend types, complete() signature,
fallback chains, vision routing, __auto__ resolution, adding backends),
config files (every file with field-level docs and gitignore status).

Also adds CONTRIBUTING.md at repo root pointing to the docs site.
2026-02-25 12:05:49 -08:00

8.1 KiB

Config Files

All Peregrine configuration lives in the config/ directory. Gitignored files contain secrets or personal data; .example files are committed as templates.


Gitignore Status

File Gitignored Notes
config/user.yaml Yes Personal data + wizard state
config/llm.yaml No LLM backends (no secrets by default)
config/search_profiles.yaml No Search configuration (no secrets)
config/resume_keywords.yaml No Scoring keywords (no secrets)
config/blocklist.yaml No Excluded companies (no secrets)
config/email.yaml Yes IMAP credentials
config/notion.yaml Yes Notion token
config/adzuna.yaml Yes Adzuna API credentials
config/craigslist.yaml Yes Craigslist target cities
config/integrations/*.yaml Yes All integration credentials
.env Yes Docker port and path overrides

config/user.yaml

The primary personal data file. Created by the first-run wizard.

# Identity
name: "Your Name"
email: "you@example.com"
phone: "555-000-0000"
linkedin: "linkedin.com/in/yourprofile"
career_summary: >
  Experienced professional with X years in [field].  

# Privacy
nda_companies: []             # company names to redact from research briefs

# Mission alignment
mission_preferences:
  music: ""                   # personal note injected into cover letter para 3
  animal_welfare: ""
  education: ""

# Research brief options (personal decision-making only)
candidate_accessibility_focus: false    # adds ADA/WCAG/ERG section
candidate_lgbtq_focus: false            # adds LGBTQIA+ inclusion section

# Tier
tier: free                              # free | paid | premium
dev_tier_override: null                 # overrides tier locally for testing

# Wizard state
wizard_complete: false
wizard_step: 0
dismissed_banners: []

# Storage paths
docs_dir: "~/Documents/JobSearch"
ollama_models_dir: "~/models/ollama"
vllm_models_dir: "~/models/vllm"

# Inference
inference_profile: "remote"             # remote | cpu | single-gpu | dual-gpu

# Service connection settings
services:
  streamlit_port: 8501
  ollama_host: localhost
  ollama_port: 11434
  ollama_ssl: false
  ollama_ssl_verify: true
  vllm_host: localhost
  vllm_port: 8000
  vllm_ssl: false
  vllm_ssl_verify: true
  searxng_host: localhost
  searxng_port: 8888
  searxng_ssl: false
  searxng_ssl_verify: true

All personal data access in scripts/ goes through scripts/user_profile.py (UserProfile class) — never read this file directly in scripts.


config/llm.yaml

LLM backend definitions and fallback chains. Not gitignored (contains no secrets by default — API keys come from environment variables).

backends:
  ollama:
    type: openai_compat
    base_url: http://localhost:11434/v1
    api_key: ollama                    # placeholder; Ollama ignores the key
    model: llama3.1:8b
    enabled: true
    supports_images: false

  ollama_research:
    type: openai_compat
    base_url: http://localhost:11434/v1
    api_key: ollama
    model: llama3.1:8b                 # can be a different model for research
    enabled: true
    supports_images: false

  vllm:
    type: openai_compat
    base_url: http://localhost:8000/v1
    api_key: ""
    model: __auto__                    # auto-detect first loaded model
    enabled: true
    supports_images: false

  claude_code:
    type: openai_compat
    base_url: http://localhost:3009/v1
    api_key: any
    model: claude-code-terminal
    enabled: false
    supports_images: true

  github_copilot:
    type: openai_compat
    base_url: http://localhost:3010/v1
    api_key: any
    model: gpt-4o
    enabled: false
    supports_images: false

  anthropic:
    type: anthropic
    api_key_env: ANTHROPIC_API_KEY     # name of environment variable
    model: claude-sonnet-4-6
    enabled: false
    supports_images: true

  vision_service:
    type: vision_service
    base_url: http://localhost:8002
    enabled: true
    supports_images: true

fallback_order:
  - ollama
  - claude_code
  - vllm
  - github_copilot
  - anthropic

research_fallback_order:
  - claude_code
  - vllm
  - ollama_research
  - github_copilot
  - anthropic

vision_fallback_order:
  - vision_service
  - claude_code
  - anthropic

See LLM Router for full documentation.


config/search_profiles.yaml

Defines what jobs to search for. Multiple profiles can coexist.

profiles:
  - name: cs_leadership                # unique profile identifier
    titles:
      - Customer Success Manager
      - Director of Customer Success
    locations:
      - Remote
      - San Francisco Bay Area, CA
    boards:
      - linkedin
      - indeed
      - glassdoor
      - zip_recruiter
      - google
    custom_boards:
      - adzuna
      - theladders
      - craigslist
    exclude_keywords:                   # job titles containing these are dropped
      - sales
      - account executive
      - SDR
    results_per_board: 75
    hours_old: 240                      # only fetch jobs posted in last N hours
    mission_tags:                       # optional: links to mission_preferences
      - music

config/resume_keywords.yaml

Keywords extracted from your resume, used for match scoring. Managed via Settings → Skills.

keywords:
  - Customer Success
  - Churn reduction
  - Salesforce
  - SQL
  - Stakeholder management
  - QBR
  - onboarding

config/blocklist.yaml

Companies or domains to exclude from discovery results entirely.

blocked_companies:
  - "Pyramid Scheme Inc"
  - "Sketchy Startup"

blocked_domains:
  - "mlm-company.com"

config/email.yaml

IMAP email sync credentials. Gitignored. See Email Sync for setup.

host: imap.gmail.com
port: 993
use_ssl: true
username: your.email@gmail.com
password: xxxx-xxxx-xxxx-xxxx     # Gmail App Password (16 chars, no spaces)
sent_folder: ""                    # leave blank to auto-detect
lookback_days: 90
todo_label: ""                     # optional: Gmail label to monitor

config/notion.yaml

Notion integration credentials. Gitignored.

token: "secret_..."               # Notion integration token
database_id: "1bd75cff-..."       # database ID from the URL

# Notion property names → Peregrine field names
field_map:
  title: "Salary"                 # Notion title property (unusual — it's the page title)
  status: "Status of Application"
  company: "Company"
  url: "Role Link"
  source: "Job Source"            # multi_select type
  location: "Location"
  applied_at: "Date Applied"

Field names in Notion are non-obvious. Always read them from field_map rather than guessing.


config/adzuna.yaml

Adzuna Jobs API credentials. Gitignored.

app_id: "12345678"
app_key: "abcdefgh1234567890abcdefgh123456"
country: "us"                      # two-letter country code

Get credentials at developer.adzuna.com.


config/craigslist.yaml

Target city slugs for the Craigslist scraper. Gitignored.

cities:
  - sfbay
  - nyc
  - seattle
  - chicago

Find slugs at https://www.craigslist.org/about/sites.


config/integrations/

One YAML file per integration, created when you test and save credentials in the wizard or Settings. All files in this directory are gitignored.

config/integrations/
  notion.yaml
  airtable.yaml
  google_sheets.yaml
  google_drive.yaml
  dropbox.yaml
  onedrive.yaml
  mega.yaml
  nextcloud.yaml
  google_calendar.yaml
  apple_calendar.yaml
  slack.yaml
  discord.yaml
  home_assistant.yaml

Each file contains only the fields defined by that integration's fields() method. Example for Discord:

webhook_url: "https://discord.com/api/webhooks/..."

.env

Docker port and path overrides. Created from .env.example by setup.sh. Gitignored.

# Ports (change if defaults conflict with existing services)
STREAMLIT_PORT=8501
OLLAMA_PORT=11434
VLLM_PORT=8000
SEARXNG_PORT=8888
VISION_PORT=8002

# GPU settings (written by preflight.py)
RECOMMENDED_PROFILE=single-gpu
CPU_OFFLOAD_GB=0                   # KV cache RAM offload for low-VRAM GPUs