# Job Discovery Peregrine discovers new job listings by running search profiles against multiple job boards simultaneously. Results are deduplicated by URL and stored in the local SQLite database (`staging.db`). --- ## How Discovery Works 1. **Search profiles** in `config/search_profiles.yaml` define what to search for 2. The Home page **Run Discovery** button triggers `scripts/discover.py` 3. `discover.py` calls each configured board (standard + custom) for each active profile 4. Results are inserted into the `jobs` table with status `pending` 5. Jobs with URLs already in the database are silently skipped (URL is the unique key) 6. After insertion, `scripts/match.py` runs keyword scoring on all new jobs --- ## Search Profiles Profiles are defined in `config/search_profiles.yaml`. You can have multiple profiles running simultaneously. ### Profile fields ```yaml profiles: - name: cs_leadership # unique identifier titles: - Customer Success Manager - Director of Customer Success locations: - Remote - San Francisco Bay Area, CA boards: - linkedin - indeed - glassdoor - zip_recruiter - google custom_boards: - adzuna - theladders - craigslist exclude_keywords: # titles containing these words are dropped - sales - account executive - SDR results_per_board: 75 # max jobs per board per run hours_old: 240 # only fetch jobs posted in last N hours mission_tags: # optional — triggers mission-alignment cover letter hints - music ``` ### Adding a new profile Open `config/search_profiles.yaml` and add an entry under `profiles:`. The next discovery run picks it up automatically — no restart required. ### Mission tags `mission_tags` links a profile to industries you care about. When cover letters are generated for jobs from a mission-tagged profile, the LLM prompt includes a personal alignment note (configured in `config/user.yaml` under `mission_preferences`). Supported tags: `music`, `animal_welfare`, `education`. --- ## Standard Job Boards These boards are powered by the [JobSpy](https://github.com/Bunsly/JobSpy) library: | Board key | Source | |-----------|--------| | `linkedin` | LinkedIn Jobs | | `indeed` | Indeed | | `glassdoor` | Glassdoor | | `zip_recruiter` | ZipRecruiter | | `google` | Google Jobs | --- ## Custom Job Board Scrapers Custom scrapers are in `scripts/custom_boards/`. They are registered in `discover.py` and activated per-profile via the `custom_boards` list. | Key | Source | Notes | |-----|--------|-------| | `adzuna` | [Adzuna Jobs API](https://developer.adzuna.com/) | Requires `config/adzuna.yaml` with `app_id` and `app_key` | | `theladders` | The Ladders | SSR scraper via `curl_cffi`; no credentials needed | | `craigslist` | Craigslist | Requires `config/craigslist.yaml` with target city slugs | To add your own scraper, see [Adding a Scraper](../developer-guide/adding-scrapers.md). --- ## Running Discovery ### From the UI 1. Open the **Home** page 2. Click **Run Discovery** 3. Peregrine runs all active search profiles in sequence 4. A progress bar shows board-by-board status 5. A summary shows how many new jobs were inserted vs. already known ### From the command line ```bash conda run -n job-seeker python scripts/discover.py ``` --- ## Filling Missing Descriptions Some boards (particularly Glassdoor) return only a short description snippet. Click **Fill Missing Descriptions** on the Home page to trigger the `enrich_descriptions` background task. The enricher visits each job URL and attempts to extract the full description from the page HTML. This runs as a background task so you can continue using the UI. You can also enrich a specific job from the Job Review page by clicking the refresh icon next to its description. --- ## Keyword Matching After discovery, `scripts/match.py` scores each new job by comparing the job description against your resume keywords (from `config/resume_keywords.yaml`). The score is stored as `match_score` (0–100). Gaps are stored as `keyword_gaps` (comma-separated missing keywords). Both fields appear in the Job Review queue and can be used to sort and prioritise jobs.