[Discovery] Custom board scrapers — Monster, CareerBuilder, Dice, RemoteOK, WeWorkRemotely #44

Open
opened 2026-03-23 20:08:18 -07:00 by pyr0ball · 0 comments
Owner

Overview

JobSpy only supports: linkedin, indeed, zip_recruiter, glassdoor, google, bayt, naukri, bdjobs. Five additional boards need custom scrapers (same pattern as the existing adzuna and theladders modules in scripts/custom_boards/).

Boards to implement

Board Notes
Monster Large legacy board; public search, no auth required
CareerBuilder Large legacy board; public search
Dice Tech-focused; JSON API available
RemoteOK Remote-only; public JSON API at remoteok.com/remote-jobs.json
WeWorkRemotely Remote-only; RSS feed available

Implementation notes

  • Each scraper lives in scripts/custom_boards/<name>.py and exports a scrape(profile, location, results_wanted) function returning a list of job dicts
  • Register in CUSTOM_SCRAPERS dict in scripts/discover.py
  • Add to _all_custom list and _custom_board_labels dict in app/pages/2_Settings.py
  • RemoteOK and WeWorkRemotely ignore location param (remote-only)
  • Follow existing adzuna/theladders scraper interface exactly

Priority

RemoteOK and WeWorkRemotely are highest value (free public APIs, no auth). Dice next for tech-focused roles.

## Overview JobSpy only supports: `linkedin`, `indeed`, `zip_recruiter`, `glassdoor`, `google`, `bayt`, `naukri`, `bdjobs`. Five additional boards need custom scrapers (same pattern as the existing `adzuna` and `theladders` modules in `scripts/custom_boards/`). ## Boards to implement | Board | Notes | |-------|-------| | **Monster** | Large legacy board; public search, no auth required | | **CareerBuilder** | Large legacy board; public search | | **Dice** | Tech-focused; JSON API available | | **RemoteOK** | Remote-only; public JSON API at `remoteok.com/remote-jobs.json` | | **WeWorkRemotely** | Remote-only; RSS feed available | ## Implementation notes - Each scraper lives in `scripts/custom_boards/<name>.py` and exports a `scrape(profile, location, results_wanted)` function returning a list of job dicts - Register in `CUSTOM_SCRAPERS` dict in `scripts/discover.py` - Add to `_all_custom` list and `_custom_board_labels` dict in `app/pages/2_Settings.py` - RemoteOK and WeWorkRemotely ignore `location` param (remote-only) - Follow existing adzuna/theladders scraper interface exactly ## Priority RemoteOK and WeWorkRemotely are highest value (free public APIs, no auth). Dice next for tech-focused roles.
pyr0ball added this to the Public Launch milestone 2026-04-04 16:33:18 -07:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Circuit-Forge/peregrine#44
No description provided.