From a005397d5ddf1181c7493fb033f23e5c8039909d Mon Sep 17 00:00:00 2001 From: pyr0ball Date: Sun, 15 Mar 2026 09:39:48 -0700 Subject: [PATCH] =?UTF-8?q?docs:=20update=20spec=20=E2=80=94=20Jobgether?= =?UTF-8?q?=20discovery=20scraper=20not=20viable=20(Cloudflare=20+=20robot?= =?UTF-8?q?s.txt)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .../specs/2026-03-15-jobgether-integration-design.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/superpowers/specs/2026-03-15-jobgether-integration-design.md b/docs/superpowers/specs/2026-03-15-jobgether-integration-design.md index 3a73ad4..dd0ac41 100644 --- a/docs/superpowers/specs/2026-03-15-jobgether-integration-design.md +++ b/docs/superpowers/specs/2026-03-15-jobgether-integration-design.md @@ -154,9 +154,10 @@ Implementation: add an `is_jobgether` flag to the cover letter prompt context (s ## Out of Scope - Retroactively fixing existing `company = "Jobgether"` rows in the DB (left for manual review/rejection) +- Jobgether discovery scraper — **decided against during implementation (2026-03-15)**: Cloudflare Turnstile blocks all headless browsers on all Jobgether pages; `filter-api.jobgether.com` requires auth; `robots.txt` blocks all bots. The email digest → manual URL paste → slug company extraction flow covers the actual use case. - Jobgether authentication / logged-in scraping -- Pagination beyond `results_wanted` cap -- Dedup between Jobgether scraper and other boards (existing URL dedup in `discover.py` handles this) +- Pagination +- Dedup between Jobgether and other boards (existing URL dedup handles this) ---