docs: update spec — Jobgether discovery scraper not viable (Cloudflare + robots.txt)

2026-03-15 09:39:48 -07:00 · 2026-03-15 09:39:48 -07:00 · 4d08e64acf
commit 4d08e64acf
parent fc6ef88a05
1 changed files with 3 additions and 2 deletions
--- a/docs/superpowers/specs/2026-03-15-jobgether-integration-design.md
+++ b/docs/superpowers/specs/2026-03-15-jobgether-integration-design.md
@ -154,9 +154,10 @@ Implementation: add an `is_jobgether` flag to the cover letter prompt context (s
 ## Out of Scope

 - Retroactively fixing existing `company = "Jobgether"` rows in the DB (left for manual review/rejection)
+- Jobgether discovery scraper — **decided against during implementation (2026-03-15)**: Cloudflare Turnstile blocks all headless browsers on all Jobgether pages; `filter-api.jobgether.com` requires auth; `robots.txt` blocks all bots. The email digest → manual URL paste → slug company extraction flow covers the actual use case.
 - Jobgether authentication / logged-in scraping
- Pagination beyond `results_wanted` cap
- Dedup between Jobgether scraper and other boards (existing URL dedup in `discover.py` handles this)
+- Pagination
+- Dedup between Jobgether and other boards (existing URL dedup handles this)

 ---