feat: shadow listing detector — trust badge in Job Review UI #95

Closed
opened 2026-04-12 17:32:29 -07:00 by pyr0ball · 0 comments
Owner

Overview

Integrate the circuitforge_core.job_quality scorer (cf-core#48) into the Job Review pipeline to flag ghost jobs, evergreen repost traps, and other timewasters before the user spends time applying.

User-visible feature

  • Trust badge on each job card in Job Review: colored dot + label (Trustworthy / Caution / Likely Ghost / Scam Risk)
  • Tooltip showing top 2 reasons from JobQualityScore.top_reasons
  • Filter chip in Job Review: "Hide likely ghosts" (filters risk_level >= high)
  • Auto-deprioritizes risk_level == critical listings to bottom of review queue (does not auto-reject)

Data gathering (Peregrine side)

Peregrine gathers enrichment data that cf-core cannot produce itself:

Field Source
repost_count staging.db — count rows matching company + normalized_title
is_staffing_agency company name pattern match ("Staffing", "Recruiting", "Talent", "Search Group")
layoff_news_snippet SearXNG — existing company_research.py adds a layoff news query
user_company_response_rate staging.db — (responded count / applied count) per company
applicant_count Parsed from JobSpy metadata where available

Avocet routing

When confidence < 0.5, submit an unlabeled job_quality signal to Avocet for human review:

# in scripts/job_quality_bridge.py
if score.confidence < 0.5:
    avocet_client.submit({
        "signal_type": "job_quality",
        "label": "unknown",
        "features": {
            "title_hash": sha256(normalize(job.title)),
            "company_hash": sha256(normalize(job.company)),
            "trust_score": score.trust_score,
            "signals_fired": [s.name for s in score.signals if s.fired],
        }
    })

High-confidence scores (confidence >= 0.8) with a clear ghost/scam label also route to Avocet as pre-labeled training data (no human review needed).

Avocet connection is gated on the same opt-in toggle as issue #93 (community signal contribution). One consent screen covers both.

Scoring call

from circuitforge_core.job_quality import score_job, JobListing, JobEnrichment

listing = JobListing(
    title=job["title"],
    company=job["company"],
    description=job["description"],
    salary_min=job.get("salary_min"),
    salary_max=job.get("salary_max"),
    location=job.get("location"),
    apply_url=job.get("url"),
    date_posted=job.get("date_found"),
    applicant_count=job.get("applicant_count"),
)
enrichment = JobEnrichment(
    repost_count=db.get_repost_count(job["company"], job["title"]),
    is_staffing_agency=_is_agency(job["company"]),
    layoff_news_snippet=research.get("layoff_snippet"),
    user_company_response_rate=db.get_company_response_rate(job["company"]),
)
score = score_job(listing, enrichment)

DB change

Add trust_score REAL and risk_level TEXT columns to jobs table (migration 005). Scores are cached — recomputed only when listing data changes or on manual refresh.

Tier

Badge display: Free (deterministic, no LLM)
Avocet routing and community contribution: Free opt-in
Future learned classifier (LLM-enhanced vagueness): Paid (behind BYOK gate)

Acceptance criteria

  • Trust badge visible on every job card in Job Review
  • Tooltip shows top 2 reasons
  • "Hide likely ghosts" filter chip works
  • Critical-risk listings sort to bottom of queue
  • repost_count query in scripts/db.py
  • get_company_response_rate() query in scripts/db.py
  • Avocet bridge: low-confidence unknowns submitted when opt-in enabled
  • High-confidence ghost/scam scores submitted to Avocet as pre-labeled data
  • Migration 005 adds trust_score + risk_level columns
  • Unit tests for enrichment gathering helpers
## Overview Integrate the `circuitforge_core.job_quality` scorer (cf-core#48) into the Job Review pipeline to flag ghost jobs, evergreen repost traps, and other timewasters before the user spends time applying. ## User-visible feature - **Trust badge** on each job card in Job Review: colored dot + label (Trustworthy / Caution / Likely Ghost / Scam Risk) - **Tooltip** showing top 2 reasons from `JobQualityScore.top_reasons` - **Filter chip** in Job Review: "Hide likely ghosts" (filters `risk_level >= high`) - Auto-deprioritizes `risk_level == critical` listings to bottom of review queue (does not auto-reject) ## Data gathering (Peregrine side) Peregrine gathers enrichment data that cf-core cannot produce itself: | Field | Source | |-------|--------| | `repost_count` | `staging.db` — count rows matching `company + normalized_title` | | `is_staffing_agency` | company name pattern match ("Staffing", "Recruiting", "Talent", "Search Group") | | `layoff_news_snippet` | SearXNG — existing `company_research.py` adds a layoff news query | | `user_company_response_rate` | `staging.db` — (responded count / applied count) per company | | `applicant_count` | Parsed from JobSpy metadata where available | ## Avocet routing When `confidence < 0.5`, submit an unlabeled `job_quality` signal to Avocet for human review: ```python # in scripts/job_quality_bridge.py if score.confidence < 0.5: avocet_client.submit({ "signal_type": "job_quality", "label": "unknown", "features": { "title_hash": sha256(normalize(job.title)), "company_hash": sha256(normalize(job.company)), "trust_score": score.trust_score, "signals_fired": [s.name for s in score.signals if s.fired], } }) ``` High-confidence scores (confidence >= 0.8) with a clear ghost/scam label also route to Avocet as pre-labeled training data (no human review needed). Avocet connection is gated on the same opt-in toggle as issue #93 (community signal contribution). One consent screen covers both. ## Scoring call ```python from circuitforge_core.job_quality import score_job, JobListing, JobEnrichment listing = JobListing( title=job["title"], company=job["company"], description=job["description"], salary_min=job.get("salary_min"), salary_max=job.get("salary_max"), location=job.get("location"), apply_url=job.get("url"), date_posted=job.get("date_found"), applicant_count=job.get("applicant_count"), ) enrichment = JobEnrichment( repost_count=db.get_repost_count(job["company"], job["title"]), is_staffing_agency=_is_agency(job["company"]), layoff_news_snippet=research.get("layoff_snippet"), user_company_response_rate=db.get_company_response_rate(job["company"]), ) score = score_job(listing, enrichment) ``` ## DB change Add `trust_score REAL` and `risk_level TEXT` columns to `jobs` table (migration 005). Scores are cached — recomputed only when listing data changes or on manual refresh. ## Tier Badge display: **Free** (deterministic, no LLM) Avocet routing and community contribution: **Free opt-in** Future learned classifier (LLM-enhanced vagueness): **Paid** (behind BYOK gate) ## Acceptance criteria - [ ] Trust badge visible on every job card in Job Review - [ ] Tooltip shows top 2 reasons - [ ] "Hide likely ghosts" filter chip works - [ ] Critical-risk listings sort to bottom of queue - [ ] `repost_count` query in `scripts/db.py` - [ ] `get_company_response_rate()` query in `scripts/db.py` - [ ] Avocet bridge: low-confidence unknowns submitted when opt-in enabled - [ ] High-confidence ghost/scam scores submitted to Avocet as pre-labeled data - [ ] Migration 005 adds `trust_score` + `risk_level` columns - [ ] Unit tests for enrichment gathering helpers ## Related - Circuit-Forge/circuitforge-core#48 — job_quality scorer (dependency) - Circuit-Forge/peregrine#93 — community signal contribution (shared opt-in toggle) - Circuit-Forge/avocet — job_quality labeling queue
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Circuit-Forge/peregrine#95
No description provided.