feat: job_quality module — deterministic trust scorer for job listings #48

Open
opened 2026-04-12 17:32:00 -07:00 by pyr0ball · 0 comments
Owner

Overview

New MIT module circuitforge_core/job_quality/ that scores a job listing for ghost/shadow/scam risk. Returns a structured JobQualityScore with a 0.0–1.0 trust score and per-signal breakdown.

Intended callers: Peregrine (job review badge), Falcon (benefits listing quality), and any future product that ingests external listings.

Module layout

circuitforge_core/job_quality/
    __init__.py          — exports score_job, JobListing, JobQualityScore
    models.py            — Pydantic input/output models
    signals.py           — individual signal functions (one per indicator)
    scorer.py            — score_job() aggregates signals into final score

Input models

class JobListing(BaseModel):
    title: str
    company: str
    description: str
    salary_min: int | None = None
    salary_max: int | None = None
    location: str | None = None          # used for transparency-law state check
    apply_url: str | None = None
    date_posted: date | None = None
    source: str | None = None            # board name
    applicant_count: int | None = None   # if provided by board

class JobEnrichment(BaseModel):
    repost_count: int = 0                # times caller has seen this company+title
    is_staffing_agency: bool = False
    layoff_news_snippet: str | None = None   # SearXNG result, caller-provided
    user_company_response_rate: float | None = None  # 0–1, from caller history

Output models

class SignalResult(BaseModel):
    name: str
    fired: bool
    weight: float
    reason: str

class JobQualityScore(BaseModel):
    trust_score: float        # 0.0 (ghost/scam) → 1.0 (trustworthy)
    risk_level: str           # "low" | "medium" | "high" | "critical"
    signals: list[SignalResult]
    top_reasons: list[str]    # 2–3 human-readable reasons for the score
    confidence: float         # 0.0–1.0; low when few signals fired

Signal set

High-weight (deterministic)

Signal Fires when Weight
listing_age date_posted > 60 days ago 0.25
repost_detected repost_count > 1 0.25
no_salary_transparency no salary + location in CO/CA/NY/WA/IL/MA 0.20
always_open_pattern repost_count > 3 or listing_age > 90 0.20
staffing_agency is_staffing_agency=True 0.15

Medium-weight

Signal Fires when Weight
requirement_overload regex detects "X+ years" where X > tech age estimate 0.12
jd_vagueness description < 300 chars or buzzword density > threshold 0.10
ats_blackhole apply_url domain is known black-hole ATS with no recruiter name 0.10
high_applicant_count applicant_count > 500 0.08
layoff_news layoff_news_snippet is non-empty 0.12

Low-weight / contextual

Signal Fires when Weight
weekend_posted date_posted falls on Sat/Sun 0.04
poor_response_history user_company_response_rate < 0.05 0.08

Avocet routing

When confidence < 0.5 (few signals, uncertain score), the caller is expected to submit an unlabeled signal to Avocet for human review. JobQualityScore includes confidence explicitly for this purpose.

Avocet label schema for this signal type:

signal_type: job_quality
label: ghost | legitimate | scam | unknown
features: title_hash, company_hash, trust_score, signals_fired[]

This feeds a labeled corpus for a future learned classifier that supplements the deterministic scorer.

Licensing

Full MIT — no LLM inference in the core module. Callers may optionally pass LLM-derived fields (vagueness score, requirement analysis) as enrichment, but the module itself makes no inference calls.

Acceptance criteria

  • score_job(listing, enrichment=None) -> JobQualityScore implemented and tested
  • All signals independently unit-testable
  • trust_score range validated: always 0.0–1.0
  • confidence low when fewer than 3 signals have data
  • Transparency-law state list is a constant (easy to extend)
  • No network calls, no LLM calls, no file I/O in this module
  • 90%+ test coverage
  • Circuit-Forge/peregrine — shadow listing detection UI (consumer of this module)
  • Circuit-Forge/avocet — job_quality signal labeling queue
  • Circuit-Forge/peregrine#93 — community signal upload (post-outcome contribution)
## Overview New MIT module `circuitforge_core/job_quality/` that scores a job listing for ghost/shadow/scam risk. Returns a structured `JobQualityScore` with a 0.0–1.0 trust score and per-signal breakdown. Intended callers: Peregrine (job review badge), Falcon (benefits listing quality), and any future product that ingests external listings. ## Module layout ``` circuitforge_core/job_quality/ __init__.py — exports score_job, JobListing, JobQualityScore models.py — Pydantic input/output models signals.py — individual signal functions (one per indicator) scorer.py — score_job() aggregates signals into final score ``` ## Input models ```python class JobListing(BaseModel): title: str company: str description: str salary_min: int | None = None salary_max: int | None = None location: str | None = None # used for transparency-law state check apply_url: str | None = None date_posted: date | None = None source: str | None = None # board name applicant_count: int | None = None # if provided by board class JobEnrichment(BaseModel): repost_count: int = 0 # times caller has seen this company+title is_staffing_agency: bool = False layoff_news_snippet: str | None = None # SearXNG result, caller-provided user_company_response_rate: float | None = None # 0–1, from caller history ``` ## Output models ```python class SignalResult(BaseModel): name: str fired: bool weight: float reason: str class JobQualityScore(BaseModel): trust_score: float # 0.0 (ghost/scam) → 1.0 (trustworthy) risk_level: str # "low" | "medium" | "high" | "critical" signals: list[SignalResult] top_reasons: list[str] # 2–3 human-readable reasons for the score confidence: float # 0.0–1.0; low when few signals fired ``` ## Signal set ### High-weight (deterministic) | Signal | Fires when | Weight | |--------|-----------|--------| | `listing_age` | date_posted > 60 days ago | 0.25 | | `repost_detected` | repost_count > 1 | 0.25 | | `no_salary_transparency` | no salary + location in CO/CA/NY/WA/IL/MA | 0.20 | | `always_open_pattern` | repost_count > 3 or listing_age > 90 | 0.20 | | `staffing_agency` | is_staffing_agency=True | 0.15 | ### Medium-weight | Signal | Fires when | Weight | |--------|-----------|--------| | `requirement_overload` | regex detects "X+ years" where X > tech age estimate | 0.12 | | `jd_vagueness` | description < 300 chars or buzzword density > threshold | 0.10 | | `ats_blackhole` | apply_url domain is known black-hole ATS with no recruiter name | 0.10 | | `high_applicant_count` | applicant_count > 500 | 0.08 | | `layoff_news` | layoff_news_snippet is non-empty | 0.12 | ### Low-weight / contextual | Signal | Fires when | Weight | |--------|-----------|--------| | `weekend_posted` | date_posted falls on Sat/Sun | 0.04 | | `poor_response_history` | user_company_response_rate < 0.05 | 0.08 | ## Avocet routing When `confidence < 0.5` (few signals, uncertain score), the caller is expected to submit an `unlabeled` signal to Avocet for human review. `JobQualityScore` includes `confidence` explicitly for this purpose. Avocet label schema for this signal type: ``` signal_type: job_quality label: ghost | legitimate | scam | unknown features: title_hash, company_hash, trust_score, signals_fired[] ``` This feeds a labeled corpus for a future learned classifier that supplements the deterministic scorer. ## Licensing Full MIT — no LLM inference in the core module. Callers may optionally pass LLM-derived fields (vagueness score, requirement analysis) as enrichment, but the module itself makes no inference calls. ## Acceptance criteria - [ ] `score_job(listing, enrichment=None) -> JobQualityScore` implemented and tested - [ ] All signals independently unit-testable - [ ] `trust_score` range validated: always 0.0–1.0 - [ ] `confidence` low when fewer than 3 signals have data - [ ] Transparency-law state list is a constant (easy to extend) - [ ] No network calls, no LLM calls, no file I/O in this module - [ ] 90%+ test coverage ## Related - Circuit-Forge/peregrine — shadow listing detection UI (consumer of this module) - Circuit-Forge/avocet — job_quality signal labeling queue - Circuit-Forge/peregrine#93 — community signal upload (post-outcome contribution)
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Circuit-Forge/circuitforge-core#48
No description provided.