feat: job_quality module — deterministic trust scorer for job listings #48

New issue

Closed

opened 2026-04-12 17:32:00 -07:00 by pyr0ball · 0 comments

pyr0ball commented

2026-04-12 17:32:00 -07:00

Owner

Overview

New MIT module circuitforge_core/job_quality/ that scores a job listing for ghost/shadow/scam risk. Returns a structured JobQualityScore with a 0.0–1.0 trust score and per-signal breakdown.

Intended callers: Peregrine (job review badge), Falcon (benefits listing quality), and any future product that ingests external listings.

Module layout

circuitforge_core/job_quality/
    __init__.py          — exports score_job, JobListing, JobQualityScore
    models.py            — Pydantic input/output models
    signals.py           — individual signal functions (one per indicator)
    scorer.py            — score_job() aggregates signals into final score

Input models

class JobListing(BaseModel):
    title: str
    company: str
    description: str
    salary_min: int | None = None
    salary_max: int | None = None
    location: str | None = None          # used for transparency-law state check
    apply_url: str | None = None
    date_posted: date | None = None
    source: str | None = None            # board name
    applicant_count: int | None = None   # if provided by board

class JobEnrichment(BaseModel):
    repost_count: int = 0                # times caller has seen this company+title
    is_staffing_agency: bool = False
    layoff_news_snippet: str | None = None   # SearXNG result, caller-provided
    user_company_response_rate: float | None = None  # 0–1, from caller history

Output models

class SignalResult(BaseModel):
    name: str
    fired: bool
    weight: float
    reason: str

class JobQualityScore(BaseModel):
    trust_score: float        # 0.0 (ghost/scam) → 1.0 (trustworthy)
    risk_level: str           # "low" | "medium" | "high" | "critical"
    signals: list[SignalResult]
    top_reasons: list[str]    # 2–3 human-readable reasons for the score
    confidence: float         # 0.0–1.0; low when few signals fired

Signal set

High-weight (deterministic)

Signal	Fires when	Weight
`listing_age`	date_posted > 60 days ago	0.25
`repost_detected`	repost_count > 1	0.25
`no_salary_transparency`	no salary + location in CO/CA/NY/WA/IL/MA	0.20
`always_open_pattern`	repost_count > 3 or listing_age > 90	0.20
`staffing_agency`	is_staffing_agency=True	0.15

Medium-weight

Signal	Fires when	Weight
`requirement_overload`	regex detects "X+ years" where X > tech age estimate	0.12
`jd_vagueness`	description < 300 chars or buzzword density > threshold	0.10
`ats_blackhole`	apply_url domain is known black-hole ATS with no recruiter name	0.10
`high_applicant_count`	applicant_count > 500	0.08
`layoff_news`	layoff_news_snippet is non-empty	0.12

Low-weight / contextual

Signal	Fires when	Weight
`weekend_posted`	date_posted falls on Sat/Sun	0.04
`poor_response_history`	user_company_response_rate < 0.05	0.08

Avocet routing

When confidence < 0.5 (few signals, uncertain score), the caller is expected to submit an unlabeled signal to Avocet for human review. JobQualityScore includes confidence explicitly for this purpose.

Avocet label schema for this signal type:

signal_type: job_quality
label: ghost | legitimate | scam | unknown
features: title_hash, company_hash, trust_score, signals_fired[]

This feeds a labeled corpus for a future learned classifier that supplements the deterministic scorer.

Licensing

Full MIT — no LLM inference in the core module. Callers may optionally pass LLM-derived fields (vagueness score, requirement analysis) as enrichment, but the module itself makes no inference calls.

Acceptance criteria

score_job(listing, enrichment=None) -> JobQualityScore implemented and tested
All signals independently unit-testable
trust_score range validated: always 0.0–1.0
confidence low when fewer than 3 signals have data
Transparency-law state list is a constant (easy to extend)
No network calls, no LLM calls, no file I/O in this module
90%+ test coverage

Circuit-Forge/peregrine — shadow listing detection UI (consumer of this module)
Circuit-Forge/avocet — job_quality signal labeling queue
Circuit-Forge/peregrine#93 — community signal upload (post-outcome contribution)

## Overview New MIT module `circuitforge_core/job_quality/` that scores a job listing for ghost/shadow/scam risk. Returns a structured `JobQualityScore` with a 0.0–1.0 trust score and per-signal breakdown. Intended callers: Peregrine (job review badge), Falcon (benefits listing quality), and any future product that ingests external listings. ## Module layout ``` circuitforge_core/job_quality/ __init__.py — exports score_job, JobListing, JobQualityScore models.py — Pydantic input/output models signals.py — individual signal functions (one per indicator) scorer.py — score_job() aggregates signals into final score ``` ## Input models ```python class JobListing(BaseModel): title: str company: str description: str salary_min: int | None = None salary_max: int | None = None location: str | None = None # used for transparency-law state check apply_url: str | None = None date_posted: date | None = None source: str | None = None # board name applicant_count: int | None = None # if provided by board class JobEnrichment(BaseModel): repost_count: int = 0 # times caller has seen this company+title is_staffing_agency: bool = False layoff_news_snippet: str | None = None # SearXNG result, caller-provided user_company_response_rate: float | None = None # 0–1, from caller history ``` ## Output models ```python class SignalResult(BaseModel): name: str fired: bool weight: float reason: str class JobQualityScore(BaseModel): trust_score: float # 0.0 (ghost/scam) → 1.0 (trustworthy) risk_level: str # "low" | "medium" | "high" | "critical" signals: list[SignalResult] top_reasons: list[str] # 2–3 human-readable reasons for the score confidence: float # 0.0–1.0; low when few signals fired ``` ## Signal set ### High-weight (deterministic) | Signal | Fires when | Weight | |--------|-----------|--------| | `listing_age` | date_posted > 60 days ago | 0.25 | | `repost_detected` | repost_count > 1 | 0.25 | | `no_salary_transparency` | no salary + location in CO/CA/NY/WA/IL/MA | 0.20 | | `always_open_pattern` | repost_count > 3 or listing_age > 90 | 0.20 | | `staffing_agency` | is_staffing_agency=True | 0.15 | ### Medium-weight | Signal | Fires when | Weight | |--------|-----------|--------| | `requirement_overload` | regex detects "X+ years" where X > tech age estimate | 0.12 | | `jd_vagueness` | description < 300 chars or buzzword density > threshold | 0.10 | | `ats_blackhole` | apply_url domain is known black-hole ATS with no recruiter name | 0.10 | | `high_applicant_count` | applicant_count > 500 | 0.08 | | `layoff_news` | layoff_news_snippet is non-empty | 0.12 | ### Low-weight / contextual | Signal | Fires when | Weight | |--------|-----------|--------| | `weekend_posted` | date_posted falls on Sat/Sun | 0.04 | | `poor_response_history` | user_company_response_rate < 0.05 | 0.08 | ## Avocet routing When `confidence < 0.5` (few signals, uncertain score), the caller is expected to submit an `unlabeled` signal to Avocet for human review. `JobQualityScore` includes `confidence` explicitly for this purpose. Avocet label schema for this signal type: ``` signal_type: job_quality label: ghost | legitimate | scam | unknown features: title_hash, company_hash, trust_score, signals_fired[] ``` This feeds a labeled corpus for a future learned classifier that supplements the deterministic scorer. ## Licensing Full MIT — no LLM inference in the core module. Callers may optionally pass LLM-derived fields (vagueness score, requirement analysis) as enrichment, but the module itself makes no inference calls. ## Acceptance criteria - [ ] `score_job(listing, enrichment=None) -> JobQualityScore` implemented and tested - [ ] All signals independently unit-testable - [ ] `trust_score` range validated: always 0.0–1.0 - [ ] `confidence` low when fewer than 3 signals have data - [ ] Transparency-law state list is a constant (easy to extend) - [ ] No network calls, no LLM calls, no file I/O in this module - [ ] 90%+ test coverage ## Related - Circuit-Forge/peregrine — shadow listing detection UI (consumer of this module) - Circuit-Forge/avocet — job_quality signal labeling queue - Circuit-Forge/peregrine#93 — community signal upload (post-outcome contribution)