feat: shadow-listing flagger — contribute listing quality signals to community DB for fine-tuning #93

Closed
opened 2026-04-12 16:17:23 -07:00 by pyr0ball · 0 comments
Owner

Overview

When a job listing is flagged (ghost job, scam, or confirmed legitimate with outcome), contribute an opt-in anonymized listing_quality signal to the cf-core community DB. These signals feed a job-listing quality classifier fine-tuning corpus.

Background

Peregrine already detects patterns associated with ghost postings and scam listings. Pooling these signals across users (opt-in) produces a labeled training corpus that improves classifier quality for everyone. The cf-core community module (Kiwi shared meal plan design, 2026-04-12) provides the CommunitySignal base model and PostgreSQL store.

Signal schema (listing_quality)

{
  "signal_type": "listing_quality",
  "label": "ghost" | "scam" | "legitimate" | "unknown",
  "features": {
    "title_hash": str,          # SHA-256 of normalized title (no PII)
    "company_name_hash": str,   # SHA-256 of normalized company name
    "jd_length_chars": int,
    "salary_disclosed": bool,
    "apply_url_domain": str,    # domain only, not full URL
    "response_received": bool | None,   # outcome if known
    "days_to_response": int | None,
    "ats_platform": str | None,
    "repost_count": int,        # times this listing was seen
  },
  "outcome": "ghosted" | "responded" | "rejected" | "offer" | None
}

No PII (no applicant name, email, resume content). Title and company name are hashed before transmission.

  • Contribution is opt-in for all tiers, surfaced in Settings under "Help improve Peregrine"
  • Users can withdraw consent at any time; their signals are deleted from the community DB
  • Plain-language consent copy required before first contribution

Integration points

  • Reads from cf-core.community.CommunitySignal
  • Submits to cf-orch POST /ingest/signals (see Circuit-Forge/circuitforge-orch — cf-ingest ticket)
  • Avocet labeling UI consumes unlabeled signals (label = unknown) for human review

Acceptance criteria

  • listing_quality signal constructed and submitted on listing flag event (ghost/scam/outcome)
  • Opt-in toggle in Settings, off by default
  • Consent shown before first submission; withdrawal deletes signals from community DB
  • No PII in transmitted signal (verified by test asserting absence of name/email/resume fields)
  • Integration test: flag a listing, verify signal appears in community DB with correct schema
  • cf-core community module (2026-04-12)
  • Circuit-Forge/circuitforge-orch — cf-ingest service ticket
  • Circuit-Forge/avocet — cross-product labeling ticket
## Overview When a job listing is flagged (ghost job, scam, or confirmed legitimate with outcome), contribute an opt-in anonymized `listing_quality` signal to the cf-core community DB. These signals feed a job-listing quality classifier fine-tuning corpus. ## Background Peregrine already detects patterns associated with ghost postings and scam listings. Pooling these signals across users (opt-in) produces a labeled training corpus that improves classifier quality for everyone. The cf-core `community` module (Kiwi shared meal plan design, 2026-04-12) provides the `CommunitySignal` base model and PostgreSQL store. ## Signal schema (`listing_quality`) ```python { "signal_type": "listing_quality", "label": "ghost" | "scam" | "legitimate" | "unknown", "features": { "title_hash": str, # SHA-256 of normalized title (no PII) "company_name_hash": str, # SHA-256 of normalized company name "jd_length_chars": int, "salary_disclosed": bool, "apply_url_domain": str, # domain only, not full URL "response_received": bool | None, # outcome if known "days_to_response": int | None, "ats_platform": str | None, "repost_count": int, # times this listing was seen }, "outcome": "ghosted" | "responded" | "rejected" | "offer" | None } ``` No PII (no applicant name, email, resume content). Title and company name are hashed before transmission. ## Tier and consent - Contribution is **opt-in** for all tiers, surfaced in Settings under "Help improve Peregrine" - Users can withdraw consent at any time; their signals are deleted from the community DB - Plain-language consent copy required before first contribution ## Integration points - Reads from `cf-core.community.CommunitySignal` - Submits to `cf-orch` POST `/ingest/signals` (see Circuit-Forge/circuitforge-orch — cf-ingest ticket) - Avocet labeling UI consumes unlabeled signals (label = `unknown`) for human review ## Acceptance criteria - [ ] `listing_quality` signal constructed and submitted on listing flag event (ghost/scam/outcome) - [ ] Opt-in toggle in Settings, off by default - [ ] Consent shown before first submission; withdrawal deletes signals from community DB - [ ] No PII in transmitted signal (verified by test asserting absence of name/email/resume fields) - [ ] Integration test: flag a listing, verify signal appears in community DB with correct schema ## Related - cf-core community module (2026-04-12) - Circuit-Forge/circuitforge-orch — cf-ingest service ticket - Circuit-Forge/avocet — cross-product labeling ticket
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Circuit-Forge/peregrine#93
No description provided.