feat: cross-product community signal labeling — ingest and label signals from Kiwi, Peregrine, Snipe #32

Open
opened 2026-04-12 16:17:54 -07:00 by pyr0ball · 0 comments
Owner

Overview

Extend Avocet's card-stack labeling UI and benchmark harness to ingest unlabeled CommunitySignal records from the cf-core community DB (fed by Kiwi, Peregrine, Snipe) and route labeled outputs back to cf-orch for fine-tuning queue dispatch.

Background

Avocet is already the menagerie's labeling and classifier training tool (email classifier, benchmark harness, fine-tune harness). The cf-core community module (Kiwi shared meal plan design, 2026-04-12) and cf-orch cf-ingest service (separate ticket) create an unlabeled signal backlog that needs human review before entering fine-tuning. Avocet is the natural home for this labeling work.

New signal type adapters

signal_type Source Label options Notes
recipe_outcome Kiwi good / needs_work / blooper Photo + recipe name shown
recipe_blooper Kiwi funny / unsafe / not_a_blooper Moderation gate before community post
listing_quality Peregrine ghost / scam / legitimate / skip Text features only (no PII)
seller_trust Snipe scam / legitimate / uncertain / skip Feature vector display

Each adapter renders the signal's features in the card-stack UI using a type-specific card template. The existing ASMR drag-to-bucket interaction applies to all types.

Architecture

  • New app/sources/community_signals.py — polls cf-core.community PostgreSQL for label = NULL records by signal_type
  • New card templates per signal_type under frontend/src/components/cards/
  • Labeled outputs POST back to cf-orch /ingest/signals with the assigned label
  • Benchmark harness extended: track inter-annotator agreement across signal types
  • Avocet's existing fine-tune harness accepts the labeled corpus as a new dataset source

Tier and access

  • Community labeling is an internal tool — no tier gate, Avocet is internal beta
  • Labeler identity is tracked for agreement metrics (Directus user_id, not anonymized)
  • Abuse prevention: flag outlier labelers (one user labels all as one class) for review

Acceptance criteria

  • community_signals.py source adapter polls community DB, returns unlabeled signals by type
  • Card templates rendered for all 4 signal types
  • Drag-to-bucket labels POST to cf-orch with correct signal_type and label
  • Benchmark harness reports labeling throughput and inter-annotator agreement per signal_type
  • Integration test: seed 10 unlabeled recipe_outcome signals, label 5 via API, verify labeled count in community DB
  • Skip/uncertain labels do not enter fine-tuning queue (held for re-review)
  • cf-core community module (2026-04-12)
  • Circuit-Forge/circuitforge-orch — cf-ingest service ticket
  • Circuit-Forge/kiwi#75 — meal planner / community post source
  • Circuit-Forge/peregrine — listing_quality signal ticket
  • Circuit-Forge/snipe — seller_trust signal ticket
  • Existing ASMR bucket-expansion pattern (see avocet.md)
## Overview Extend Avocet's card-stack labeling UI and benchmark harness to ingest unlabeled `CommunitySignal` records from the cf-core community DB (fed by Kiwi, Peregrine, Snipe) and route labeled outputs back to cf-orch for fine-tuning queue dispatch. ## Background Avocet is already the menagerie's labeling and classifier training tool (email classifier, benchmark harness, fine-tune harness). The cf-core `community` module (Kiwi shared meal plan design, 2026-04-12) and cf-orch `cf-ingest` service (separate ticket) create an unlabeled signal backlog that needs human review before entering fine-tuning. Avocet is the natural home for this labeling work. ## New signal type adapters | signal_type | Source | Label options | Notes | |---|---|---|---| | `recipe_outcome` | Kiwi | good / needs_work / blooper | Photo + recipe name shown | | `recipe_blooper` | Kiwi | funny / unsafe / not_a_blooper | Moderation gate before community post | | `listing_quality` | Peregrine | ghost / scam / legitimate / skip | Text features only (no PII) | | `seller_trust` | Snipe | scam / legitimate / uncertain / skip | Feature vector display | Each adapter renders the signal's features in the card-stack UI using a type-specific card template. The existing ASMR drag-to-bucket interaction applies to all types. ## Architecture - New `app/sources/community_signals.py` — polls `cf-core.community` PostgreSQL for `label = NULL` records by signal_type - New card templates per signal_type under `frontend/src/components/cards/` - Labeled outputs POST back to cf-orch `/ingest/signals` with the assigned label - Benchmark harness extended: track inter-annotator agreement across signal types - Avocet's existing fine-tune harness accepts the labeled corpus as a new dataset source ## Tier and access - Community labeling is an internal tool — no tier gate, Avocet is internal beta - Labeler identity is tracked for agreement metrics (Directus user_id, not anonymized) - Abuse prevention: flag outlier labelers (one user labels all as one class) for review ## Acceptance criteria - [ ] `community_signals.py` source adapter polls community DB, returns unlabeled signals by type - [ ] Card templates rendered for all 4 signal types - [ ] Drag-to-bucket labels POST to cf-orch with correct signal_type and label - [ ] Benchmark harness reports labeling throughput and inter-annotator agreement per signal_type - [ ] Integration test: seed 10 unlabeled `recipe_outcome` signals, label 5 via API, verify labeled count in community DB - [ ] Skip/uncertain labels do not enter fine-tuning queue (held for re-review) ## Related - cf-core community module (2026-04-12) - Circuit-Forge/circuitforge-orch — cf-ingest service ticket - Circuit-Forge/kiwi#75 — meal planner / community post source - Circuit-Forge/peregrine — listing_quality signal ticket - Circuit-Forge/snipe — seller_trust signal ticket - Existing ASMR bucket-expansion pattern (see avocet.md)
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Circuit-Forge/avocet#32
No description provided.