feat: SpeakerTracker — ephemeral friendly labels + overlap detection for diarization #1

Closed
opened 2026-04-12 10:52:02 -07:00 by pyr0ball · 0 comments
Owner

Context

Linnet#8 covers end-to-end diarization for Linnet. Two gaps exist in diarize.py that block the issue from being closed:

Changes needed

1. SpeakerTracker — ephemeral friendly label mapping

Diarizer returns raw pyannote segment labels like SPEAKER_00, SPEAKER_01. These should be mapped to stable per-session friendly labels (Speaker A, Speaker B, ...) so:

  • The frontend never sees internal pyannote IDs
  • Labels remain stable across windows within one session
  • Embeddings are never stored — only the label→friendly string map, discarded at session end

Add SpeakerTracker class to diarize.py:

class SpeakerTracker:
    """Maps pyannote speaker IDs to ephemeral per-session friendly labels."""
    def label(self, raw_id: str) -> str: ...
    def reset(self) -> None: ...  # call at session end

2. Overlap detection in speaker_at()

When two SpeakerSegments cover the same timestamp (simultaneous speech), the current speaker_at() silently returns the first match. The correct behavior:

  • 1 segment covers timestamp → return its label
  • 2+ segments cover timestamp → return "Multiple"
  • 0 segments → return "speaker_a" (silence/VAD miss)

Rename or add speaker_at_window() with this logic.

3. Tests

Add tests/test_diarize.py:

  • SpeakerTracker mapping stability across calls
  • SpeakerTracker handles >26 speakers gracefully
  • speaker_at() single speaker
  • speaker_at() overlap → "Multiple"
  • speaker_at() silence → "speaker_a"
  • Diarizer.from_env() raises EnvironmentError when HF_TOKEN absent
  • Mock path: _noop_diarize() returns []

Out of scope

  • Biometric storage (embeddings are never persisted)
  • Identifying who a speaker is across sessions
  • Model download / HF token acceptance (documented in cf-voice README separately)

Env vars

  • CF_VOICE_DIARIZE=1 — opt-in, default off
  • HF_TOKEN — required when diarize is enabled

Tracking

Linnet#8

## Context Linnet#8 covers end-to-end diarization for Linnet. Two gaps exist in `diarize.py` that block the issue from being closed: ## Changes needed ### 1. `SpeakerTracker` — ephemeral friendly label mapping `Diarizer` returns raw pyannote segment labels like `SPEAKER_00`, `SPEAKER_01`. These should be mapped to stable per-session friendly labels (`Speaker A`, `Speaker B`, ...) so: - The frontend never sees internal pyannote IDs - Labels remain stable across windows within one session - Embeddings are never stored — only the label→friendly string map, discarded at session end Add `SpeakerTracker` class to `diarize.py`: ```python class SpeakerTracker: """Maps pyannote speaker IDs to ephemeral per-session friendly labels.""" def label(self, raw_id: str) -> str: ... def reset(self) -> None: ... # call at session end ``` ### 2. Overlap detection in `speaker_at()` When two `SpeakerSegment`s cover the same timestamp (simultaneous speech), the current `speaker_at()` silently returns the first match. The correct behavior: - 1 segment covers timestamp → return its label - 2+ segments cover timestamp → return `"Multiple"` - 0 segments → return `"speaker_a"` (silence/VAD miss) Rename or add `speaker_at_window()` with this logic. ### 3. Tests Add `tests/test_diarize.py`: - `SpeakerTracker` mapping stability across calls - `SpeakerTracker` handles >26 speakers gracefully - `speaker_at()` single speaker - `speaker_at()` overlap → `"Multiple"` - `speaker_at()` silence → `"speaker_a"` - `Diarizer.from_env()` raises `EnvironmentError` when `HF_TOKEN` absent - Mock path: `_noop_diarize()` returns `[]` ## Out of scope - Biometric storage (embeddings are never persisted) - Identifying *who* a speaker is across sessions - Model download / HF token acceptance (documented in cf-voice README separately) ## Env vars - `CF_VOICE_DIARIZE=1` — opt-in, default off - `HF_TOKEN` — required when diarize is enabled ## Tracking Linnet#8
pyr0ball added the
enhancement
diarization
testing
labels 2026-04-12 10:52:02 -07:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Circuit-Forge/cf-voice#1
No description provided.