feat: speaker diarization via pyannote.audio (cf_voice.context) #8

Closed
opened 2026-04-06 13:09:40 -07:00 by pyr0ball · 2 comments
Owner

Enable multi-speaker attribution so annotations are labeled per speaker, not per utterance.

Requirements:

  • pyannote.audio 3.x via cf_voice.context diarization pipeline
  • Speaker IDs are local, ephemeral labels (Speaker A, Speaker B) — no biometric storage
  • Embeddings used only for same-session grouping, discarded at session end
  • Accuracy degrades gracefully for 3+ simultaneous speakers (label as 'Multiple')

Gotcha: pyannote/speaker-diarization-3.1 requires HuggingFace license acceptance before download. Must be done manually on Heimdall. Document in CLAUDE.md and deploy runbook.

Tier gate: local diarization = Free (slower). Cloud-assisted diarization = Paid (higher accuracy, lower latency).

Enable multi-speaker attribution so annotations are labeled per speaker, not per utterance. Requirements: - pyannote.audio 3.x via cf_voice.context diarization pipeline - Speaker IDs are local, ephemeral labels (Speaker A, Speaker B) — no biometric storage - Embeddings used only for same-session grouping, discarded at session end - Accuracy degrades gracefully for 3+ simultaneous speakers (label as 'Multiple') **Gotcha:** pyannote/speaker-diarization-3.1 requires HuggingFace license acceptance before download. Must be done manually on Heimdall. Document in CLAUDE.md and deploy runbook. Tier gate: local diarization = Free (slower). Cloud-assisted diarization = Paid (higher accuracy, lower latency).
pyr0ball added this to the Navigation — v0.2.x milestone 2026-04-06 13:09:40 -07:00
pyr0ball added the
enhancement
backlog
cf-core-dep
labels 2026-04-06 13:09:40 -07:00
Author
Owner

Code exists: cf_voice.diarize.Diarizer wraps pyannote/speaker-diarization-3.1 with async thread pool. Gated by CF_VOICE_DIARIZE=1 + HF_TOKEN. Lazy-loaded in _classify_real_async alongside tone + STT.

Not yet confirmed working end-to-end (model acceptance on HuggingFace required, 2s windows may be too short for reliable diarization). Keeping open until tested in a real session.

Code exists: `cf_voice.diarize.Diarizer` wraps `pyannote/speaker-diarization-3.1` with async thread pool. Gated by `CF_VOICE_DIARIZE=1` + `HF_TOKEN`. Lazy-loaded in `_classify_real_async` alongside tone + STT. Not yet confirmed working end-to-end (model acceptance on HuggingFace required, 2s windows may be too short for reliable diarization). Keeping open until tested in a real session.
Author
Owner

Implementation complete (cf-voice side)

cf-voice#1 implemented and closed:

  • SpeakerTracker — maps pyannote IDs (SPEAKER_00) to stable per-session friendly labels (Speaker A, Speaker B, ...). Resets on session stop. No biometric data stored.
  • speaker_at() updated: single speaker → friendly label, 2+ covering same timestamp → "Multiple", silence → "speaker_a"
  • ContextClassifier now holds a per-session SpeakerTracker, passes it into speaker_at() on every window, resets on stop()
  • 14 tests in cf_voice/tests/test_diarize.py, all passing

Remaining before closing this issue

Manual step required on Heimdall: HuggingFace gated model acceptance. Must be done by hand at:

Then set CF_VOICE_DIARIZE=1 + HF_TOKEN=<token> in the linnet .env and run a live session to confirm end-to-end attribution.

Once the live test passes, close this issue.

## Implementation complete (cf-voice side) **cf-voice#1** implemented and closed: - `SpeakerTracker` — maps pyannote IDs (`SPEAKER_00`) to stable per-session friendly labels (`Speaker A`, `Speaker B`, ...). Resets on session stop. No biometric data stored. - `speaker_at()` updated: single speaker → friendly label, 2+ covering same timestamp → `"Multiple"`, silence → `"speaker_a"` - `ContextClassifier` now holds a per-session `SpeakerTracker`, passes it into `speaker_at()` on every window, resets on `stop()` - 14 tests in `cf_voice/tests/test_diarize.py`, all passing ## Remaining before closing this issue **Manual step required on Heimdall:** HuggingFace gated model acceptance. Must be done by hand at: - https://huggingface.co/pyannote/speaker-diarization-3.1 - https://huggingface.co/pyannote/segmentation-3.0 Then set `CF_VOICE_DIARIZE=1` + `HF_TOKEN=<token>` in the linnet `.env` and run a live session to confirm end-to-end attribution. Once the live test passes, close this issue.
Sign in to join this conversation.
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Circuit-Forge/linnet#8
No description provided.