feat: speaker diarization via pyannote.audio (cf_voice.context) #8

New issue

Closed

opened 2026-04-06 13:09:40 -07:00 by pyr0ball · 2 comments

pyr0ball commented

2026-04-06 13:09:40 -07:00

Owner

Enable multi-speaker attribution so annotations are labeled per speaker, not per utterance.

Requirements:

pyannote.audio 3.x via cf_voice.context diarization pipeline
Speaker IDs are local, ephemeral labels (Speaker A, Speaker B) — no biometric storage
Embeddings used only for same-session grouping, discarded at session end
Accuracy degrades gracefully for 3+ simultaneous speakers (label as 'Multiple')

Gotcha: pyannote/speaker-diarization-3.1 requires HuggingFace license acceptance before download. Must be done manually on Heimdall. Document in CLAUDE.md and deploy runbook.

Tier gate: local diarization = Free (slower). Cloud-assisted diarization = Paid (higher accuracy, lower latency).

Enable multi-speaker attribution so annotations are labeled per speaker, not per utterance. Requirements: - pyannote.audio 3.x via cf_voice.context diarization pipeline - Speaker IDs are local, ephemeral labels (Speaker A, Speaker B) — no biometric storage - Embeddings used only for same-session grouping, discarded at session end - Accuracy degrades gracefully for 3+ simultaneous speakers (label as 'Multiple') **Gotcha:** pyannote/speaker-diarization-3.1 requires HuggingFace license acceptance before download. Must be done manually on Heimdall. Document in CLAUDE.md and deploy runbook. Tier gate: local diarization = Free (slower). Cloud-assisted diarization = Paid (higher accuracy, lower latency).

pyr0ball added this to the Navigation — v0.2.x milestone 2026-04-06 13:09:40 -07:00

pyr0ball added the

enhancement

backlog

cf-core-dep

labels 2026-04-06 13:09:40 -07:00

pyr0ball commented

2026-04-11 09:50:48 -07:00

Author

Owner

Code exists: cf_voice.diarize.Diarizer wraps pyannote/speaker-diarization-3.1 with async thread pool. Gated by CF_VOICE_DIARIZE=1 + HF_TOKEN. Lazy-loaded in _classify_real_async alongside tone + STT.

Not yet confirmed working end-to-end (model acceptance on HuggingFace required, 2s windows may be too short for reliable diarization). Keeping open until tested in a real session.

Code exists: `cf_voice.diarize.Diarizer` wraps `pyannote/speaker-diarization-3.1` with async thread pool. Gated by `CF_VOICE_DIARIZE=1` + `HF_TOKEN`. Lazy-loaded in `_classify_real_async` alongside tone + STT. Not yet confirmed working end-to-end (model acceptance on HuggingFace required, 2s windows may be too short for reliable diarization). Keeping open until tested in a real session.

pyr0ball referenced this issue

2026-04-11 09:58:12 -07:00

feat: acoustic environment fingerprinting + privacy risk scoring #20

pyr0ball referenced this issue

2026-04-12 10:19:26 -07:00

feat: negotiation discourse analysis — affect-text discordance, linguistic evasion, and manipulation pattern detection [INTERNAL] #21

pyr0ball commented

2026-04-12 10:54:05 -07:00

Author

Owner

Implementation complete (cf-voice side)

cf-voice#1 implemented and closed:

SpeakerTracker — maps pyannote IDs (SPEAKER_00) to stable per-session friendly labels (Speaker A, Speaker B, ...). Resets on session stop. No biometric data stored.
speaker_at() updated: single speaker → friendly label, 2+ covering same timestamp → "Multiple", silence → "speaker_a"
ContextClassifier now holds a per-session SpeakerTracker, passes it into speaker_at() on every window, resets on stop()
14 tests in cf_voice/tests/test_diarize.py, all passing

Remaining before closing this issue

Manual step required on Heimdall: HuggingFace gated model acceptance. Must be done by hand at:

Then set CF_VOICE_DIARIZE=1 + HF_TOKEN=<token> in the linnet .env and run a live session to confirm end-to-end attribution.

Once the live test passes, close this issue.

## Implementation complete (cf-voice side) **cf-voice#1** implemented and closed: - `SpeakerTracker` — maps pyannote IDs (`SPEAKER_00`) to stable per-session friendly labels (`Speaker A`, `Speaker B`, ...). Resets on session stop. No biometric data stored. - `speaker_at()` updated: single speaker → friendly label, 2+ covering same timestamp → `"Multiple"`, silence → `"speaker_a"` - `ContextClassifier` now holds a per-session `SpeakerTracker`, passes it into `speaker_at()` on every window, resets on `stop()` - 14 tests in `cf_voice/tests/test_diarize.py`, all passing ## Remaining before closing this issue **Manual step required on Heimdall:** HuggingFace gated model acceptance. Must be done by hand at: - https://huggingface.co/pyannote/speaker-diarization-3.1 - https://huggingface.co/pyannote/segmentation-3.0 Then set `CF_VOICE_DIARIZE=1` + `HF_TOKEN=<token>` in the linnet `.env` and run a live session to confirm end-to-end attribution. Once the live test passes, close this issue.

pyr0ball closed this issue

2026-04-17 11:55:52 -07:00