eval: cohere-transcribe-diarize as pyannote replacement backend #5
Labels
No labels
a11y
acoustic
backlog
bug
cf-core-dep
diarization
enhancement
inference
privacy
stt
testing
tier:paid
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: Circuit-Forge/cf-voice#5
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Source: https://huggingface.co/syvai/cohere-transcribe-diarize
What it is
syvai/cohere-transcribe-diarizeis a conformer encoder-decoder that performs transcription + diarization in a single forward pass. It extends the vocabulary with 8 speaker tokens and 300 timestamp tokens (100ms resolution), emitting an interleaved stream like:License: Apache 2.0 (no gating, no HF_TOKEN required)
Speed: 44x real-time on RTX 3090; 249x throughput via vLLM batching
Why this matters for cf-voice
Current
cf_voice/diarize.pyusespyannote/speaker-diarization-3.1, which:HF_TOKENcohere-transcribe-diarize eliminates all three pain points.
Integration approach
Add as an optional backend in
cf_voice/diarize.pyalongside the existing pyannote backend:The 30s window limit requires the sliding-window + speaker embedding clustering path already provided in the model's helper scripts.
Before adopting
Products that benefit
vLLM deployment note
vLLM 0.19.0 with continuous batching is the recommended deployment. This maps cleanly onto cf-orch's existing GPU worker pattern.