CircuitForge voice annotation pipeline — VoiceFrame API, tone classifiers, speaker diarization

Find a file

pyr0ball fed6388b99 feat: real inference pipeline — STT, tone classifier, diarization, mic capture - cf_voice/stt.py: WhisperSTT async wrapper (faster-whisper, thread-pool executor, rolling 50-word session prompt for cross-chunk context continuity) - cf_voice/classify.py: ToneClassifier — wav2vec2 SER + librosa prosody flags (energy, ZCR speech rate, YIN pitch contour) mapped to AFFECT_LABELS - cf_voice/diarize.py: Diarizer async wrapper around pyannote/speaker-diarization-3.1; speaker_at() helper for Navigation v0.2.x wiring - cf_voice/capture.py: MicVoiceIO — sounddevice 16kHz mono capture, 2s window accumulation, parallel STT+classify tasks, shift_magnitude from confidence delta - cf_voice/io.py: make_io() now returns MicVoiceIO when CF_VOICE_MOCK is unset - cf_voice/context.py: classify_chunk() split into mock/real paths; real path decodes base64 PCM and runs ToneClassifier synchronously (cf-orch endpoint) - pyproject.toml: inference extras expanded (faster-whisper, sounddevice, librosa, python-dotenv) - .env.example: HF_TOKEN, CF_VOICE_WHISPER_MODEL, CF_VOICE_DEVICE, CF_VOICE_MOCK, CF_VOICE_CONFIDENCE_THRESHOLD Prior art ported from: Plex-Scripts/transcription/diarization.py (pyannote setup), devl/ogma/backend/speech/transcription_engine.py (faster-whisper preprocessing and session prompt pattern).		2026-04-06 17:33:51 -07:00
cf_voice	feat: real inference pipeline — STT, tone classifier, diarization, mic capture	2026-04-06 17:33:51 -07:00
tests	feat: initial cf-voice stub — VoiceFrame API, mock IO, context classifier	2026-04-06 16:03:07 -07:00
.env.example	feat: real inference pipeline — STT, tone classifier, diarization, mic capture	2026-04-06 17:33:51 -07:00
.gitignore	feat: initial cf-voice stub — VoiceFrame API, mock IO, context classifier	2026-04-06 16:03:07 -07:00
pyproject.toml	feat: real inference pipeline — STT, tone classifier, diarization, mic capture	2026-04-06 17:33:51 -07:00
README.md	feat: initial cf-voice stub — VoiceFrame API, mock IO, context classifier	2026-04-06 16:03:07 -07:00

README.md

cf-voice

CircuitForge voice annotation pipeline. Produces VoiceFrame objects from a live audio stream — tone label, confidence, speaker identity, and shift magnitude.

Status: Notation v0.1.x stub — mock mode only. Real classifiers (YAMNet, wav2vec2, pyannote.audio) land incrementally.

Install

pip install -e ../cf-voice   # editable install alongside sibling repos

Quick start

from cf_voice.context import ContextClassifier

classifier = ContextClassifier.mock()          # or from_env() with CF_VOICE_MOCK=1
async for frame in classifier.stream():
    print(frame.label, frame.confidence)

Or run the demo CLI:

CF_VOICE_MOCK=1 cf-voice-demo

VoiceFrame

@dataclass
class VoiceFrame:
    label: str            # e.g. "Warmly impatient"
    confidence: float     # 0.0–1.0
    speaker_id: str       # ephemeral local label, e.g. "speaker_a"
    shift_magnitude: float  # delta from previous frame, 0.0–1.0
    timestamp: float      # session-relative seconds

Mock mode

Set CF_VOICE_MOCK=1 or pass mock=True to make_io(). No GPU or microphone required. Useful for CI and frontend development.

Module structure

Module	License	Purpose
`cf_voice.models`	MIT	`VoiceFrame` dataclass
`cf_voice.io`	MIT	Audio capture, mock generator
`cf_voice.context`	BSL 1.1*	Tone classification, diarization

*BSL applies when real inference models are integrated. Currently stub = MIT.

Consumed by

Circuit-Forge/linnet — real-time tone annotation widget
Circuit-Forge/osprey — telephony bridge voice context

README.md Unescape Escape