tracking: cf-core integration points for cf-voice (SSE wire format, preferences hooks) #34
Labels
No labels
architecture
backlog
enhancement
module:documents
module:hardware
module:manage
module:pipeline
module:voice
priority:backlog
priority:high
priority:medium
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: Circuit-Forge/circuitforge-core#34
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Graduate the
voicestub to a fullcf_voicemodule. Two active consumers: Osprey (telephony + IVR) and Peregrine (voice I/O for nonverbal users). Design doc:circuitforge-plans/circuitforge-core/2026-04-06-cf-voice-design.md.Three sub-modules
cf_voice.io — Speech I/O
STT (Whisper local, cloud fallback) and TTS (Piper local, cloud fallback).
Key output:
TranscriptResultwithconfidenceandflagged_low_confidence(cf-orch#17 a11y blocker).Protocol interface:
cf_voice.context — Parallel audio classifier
Non-conversational. Runs alongside STT, never tries to understand words — classifies acoustic features only.
Event classes:
hold_music | silence | ringback | busy | dead_airivr_synth | human_single | human_multi | transfercall_center | music | background_shiftaffect+shift+prosody+subtext(Elcor label)Model strategy:
Elcor mode (accessibility feature)
Tone shifts and affect are converted to a human-readable annotation prepended to the transcript:
This is an explicit accessibility feature for autistic and ND users who may not reliably perceive implicit tonal/emotional cues in voice interactions. It is opt-in, user-configurable, and rendered locally — no audio or labels leave the device.
The classifier also doubles as local AMD (answering machine detection):
background_shiftfrom hold music to call-center ambient is a reliable pre-speech human-answered signal, resolving the FreeSWITCH AMD open question from the telephony spec.cf_voice.telephony — Outbound telephony abstraction
announce()implements the adaptive service identification requirement (cf-orch#18, osprey#21).Combined output type
Tier mapping
Build sequence
cf_voice.io— unblocks peregrine#74 (voice I/O for nonverbal users)cf_voice.telephony— unblocks osprey#1cf_voice.context— queue state + speaker first (AMD), tone + Elcor secondOpen questions
References
circuitforge-plans/osprey/superpowers/specs/2026-04-04-telephony-backend-design.mdcf_voice is now a standalone repo (Circuit-Forge/cf-voice, MIT/BSL split) rather than a cf-core module — see #35 (closed) and cf-core#39 (closed).
This issue should track any cf-core integration points that depend on cf-voice (e.g. shared VoiceFrame SSE wire format #40, preferences.prefers_reduced_motion #38). Retitling.
New module: cf_voice — STT/TTS, parallel audio classifier, telephony abstractionto tracking: cf-core integration points for cf-voice (SSE wire format, preferences hooks)Progress update
cf_voice.telephony shipped (Notation v0.1.x):
TelephonyBackendProtocol (MIT)MockTelephonyBackend— dev/CI, no real calls, AMD simulation (MIT)SignalWireBackend— paid tier (BSL)FreeSWITCHBackend— free tier self-hosted (BSL)make_telephony()factory — env-driven backend selectioncf-voice[signalwire],cf-voice[freeswitch]Unblocks osprey#1.
cf_voice.iobuild fix: real backend raisesNotImplementedErrorinstead ofImportErrorwhen inference extras missing.Also closed: #40 (SSE wire format was already documented in README as of previous session).
Remaining:
Closing — all deliverables complete
What shipped in this pass
cf_voice.prefs (new, MIT):
PREF_ELCOR_MODE,PREF_CONFIDENCE_THRESHOLD,PREF_WHISPER_MODEL,PREF_ELCOR_PRIOR_FRAMESget_voice_pref()/set_voice_pref()— optional cf-core integration, env var fallback, built-in defaultsis_elcor_enabled(),get_confidence_threshold(),get_whisper_model(),get_elcor_prior_frames()convenience helperscf_voice.acoustic (new, MIT Protocol + BSL stub):
AcousticBackendProtocol (@runtime_checkable)MockAcousticBackend— simulates full call lifecycle: ringback → IVR → hold music → AMD signal (background_shift) → human answeredYAMNetAcousticBackend— Navigation v0.2.x stub, clearNotImplementedErrormake_acoustic()factorycf_voice.context (extended):
cf_voice.prefs— Elcor mode and prior_frames read from user preference storeclassify_chunk()now returns all four event types: tone + queue + speaker + environMockAcousticBackendin mock mode;YAMNetAcousticBackendstub in real mode (graceful passthrough on NotImplementedError)session_idpropagates into ToneEventuser_id+storeplumbed through from constructionTests: 64 passing (was 31)
Open questions resolved
Piper vs. Coqui TTS: Piper. Coqui-TTS is effectively abandoned (last release 2023). Piper is maintained by Nabu Casa (Home Assistant), ships as a single binary with pre-built voices, and runs on CPU without Python binding issues.
.env.examplewill noteCF_VOICE_TTS_BACKEND=piper.wav2vec2 SER model VRAM:
ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition(~1.5GB VRAM) is the right pick. On the RTX 4000 SFF Ada (8GB) it fits alongside Whisper small (500MB) with 6GB headroom for the LLM router. If VRAM is tight, the model can run on CPU at ~2× realtime — acceptable for 2s windows.Elcor label prompt design: 4 prior frames (~8–10 seconds of context at 2.5s intervals). The prompt includes the last N
affectlabels in sequence so the LLM can label the shift not just the instantaneous affect. Configured viaPREF_ELCOR_PRIOR_FRAMES(default 4, user-adjustable).Event-driven vs. continuous stream: Event-driven. The streaming path suppresses frames where
shift_magnitude < 0.15(theis_shift()threshold). Queue/environ events only emit on label change. This prevents SSE flooding on long hold-music segments.pyannote.audio license: CC BY 4.0 — commercial use is permitted with attribution. The two gated models (
pyannote/speaker-diarization-3.1andpyannote/segmentation-3.0) require HuggingFace account acceptance, already documented in.env.exampleand the diarize.py module header.Navigation v0.2.x (real YAMNet + pyannote wiring into ContextClassifier.stream()) will be a separate issue.