pyr0ball
24f04b67db
feat: full voice pipeline — AST acoustic, accent, privacy, prosody, dimensional, trajectory, telephony, FastAPI app
...
New modules shipped (from Linnet integration):
- acoustic.py: AST (MIT/ast-finetuned-audioset-10-10-0.4593) replaces YAMNet stub;
527 AudioSet classes mapped to queue/speaker/environ/scene labels; _LABEL_MAP
includes hold_music, ringback, DTMF, background_shift, AMD signal chain
- accent.py: facebook/mms-lid-126 language ID → regional accent labels
(en_gb, en_us, en_au, fr, es, de, zh, …); lazy-loaded, gated by CF_VOICE_ACCENT
- privacy.py: compound privacy risk scorer — public_env, background_voices,
nature scene, accent signals; returns 0–3 score without storing any audio
- prosody.py: openSMILE-backed prosody extractor (sarcasm_risk, flat_f0_score,
speech_rate, pitch_range); mock mode returns neutral values
- dimensional.py: audeering/wav2vec2-large-robust-12-ft-emotion-msp-dim
valence/arousal/dominance scorer; gated by CF_VOICE_DIMENSIONAL
- trajectory.py: rolling buffer for arousal/valence deltas, trend detection
(escalating/suppressed/stable), coherence scoring, suppression/reframe flags
- telephony.py: TelephonyBackend Protocol + MockTelephonyBackend + SignalWireBackend
+ FreeSWITCHBackend; CallSession dataclass; make_telephony() factory
- app.py: FastAPI service (port 8007) — /health + /classify; accepts base64 PCM
chunks, returns full AudioEventOut including dimensional/prosody/accent fields
- prefs.py: voice preference helpers (elcor_mode, confidence_threshold,
whisper_model, elcor_prior_frames); cf-core and env-var fallback
Tests: fix stale tests (YAMNetAcousticBackend → ASTAcousticBackend, scene field
added to AcousticResult, speaker_at gap now resolves dominant speaker not UNKNOWN,
make_io real path returns MicVoiceIO when sounddevice installed). 78 tests passing.
Closes #2 , #3 .
2026-04-18 22:36:58 -07:00
pyr0ball
335d51f02f
feat: lock ToneEvent SSE wire format (cf-core#40)
...
- AudioEvent: add speaker_id field (was on VoiceFrame only; needed on all events)
- ToneEvent: add session_id field for session correlation across embedded consumers
- README: full wire format documentation — JSON shape, field reference table,
SSE envelope, Elcor mode subtext table, module license map
- ToneEvent docstring references cf-core#40 as the wire format spec
Closes cf-core#40
2026-04-06 17:51:09 -07:00
pyr0ball
fed6388b99
feat: real inference pipeline — STT, tone classifier, diarization, mic capture
...
- cf_voice/stt.py: WhisperSTT async wrapper (faster-whisper, thread-pool executor,
rolling 50-word session prompt for cross-chunk context continuity)
- cf_voice/classify.py: ToneClassifier — wav2vec2 SER + librosa prosody flags
(energy, ZCR speech rate, YIN pitch contour) mapped to AFFECT_LABELS
- cf_voice/diarize.py: Diarizer async wrapper around pyannote/speaker-diarization-3.1;
speaker_at() helper for Navigation v0.2.x wiring
- cf_voice/capture.py: MicVoiceIO — sounddevice 16kHz mono capture, 2s window
accumulation, parallel STT+classify tasks, shift_magnitude from confidence delta
- cf_voice/io.py: make_io() now returns MicVoiceIO when CF_VOICE_MOCK is unset
- cf_voice/context.py: classify_chunk() split into mock/real paths; real path
decodes base64 PCM and runs ToneClassifier synchronously (cf-orch endpoint)
- pyproject.toml: inference extras expanded (faster-whisper, sounddevice,
librosa, python-dotenv)
- .env.example: HF_TOKEN, CF_VOICE_WHISPER_MODEL, CF_VOICE_DEVICE, CF_VOICE_MOCK,
CF_VOICE_CONFIDENCE_THRESHOLD
Prior art ported from: Plex-Scripts/transcription/diarization.py (pyannote
setup), devl/ogma/backend/speech/transcription_engine.py (faster-whisper
preprocessing and session prompt pattern).
2026-04-06 17:33:51 -07:00
pyr0ball
6e17da9e93
feat: AudioEvent models, classify_chunk() for per-chunk request-response path
...
- events.py: AudioEvent dataclass + ToneEvent with affect, shift_magnitude,
shift_direction, prosody_flags; make_subtext() for generic/Elcor formats
- context.py: classify_chunk(audio_b64, timestamp, prior_frames, elcor)
returns list[AudioEvent]; mock mode uses MockVoiceIO RNG, real raises NotImplementedError
- ToneEvent.__post_init__ pins event_type='tone' (avoids MRO default-field ordering bug)
- Elcor mode: same classifier output, Elcor speech-prefix wording; all tiers
2026-04-06 16:53:10 -07:00
pyr0ball
35fc0a088c
feat: initial cf-voice stub — VoiceFrame API, mock IO, context classifier
...
- VoiceFrame dataclass: label, confidence, speaker_id, shift_magnitude, timestamp
- MockVoiceIO: async generator of synthetic frames on a timer (CF_VOICE_MOCK=1)
- ContextClassifier: passthrough stub wrapping VoiceIO; _enrich() hook for real classifiers
- make_io() factory: mock mode auto-detected from env, raises NotImplementedError for real audio
- cf-voice-demo CLI entry point for quick smoke-testing
- 12 tests passing; editable install via pip install -e ../cf-voice
2026-04-06 16:03:07 -07:00