cf-voice

6 commits 1 branch 0 tags 138 KiB

Author	SHA1	Message	Date
pyr0ball	77aa8513c0	docs: add Prerequisites section and MIT LICENSE Surfaces mock mode as the default starting point before any GPU path. Adds HuggingFace gated model callout (pyannote, CC BY 4.0) with individual acceptance requirement before install steps. Adds missing LICENSE file — pyproject.toml declared MIT but no LICENSE text was present.	2026-06-05 11:59:32 -07:00
pyr0ball	e6a9240e2d	feat: full voice pipeline — AST acoustic, accent, privacy, prosody, dimensional, trajectory, telephony, FastAPI app New modules shipped (from Linnet integration): - acoustic.py: AST (MIT/ast-finetuned-audioset-10-10-0.4593) replaces YAMNet stub; 527 AudioSet classes mapped to queue/speaker/environ/scene labels; _LABEL_MAP includes hold_music, ringback, DTMF, background_shift, AMD signal chain - accent.py: facebook/mms-lid-126 language ID → regional accent labels (en_gb, en_us, en_au, fr, es, de, zh, …); lazy-loaded, gated by CF_VOICE_ACCENT - privacy.py: compound privacy risk scorer — public_env, background_voices, nature scene, accent signals; returns 0–3 score without storing any audio - prosody.py: openSMILE-backed prosody extractor (sarcasm_risk, flat_f0_score, speech_rate, pitch_range); mock mode returns neutral values - dimensional.py: audeering/wav2vec2-large-robust-12-ft-emotion-msp-dim valence/arousal/dominance scorer; gated by CF_VOICE_DIMENSIONAL - trajectory.py: rolling buffer for arousal/valence deltas, trend detection (escalating/suppressed/stable), coherence scoring, suppression/reframe flags - telephony.py: TelephonyBackend Protocol + MockTelephonyBackend + SignalWireBackend + FreeSWITCHBackend; CallSession dataclass; make_telephony() factory - app.py: FastAPI service (port 8007) — /health + /classify; accepts base64 PCM chunks, returns full AudioEventOut including dimensional/prosody/accent fields - prefs.py: voice preference helpers (elcor_mode, confidence_threshold, whisper_model, elcor_prior_frames); cf-core and env-var fallback Tests: fix stale tests (YAMNetAcousticBackend → ASTAcousticBackend, scene field added to AcousticResult, speaker_at gap now resolves dominant speaker not UNKNOWN, make_io real path returns MicVoiceIO when sounddevice installed). 78 tests passing. Closes #2, #3.	2026-04-18 22:36:58 -07:00
pyr0ball	2a23ba520f	feat: lock ToneEvent SSE wire format (cf-core#40) - AudioEvent: add speaker_id field (was on VoiceFrame only; needed on all events) - ToneEvent: add session_id field for session correlation across embedded consumers - README: full wire format documentation — JSON shape, field reference table, SSE envelope, Elcor mode subtext table, module license map - ToneEvent docstring references cf-core#40 as the wire format spec Closes cf-core#40	2026-04-06 17:51:09 -07:00
pyr0ball	185b849668	feat: real inference pipeline — STT, tone classifier, diarization, mic capture - cf_voice/stt.py: WhisperSTT async wrapper (faster-whisper, thread-pool executor, rolling 50-word session prompt for cross-chunk context continuity) - cf_voice/classify.py: ToneClassifier — wav2vec2 SER + librosa prosody flags (energy, ZCR speech rate, YIN pitch contour) mapped to AFFECT_LABELS - cf_voice/diarize.py: Diarizer async wrapper around pyannote/speaker-diarization-3.1; speaker_at() helper for Navigation v0.2.x wiring - cf_voice/capture.py: MicVoiceIO — sounddevice 16kHz mono capture, 2s window accumulation, parallel STT+classify tasks, shift_magnitude from confidence delta - cf_voice/io.py: make_io() now returns MicVoiceIO when CF_VOICE_MOCK is unset - cf_voice/context.py: classify_chunk() split into mock/real paths; real path decodes base64 PCM and runs ToneClassifier synchronously (cf-orch endpoint) - pyproject.toml: inference extras expanded (faster-whisper, sounddevice, librosa, python-dotenv) - .env.example: HF_TOKEN, CF_VOICE_WHISPER_MODEL, CF_VOICE_DEVICE, CF_VOICE_MOCK, CF_VOICE_CONFIDENCE_THRESHOLD Prior art ported from: Plex-Scripts/transcription/diarization.py (pyannote setup), devl/ogma/backend/speech/transcription_engine.py (faster-whisper preprocessing and session prompt pattern).	2026-04-06 17:33:51 -07:00
pyr0ball	fa72fa4c8f	feat: AudioEvent models, classify_chunk() for per-chunk request-response path - events.py: AudioEvent dataclass + ToneEvent with affect, shift_magnitude, shift_direction, prosody_flags; make_subtext() for generic/Elcor formats - context.py: classify_chunk(audio_b64, timestamp, prior_frames, elcor) returns list[AudioEvent]; mock mode uses MockVoiceIO RNG, real raises NotImplementedError - ToneEvent.__post_init__ pins event_type='tone' (avoids MRO default-field ordering bug) - Elcor mode: same classifier output, Elcor speech-prefix wording; all tiers	2026-04-06 16:53:10 -07:00
pyr0ball	792c998612	feat: initial cf-voice stub — VoiceFrame API, mock IO, context classifier - VoiceFrame dataclass: label, confidence, speaker_id, shift_magnitude, timestamp - MockVoiceIO: async generator of synthetic frames on a timer (CF_VOICE_MOCK=1) - ContextClassifier: passthrough stub wrapping VoiceIO; _enrich() hook for real classifiers - make_io() factory: mock mode auto-detected from env, raises NotImplementedError for real audio - cf-voice-demo CLI entry point for quick smoke-testing - 12 tests passing; editable install via pip install -e ../cf-voice	2026-04-06 16:03:07 -07:00