New modules shipped (from Linnet integration): - acoustic.py: AST (MIT/ast-finetuned-audioset-10-10-0.4593) replaces YAMNet stub; 527 AudioSet classes mapped to queue/speaker/environ/scene labels; _LABEL_MAP includes hold_music, ringback, DTMF, background_shift, AMD signal chain - accent.py: facebook/mms-lid-126 language ID → regional accent labels (en_gb, en_us, en_au, fr, es, de, zh, …); lazy-loaded, gated by CF_VOICE_ACCENT - privacy.py: compound privacy risk scorer — public_env, background_voices, nature scene, accent signals; returns 0–3 score without storing any audio - prosody.py: openSMILE-backed prosody extractor (sarcasm_risk, flat_f0_score, speech_rate, pitch_range); mock mode returns neutral values - dimensional.py: audeering/wav2vec2-large-robust-12-ft-emotion-msp-dim valence/arousal/dominance scorer; gated by CF_VOICE_DIMENSIONAL - trajectory.py: rolling buffer for arousal/valence deltas, trend detection (escalating/suppressed/stable), coherence scoring, suppression/reframe flags - telephony.py: TelephonyBackend Protocol + MockTelephonyBackend + SignalWireBackend + FreeSWITCHBackend; CallSession dataclass; make_telephony() factory - app.py: FastAPI service (port 8007) — /health + /classify; accepts base64 PCM chunks, returns full AudioEventOut including dimensional/prosody/accent fields - prefs.py: voice preference helpers (elcor_mode, confidence_threshold, whisper_model, elcor_prior_frames); cf-core and env-var fallback Tests: fix stale tests (YAMNetAcousticBackend → ASTAcousticBackend, scene field added to AcousticResult, speaker_at gap now resolves dominant speaker not UNKNOWN, make_io real path returns MicVoiceIO when sounddevice installed). 78 tests passing. Closes #2, #3.
63 lines
2.6 KiB
Python
63 lines
2.6 KiB
Python
# cf_voice/models.py — VoiceFrame API contract
|
|
#
|
|
# This module is MIT licensed. All consumers (Linnet, Osprey, etc.)
|
|
# import VoiceFrame from here so the shape is consistent across the stack.
|
|
from __future__ import annotations
|
|
|
|
from dataclasses import dataclass, field
|
|
|
|
|
|
@dataclass
|
|
class VoiceFrame:
|
|
"""
|
|
A single annotated moment in a voice stream.
|
|
|
|
Produced by cf_voice.io (audio capture) and enriched by cf_voice.context
|
|
(tone classification, speaker diarization, dimensional emotion).
|
|
|
|
Fields
|
|
------
|
|
label Tone annotation, e.g. "Warmly impatient" or "Deflecting".
|
|
Generic by default; Elcor-style prefix format is an
|
|
easter egg surfaced by the product UI, not set here.
|
|
confidence 0.0-1.0. Below ~0.5 the annotation is speculative.
|
|
speaker_id Ephemeral local label ("speaker_a", "speaker_b").
|
|
Not tied to identity — resets each session.
|
|
shift_magnitude Delta from the previous frame's tone, 0.0-1.0.
|
|
High values indicate a meaningful register shift.
|
|
timestamp Session-relative seconds since capture started.
|
|
|
|
Dimensional emotion (audeering model — Navigation v0.2.x, optional):
|
|
valence 0.0-1.0. Negative affect (0) to positive affect (1).
|
|
arousal 0.0-1.0. Low energy / calm (0) to high energy / excited (1).
|
|
dominance 0.0-1.0. Submissive / uncertain (0) to assertive / dominant (1).
|
|
|
|
Prosodic features (openSMILE eGeMAPS — Navigation v0.2.x, optional):
|
|
sarcasm_risk 0.0-1.0 heuristic score: flat F0 + calm-positive VAD +
|
|
text divergence (linnet#22). All three signals required for
|
|
high confidence — audio-only signals are weak priors.
|
|
flat_f0_score Normalised F0 flatness: 1.0 = maximally flat pitch.
|
|
"""
|
|
|
|
label: str
|
|
confidence: float
|
|
speaker_id: str
|
|
shift_magnitude: float
|
|
timestamp: float
|
|
|
|
# Dimensional emotion scores — None when dimensional classifier is disabled
|
|
valence: float | None = None
|
|
arousal: float | None = None
|
|
dominance: float | None = None
|
|
|
|
# Prosodic signals — None when prosodic extractor is disabled
|
|
sarcasm_risk: float | None = None
|
|
flat_f0_score: float | None = None
|
|
|
|
def is_reliable(self, threshold: float = 0.6) -> bool:
|
|
"""Return True when confidence meets the given threshold."""
|
|
return self.confidence >= threshold
|
|
|
|
def is_shift(self, threshold: float = 0.3) -> bool:
|
|
"""Return True when shift_magnitude indicates a meaningful register change."""
|
|
return self.shift_magnitude >= threshold
|