cf-voice/cf_voice/models.py
pyr0ball 24f04b67db feat: full voice pipeline — AST acoustic, accent, privacy, prosody, dimensional, trajectory, telephony, FastAPI app
New modules shipped (from Linnet integration):
- acoustic.py: AST (MIT/ast-finetuned-audioset-10-10-0.4593) replaces YAMNet stub;
  527 AudioSet classes mapped to queue/speaker/environ/scene labels; _LABEL_MAP
  includes hold_music, ringback, DTMF, background_shift, AMD signal chain
- accent.py: facebook/mms-lid-126 language ID → regional accent labels
  (en_gb, en_us, en_au, fr, es, de, zh, …); lazy-loaded, gated by CF_VOICE_ACCENT
- privacy.py: compound privacy risk scorer — public_env, background_voices,
  nature scene, accent signals; returns 0–3 score without storing any audio
- prosody.py: openSMILE-backed prosody extractor (sarcasm_risk, flat_f0_score,
  speech_rate, pitch_range); mock mode returns neutral values
- dimensional.py: audeering/wav2vec2-large-robust-12-ft-emotion-msp-dim
  valence/arousal/dominance scorer; gated by CF_VOICE_DIMENSIONAL
- trajectory.py: rolling buffer for arousal/valence deltas, trend detection
  (escalating/suppressed/stable), coherence scoring, suppression/reframe flags
- telephony.py: TelephonyBackend Protocol + MockTelephonyBackend + SignalWireBackend
  + FreeSWITCHBackend; CallSession dataclass; make_telephony() factory
- app.py: FastAPI service (port 8007) — /health + /classify; accepts base64 PCM
  chunks, returns full AudioEventOut including dimensional/prosody/accent fields
- prefs.py: voice preference helpers (elcor_mode, confidence_threshold,
  whisper_model, elcor_prior_frames); cf-core and env-var fallback

Tests: fix stale tests (YAMNetAcousticBackend → ASTAcousticBackend, scene field
added to AcousticResult, speaker_at gap now resolves dominant speaker not UNKNOWN,
make_io real path returns MicVoiceIO when sounddevice installed). 78 tests passing.

Closes #2, #3.
2026-04-18 22:36:58 -07:00

63 lines
2.6 KiB
Python

# cf_voice/models.py — VoiceFrame API contract
#
# This module is MIT licensed. All consumers (Linnet, Osprey, etc.)
# import VoiceFrame from here so the shape is consistent across the stack.
from __future__ import annotations
from dataclasses import dataclass, field
@dataclass
class VoiceFrame:
"""
A single annotated moment in a voice stream.
Produced by cf_voice.io (audio capture) and enriched by cf_voice.context
(tone classification, speaker diarization, dimensional emotion).
Fields
------
label Tone annotation, e.g. "Warmly impatient" or "Deflecting".
Generic by default; Elcor-style prefix format is an
easter egg surfaced by the product UI, not set here.
confidence 0.0-1.0. Below ~0.5 the annotation is speculative.
speaker_id Ephemeral local label ("speaker_a", "speaker_b").
Not tied to identity — resets each session.
shift_magnitude Delta from the previous frame's tone, 0.0-1.0.
High values indicate a meaningful register shift.
timestamp Session-relative seconds since capture started.
Dimensional emotion (audeering model — Navigation v0.2.x, optional):
valence 0.0-1.0. Negative affect (0) to positive affect (1).
arousal 0.0-1.0. Low energy / calm (0) to high energy / excited (1).
dominance 0.0-1.0. Submissive / uncertain (0) to assertive / dominant (1).
Prosodic features (openSMILE eGeMAPS — Navigation v0.2.x, optional):
sarcasm_risk 0.0-1.0 heuristic score: flat F0 + calm-positive VAD +
text divergence (linnet#22). All three signals required for
high confidence — audio-only signals are weak priors.
flat_f0_score Normalised F0 flatness: 1.0 = maximally flat pitch.
"""
label: str
confidence: float
speaker_id: str
shift_magnitude: float
timestamp: float
# Dimensional emotion scores — None when dimensional classifier is disabled
valence: float | None = None
arousal: float | None = None
dominance: float | None = None
# Prosodic signals — None when prosodic extractor is disabled
sarcasm_risk: float | None = None
flat_f0_score: float | None = None
def is_reliable(self, threshold: float = 0.6) -> bool:
"""Return True when confidence meets the given threshold."""
return self.confidence >= threshold
def is_shift(self, threshold: float = 0.3) -> bool:
"""Return True when shift_magnitude indicates a meaningful register change."""
return self.shift_magnitude >= threshold