cf-voice/scripts/test_classify_e2e.py
pyr0ball 24f04b67db feat: full voice pipeline — AST acoustic, accent, privacy, prosody, dimensional, trajectory, telephony, FastAPI app
New modules shipped (from Linnet integration):
- acoustic.py: AST (MIT/ast-finetuned-audioset-10-10-0.4593) replaces YAMNet stub;
  527 AudioSet classes mapped to queue/speaker/environ/scene labels; _LABEL_MAP
  includes hold_music, ringback, DTMF, background_shift, AMD signal chain
- accent.py: facebook/mms-lid-126 language ID → regional accent labels
  (en_gb, en_us, en_au, fr, es, de, zh, …); lazy-loaded, gated by CF_VOICE_ACCENT
- privacy.py: compound privacy risk scorer — public_env, background_voices,
  nature scene, accent signals; returns 0–3 score without storing any audio
- prosody.py: openSMILE-backed prosody extractor (sarcasm_risk, flat_f0_score,
  speech_rate, pitch_range); mock mode returns neutral values
- dimensional.py: audeering/wav2vec2-large-robust-12-ft-emotion-msp-dim
  valence/arousal/dominance scorer; gated by CF_VOICE_DIMENSIONAL
- trajectory.py: rolling buffer for arousal/valence deltas, trend detection
  (escalating/suppressed/stable), coherence scoring, suppression/reframe flags
- telephony.py: TelephonyBackend Protocol + MockTelephonyBackend + SignalWireBackend
  + FreeSWITCHBackend; CallSession dataclass; make_telephony() factory
- app.py: FastAPI service (port 8007) — /health + /classify; accepts base64 PCM
  chunks, returns full AudioEventOut including dimensional/prosody/accent fields
- prefs.py: voice preference helpers (elcor_mode, confidence_threshold,
  whisper_model, elcor_prior_frames); cf-core and env-var fallback

Tests: fix stale tests (YAMNetAcousticBackend → ASTAcousticBackend, scene field
added to AcousticResult, speaker_at gap now resolves dominant speaker not UNKNOWN,
make_io real path returns MicVoiceIO when sounddevice installed). 78 tests passing.

Closes #2, #3.
2026-04-18 22:36:58 -07:00

69 lines
1.7 KiB
Python

"""
End-to-end integration test for the cf-voice /classify endpoint.
Extracts a 2-second window from a local media file, base64-encodes the
raw PCM, and POSTs it to the running cf-voice service at localhost:8009.
Prints each returned AudioEvent for quick inspection.
Requires:
- cf-voice running at localhost:8009 (CF_VOICE_DIARIZE=1 for speaker labels)
- ffmpeg on PATH
- A local audio/video file (edit MEDIA_FILE below)
Run:
python scripts/test_classify_e2e.py
"""
from __future__ import annotations
import base64
import json
import subprocess
import urllib.request
import numpy as np
MEDIA_FILE = "/Library/Series/Hogan's Heroes/Season 3/Hogan's Heroes - S03E19 - Hogan, Go Home.mkv"
START_S = 120
DURATION_S = 2
SAMPLE_RATE = 16_000
CF_VOICE_URL = "http://localhost:8009"
proc = subprocess.run(
[
"ffmpeg", "-i", MEDIA_FILE,
"-ss", str(START_S),
"-t", str(DURATION_S),
"-ar", str(SAMPLE_RATE),
"-ac", "1",
"-f", "s16le",
"-",
],
capture_output=True,
check=True,
)
pcm = proc.stdout
audio = np.frombuffer(pcm, dtype=np.int16)
print(f"audio samples: {len(audio)}, duration: {len(audio) / SAMPLE_RATE:.2f}s")
payload = json.dumps({
"audio_chunk": base64.b64encode(pcm).decode(),
"timestamp": float(START_S),
"session_id": "test",
}).encode()
req = urllib.request.Request(
f"{CF_VOICE_URL}/classify",
data=payload,
headers={"Content-Type": "application/json"},
method="POST",
)
with urllib.request.urlopen(req, timeout=30) as resp:
result = json.loads(resp.read())
for ev in result["events"]:
print(
f" {ev['event_type']:10}"
f" speaker_id={ev.get('speaker_id', 'N/A'):14}"
f" label={ev.get('label', '')}"
)