New modules shipped (from Linnet integration): - acoustic.py: AST (MIT/ast-finetuned-audioset-10-10-0.4593) replaces YAMNet stub; 527 AudioSet classes mapped to queue/speaker/environ/scene labels; _LABEL_MAP includes hold_music, ringback, DTMF, background_shift, AMD signal chain - accent.py: facebook/mms-lid-126 language ID → regional accent labels (en_gb, en_us, en_au, fr, es, de, zh, …); lazy-loaded, gated by CF_VOICE_ACCENT - privacy.py: compound privacy risk scorer — public_env, background_voices, nature scene, accent signals; returns 0–3 score without storing any audio - prosody.py: openSMILE-backed prosody extractor (sarcasm_risk, flat_f0_score, speech_rate, pitch_range); mock mode returns neutral values - dimensional.py: audeering/wav2vec2-large-robust-12-ft-emotion-msp-dim valence/arousal/dominance scorer; gated by CF_VOICE_DIMENSIONAL - trajectory.py: rolling buffer for arousal/valence deltas, trend detection (escalating/suppressed/stable), coherence scoring, suppression/reframe flags - telephony.py: TelephonyBackend Protocol + MockTelephonyBackend + SignalWireBackend + FreeSWITCHBackend; CallSession dataclass; make_telephony() factory - app.py: FastAPI service (port 8007) — /health + /classify; accepts base64 PCM chunks, returns full AudioEventOut including dimensional/prosody/accent fields - prefs.py: voice preference helpers (elcor_mode, confidence_threshold, whisper_model, elcor_prior_frames); cf-core and env-var fallback Tests: fix stale tests (YAMNetAcousticBackend → ASTAcousticBackend, scene field added to AcousticResult, speaker_at gap now resolves dominant speaker not UNKNOWN, make_io real path returns MicVoiceIO when sounddevice installed). 78 tests passing. Closes #2, #3.
59 lines
3.6 KiB
Text
59 lines
3.6 KiB
Text
# cf-voice environment — copy to .env and fill in values
|
|
# cf-voice itself does not auto-load .env; consumers (Linnet, Osprey, etc.)
|
|
# load it via python-dotenv in their own startup. For standalone cf-voice
|
|
# dev/testing, source this file manually or install python-dotenv.
|
|
|
|
# ── HuggingFace — free tier / local use ──────────────────────────────────────
|
|
# Used by the local diarization path (free tier, user's own machine).
|
|
# Each user must:
|
|
# 1. Create a free account at huggingface.co
|
|
# 2. Accept the gated model terms at:
|
|
# https://huggingface.co/pyannote/speaker-diarization-3.1
|
|
# https://huggingface.co/pyannote/segmentation-3.0
|
|
# 3. Generate a read token at huggingface.co/settings/tokens
|
|
HF_TOKEN=
|
|
|
|
# ── HuggingFace — paid tier / cf-orch backend ─────────────────────────────────
|
|
# Used by cf-orch when running diarization as a managed service on Heimdall.
|
|
# This is a CircuitForge org token — NOT the user's personal token.
|
|
#
|
|
# Prerequisites (one-time, manual — tracked in circuitforge-orch#27):
|
|
# 1. Create CircuitForge org on huggingface.co
|
|
# 2. Accept pyannote/speaker-diarization-3.1 terms under the org account
|
|
# 3. Accept pyannote/segmentation-3.0 terms under the org account
|
|
# 4. Generate a read-only org token and set it here
|
|
#
|
|
# Leave blank on local installs — HF_TOKEN above is used instead.
|
|
CF_HF_TOKEN=
|
|
|
|
# ── Whisper STT ───────────────────────────────────────────────────────────────
|
|
# Model size: tiny | base | small | medium | large-v2 | large-v3
|
|
# Smaller = faster / less VRAM; larger = more accurate.
|
|
# Recommended: small (500MB VRAM) for real-time use.
|
|
CF_VOICE_WHISPER_MODEL=small
|
|
|
|
# ── Compute ───────────────────────────────────────────────────────────────────
|
|
# auto (detect GPU), cuda, cpu
|
|
CF_VOICE_DEVICE=auto
|
|
|
|
# ── Mock mode ─────────────────────────────────────────────────────────────────
|
|
# Set to 1 to use synthetic VoiceFrames — no GPU, mic, or HF token required.
|
|
# Unset or 0 for real audio capture.
|
|
CF_VOICE_MOCK=
|
|
|
|
# ── Tone classifier ───────────────────────────────────────────────────────────
|
|
# Minimum confidence to emit a VoiceFrame (below this = frame skipped).
|
|
CF_VOICE_CONFIDENCE_THRESHOLD=0.55
|
|
|
|
# ── Elcor annotation mode ─────────────────────────────────────────────────────
|
|
# Accessibility feature for autistic and ND users. Switches tone subtext from
|
|
# generic format ("Tone: Frustrated") to Elcor-style prefix format
|
|
# ("With barely concealed frustration:"). Opt-in, local-only.
|
|
# Overridden by cf-core preferences store when circuitforge_core is installed.
|
|
# 1 = enabled, 0 or unset = disabled (default).
|
|
CF_VOICE_ELCOR=0
|
|
|
|
# Number of prior VoiceFrames to include as context for Elcor label generation.
|
|
# Larger windows = more contextually aware annotations, higher LLM prompt cost.
|
|
# Default: 4 frames (~10 seconds of rolling context at 2.5s intervals).
|
|
CF_VOICE_ELCOR_PRIOR_FRAMES=4
|