feat: YAMNet acoustic event classifier — queue/environ/speaker type #2

New issue

Closed

opened 2026-04-12 10:56:21 -07:00 by pyr0ball · 0 comments

pyr0ball commented

2026-04-12 10:56:21 -07:00

Owner

Context

Linnet#5. The YAMNetAcousticBackend is currently a NotImplementedError stub in acoustic.py. This issue covers the real implementation.

Requirements

Load google/yamnet via TensorFlow Hub (or PyTorch port)
Map YAMNet class outputs to cf-voice event buckets:
- queue — hold music, elevator music, phone beep, keypad tone
- environ — indoor room, outdoor, vehicle interior, crowd
- speaker — speech (single), speech (crowd), silence
classify_window(audio_bytes, timestamp) returns AcousticResult(queue, environ, speaker)
Graceful degradation: if YAMNet unavailable (no TF), fall back to MockAcousticBackend
CF_VOICE_ACOUSTIC=1 env var opt-in (default off until model is confirmed on Heimdall)

Label expansion for linnet#20

Once base YAMNet works, extend _YAMNET_MAP with:

Birdsong, traffic, street crossing, rain, background voices
These feed the acoustic fingerprinting / privacy risk scorer.

Tracking

Linnet#5 (base), Linnet#20 (privacy extension)

## Context Linnet#5. The `YAMNetAcousticBackend` is currently a `NotImplementedError` stub in `acoustic.py`. This issue covers the real implementation. ## Requirements - Load `google/yamnet` via TensorFlow Hub (or PyTorch port) - Map YAMNet class outputs to cf-voice event buckets: - `queue` — hold music, elevator music, phone beep, keypad tone - `environ` — indoor room, outdoor, vehicle interior, crowd - `speaker` — speech (single), speech (crowd), silence - `classify_window(audio_bytes, timestamp)` returns `AcousticResult(queue, environ, speaker)` - Graceful degradation: if YAMNet unavailable (no TF), fall back to `MockAcousticBackend` - `CF_VOICE_ACOUSTIC=1` env var opt-in (default off until model is confirmed on Heimdall) ## Label expansion for linnet#20 Once base YAMNet works, extend `_YAMNET_MAP` with: - Birdsong, traffic, street crossing, rain, background voices These feed the acoustic fingerprinting / privacy risk scorer. ## Tracking Linnet#5 (base), Linnet#20 (privacy extension)