Parallel tone classifiers: audio + text with timestamp sync for semantic divergence detection #22

Open
opened 2026-04-12 17:42:11 -07:00 by pyr0ball · 0 comments
Owner

Concept

Run two tone classifiers in parallel, synced by timestamp, and detect divergence between audio affect and text/linguistic affect. The divergence itself is the semantic signal.

Architecture

AudioToneClassifier (existing): wav2vec2/emotion — classifies prosody, energy, affect from raw audio

TextToneClassifier (new): runs on Whisper transcript text — lightweight sentiment/emotion model (or fast local LLM) on the text content

ToneSyncAnalyzer (new): combines both streams with timestamp alignment, maps divergence patterns to semantic modifiers:

Audio tone Text tone Divergence signal
calm / flat distressed / urgent language emotional suppression / masking
rising / warm neutral content emphasis / enthusiasm
flat / monotone hyperbolic or positive phrasing sarcasm / irony
urgent / tense hedged / softened language passive aggression / people-pleasing
aligned aligned literal / confident communication

Why this matters for Elcor

Currently Elcor annotation is "what does the audio feel like." With divergence detection it becomes "what does this mean given the gap between what is said and how it is said" — much stronger for ND use cases where tonal subtext is the hard part.

Implementation notes

  • Text classifier runs on the same 1-2s windows as audio, using the Whisper transcript from that window
  • Requires Whisper output to be reliable enough to classify (accuracy gating: skip if STT confidence low)
  • Combiner emits a new DivergenceEvent with audio_tone, text_tone, divergence_type, confidence
  • Divergence type becomes an input to the Elcor prefix generator
  • Could start with a simple sentiment lexicon / rule-based text classifier before graduating to a model

Dependencies

  • Whisper accuracy improvements (in progress)
  • cf-voice context.py classify pipeline
  • Elcor annotation layer
## Concept Run two tone classifiers in parallel, synced by timestamp, and detect divergence between audio affect and text/linguistic affect. The divergence itself is the semantic signal. ## Architecture **AudioToneClassifier** (existing): wav2vec2/emotion — classifies prosody, energy, affect from raw audio **TextToneClassifier** (new): runs on Whisper transcript text — lightweight sentiment/emotion model (or fast local LLM) on the text content **ToneSyncAnalyzer** (new): combines both streams with timestamp alignment, maps divergence patterns to semantic modifiers: | Audio tone | Text tone | Divergence signal | |---|---|---| | calm / flat | distressed / urgent language | emotional suppression / masking | | rising / warm | neutral content | emphasis / enthusiasm | | flat / monotone | hyperbolic or positive phrasing | sarcasm / irony | | urgent / tense | hedged / softened language | passive aggression / people-pleasing | | aligned | aligned | literal / confident communication | ## Why this matters for Elcor Currently Elcor annotation is "what does the audio feel like." With divergence detection it becomes "what does this *mean* given the gap between what is said and how it is said" — much stronger for ND use cases where tonal subtext is the hard part. ## Implementation notes - Text classifier runs on the same 1-2s windows as audio, using the Whisper transcript from that window - Requires Whisper output to be reliable enough to classify (accuracy gating: skip if STT confidence low) - Combiner emits a new `DivergenceEvent` with `audio_tone`, `text_tone`, `divergence_type`, `confidence` - Divergence type becomes an input to the Elcor prefix generator - Could start with a simple sentiment lexicon / rule-based text classifier before graduating to a model ## Dependencies - Whisper accuracy improvements (in progress) - cf-voice context.py classify pipeline - Elcor annotation layer
pyr0ball added this to the Navigation — v0.2.x milestone 2026-04-17 11:56:15 -07:00
Sign in to join this conversation.
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Circuit-Forge/linnet#22
No description provided.