Parallel tone classifiers: audio + text with timestamp sync for semantic divergence detection #22
Labels
No labels
a11y
backlog
blocked
bug
cf-core-dep
design
enhancement
infrastructure
internal
privacy
tier:free
tier:paid
ux
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: Circuit-Forge/linnet#22
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Concept
Run two tone classifiers in parallel, synced by timestamp, and detect divergence between audio affect and text/linguistic affect. The divergence itself is the semantic signal.
Architecture
AudioToneClassifier (existing): wav2vec2/emotion — classifies prosody, energy, affect from raw audio
TextToneClassifier (new): runs on Whisper transcript text — lightweight sentiment/emotion model (or fast local LLM) on the text content
ToneSyncAnalyzer (new): combines both streams with timestamp alignment, maps divergence patterns to semantic modifiers:
Why this matters for Elcor
Currently Elcor annotation is "what does the audio feel like." With divergence detection it becomes "what does this mean given the gap between what is said and how it is said" — much stronger for ND use cases where tonal subtext is the hard part.
Implementation notes
DivergenceEventwithaudio_tone,text_tone,divergence_type,confidenceDependencies