feat: SDH subtitle generation pipeline (Marlin + cf-stt + tone annotation → SRT/VTT) #31
Labels
No labels
a11y
backlog
blocked
bug
cf-core-dep
design
enhancement
infrastructure
internal
privacy
tier:free
tier:paid
ux
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: Circuit-Forge/linnet#31
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Add SDH (Subtitles for the Deaf and Hard of Hearing) subtitle generation to Linnet, combining three inference services into a merged, broadcast-quality subtitle output.
SDH is distinct from standard closed captions: it includes non-speech audio events (
[DOOR SLAMS],[tense music],[crowd cheering]), speaker identification, and tone/manner descriptors ([whispering],[angrily]) — exactly the bracketed annotation grammar Linnet already produces for real-time tone annotation.Pipeline
cf-video(Marlin-2B)cf-stt(Whisper)Example Output (SRT)
Scope
POST /api/subtitles/generate— accepts video file path or URL, returns job IDGET /api/subtitles/{job_id}/progressGET /api/subtitles/{job_id}/output?format=srt|vtt|assSPEAKER NAME:prefix in capsDependencies
cf-videoservice type in cf-orch (see cf-orch#71) — not yet builtcf-sttalready availableAccessibility rationale
SDH production is expensive, often outsourced, and routinely skipped on independent, community, and self-hosted content. A local pipeline producing broadcast-quality SDH from a video file is a direct accessibility win for deaf/HoH users — a primary CF audience. This is a strong product differentiator for Linnet beyond real-time chat annotation.
Model
NemoStation/Marlin-2B(candidate; see cf-orch#71)