eval: ARK-ASR-0.6B as lightweight CPU-capable ASR backend (Linnet students use case) #6
Labels
No labels
a11y
acoustic
backlog
bug
cf-core-dep
diarization
enhancement
inference
privacy
stt
testing
tier:paid
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: Circuit-Forge/cf-voice#6
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Source: https://huggingface.co/AutoArk-AI/ARK-ASR-0.6B
What it is
ARK-ASR-0.6B is a 0.6B parameter ASR model using a Whisper-style encoder + RoPE + MLP adapter + Qwen2 decoder. Trained with teacher-data adaptation and online policy distillation (OPD).
Why this matters for cf-voice / Linnet
Linnet delegates its full ASR pipeline to cf-voice (
requirements.txtpulls cf-voice from Forgejo). The primary driver here is students using Linnet as a tone/context aid in lectures and group discussions.Student context:
At 0.6B params (half of Whisper Large v3 at 1.5B), this fits comfortably in RAM and runs on CPU at usable speeds.
Integration approach
Add as a named backend in cf-voice alongside the existing pyannote pipeline:
Security flag — action required before shipping
The model requires
trust_remote_code=True, meaning Qwen2 decoder injection code from the HuggingFace repo runs at load time. Before this backend ships to end users:AutoArk-AI/ARK-ASR-0.6Bcustom modeling code@mainComparison to alternatives
ARK-ASR-0.6B is the best fit for the free-tier / no-GPU / student path. cohere-transcribe-diarize is better for the paid/GPU path (adds diarization too — see cf-voice#5).