feat: voice/TTS finetuning backend (Phase 2) #53

Open
opened 2026-05-01 12:13:46 -07:00 by pyr0ball · 0 comments
Owner

Context

Avocet benchmarks voice models (whisper, chatterbox). This adds finetuning to adapt them to specific voices/accents/domains.

Work

  • scripts/finetune_voice.py — covers:
    • Whisper finetuning via HuggingFace (audio → transcript pairs)
    • Chatterbox LoRA (voice cloning / style adaptation)
  • Register as voice job type in train queue
  • Audio preprocessing pipeline (resample, normalize, segment)

Depends on

  • #43 (train job queue)
## Context Avocet benchmarks voice models (whisper, chatterbox). This adds finetuning to adapt them to specific voices/accents/domains. ## Work - `scripts/finetune_voice.py` — covers: - Whisper finetuning via HuggingFace (audio → transcript pairs) - Chatterbox LoRA (voice cloning / style adaptation) - Register as `voice` job type in train queue - Audio preprocessing pipeline (resample, normalize, segment) ## Depends on - #43 (train job queue)
pyr0ball added this to the v2 — Pipeline Architecture milestone 2026-05-01 12:13:46 -07:00
pyr0ball added the
ml
phase-2
backend
labels 2026-05-01 12:13:46 -07:00
Sign in to join this conversation.
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Circuit-Forge/avocet#53
No description provided.