feat: voice/TTS finetuning backend (Phase 2) #53

New issue

Open

opened 2026-05-01 12:13:46 -07:00 by pyr0ball · 0 comments

pyr0ball commented

2026-05-01 12:13:46 -07:00

Owner

Context

Avocet benchmarks voice models (whisper, chatterbox). This adds finetuning to adapt them to specific voices/accents/domains.

Work

scripts/finetune_voice.py — covers:
- Whisper finetuning via HuggingFace (audio → transcript pairs)
- Chatterbox LoRA (voice cloning / style adaptation)
Register as voice job type in train queue
Audio preprocessing pipeline (resample, normalize, segment)

Depends on

#43 (train job queue)

## Context Avocet benchmarks voice models (whisper, chatterbox). This adds finetuning to adapt them to specific voices/accents/domains. ## Work - `scripts/finetune_voice.py` — covers: - Whisper finetuning via HuggingFace (audio → transcript pairs) - Chatterbox LoRA (voice cloning / style adaptation) - Register as `voice` job type in train queue - Audio preprocessing pipeline (resample, normalize, segment) ## Depends on - #43 (train job queue)