feat: fine-tune pipeline for writing voice model #37

New issue

Open

opened 2026-04-22 07:06:45 -07:00 by pyr0ball · 0 comments

pyr0ball commented

2026-04-22 07:06:45 -07:00

Owner

Summary

Fine-tune the top-ranked model from the voice benchmark (#prev) on Alan's writing corpus to produce a voice-matched local model for Magpie draft generation.

Depends on

Voice benchmark ticket (run first, identify base model)

Approach

Dataset prep

Format voice corpus as instruction-tuning pairs:
- Input: thread title + thread body + signal reason
- Output: reply in Alan's voice
Augment with rephrasing variants to avoid overfitting
Target: 200-500 training pairs

Fine-tuning

Reuse Avocet's existing fine-tune harness (scripts/finetune.py)
Method: QLoRA (4-bit quantized LoRA) -- fits on single RTX4000 8GB
Base: winner from benchmark (likely Mistral-7B or similar)
Epochs: 3, eval every 50 steps
Output: merged GGUF at Q4_K_M for cf-orch serving

Eval

Blind comparison: fine-tuned vs base model on held-out thread samples
Pass criteria: human eval prefers fine-tuned output 70%+ of the time

Output

models/voice-v1.gguf (gitignored, stored in /devl/models/)
Model card: docs/voice-model-v1.md (training params, eval results, known quirks)
cf-orch registration: add to models inventory so Magpie can route to it

## Summary Fine-tune the top-ranked model from the voice benchmark (#prev) on Alan's writing corpus to produce a voice-matched local model for Magpie draft generation. ## Depends on - Voice benchmark ticket (run first, identify base model) ## Approach ### Dataset prep - Format voice corpus as instruction-tuning pairs: - Input: thread title + thread body + signal reason - Output: reply in Alan's voice - Augment with rephrasing variants to avoid overfitting - Target: 200-500 training pairs ### Fine-tuning - Reuse Avocet's existing fine-tune harness (`scripts/finetune.py`) - Method: QLoRA (4-bit quantized LoRA) -- fits on single RTX4000 8GB - Base: winner from benchmark (likely Mistral-7B or similar) - Epochs: 3, eval every 50 steps - Output: merged GGUF at Q4_K_M for cf-orch serving ### Eval - Blind comparison: fine-tuned vs base model on held-out thread samples - Pass criteria: human eval prefers fine-tuned output 70%+ of the time ## Output - `models/voice-v1.gguf` (gitignored, stored in /devl/models/) - Model card: `docs/voice-model-v1.md` (training params, eval results, known quirks) - cf-orch registration: add to models inventory so Magpie can route to it