feat: LLM SFT finetuning backend (TRL + PEFT/LoRA) #46

New issue

Closed

opened 2026-05-01 12:12:32 -07:00 by pyr0ball · 1 comment

pyr0ball commented

2026-05-01 12:12:32 -07:00

Owner

Context

The SFT corrections pipeline (candidates → review → export JSONL) already exists. This closes the loop by adding the training backend so corrections actually train models.

Work

Create scripts/finetune_sft.py using TRL SFTTrainer + PEFT LoRA
Inputs: corrections JSONL export ({prompt, completion} pairs), base model HF id or local path
Outputs: LoRA adapter saved to models/ directory, training_info.json (same schema as classifier)
Register as llm-sft job type in the train job queue (#43)
Config: r, lora_alpha, target_modules, epochs, batch_size via job config_json
GPU: uses _best_cuda_device() pattern (highest free VRAM via nvidia-smi)
Add to environment.yml: trl, peft

Acceptance

Can finetune a 7B model in 4-bit via CF_TEXT_4BIT=1 equivalent for training
LoRA adapter loads via classifier_adapters.py FineTunedAdapter or new LoRAAdapter
Job runs end-to-end from queue → training → results registered

## Context The SFT corrections pipeline (candidates → review → export JSONL) already exists. This closes the loop by adding the training backend so corrections actually train models. ## Work - Create `scripts/finetune_sft.py` using TRL `SFTTrainer` + PEFT LoRA - Inputs: corrections JSONL export (`{prompt, completion}` pairs), base model HF id or local path - Outputs: LoRA adapter saved to `models/` directory, `training_info.json` (same schema as classifier) - Register as `llm-sft` job type in the train job queue (#43) - Config: `r`, `lora_alpha`, `target_modules`, `epochs`, `batch_size` via job config_json - GPU: uses `_best_cuda_device()` pattern (highest free VRAM via nvidia-smi) - Add to `environment.yml`: `trl`, `peft` ## Acceptance - Can finetune a 7B model in 4-bit via `CF_TEXT_4BIT=1` equivalent for training - LoRA adapter loads via `classifier_adapters.py` `FineTunedAdapter` or new `LoRAAdapter` - Job runs end-to-end from queue → training → results registered