feat: LLM SFT finetuning backend (TRL + PEFT/LoRA) #46

Closed
opened 2026-05-01 12:12:32 -07:00 by pyr0ball · 1 comment
Owner

Context

The SFT corrections pipeline (candidates → review → export JSONL) already exists. This closes the loop by adding the training backend so corrections actually train models.

Work

  • Create scripts/finetune_sft.py using TRL SFTTrainer + PEFT LoRA
  • Inputs: corrections JSONL export ({prompt, completion} pairs), base model HF id or local path
  • Outputs: LoRA adapter saved to models/ directory, training_info.json (same schema as classifier)
  • Register as llm-sft job type in the train job queue (#43)
  • Config: r, lora_alpha, target_modules, epochs, batch_size via job config_json
  • GPU: uses _best_cuda_device() pattern (highest free VRAM via nvidia-smi)
  • Add to environment.yml: trl, peft

Acceptance

  • Can finetune a 7B model in 4-bit via CF_TEXT_4BIT=1 equivalent for training
  • LoRA adapter loads via classifier_adapters.py FineTunedAdapter or new LoRAAdapter
  • Job runs end-to-end from queue → training → results registered
## Context The SFT corrections pipeline (candidates → review → export JSONL) already exists. This closes the loop by adding the training backend so corrections actually train models. ## Work - Create `scripts/finetune_sft.py` using TRL `SFTTrainer` + PEFT LoRA - Inputs: corrections JSONL export (`{prompt, completion}` pairs), base model HF id or local path - Outputs: LoRA adapter saved to `models/` directory, `training_info.json` (same schema as classifier) - Register as `llm-sft` job type in the train job queue (#43) - Config: `r`, `lora_alpha`, `target_modules`, `epochs`, `batch_size` via job config_json - GPU: uses `_best_cuda_device()` pattern (highest free VRAM via nvidia-smi) - Add to `environment.yml`: `trl`, `peft` ## Acceptance - Can finetune a 7B model in 4-bit via `CF_TEXT_4BIT=1` equivalent for training - LoRA adapter loads via `classifier_adapters.py` `FineTunedAdapter` or new `LoRAAdapter` - Job runs end-to-end from queue → training → results registered
pyr0ball added this to the v2 — Pipeline Architecture milestone 2026-05-01 12:12:32 -07:00
pyr0ball added the
ml
backend
labels 2026-05-01 12:12:32 -07:00
Author
Owner

Shipped in the Apr 19–May 4 sprint. LLM SFT backend using TRL + PEFT/LoRA is in app/train/train.py.

Shipped in the Apr 19–May 4 sprint. LLM SFT backend using TRL + PEFT/LoRA is in app/train/train.py.
Sign in to join this conversation.
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Circuit-Forge/avocet#46
No description provided.