avocet

Author	SHA1	Message	Date
pyr0ball	753f8f5def	fix(avocet): use_reentrant=False for gradient checkpointing Reentrant gradient checkpointing (the default) conflicts with Accelerate's gradient accumulation context manager -- causes 'backward through graph a second time' on the first training step. use_reentrant=False uses the non-reentrant autograd hook path which is compatible with Accelerate >= 0.27.	2026-03-15 17:23:40 -07:00
pyr0ball	5dee23f53c	fix(avocet): reduce deberta-small VRAM + auto-select freest GPU for training - deberta-small: batch_size 16→8 + grad_accum 1→2 (same effective batch), gradient_checkpointing=True (fp16 stays off: DeBERTa v3 disentangled attention overflows fp16 at the gather step) - api: _best_cuda_device() picks highest free-VRAM GPU via nvidia-smi; sets CUDA_VISIBLE_DEVICES in subprocess env to prevent DataParallel replication across both GPUs; adds PYTORCH_ALLOC_CONF=expandable_segments - SSE log now reports which GPU was selected	2026-03-15 17:09:06 -07:00
pyr0ball	64fd19a7b6	fix(avocet): move TorchDataset import to top; split sample_count into total+train	2026-03-15 16:02:43 -07:00
pyr0ball	8ba34bb2d1	feat(avocet): run_finetune, CLI, multi-score-file merge with last-write-wins dedup - load_and_prepare_data() now accepts Path \| list[Path]; single-Path callers unchanged - Dedup by MD5(subject + body[:100]); last file/row wins (lets later runs correct labels) - Prints summary line when duplicates are dropped - Added _EmailDataset (TorchDataset wrapper), run_finetune(), and argparse CLI - run_finetune() saves model + tokenizer + training_info.json with score_files provenance - Stratified split guard: val set size clamped to at least n_classes (handles tiny example data) - 3 new unit tests (merge, last-write-wins dedup, single-Path compat) + 1 integration test - All 16 tests pass (15 unit + 1 integration)	2026-03-15 15:52:41 -07:00
pyr0ball	5eb593569d	feat(avocet): add finetune data pipeline, class weights, WeightedTrainer Implements load_and_prepare_data (JSONL ingestion with class filtering), compute_class_weights (inverse-frequency, div-by-zero safe), compute_metrics_for_trainer (macro F1 + accuracy), and WeightedTrainer.compute_loss (**kwargs-safe for Transformers 4.38+ num_items_in_batch). All 12 tests pass.	2026-03-15 15:38:45 -07:00

5 commits