- app/models.py: add set_cf_text_models_dir() testability seam
- tests/test_models.py: redirect _CF_TEXT_MODELS_DIR in reset_models_globals
fixture so list_installed() count tests are not polluted by real NFS models
- app/cforch.py: fix get_results() return type annotation list → dict
- tests/test_cforch.py: give _BENCH_RUNNING=True test a mock proc with
poll()=None so the stale-flag check correctly returns 409; patch
_select.select in streaming tests (select requires fileno(), iter() doesn't)
- tests/test_finetune.py: mark GPU integration test @pytest.mark.gpu
- pytest.ini: register gpu and slow markers
- load_and_prepare_data() now accepts Path | list[Path]; single-Path callers unchanged
- Dedup by MD5(subject + body[:100]); last file/row wins (lets later runs correct labels)
- Prints summary line when duplicates are dropped
- Added _EmailDataset (TorchDataset wrapper), run_finetune(), and argparse CLI
- run_finetune() saves model + tokenizer + training_info.json with score_files provenance
- Stratified split guard: val set size clamped to at least n_classes (handles tiny example data)
- 3 new unit tests (merge, last-write-wins dedup, single-Path compat) + 1 integration test
- All 16 tests pass (15 unit + 1 integration)
Implements load_and_prepare_data (JSONL ingestion with class filtering),
compute_class_weights (inverse-frequency, div-by-zero safe), compute_metrics_for_trainer
(macro F1 + accuracy), and WeightedTrainer.compute_loss (**kwargs-safe for
Transformers 4.38+ num_items_in_batch). All 12 tests pass.