fix: 5 pre-existing test failures on main (models isolation, cforch return type, finetune GPU) #56
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Pre-existing failures identified during avocet#55 review
All 5 failures exist on
mainand predate the embedding k-NN work. Zero regressions were introduced by #55.1-3.
test_models.py— test isolation gap (3 failures)Tests:
test_installed_empty,test_installed_detects_downloaded_model,test_installed_detects_finetuned_modelRoot cause:
list_installed()inapp/models.py:896-898scans both_MODELS_DIRand_CF_TEXT_MODELS_DIR. Thereset_models_globalsfixture redirects_MODELS_DIRto tmp but has no setter for_CF_TEXT_MODELS_DIR, which points at/Library/Assets/LLM/cf-text/models(15 real models on the dev machine). All three tests that assert exact counts (== []or== 1) fail because of the leaked models.Fix: Add
set_cf_text_models_dir(path: Path) -> Nonetoapp/models.py(mirrors the existingset_models_dir()). Update thereset_models_globalsfixture to redirect both dirs — point_CF_TEXT_MODELS_DIRat a nonexistent tmp subpath so the scan skips it cleanly.Effort: ~5 lines (1 setter + 2-line fixture update).
4.
test_cforch.py::test_results_returns_latest_summaryRoot cause:
get_results()inapp/cforch.py:498has return type-> list, butsummary.jsonis a dict. FastAPI response validation raisesResponseValidationError: Input should be a valid list.The endpoint name, docstring, and test all indicate intent to return a single summary dict. The
-> listannotation is wrong.Fix: Change
def get_results() -> list:todef get_results() -> dict:.Effort: Trivial (1 word).
5.
test_finetune.py::test_integration_finetune_on_example_dataRoot cause:
torch.OutOfMemoryError: CUDA out of memory— GPU has only 10.5 MiB free when another process holds 5.72 GiB of VRAM. This test passes when the GPU is idle and fails when cf-orch has a model loaded.There is also a secondary correctness issue visible in the output:
A checkpoint trained on 3 labels is being reloaded into a 2-label model. The OOM masks this during the current run, but it would surface as a classification error on a less-loaded machine.
Fix (environmental): Mark with
@pytest.mark.slowor@pytest.mark.gpuand exclude from the defaultpytestrun. Only run explicitly when GPU is idle.Fix (correctness): Investigate the label count mismatch in the fine-tune checkpoint reload path — the checkpoint and model must agree on
num_labelsbefore loading weights.Effort: Skip marker is trivial; label mismatch needs investigation.
Labels
bug/test/good first issue(items 1-4 are mechanical fixes)blocked:gpu(item 5 environmental OOM)