chore(llm): swap model_candidates order — Qwen2.5-3B first, Phi-4-mini fallback

Phi-4-mini's cached modeling_phi3.py imports SlidingWindowCache which
was removed in transformers 5.x. Qwen2.5-3B uses built-in qwen2 arch
and works cleanly. Reorder so Qwen is tried first.
This commit is contained in:
pyr0ball 2026-04-02 16:36:38 -07:00
parent 11fb3a07b4
commit bc80922d61

View file

@ -48,8 +48,8 @@ backends:
cf_orch: cf_orch:
service: vllm service: vllm
model_candidates: model_candidates:
- Phi-4-mini-instruct
- Qwen2.5-3B-Instruct - Qwen2.5-3B-Instruct
- Phi-4-mini-instruct
ttl_s: 300 ttl_s: 300
vllm_research: vllm_research:
api_key: '' api_key: ''