chore(llm): swap model_candidates order — Qwen2.5-3B first, Phi-4-mini fallback
Phi-4-mini's cached modeling_phi3.py imports SlidingWindowCache which was removed in transformers 5.x. Qwen2.5-3B uses built-in qwen2 arch and works cleanly. Reorder so Qwen is tried first.
This commit is contained in:
parent
11fb3a07b4
commit
bc80922d61
1 changed files with 1 additions and 1 deletions
|
|
@ -48,8 +48,8 @@ backends:
|
||||||
cf_orch:
|
cf_orch:
|
||||||
service: vllm
|
service: vllm
|
||||||
model_candidates:
|
model_candidates:
|
||||||
- Phi-4-mini-instruct
|
|
||||||
- Qwen2.5-3B-Instruct
|
- Qwen2.5-3B-Instruct
|
||||||
|
- Phi-4-mini-instruct
|
||||||
ttl_s: 300
|
ttl_s: 300
|
||||||
vllm_research:
|
vllm_research:
|
||||||
api_key: ''
|
api_key: ''
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue