fix(recipe): fail fast on cf-orch 429 instead of slow LLMRouter fallback
When the coordinator returns 429 (all nodes at max_concurrent limit), the previous code fell back to LLMRouter which is also overloaded at high concurrency. This caused the request to hang for ~60s before nginx returned a 504. Now: detect 429/max_concurrent in the RuntimeError message and return "" immediately so the caller gets an empty RecipeResult (graceful degradation) rather than a timeout.
This commit is contained in:
parent
79f345aae6
commit
eba536070c
1 changed files with 13 additions and 0 deletions
|
|
@ -181,6 +181,19 @@ class LLMRecipeGenerator:
|
|||
try:
|
||||
alloc = ctx.__enter__()
|
||||
except Exception as exc:
|
||||
msg = str(exc)
|
||||
# 429 = coordinator at capacity (all nodes at max_concurrent limit).
|
||||
# Don't fall back to LLMRouter — it's also overloaded and the slow
|
||||
# fallback causes nginx 504s. Return "" fast so the caller degrades
|
||||
# gracefully (empty recipe result) rather than timing out.
|
||||
if "429" in msg or "max_concurrent" in msg.lower():
|
||||
logger.info("cf-orch at capacity — returning empty result (graceful degradation)")
|
||||
if ctx is not None:
|
||||
try:
|
||||
ctx.__exit__(None, None, None)
|
||||
except Exception:
|
||||
pass
|
||||
return ""
|
||||
logger.debug("cf-orch allocation failed, falling back to LLMRouter: %s", exc)
|
||||
ctx = None # __enter__ raised — do not call __exit__
|
||||
|
||||
|
|
|
|||
Loading…
Reference in a new issue