fix(recipe): fail fast on cf-orch 429 instead of slow LLMRouter fallback

When the coordinator returns 429 (all nodes at max_concurrent limit), the previous code fell back to LLMRouter which is also overloaded at high concurrency. This caused the request to hang for ~60s before nginx returned a 504. Now: detect 429/max_concurrent in the RuntimeError message and return "" immediately so the caller gets an empty RecipeResult (graceful degradation) rather than a timeout.
2026-04-19 20:24:21 -07:00 · 2026-04-19 20:24:21 -07:00 · eba536070c
commit eba536070c
parent 79f345aae6
1 changed files with 13 additions and 0 deletions
--- a/app/services/recipe/llm_recipe.py
+++ b/app/services/recipe/llm_recipe.py
@ -181,6 +181,19 @@ class LLMRecipeGenerator:
        try:
            alloc = ctx.__enter__()
        except Exception as exc:
+            msg = str(exc)
+            # 429 = coordinator at capacity (all nodes at max_concurrent limit).
+            # Don't fall back to LLMRouter — it's also overloaded and the slow
+            # fallback causes nginx 504s. Return "" fast so the caller degrades
+            # gracefully (empty recipe result) rather than timing out.
+            if "429" in msg or "max_concurrent" in msg.lower():
+                logger.info("cf-orch at capacity — returning empty result (graceful degradation)")
+                if ctx is not None:
+                    try:
+                        ctx.__exit__(None, None, None)
+                    except Exception:
+                        pass
+                return ""
            logger.debug("cf-orch allocation failed, falling back to LLMRouter: %s", exc)
            ctx = None  # __enter__ raised — do not call __exit__