fix: sanitize invalid JSON escape sequences from LLM output in resume optimizer

LLMs occasionally emit backslash sequences that are valid regex but not valid
JSON (e.g. \s, \d, \p). This caused extract_jd_signals() to fall through to
the exception handler, leaving llm_signals empty. With no LLM signals, the
optimizer fell back to TF-IDF only — which is more conservative and can
legitimately return zero gaps, making the UI appear to say the resume is fine.

Fix: strip bare backslashes not followed by a recognised JSON escape character
("  \  /  b  f  n  r  t  u) before parsing. Preserves \n, \", etc.

Reproduces: cover letter generation concurrent with gap analysis raises the
probability of a slightly malformed LLM response due to model load.
This commit is contained in:
pyr0ball 2026-04-16 11:11:50 -07:00
parent a4a2216c2f
commit 4e11cf3cfa

View file

@ -70,7 +70,12 @@ def extract_jd_signals(description: str, resume_text: str = "") -> list[str]:
# Extract JSON array from response (LLM may wrap it in markdown)
match = re.search(r"\[.*\]", raw, re.DOTALL)
if match:
llm_signals = json.loads(match.group(0))
json_str = match.group(0)
# LLMs occasionally emit invalid JSON escape sequences (e.g. \s, \d, \p)
# that are valid regex but not valid JSON. Replace bare backslashes that
# aren't followed by a recognised JSON escape character.
json_str = re.sub(r'\\([^"\\/bfnrtu])', r'\1', json_str)
llm_signals = json.loads(json_str)
llm_signals = [s.strip() for s in llm_signals if isinstance(s, str) and s.strip()]
except Exception:
log.warning("[resume_optimizer] LLM signal extraction failed", exc_info=True)