turnstone/app/services/diagnose
pyr0ball 8849f3aa22 fix(hypothesizer): extract first JSON array to handle reasoning model double-output
Reasoning models (e.g. foundation-sec-8b) emit valid JSON then repeat it
inside a markdown fence block. json.loads() fails on the combined text.

extract_first_json_array() scans for the first '[' and walks to its
matching ']' with proper string/escape/nesting handling, then returns
just that slice. Combined with strip_json_fences(), this handles all
observed output patterns:
  - bare JSON array (standard models)
  - fenced JSON array (fence-wrapping models)
  - bare array followed by fenced repeat (reasoning models)
2026-05-25 21:01:14 -07:00
..
__init__.py refactor: pipeline cleanup — 6 follow-up fixes (#33-#38) 2026-05-25 19:05:56 -07:00
_llm_client.py fix(hypothesizer): extract first JSON array to handle reasoning model double-output 2026-05-25 21:01:14 -07:00
classifier.py refactor: pipeline cleanup — 6 follow-up fixes (#33-#38) 2026-05-25 19:05:56 -07:00
hypothesizer.py fix(hypothesizer): extract first JSON array to handle reasoning model double-output 2026-05-25 21:01:14 -07:00
legacy.py fix: frozen dataclasses, clean __all__, improve exception logging in diagnose package 2026-05-25 12:31:07 -07:00
models.py refactor: pipeline cleanup — 6 follow-up fixes (#33-#38) 2026-05-25 19:05:56 -07:00
pipeline.py feat(pipeline): add TURNSTONE_CLASSIFIER_MODEL env var for Stage 2 ML config 2026-05-25 19:11:32 -07:00
suppressor.py fix: invert suppress_threshold semantics to similarity_threshold in FalsePositiveSuppressor 2026-05-25 18:58:52 -07:00
synthesizer.py refactor: pipeline cleanup — 6 follow-up fixes (#33-#38) 2026-05-25 19:05:56 -07:00
timeline.py refactor: split TimelineReconstructor.reconstruct into helpers, fix magic number + error handling 2026-05-25 13:22:18 -07:00