turnstone

Author	SHA1	Message	Date
pyr0ball	e851099e5c	fix(hypothesizer): extract first JSON array to handle reasoning model double-output Reasoning models (e.g. foundation-sec-8b) emit valid JSON then repeat it inside a markdown fence block. json.loads() fails on the combined text. extract_first_json_array() scans for the first '[' and walks to its matching ']' with proper string/escape/nesting handling, then returns just that slice. Combined with strip_json_fences(), this handles all observed output patterns: - bare JSON array (standard models) - fenced JSON array (fence-wrapping models) - bare array followed by fenced repeat (reasoning models)	2026-05-25 21:01:14 -07:00
pyr0ball	85e7a70536	refactor: pipeline cleanup — 6 follow-up fixes (#33-#38) - #33: Wrap ClassifiedTimeline.cluster_severities in MappingProxyType for true immutability (frozen=True only blocks field reassignment, not dict mutation). - #34: Remove dead suppression branch in synthesizer._build_hypothesis_block. active[] is already filtered to not rh.suppress, so the 'Yes — suppressed' branch was unreachable. Now shows novelty score only. - #35: Extract shared _llm_client.py with call_llm() + extract_content() + strip_json_fences(). Both RootCauseHypothesizer and SummarySynthesizer now import from one source. Also strips JSON fences from LLM output before parsing in hypothesizer._parse_response. - #36: Add per-stage try/except in pipeline.run_pipeline(). Unhandled stage exceptions now emit {type: 'error'} + {type: 'done'} SSE events instead of silently closing the stream. - #37: Move format_context_block() call inside the legacy LLM branch in diagnose/__init__.py — it was being computed unconditionally but only used in the non-pipeline path. - #38: Coerce supporting_cluster_ids items to str() in hypothesizer _parse_response to guard against LLMs returning integers instead of string cluster IDs.	2026-05-25 19:05:56 -07:00
pyr0ball	a2916f958a	fix: defensive coercion for LLM confidence and cluster fields in hypothesizer - Add _coerce_float() module-level helper: catches TypeError/ValueError from non-numeric LLM output (e.g. 'high', 'N/A') and returns a caller-supplied default instead of raising. - Replace float(item.get('confidence', 0.5)) with _coerce_float(item.get('confidence'), 0.5) in _parse_response. - Guard supporting_cluster_ids: tuple(item.get(...) or []) so a JSON null from the LLM does not cause TypeError('NoneType is not iterable'). - runbook_refs is hardcoded as () and not sourced from LLM output; no change needed there. - Add test_non_numeric_confidence_uses_default (Test 10) to cover the 'high' string case: asserts no exception and confidence == 0.5. - 341 tests passing (+1). Closes: #29	2026-05-25 14:00:30 -07:00
pyr0ball	34fb8f501d	feat: Stage 3 — RootCauseHypothesizer for multi-agent diagnose pipeline (issue #29 ) - Add app/services/diagnose/hypothesizer.py with RootCauseHypothesizer class - Stage 3 of the multi-agent diagnose pipeline: accepts ClassifiedTimeline + RetrievedContext, builds a structured JSON prompt, calls the LLM via the same cf-orch task → OpenAI-compat fallback pattern used by llm.py - Parses JSON array response into list[Hypothesis] dataclasses with UUID ids, severity validation (WARNING→WARN, unknown→ERROR), confidence coercion - Gracefully returns [] when llm_url/llm_model absent or clusters empty - Add tests/test_diagnose_hypothesizer.py: 12 tests, all mocked, no LLM I/O covering: valid response, UUID generation, malformed JSON, non-list JSON, empty clusters, missing URL/model, max_hypotheses cap, severity mapping, confidence string coercion - 340 tests passing (328 prior + 12 new) Closes: #29	2026-05-25 13:49:18 -07:00

4 commits