refactor: pipeline cleanup — 6 follow-up fixes (#33–#38) #40

Merged

pyr0ball merged 3 commits from feat/pipeline-cleanup into main

2026-05-25 20:00:11 -07:00

pyr0ball commented

2026-05-25 19:07:35 -07:00

Owner

Summary

Six follow-up fixes from the post-implementation code review of the multi-agent diagnose pipeline. Depends on PR #39 (base branch: feat/29-multi-agent-diagnose).

372 tests passing — no regressions.

#33 — MappingProxyType for ClassifiedTimeline.cluster_severities

frozen=True only blocks field reassignment, not mutation of the dict value. Wrapped with MappingProxyType in classifier.py at construction time so the mapping is truly immutable.

#34 — Remove dead suppression branch in synthesizer

_build_hypothesis_block() filters to active = [rh for rh in ranked if not rh.suppress][:3]. The if rh.suppress and rh.suppression_reason branch was unreachable — always False. Replaced with novelty score display.

#35 — Shared `_llm_client.py`

Extracted call_llm(), extract_content(), and strip_json_fences() into app/services/diagnose/_llm_client.py. Both RootCauseHypothesizer and SummarySynthesizer now import from one source. Also added JSON fence stripping in hypothesizer._parse_response() — LLMs often return triple-backtick fences despite system prompt instructions.

#36 — Per-stage error isolation in pipeline.py

Unhandled stage exceptions previously caused the SSE stream to close silently. Each stage is now wrapped in try/except that emits {type: 'error'} + {type: 'done'} so the client always receives a terminal event.

#37 — format_context_block() in legacy branch only

format_context_block(ctx) was computed unconditionally but only used in the legacy LLM path. Moved inside the if llm_url and llm_model and combined: block.

#38 — str() coercion on supporting_cluster_ids

LLMs sometimes return integers instead of strings for cluster IDs. Added str(x) coercion in hypothesizer._parse_response() to match the tuple[str, ...] type annotation.

Files changed

app/services/diagnose/_llm_client.py (new)
app/services/diagnose/models.py
app/services/diagnose/classifier.py
app/services/diagnose/hypothesizer.py
app/services/diagnose/synthesizer.py
app/services/diagnose/pipeline.py
app/services/diagnose/__init__.py

Closes #33 #34 #35 #36 #37 #38

## Summary Six follow-up fixes from the post-implementation code review of the multi-agent diagnose pipeline. Depends on PR #39 (base branch: `feat/29-multi-agent-diagnose`). **372 tests passing — no regressions.** --- ## #33 — MappingProxyType for ClassifiedTimeline.cluster_severities `frozen=True` only blocks field reassignment, not mutation of the dict value. Wrapped with `MappingProxyType` in `classifier.py` at construction time so the mapping is truly immutable. ## #34 — Remove dead suppression branch in synthesizer `_build_hypothesis_block()` filters to `active = [rh for rh in ranked if not rh.suppress][:3]`. The `if rh.suppress and rh.suppression_reason` branch was unreachable — always `False`. Replaced with novelty score display. ## #35 — Shared `_llm_client.py` Extracted `call_llm()`, `extract_content()`, and `strip_json_fences()` into `app/services/diagnose/_llm_client.py`. Both `RootCauseHypothesizer` and `SummarySynthesizer` now import from one source. Also added JSON fence stripping in `hypothesizer._parse_response()` — LLMs often return triple-backtick fences despite system prompt instructions. ## #36 — Per-stage error isolation in pipeline.py Unhandled stage exceptions previously caused the SSE stream to close silently. Each stage is now wrapped in `try/except` that emits `{type: 'error'}` + `{type: 'done'}` so the client always receives a terminal event. ## #37 — format_context_block() in legacy branch only `format_context_block(ctx)` was computed unconditionally but only used in the legacy LLM path. Moved inside the `if llm_url and llm_model and combined:` block. ## #38 — str() coercion on supporting_cluster_ids LLMs sometimes return integers instead of strings for cluster IDs. Added `str(x)` coercion in `hypothesizer._parse_response()` to match the `tuple[str, ...]` type annotation. --- ## Files changed - `app/services/diagnose/_llm_client.py` (new) - `app/services/diagnose/models.py` - `app/services/diagnose/classifier.py` - `app/services/diagnose/hypothesizer.py` - `app/services/diagnose/synthesizer.py` - `app/services/diagnose/pipeline.py` - `app/services/diagnose/__init__.py` Closes #33 #34 #35 #36 #37 #38

pyr0ball added 1 commit 2026-05-25 19:07:36 -07:00

refactor: pipeline cleanup — 6 follow-up fixes (#33-#38) 94d796e103

- #33: Wrap ClassifiedTimeline.cluster_severities in MappingProxyType for
  true immutability (frozen=True only blocks field reassignment, not dict
  mutation).

- #34: Remove dead suppression branch in synthesizer._build_hypothesis_block.
  active[] is already filtered to not rh.suppress, so the 'Yes — suppressed'
  branch was unreachable. Now shows novelty score only.

- #35: Extract shared _llm_client.py with call_llm() + extract_content() +
  strip_json_fences(). Both RootCauseHypothesizer and SummarySynthesizer
  now import from one source. Also strips JSON fences from LLM output before
  parsing in hypothesizer._parse_response.

- #36: Add per-stage try/except in pipeline.run_pipeline(). Unhandled
  stage exceptions now emit {type: 'error'} + {type: 'done'} SSE events
  instead of silently closing the stream.

- #37: Move format_context_block() call inside the legacy LLM branch in
  diagnose/__init__.py — it was being computed unconditionally but only
  used in the non-pipeline path.

- #38: Coerce supporting_cluster_ids items to str() in hypothesizer
  _parse_response to guard against LLMs returning integers instead of
  string cluster IDs.

pyr0ball added 1 commit 2026-05-25 19:11:40 -07:00

feat(pipeline): add TURNSTONE_CLASSIFIER_MODEL env var for Stage 2 ML config 4a2fd0fb0d

Makes the HuggingFace classifier model for Stage 2 configurable via
TURNSTONE_CLASSIFIER_MODEL. When unset (default), Stage 2 falls back
to pattern_tags then regex — no download required on first run.

Also documents TURNSTONE_MULTI_AGENT_DIAGNOSE, TURNSTONE_CLASSIFIER_MODEL,
TURNSTONE_EMBED_BACKEND/MODEL/DEVICE in .env.example.

pyr0ball added 1 commit 2026-05-25 19:15:35 -07:00

feat(manage): source .env before starting uvicorn 8d281a9d64

Enables TURNSTONE_MULTI_AGENT_DIAGNOSE and other env vars set in
.env to reach the running process without manual export. Variables
already set in the caller's environment take precedence.