refactor: pipeline cleanup — 6 follow-up fixes (#33–#38) #40

Merged
pyr0ball merged 3 commits from feat/pipeline-cleanup into main 2026-05-25 20:00:11 -07:00
Owner

Summary

Six follow-up fixes from the post-implementation code review of the multi-agent diagnose pipeline. Depends on PR #39 (base branch: feat/29-multi-agent-diagnose).

372 tests passing — no regressions.


#33 — MappingProxyType for ClassifiedTimeline.cluster_severities

frozen=True only blocks field reassignment, not mutation of the dict value. Wrapped with MappingProxyType in classifier.py at construction time so the mapping is truly immutable.

#34 — Remove dead suppression branch in synthesizer

_build_hypothesis_block() filters to active = [rh for rh in ranked if not rh.suppress][:3]. The if rh.suppress and rh.suppression_reason branch was unreachable — always False. Replaced with novelty score display.

#35 — Shared _llm_client.py

Extracted call_llm(), extract_content(), and strip_json_fences() into app/services/diagnose/_llm_client.py. Both RootCauseHypothesizer and SummarySynthesizer now import from one source. Also added JSON fence stripping in hypothesizer._parse_response() — LLMs often return triple-backtick fences despite system prompt instructions.

#36 — Per-stage error isolation in pipeline.py

Unhandled stage exceptions previously caused the SSE stream to close silently. Each stage is now wrapped in try/except that emits {type: 'error'} + {type: 'done'} so the client always receives a terminal event.

#37 — format_context_block() in legacy branch only

format_context_block(ctx) was computed unconditionally but only used in the legacy LLM path. Moved inside the if llm_url and llm_model and combined: block.

#38 — str() coercion on supporting_cluster_ids

LLMs sometimes return integers instead of strings for cluster IDs. Added str(x) coercion in hypothesizer._parse_response() to match the tuple[str, ...] type annotation.


Files changed

  • app/services/diagnose/_llm_client.py (new)
  • app/services/diagnose/models.py
  • app/services/diagnose/classifier.py
  • app/services/diagnose/hypothesizer.py
  • app/services/diagnose/synthesizer.py
  • app/services/diagnose/pipeline.py
  • app/services/diagnose/__init__.py

Closes #33 #34 #35 #36 #37 #38

## Summary Six follow-up fixes from the post-implementation code review of the multi-agent diagnose pipeline. Depends on PR #39 (base branch: `feat/29-multi-agent-diagnose`). **372 tests passing — no regressions.** --- ## #33 — MappingProxyType for ClassifiedTimeline.cluster_severities `frozen=True` only blocks field reassignment, not mutation of the dict value. Wrapped with `MappingProxyType` in `classifier.py` at construction time so the mapping is truly immutable. ## #34 — Remove dead suppression branch in synthesizer `_build_hypothesis_block()` filters to `active = [rh for rh in ranked if not rh.suppress][:3]`. The `if rh.suppress and rh.suppression_reason` branch was unreachable — always `False`. Replaced with novelty score display. ## #35 — Shared `_llm_client.py` Extracted `call_llm()`, `extract_content()`, and `strip_json_fences()` into `app/services/diagnose/_llm_client.py`. Both `RootCauseHypothesizer` and `SummarySynthesizer` now import from one source. Also added JSON fence stripping in `hypothesizer._parse_response()` — LLMs often return triple-backtick fences despite system prompt instructions. ## #36 — Per-stage error isolation in pipeline.py Unhandled stage exceptions previously caused the SSE stream to close silently. Each stage is now wrapped in `try/except` that emits `{type: 'error'}` + `{type: 'done'}` so the client always receives a terminal event. ## #37 — format_context_block() in legacy branch only `format_context_block(ctx)` was computed unconditionally but only used in the legacy LLM path. Moved inside the `if llm_url and llm_model and combined:` block. ## #38 — str() coercion on supporting_cluster_ids LLMs sometimes return integers instead of strings for cluster IDs. Added `str(x)` coercion in `hypothesizer._parse_response()` to match the `tuple[str, ...]` type annotation. --- ## Files changed - `app/services/diagnose/_llm_client.py` (new) - `app/services/diagnose/models.py` - `app/services/diagnose/classifier.py` - `app/services/diagnose/hypothesizer.py` - `app/services/diagnose/synthesizer.py` - `app/services/diagnose/pipeline.py` - `app/services/diagnose/__init__.py` Closes #33 #34 #35 #36 #37 #38
pyr0ball added 1 commit 2026-05-25 19:07:36 -07:00
- #33: Wrap ClassifiedTimeline.cluster_severities in MappingProxyType for
  true immutability (frozen=True only blocks field reassignment, not dict
  mutation).

- #34: Remove dead suppression branch in synthesizer._build_hypothesis_block.
  active[] is already filtered to not rh.suppress, so the 'Yes — suppressed'
  branch was unreachable. Now shows novelty score only.

- #35: Extract shared _llm_client.py with call_llm() + extract_content() +
  strip_json_fences(). Both RootCauseHypothesizer and SummarySynthesizer
  now import from one source. Also strips JSON fences from LLM output before
  parsing in hypothesizer._parse_response.

- #36: Add per-stage try/except in pipeline.run_pipeline(). Unhandled
  stage exceptions now emit {type: 'error'} + {type: 'done'} SSE events
  instead of silently closing the stream.

- #37: Move format_context_block() call inside the legacy LLM branch in
  diagnose/__init__.py — it was being computed unconditionally but only
  used in the non-pipeline path.

- #38: Coerce supporting_cluster_ids items to str() in hypothesizer
  _parse_response to guard against LLMs returning integers instead of
  string cluster IDs.
pyr0ball added 1 commit 2026-05-25 19:11:40 -07:00
Makes the HuggingFace classifier model for Stage 2 configurable via
TURNSTONE_CLASSIFIER_MODEL. When unset (default), Stage 2 falls back
to pattern_tags then regex — no download required on first run.

Also documents TURNSTONE_MULTI_AGENT_DIAGNOSE, TURNSTONE_CLASSIFIER_MODEL,
TURNSTONE_EMBED_BACKEND/MODEL/DEVICE in .env.example.
pyr0ball added 1 commit 2026-05-25 19:15:35 -07:00
Enables TURNSTONE_MULTI_AGENT_DIAGNOSE and other env vars set in
.env to reach the running process without manual export. Variables
already set in the caller's environment take precedence.
pyr0ball changed target branch from feat/29-multi-agent-diagnose to main 2026-05-25 19:59:35 -07:00
pyr0ball merged commit 1f9a6bb284 into main 2026-05-25 20:00:11 -07:00
pyr0ball deleted branch feat/pipeline-cleanup 2026-05-25 20:00:12 -07:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Circuit-Forge/turnstone#40
No description provided.