feat(diagnose): 5-stage multi-agent diagnose pipeline (#29) #39

Merged

pyr0ball merged 17 commits from feat/29-multi-agent-diagnose into main

2026-05-25 19:59:35 -07:00

Author	SHA1	Message	Date
pyr0ball	86361f6c79	fix: invert suppress_threshold semantics to similarity_threshold in FalsePositiveSuppressor Was suppressing when novelty_score < 0.85 (i.e. similarity > 0.15), which would suppress nearly every hypothesis once embeddings are active. Now suppresses when max_sim >= similarity_threshold (0.85), meaning only hypotheses that are 85%+ similar to a resolved incident are suppressed. Also renames suppress_threshold → similarity_threshold for clarity and adds a borderline boundary test (0.85 suppressed, 0.84 not suppressed). Closes: #29	2026-05-25 18:58:52 -07:00
pyr0ball	255c9111d4	fix: tighten suppression_reason display guard, document unused since/until params	2026-05-25 15:02:48 -07:00
pyr0ball	8cbd981ec7	feat: Stage 5 synthesizer + pipeline orchestrator + feature flag wiring (issue #29 ) - Add app/services/diagnose/synthesizer.py: SummarySynthesizer (Stage 5) - Builds structured LLM prompt from ranked hypotheses, timeline, RAG context - Excludes suppressed hypotheses from the narrative prompt - Deterministic fallback when no LLM configured or LLM call fails - Same cf-orch task endpoint + direct OpenAI-compat fallback pattern as other stages - Replace pipeline.py stub with full run_pipeline() async generator - Orchestrates all 5 stages via asyncio.to_thread for each synchronous stage - Yields typed SSE event dicts: status, pipeline_stage (1-4), hypotheses, reasoning, done - Suppressor counts (active vs suppressed) reported in stage 4 event message - Wire MULTI_AGENT_ENABLED feature flag into diagnose_stream() - TURNSTONE_MULTI_AGENT_DIAGNOSE=true routes through run_pipeline() - pipeline emits its own done event; legacy path unchanged when flag is false - Import of run_pipeline added to __init__.py - Add 21 new tests (350 -> 371 passing): - tests/test_diagnose_synthesizer.py: 8 tests (with/without LLM, suppressed, empty ranked, LLM failure fallback) - tests/test_diagnose_pipeline.py: 13 tests (flag off, flag on event sequence, empty entries, no LLM, stage 1 cluster count message) Closes: #29	2026-05-25 14:56:25 -07:00
pyr0ball	9bfae16b54	refactor: extract _score_hypothesis helper, fix exception types, pass device in suppressor	2026-05-25 14:41:33 -07:00
pyr0ball	174cb126e6	feat: Stage 4 — FalsePositiveSuppressor for multi-agent diagnose pipeline (issue #29 ) - Implements FalsePositiveSuppressor using embedding cosine similarity - Lazy corpus embedding via get_embedder() with module-level cache keyed by db_path - Cache invalidated automatically when the resolved incident corpus changes - Suppresses hypotheses with novelty_score below configurable threshold (default 0.85) - Full fallback path (novelty=1.0, no suppression) when model_id empty, embedding service unavailable, or no resolved incidents found in DB - Graceful handling of missing incidents table and DB query failures - Numpy bool_ leakage prevented by explicit float()/bool() coercion at assignment - Pure-Python cosine fallback for environments without numpy - 9 new tests (all mocked, no real model downloads): passthrough, suppress, no-suppress, empty list, ranking, empty corpus, DB failure, service unavailable, cache invalidation - 350 total tests passing (341 pre-existing + 9 new) Closes: #29	2026-05-25 14:28:31 -07:00
pyr0ball	e8c66972fa	fix: defensive coercion for LLM confidence and cluster fields in hypothesizer - Add _coerce_float() module-level helper: catches TypeError/ValueError from non-numeric LLM output (e.g. 'high', 'N/A') and returns a caller-supplied default instead of raising. - Replace float(item.get('confidence', 0.5)) with _coerce_float(item.get('confidence'), 0.5) in _parse_response. - Guard supporting_cluster_ids: tuple(item.get(...) or []) so a JSON null from the LLM does not cause TypeError('NoneType is not iterable'). - runbook_refs is hardcoded as () and not sourced from LLM output; no change needed there. - Add test_non_numeric_confidence_uses_default (Test 10) to cover the 'high' string case: asserts no exception and confidence == 0.5. - 341 tests passing (+1). Closes: #29	2026-05-25 14:00:30 -07:00
pyr0ball	eefd65f903	feat: Stage 3 — RootCauseHypothesizer for multi-agent diagnose pipeline (issue #29 ) - Add app/services/diagnose/hypothesizer.py with RootCauseHypothesizer class - Stage 3 of the multi-agent diagnose pipeline: accepts ClassifiedTimeline + RetrievedContext, builds a structured JSON prompt, calls the LLM via the same cf-orch task → OpenAI-compat fallback pattern used by llm.py - Parses JSON array response into list[Hypothesis] dataclasses with UUID ids, severity validation (WARNING→WARN, unknown→ERROR), confidence coercion - Gracefully returns [] when llm_url/llm_model absent or clusters empty - Add tests/test_diagnose_hypothesizer.py: 12 tests, all mocked, no LLM I/O covering: valid response, UUID generation, malformed JSON, non-list JSON, empty clusters, missing URL/model, max_hypotheses cap, severity mapping, confidence string coercion - 340 tests passing (328 prior + 12 new) Closes: #29	2026-05-25 13:49:18 -07:00
pyr0ball	912ba7ac16	feat: Stage 2 — SeverityClassifier for multi-agent diagnose pipeline (issue #29 ) Three-path classification: ML (transformers pipeline, lazy singleton) → pattern_tags (YAML pattern severity dict) → regex (detect_severity). - Path A: HF text-classification pipeline loaded lazily on first classify() call via module-level singleton; shim promotes ERROR+keyword hits to CRITICAL and demotes low-confidence INFO to DEBUG. - Path B: maps cluster.pattern_tags through the loaded pattern severity dict; picks the highest severity across matching tags. - Path C: falls back to detect_severity() regex scan on representative_text; defaults to INFO when no keyword matches. - Pattern file resolved from constructor arg or TURNSTONE_PATTERNS env var (mirrors app/rest.py convention). - No crash when transformers is not installed; ImportError on per-cluster ML inference triggers clean per-cluster fallback to pattern_tags/regex. - ClassifiedTimeline.classifier_used reflects the primary session path. Tests (10 new, 328 total, all passing): - ML ERROR, CRITICAL promotion, DEBUG demotion, WARNING→WARN - pattern_tags resolution from YAML fixture - regex ERROR detection and INFO default - ImportError clean fallback - empty timeline no-crash - ClassifiedTimeline FrozenInstanceError on mutation Closes: #29	2026-05-25 13:27:17 -07:00
pyr0ball	3b04c81a2b	refactor: split TimelineReconstructor.reconstruct into helpers, fix magic number + error handling - Add gap_significance_seconds constructor param (default 30) to replace hardcoded magic number in gap_count computation - _parse_iso now returns datetime \| None with try/except on ValueError; all callers handle None return by treating malformed timestamps as absent - Extract reconstruct into four private helpers: _sort_entries, _group_into_raw_clusters, _build_cluster, _dominant_sources_tuple - Promote _sort_key to module-level function (was nested inside reconstruct) - Rename old module-level _build_cluster to _make_event_cluster to avoid name collision with new instance method - Add explanatory comment to type: ignore[arg-type] at _highest_severity call site - Black-formatted	2026-05-25 13:22:18 -07:00
pyr0ball	7cff98b1c3	feat: Stage 1 — TimelineReconstructor for multi-agent diagnose pipeline (issue #29 ) - Add app/services/diagnose/timeline.py: pure-Python TimelineReconstructor - Sorts entries by timestamp_iso (None entries appended at end) - Sliding-window clustering anchored to first entry in each cluster - Computes cluster_id (sha1[:12]), severity (highest wins), burst flag, gap_before_seconds, representative_text (highest rank, longest text tiebreak) - Builds TimelineResult with dominant_sources sorted by entry count descending - Update pipeline.py stub to import TimelineReconstructor (Task 6 wiring prep) - Add tests/test_diagnose_timeline.py: 15 tests covering all 13 required cases plus null-timestamp edge case variant; all 318 tests passing Closes: #29	2026-05-25 12:54:15 -07:00
pyr0ball	959a6cbf1c	fix: frozen dataclasses, clean __all__, improve exception logging in diagnose package	2026-05-25 12:31:07 -07:00
pyr0ball	664ab50433	refactor: convert diagnose module to package for multi-agent pipeline (issue #29 ) - Move app/services/diagnose.py verbatim to app/services/diagnose/legacy.py - Create app/services/diagnose/__init__.py with full implementation so that patch('app.services.diagnose._HAS_DATEPARSER') targets the correct namespace and all 303 existing tests continue to pass without modification - Add app/services/diagnose/models.py with 5 pipeline dataclasses: EventCluster, TimelineResult, ClassifiedTimeline, Hypothesis, RankedHypothesis - Add app/services/diagnose/pipeline.py with run_pipeline() stub (Task 6) - Add MULTI_AGENT_ENABLED feature flag (off by default via env var) - Zero behavior change; ruff clean Closes: #29	2026-05-25 11:12:39 -07:00
pyr0ball	5f32a6678d	refactor: extract embeddings service layer — decouple context embedder from Ollama - New app/services/embeddings.py: TURNSTONE_EMBED_* env vars, multi-backend support - embedder.py delegates to service layer; re-exports EMBEDDING_AVAILABLE for compat - retriever.py updated to use service layer - Test coverage updated in tests/context/test_embedder.py	2026-05-25 11:01:25 -07:00
pyr0ball	2fde3a1814	feat: fingerprint-based incremental glean — skip unchanged files (#30 ) - Add glean_fingerprints table to schema (sha256 + mtime + size) - _fingerprint(), _fp_unchanged(), _save_fingerprint() helpers in pipeline.py - _glean_files() now checks fingerprint; skips file if hash unchanged - force=True param threads through glean_dir → glean_file → glean_sources - POST /api/tasks/glean and POST /api/sources/{id}/glean accept force=true - 14 unit tests in tests/test_glean_fingerprint.py, all passing Closes: #30	2026-05-25 11:01:18 -07:00
pyr0ball	e746d55730	feat: SSH remote glean — transport layer, pipeline integration, REST + UI (#22 ) Closes turnstone#22. ## Transport layer (app/glean/ssh.py) - SSHTransport context manager: key-only auth, paramiko backend - SSHConnectionError / SSHCommandError exception hierarchy - exec_stream() generator: yields stdout lines, raises SSHCommandError on non-zero exit (isinstance(int) guard for test-mock safety) - Command builders: _build_journald_command, _build_syslog_command, _build_plaintext_command, _build_docker_command - 18 unit tests in tests/test_glean_ssh.py ## Pipeline integration (app/glean/pipeline.py) - _stream_and_write(): per-item error isolation — SSHCommandError skips one glean item without aborting the rest of the host connection - _glean_ssh_source(): one SSHTransport per host, dispatches all glean items (journald/syslog/plaintext/docker); SSHConnectionError aborts host - glean_sources(): splits local vs SSH sources; local → _glean_files(); SSH → _glean_ssh_source(); shared compiled patterns and DB connection - glean_ssh_source(): public wrapper for REST use — manages DB connection, pattern compilation, FTS rebuild lifecycle - 15 integration tests in tests/test_glean_pipeline_ssh.py - All 285 tests passing ## REST layer (app/rest.py) - GET /api/sources/configured: reads sources.yaml and enriches with DB stats; SSH sources appear before first glean (entry_count=0); sub-source IDs (rack01/journald, rack01/docker/myapp) aggregated per host entry - POST /api/sources/{id}/glean: detects transport:ssh and dispatches to glean_ssh_source() wrapper; local sources unchanged - Import: glean_ssh_source as _glean_ssh_source ## Frontend (web/src/views/SourcesView.vue) - Fetches /api/sources/configured (primary) + /api/sources (DB-only) in parallel; merges into unified SourceRow list - SSH sources show: ssh badge (with user@host tooltip), glean-type pills (journald/syslog/docker/etc.), host subtitle - SSH sub-source IDs (rack01/journald) suppressed from the DB-only list since they are covered by the parent SSH row - DB-only sources (uploads) appear below configured sources with 'uploaded' badge; reglean button disabled (not in sources.yaml) - Delete zeroes out configured-source stats in-place rather than removing the row (so the source remains visible for re-gleaning)	2026-05-21 12:37:30 -07:00
pyr0ball	81a9b0f49d	feat: SSH remote host glean — transport layer and pipeline integration (closes #22 , backend) Adds SSH-based log collection from remote hosts via Paramiko. One SSH connection per host, multiple log types per connection. New files: - app/glean/ssh.py: SSHTransport context manager + command builders for journald, syslog, plaintext, and docker log types - tests/test_glean_ssh.py: 18 tests for transport layer (all mocked) - tests/test_glean_pipeline_ssh.py: 15 tests for pipeline integration Pipeline changes (app/glean/pipeline.py): - glean_sources() now splits sources into local-file and SSH categories - SSH sources use transport: ssh + glean: list schema in sources.yaml - _glean_ssh_source(): one SSHTransport per host, N commands per connection - _stream_and_write(): SSHCommandError caught per-item so one bad command does not abort the rest of the host's glean items - SSHConnectionError skips the entire host with a warning log SSH source schema (sources.yaml): - id: rack01 transport: ssh host: 192.168.1.10 user: admin key_path: ~/.ssh/id_ed25519 glean: - type: journald args: [--since, 2 hours ago] - type: syslog path: /var/log/syslog - type: plaintext path: /var/log/app/error.log - type: docker containers: [myapp, nginx] Key design decisions: - Key-based auth only (no password prompts in daemon context) - exit-status check fires after all stdout lines yielded; callers drain the iterator to trigger it - Local file sources path unchanged; SSH sources co-exist in same yaml - Docker multi-container: one exec_stream call per container, source_id scoped as host_id/type/container_name Remaining for #22: REST endpoint, SourcesView UI, sources.yaml docs. 285 → 285 tests passing (33 new SSH tests).	2026-05-20 23:03:13 -07:00
pyr0ball	12cd0a23d5	refactor: rename ingest → glean throughout codebase Renames the app/ingest/ package to app/glean/ and updates all references across Python modules, shell scripts, Vue components, tests, and documentation. Intentionally preserved: - SQLite column name ingest_time (avoids schema migration) - RetrievedEntry.ingest_time field (maps to the column above) - Any public-facing JSON keys that reference ingest_time Changes by category: - app/ingest/ → app/glean/ (full package move, all parsers) - app/tasks/ingest_scheduler.py → app/tasks/glean_scheduler.py - scripts/ingest_corpus.py → scripts/glean_corpus.py - tests/test_ingest_.py → tests/test_glean_.py - Docstrings, log messages, comments: ingest → glean - Env var: TURNSTONE_INGEST_INTERVAL → TURNSTONE_GLEAN_INTERVAL - Shell scripts: glean.log, glean_corpus.py references - README.md: multi-source ingest → multi-source glean - .env.example: updated env var name - patterns/: new diagnostic patterns from 2026-05-20 SSH incident (service_crash_loop, pkg_daemon_restart, ssh_forward_conflict) - SourcesView.vue: pipeline label updated - All test import paths updated to app.glean.* 285 tests passing.	2026-05-20 23:02:55 -07:00