Commit graph

110 commits

Author SHA1 Message Date
db359d35b2 fix(search): qualify ambiguous column names with table alias in FTS JOIN
Both log_fts and log_entries have timestamp_iso, severity, source_id, and
matched_patterns columns. After the JOIN, unqualified references to any of
these caused SQLite to raise 'ambiguous column name', silently falling back
to the non-FTS scan path on every time-filtered or severity-filtered query.

Prefix all filter conditions that touch FTS-mirror columns with f. to
resolve the ambiguity. The e. prefix on tenant_id was already correct since
tenant_id is not present in the FTS virtual table.
2026-06-17 11:27:38 -07:00
04013757e7 chore: bump version to v0.7.0
Beta milestone complete: all open beta tickets closed.
2026-06-17 09:41:10 -07:00
5da8db2bcd fix(diagnose): pass full timeline clusters and hypothesis descriptions to synthesizer LLM
Stage 5 (SummarySynthesizer) was only sending aggregate timeline stats to the
LLM (cluster count, burst count, gap count) — the actual sequenced cluster data
that Stage 1 reconstructed was never included. The LLM had no per-cluster
timestamps, severity, burst flags, silence gaps, or representative text to
write the TIMELINE section from.

Added _build_timeline_block() to emit a numbered per-cluster summary matching
the format Stage 3 uses for the hypothesizer, and included it in the user
message alongside the hypothesis block.

Also fixed _build_hypothesis_block() to include the 2-4 sentence description
each hypothesis carries — previously only the title and novelty score reached
the LLM.

11 new tests cover _build_timeline_block() directly (burst label, gap threshold,
pattern tags, text truncation at 200 chars, null start_iso, multi-cluster
numbering). 529 tests passing.
2026-06-16 21:46:01 -07:00
4c1940d12e fix: strip reasoning-model thinking tags; surface untracked node names
- app/services/diagnose/_llm_client.py: strip <think>…</think> blocks
  (case-insensitive, multiline) from LLM response content before it
  reaches the UI or any JSON parser — affects DeepSeek-R1, Qwen QwQ,
  and any other model that emits chain-of-thought in content
- app/rest.py: suggest_sources now also returns untracked_names — query
  tokens that look like hostnames/service names but don't appear in any
  monitored source, so the UI can prompt the user to add them
- web/src/components/ChatDiagnose.vue: show amber "Not monitoring: X"
  banner with "Add as a log source →" link when untracked_names present
- tests/test_llm_client.py: 13 tests covering think-strip edge cases
  (single/multi-line, multiple blocks, case-insensitive, only-thinking)
  plus existing extract_content and JSON-fence helpers
2026-06-16 09:42:44 -07:00
6039ab2464 feat: incident ticket export — Notion and Jira integration (#12)
- app/services/ticket_export.py: plugin-dispatch architecture; Notion
  exporter (Notion API v1, blocks-based, 50 entry cap, 2000-char
  truncation per block); Jira exporter (REST API v3, Basic Auth, ADF
  description, configurable issue type defaulting to Bug)
- app/rest.py: POST /api/incidents/{id}/export endpoint; Notion/Jira
  credential fields added to SettingsBody and PATCH /api/settings handler
- web/src/views/IncidentsView.vue: "Export ticket ▾" dropdown in
  incident detail drawer — click-outside close, inline URL link on success
- web/src/views/SettingsView.vue: Ticket Trackers section with Notion
  token + database ID, Jira URL/email/token/project/issue-type; show/hide
  for secret fields
- tests/test_ticket_export.py: 17 tests covering dispatch, Notion
  success/error/config/payload/truncation paths, Jira success/error/
  auth/project/summary/default-issue-type
2026-06-14 15:46:11 -07:00
b8f766fb74 feat: SSH target manager — GUI editor for remote host configuration (#24)
- app/services/ssh_targets.py: full CRUD service with lazy paramiko
  import, key-path validation, permission warning, and test_connection
- app/db/schema.py: ssh_targets table (id, label, host, port, user,
  key_path, last_tested, last_ok, last_error, timestamps)
- app/rest.py: GET/POST /api/ssh-targets, PATCH/DELETE /{id},
  POST /{id}/test — key contents never returned in any response
- web/src/views/SettingsView.vue: Remote Hosts section with add/edit
  form, inline connection status badges, test-connection flow, delete
  with confirmation; new Set() pattern for reactive sshTesting state
- tests/test_ssh_targets.py: 22 tests — schema, CRUD, validation,
  key-warning, serialization, paramiko-absent path
2026-06-14 15:27:12 -07:00
7a2ab0bb46 feat(orchard): auto-enrollment API for branch node provisioning (#27)
Implements the Orchard branch grafting system for harvest.circuitforge.tech:

- POST /api/orchard/graft: provisions data dir, starts a new
  turnstone-submissions-<slug> Docker container on the next free port
  (ORCHARD_PORT_BASE=8538+), injects a handle_path block into the
  Caddyfile dynamic-branches marker section, restarts caddy-proxy,
  returns {submit_endpoint, api_key}
- GET /api/orchard/branches: list active/inactive branches (admin-only)
- DELETE /api/orchard/branches/<slug>: deactivate branch + stop container
- POST /api/orchard/branches/<slug>/anonymize: HMAC-based IP/username
  pseudonymization worker over a branch DB
- POST /api/glean/batch: optional TURNSTONE_BRANCH_KEY auth guard
- anonymized column added to log_entries schema (migration-safe)
- Updated Caddyfile with /huginn/* route (port 8536), /node2/* (8537),
  and dynamic-branch marker section
- All endpoints admin-gated via TURNSTONE_ORCHARD_ADMIN_KEY

Closes: #27
2026-06-14 14:30:18 -07:00
600e5a9eac feat(sources): context-aware filesystem log scanner (#23)
Add scan_log_directories() to discover.py that recursively walks
/var/log and /opt, filters to readable log files, and scores each
candidate by recency (mtime, 0.7 weight), file size (0.3), and
keyword match against an optional problem-context query (shifts
weights to 0.4/0.2/0.4 when a query is provided).

- GET /api/setup/scan?query=...&max_results=N — new API endpoint
- SourcesView: "Scan" button opens a panel with ranked candidates,
  checkboxes, and "Add selected" to write to sources.yaml
- 13 new unit tests, 466 passing total

Closes: #23
2026-06-14 14:01:45 -07:00
7c76217149 chore: sanitize internal hostnames and IP references
- Rename patterns/sources-example-node.yaml → patterns/sources-example.yaml
  and update header/comments to be host-agnostic
- Replace internal node names in gen_corpus.py _HOSTS with generic names
- Replace example-node hostname in syslog test fixtures with testhost
- Replace example-node example in mcp_server.py doc with myserver
- Replace private LAN IP (<YOUR_HOST_IP>) in docker-standalone.sh with
  <HEIMDALL_LAN_IP> placeholder
- Replace private IPs in sources-cluster.yaml comments with <YOUR_HOST_IP>
- Remove instance-specific hostname from llm.py fallback comment
- Replace Caddy example domain in podman-standalone.sh with placeholder
2026-06-13 10:02:46 -07:00
502ff54fd0 feat(ui): security alert dedup, clickable criticals, loading shimmer
Security Alerts:
- Client-side duplicate collapsing via anomaly_label + text fingerprint
- ×N count badge chip on collapsed rows; toggle to expand
- Skeleton shimmer rows replace "Loading..." text

Dashboard:
- Clickable Recent Criticals — inline LLM explanation via SSE stream
- ±5 min time window scoped to source_id for useful context
- Explanation cache keyed by entry_id (no re-fetch on re-expand)
- Default diagnose query injected on Diagnose button navigation to
  prevent local models hallucinating from bare log data
- Stat card and source-health skeleton shimmer loading states

Backend:
- anomaly.py: 4-attempt retry on "database is locked" with 10s backoff
- search.py: migrate build_fts_index to get_conn() (WAL race fix);
  add timeline_events to stats_summary for clickable criticals feature
- theme.css: @keyframes shimmer + .loading-shimmer utility;
  prefers-reduced-motion degrades gracefully to static muted block
2026-06-13 09:32:26 -07:00
f3d807d991 feat(diagnose): conversational chat mode + NL source discovery
- New ChatDiagnose.vue: multi-turn chat UI in the Diagnose tab
  - Textarea input (auto-grows) for long free-form problem descriptions
  - Source suggestion pre-flight: debounced POST /api/sources/suggest
    identifies relevant log sources from the query text and shows them
    as interactive chips (deselect to exclude before searching)
  - Conversation history preserved across turns with LLM reasoning,
    collapsible log entries, and "Save as incident" per turn
  - Reuses existing /api/diagnose/stream — no new pipeline
- DiagnoseView.vue: Chat is now default tab; viewport-height layout
- POST /api/sources/suggest: token-overlap source ranking, no LLM
- Fix: add missing 'import re' causing 500 on suggest route
2026-06-11 22:04:53 -07:00
b6b69e2150 feat(incidents): auto-incident detection + example-node Podman setup
Auto-incident detector:
- New app/tasks/incident_detector.py: post-glean error cluster detector
  - Sliding window algorithm: source + N errors within window_s seconds
  - Deduplication via issue_type='auto:{source_id}' + interval overlap check
  - Respects TURNSTONE_AUTO_INCIDENT_THRESHOLD (default 5) and
    TURNSTONE_AUTO_INCIDENT_WINDOW (default 600s) env vars
  - 20 tests all passing
- Wired into glean_scheduler.run_once() and scheduler_loop()
- TURNSTONE_AUTO_INCIDENT env var to disable (default enabled)

Podman standalone improvements:
- REPO_DIR auto-detected from script location (no longer hardcoded to /opt/turnstone)
- DATA_DIR/PATTERNS_DIR/HF_CACHE_DIR configurable via env vars
- Bootstrap step copies host-specific sources-<hostname>.yaml on first run
- Auto-incident env vars passed through

example-node sources:
- patterns/sources-example-node.yaml: Sonarr, Radarr, Bazarr, Prowlarr,
  Tautulli, autoscan, organizr, nextcloud, journal export
2026-06-11 18:37:53 -07:00
74c9de9ccf fix(corpus): glean_dir now recurses subdirs; fix docker SOURCE prefix
- Changed glob → rglob in glean_dir so corpus directories with format
  subfolders (journald/, docker/, etc.) are fully ingested
- Fixed gen_corpus.py docker SOURCE to emit "docker:<service>" prefix
  so the pipeline correctly detects format as 'docker' not 'plaintext'
- 17/17 gen_corpus tests passing

Closes: #46
2026-06-11 16:30:28 -07:00
5816ed69ae feat(corpus): synthetic log corpus generator for demos and testing
Adds scripts/gen_corpus.py that produces realistic-but-artificial log
files across all four supported formats (journald JSON, docker envelope,
qBittorrent hotio, EXT_DEVICE plaintext). Output feeds directly into
glean_corpus.py for demo environments and parser regression tests with
no production data required.

- Seed-based RNG with independent per-source sub-streams (same seed =
  same sequence for each file regardless of source count changes)
- Controllable time range, event density, and error injection rate
- Severity distribution mirrors real infrastructure (70% INFO, ~6% ERROR,
  ~2% CRITICAL) with adjustable boost via --error-rate
- 17 tests covering output structure, reproducibility, format correctness,
  parser round-trip, and CLI acceptance criteria

Also fixes a latent bug in app/glean/plaintext.py: ISO 8601 timestamps
were silently failing to parse because the T separator was normalised to
space in the input string but the strptime format string still contained T.
Fix: apply the same normalisation to the format before calling strptime.

Closes: #46
2026-06-11 10:57:20 -07:00
4dcc1a441a feat(incidents): incident timeline visualizer + fix entry lookup using wrong DB path
Adds IncidentTimeline.vue — a pure SVG time-axis component rendered inside the
incident detail drawer when entries are present:
- Horizontal strip scaled to incident window (preserveAspectRatio=none)
- Event ticks colored by severity, height proportional to severity level
- 50-bin density shading shows burst periods as blue bands
- Gap markers (dashed lines) for silence > 10% of window or > 60s
- Hover tooltip showing nearest entry's severity, time, and truncated text
- Click-to-scroll: clicking a tick highlights and scrolls to its entry in the list below
- Legend showing only severity levels present in the incident

Also fixes a pre-existing bug: get_incident_endpoint and both build_bundle callers
were passing INCIDENTS_DB_PATH to get_incident_entries/build_bundle, causing all
incident entry lookups to silently search the empty incidents DB instead of the
main log DB. This made all incident detail views show "No log entries found".

Closes: #57
2026-06-10 16:02:24 -07:00
313b25e0d0 feat(alerts): security alerts tab — full scorer integration
- Fix loadScorerStatus: was spreading data.state + data.config (both
  undefined); API returns flat object; now uses data directly
- Fix v-for to use filteredDetections (was using raw detections array,
  breaking the Unacknowledged tab filter)
- Fix double-prefix URL bug: BASE already contains /turnstone, so
  fetches to ${BASE}/turnstone/api/... doubled the prefix → returned
  SPA HTML → silent JSON parse failure. Fixed all fetch URLs to use
  ${BASE}/api/... in SecurityAlertsView and DashboardView
- Add CybersecStatus interface to replace Record<string, unknown>
- Add scorer field to Detection interface; show 'cybersec' badge in
  label cell when scorer !== 'anomaly'
- Add cybersecStatus.running to cybersec badge (pulse animation)
- Add ANOMALY / CYBERSEC stats rows side-by-side
- Add 'Run cybersec' button with cybersecTriggerLoading state and
  runCybersec() function posting to /api/cybersec/run
- Rename 'Run scorer' → 'Run anomaly' for clarity

Closes: #11
2026-06-10 14:32:43 -07:00
61816c26bd fix(cybersec): clean up debug traceback logging
Replaced manual traceback import with exc_info=True, which is the
idiomatic logging pattern and produces the same output.
2026-06-10 13:20:56 -07:00
971a859c0d fix(watcher): remove per-flush FTS sync to eliminate SQLite write lock contention
Each WatchSource was calling build_fts_index() every 3 flushes (~30s).
With 70+ active sources, this produced a near-continuous stream of FTS
INSERT operations, each holding the SQLite write lock for several seconds
while scanning the 5.4GB log_entries table. Every other writer (other
watcher flushes, cybersec scorer) timed out with 'database is locked'.

FTS index is now only updated by the glean scheduler (every 900s) and
the manual `build-fts` command — both already call build_fts_index()
through glean_dir(). Real-time freshness of watcher-ingested entries
in FTS was ~30s before; it's now up to ~15min, which is acceptable.

This is the root cause of the persistent 'database is locked' errors
blocking the cybersec scorer (issue #9).

Closes: #9
2026-06-10 12:42:24 -07:00
cffe6bcd31 feat: cybersec zero-shot scoring pipeline (#9)
Second-pass cybersec classifier using DeBERTa-v3-base-mnli (already
cached — no download required). Runs after each anomaly scoring pass on
entries flagged by the anomaly scorer or with pattern matches.

Architecture:
- app/services/cybersec.py: zero-shot-classification pipeline with 5
  cybersec candidate labels (auth failure, privilege escalation, network
  intrusion, malware, data exfiltration). Writes ml_score/ml_label/
  ml_scored_at to log_entries; inserts high-confidence hits into
  detections with scorer='cybersec'.
- app/tasks/cybersec_scorer.py: async background task (same shape as
  anomaly_scorer.py).
- REST: GET/POST /turnstone/api/cybersec/status|run|detections.
  GET /turnstone/api/anomaly/detections now accepts scorer= filter.

Schema: ml_score, ml_label, ml_scored_at added to log_entries; scorer
column added to detections (idempotent migrations + DDL for both SQLite
and Postgres).

UI: Security Alerts view gains Source dropdown (All / Anomaly / Cybersec)
and cybersec scorer status badge. Label dropdown split into optgroups.

Deployment: TURNSTONE_CYBERSEC_MODEL/DEVICE/THRESHOLD vars added to
.env.example, docker-compose.yml, docker-standalone.sh.

Tests: 10 new tests — no model, no eligible entries, scoring, detection
creation, normal label suppression, threshold filtering, pattern-tag
filtering, idempotency, list filtering, scorer column filter.
416/416 passing.

Closes: #9
2026-06-10 01:03:25 -07:00
0693e1fd54 feat: anomaly scoring pipeline (#10)
- Add app/services/anomaly.py: batch scorer using HF text-classification
  pipeline; rewrites anomaly_score/anomaly_label/anomaly_scored_at on
  log_entries; inserts high-confidence hits into detections table
- Add app/tasks/anomaly_scorer.py: background task (same shape as
  glean_scheduler); triggered after each glean cycle when
  TURNSTONE_ANOMALY_MODEL is set
- DB schema: add anomaly_score/anomaly_label/anomaly_scored_at columns to
  log_entries (idempotent ALTER TABLE migration); add detections table
- Wire scorer into scheduler_loop and glean_scheduler.run_once; no-op when
  model env var is empty (safe to leave unconfigured)
- REST endpoints: GET/POST /api/anomaly/status, /api/anomaly/run,
  GET /api/anomaly/detections, POST /api/anomaly/detections/{id}/acknowledge
- Reuses Hybrid-BERT label map from diagnose/classifier.py; works with any
  HF text-classification model
- 12 new tests; 406/406 passing

Closes: #10
2026-06-09 11:15:13 -07:00
0311d72e53 feat: dual-backend SQLite/Postgres + multi-tenant source namespacing
- Add app/db/ abstraction layer: Backend enum, DbConn wrapper,
  dialect helper (q() for ? vs %s paramstyle), get_conn(), tenant_id()
- Auto-detect backend from DATABASE_URL; SQLite remains default when
  unset — no config change for local deployments
- Add tenant_id column to all three logical DBs (main, context, incidents);
  idempotent ALTER TABLE migration runs before schema scripts on existing DBs
- All INSERTs inject tenant_id; SELECTs use (tenant_id = ? OR tenant_id = '')
  for backward compat with pre-namespacing rows
- Add docker-compose.yml with named volume turnstone_pgdata (survives rebuilds)
  and optional external Postgres support via DATABASE_URL override
- Add scripts/migrate_sqlite_to_postgres.py — one-shot idempotent migration
  for existing SQLite data; ON CONFLICT DO NOTHING for safe re-runs
- Fix SSH glean path in pipeline.py to use ensure_schema + get_conn
  (was still using raw sqlite3.connect + old _SCHEMA without tenant_id)
- Fix FTS5 JOIN ambiguity: qualify repeat_count as f.repeat_count in search
- Update all tests to use ensure_*_schema fixtures; add row_factory where needed
- 394/394 tests passing

Closes: #42
Closes: #50
2026-06-08 08:37:54 -07:00
1de156ebde fix: reset browser UA button chrome for dark mode
HTML buttons get a ~#efefef background and 2px outset border from the
browser UA stylesheet. In light mode these blend in; in dark mode they
render as stark white boxes. Adding a global button reset in theme.css
clears the UA defaults — explicit bg-* utility classes still win.

Affects: theme toggle, hamburger nav button, dashboard diagnose buttons,
and all other icon/text buttons that had no explicit bg class.

Bumps version to 0.6.2.
2026-06-05 09:55:08 -07:00
876cfb9a63 fix: group journal sources by prefix:host stem in source health
source_ids with 3+ colon segments (e.g. muninn-journal:Muninn:ssh.service)
are now aggregated by their prefix:host key at the SQL level in both
list_sources() and stats_summary(). This collapses ~19K transient systemd
unit rows (crash-loop scope entries from Muninn) into ~24 grouped rows.

- list_sources: SQL CASE/INSTR group-by stem + unit_count field
- stats_summary: same stem grouping for dashboard source health table
- delete endpoint: LIKE-based cascade delete covers grouped stems
- SourcesView: unit_count badge (e.g. "2686 units") on grouped rows;
  delete confirmation names the unit count when deleting a group
- Bump version to v0.6.1
2026-06-02 04:35:26 -07:00
9cd7450591 chore: bump version to 0.6.0
Release summary:
- #60 split incidents tables to turnstone-incidents.db (eliminates FTS5 write lock starvation)
- #41 Hybrid-BERT label mapping shim (7-class vocabulary support in classifier)
- #15 hybrid BM25 + vector re-ranking for diagnose search (semantic=True, alpha=0.6/beta=0.4)
- #32 domain-view mapping: 42 patterns annotated across 10 domains, by_domain in diagnose summary
2026-06-01 20:52:35 -07:00
ce2a2b55a6 Merge feat/32-domain-view: domain-view mapping for patterns and diagnose output (#32) 2026-06-01 20:01:19 -07:00
eac9a4ba28 Merge feat/15-hybrid-rag: hybrid BM25 + vector re-ranking for diagnose search (#15) 2026-06-01 20:00:02 -07:00
cfddff6a2a Merge feat/41-hybrid-bert-shim: Hybrid-BERT label mapping shim (#41) 2026-06-01 19:59:34 -07:00
b1f3d68724 feat: domain-view mapping for patterns and diagnose output (#32)
Adds a domain: field to the pattern taxonomy and surfaces per-domain
hit counts in diagnose summaries for faster triage.

Changes:
- LogPattern gains domain: str = "" (backward-compatible default)
- load_patterns() reads domain from YAML via p.get("domain", "")
- All 42 patterns in default.yaml annotated across 10 domains:
    service_health | networking | auth | storage | memory |
    kernel | power | web_proxy | media | gpu
- _pattern_domain dict built at startup from compiled patterns
- _domain_counts() helper: maps matched_patterns tags to domains,
  counts hits per domain across a result set
- diagnose POST: summary includes by_domain: {domain: count}
- diagnose stream: summary SSE event includes by_domain when
  pattern_domain is provided (passed from rest.py at startup)
- /api/search gains ?domain= filter: post-filters results to entries
  whose matched_patterns include at least one tag in the given domain

Test fixtures: patch _pattern_domain={} and CONTEXT_DB_PATH in
test_blocklist_endpoints.py and test_glean_tautulli.py (worktree
has no data/ dir; same fix as feat/60-incidents-db).

372 tests passing.

Closes: #32
2026-06-01 19:57:16 -07:00
1abdcfb1f3 feat: hybrid BM25 + vector re-ranking for diagnose search (#15)
Adds late-fusion hybrid search to Turnstone's log retrieval layer:

  hybrid_score = 0.6 * bm25_normalized + 0.4 * cosine_similarity

Implementation:
- _bm25_search() extracts the existing FTS5 BM25 path as a named helper
- _hybrid_search() fetches an oversized BM25 candidate pool (5x limit,
  min 100), embeds the query and each candidate text in-process via the
  existing embeddings service, normalizes BM25 rank to [0,1], combines
  with cosine similarity, and re-ranks
- search() gets semantic=False param that dispatches to _hybrid_search()
  when True; pure BM25 remains the default for all existing call sites
- diagnose_stream() enables semantic=True so symptom-based queries
  ("database connection failed") surface semantically equivalent entries
  ("ECONNREFUSED", "backend gone away", "max retries exceeded")
- /api/search REST endpoint exposes ?semantic=true query param

Graceful degradation: falls back silently to pure BM25 when the embedding
backend is unavailable (EMBEDDING_AVAILABLE=False) or when embed_batch
raises an exception. No new infra — in-process numpy cosine, no vector DB.

11 new tests: BM25 helper, hybrid re-ranking, fallback paths, dispatcher.
372 + 11 = 383 tests passing.

Closes: #15
2026-06-01 18:13:09 -07:00
503a36d76c feat(classifier): add Hybrid-BERT label mapping shim (#41)
Adds _HYBRID_BERT_LABEL_MAP to translate the 7-class output vocabulary of
krishnas4415/log-anomaly-detection-models (Hybrid-BERT, MIT) to Turnstone
SeverityLabel. _map_label now checks the Hybrid-BERT map before the standard
map so either model family works via TURNSTONE_CLASSIFIER_MODEL without any
additional code path.

Mapping (confirmed from model config.json):
  normal            → INFO
  security_anomaly  → ERROR
  system_failure    → CRITICAL
  performance_issue → WARN
  network_anomaly   → WARN
  config_error      → ERROR
  hardware_issue    → CRITICAL

Keyword-based CRITICAL promotion and low-confidence DEBUG demotion apply on
top of the base mapping (same rules as the standard vocabulary).

11 new tests covering all 7 Hybrid-BERT labels, case-insensitivity, and
regression on standard-vocabulary labels. 372 tests passing total.

Note: custom loading code for the non-standard .pt checkpoint format is
explicitly out of scope — evaluate better-packaged HF alternatives first
(see #41 for candidate list).

Closes: #41
2026-06-01 16:20:31 -07:00
bd3923e163 fix: split incidents tables to dedicated turnstone-incidents.db (#60)
FTS5 bulk-insert write locks starved the incident API and bundle endpoints
during log bursts (sonarr/radarr, high-volume docker sources). Fix mirrors
the context_facts split (context -> turnstone-context.db):

- Add INCIDENTS_DB_PATH / TURNSTONE_INCIDENTS_DB env var in rest.py
- Add _INCIDENTS_SCHEMA, ensure_incidents_schema(), and
  migrate_incidents_to_dedicated_db() in glean/pipeline.py
- Stub out incidents/received_bundles/sent_bundles in _SCHEMA (no-op
  CREATE IF NOT EXISTS) so legacy single-file deployments still open
- Thread incidents_db_path through diagnose_stream -> run_pipeline ->
  FalsePositiveSuppressor.suppress -> _fetch_resolved_incidents
- One-shot migration on startup: copy existing rows from main DB to
  incidents DB via INSERT OR IGNORE (idempotent, safe to re-run)
- Fix test_blocklist_endpoints fixtures to patch CONTEXT_DB_PATH and
  INCIDENTS_DB_PATH alongside DB_PATH (worktree has no data/ dir)

372 tests passing.

Closes: #60
2026-06-01 15:54:23 -07:00
1131816666 feat: bundle PII sanitization, onboarding wizard, NL source addition (#51, #52, #53)
Bundle export (#51):
- _redact_text() with 5 compiled regex patterns (IPv4, email, user=, host=, password=)
- build_bundle(sanitize=False) — per-entry redaction at export time
- sent_bundles table tracks every outgoing export (GET and POST /send)
- GET /api/sent-bundles exposes history; SentBundle model added
- BundlesView: Received/Sent tabs, sanitized badge, 5-entry preview, re-download
- IncidentsView: Sanitize PII checkbox next to Send Bundle

Onboarding wizard (#52):
- app/services/discover.py: journald/Docker/file detection (best-effort, safe in containers)
- GET /api/setup/status, /discover, POST /api/setup/write (additive, appends to existing)
- SetupWizard.vue: 3-step Detect → Select → Confirm
  - Step 1 shows grouped summary (journald/file/docker counts)
  - Step 2: collapsible groups with All/None section toggles
    - journald + file: pre-selected; docker: collapsed, none pre-selected
  - Step 3: YAML preview before write
- SourcesView: shows wizard on first run; Add Source button reuses it

NL source addition (#53):
- app/services/nl_source.py: keyword shortcut (13 well-known apps) + LLM fallback
- POST /api/setup/interpret: keyword → LLM → null (graceful fallback)
- NL field in wizard step 2; manual form shown when interpretation fails
- Added sources appear in grouped list immediately
2026-05-29 14:14:28 -07:00
054ebfa0e3 feat(diagnose): tech-level post-processor, offline mode, API auth, context harvest
- synthesizer: 3 system prompts (sysadmin/homelab/executive) selected by tech_level pref
- settings: tech_level selector (UI + backend) persisted in preferences.json
- QuickCapture: shows active level label in diagnosis card header
- TURNSTONE_OFFLINE_MODE=1: sets HF_HUB_OFFLINE + TRANSFORMERS_OFFLINE before lib load
- TURNSTONE_API_KEY: bearer token auth on all /api/ routes (hmac.compare_digest)
- /health always open; unset key = no auth (backward compatible)
- docs/air-gapped-deployment.md: full offline deployment guide
- scripts/harvest_docs.py: generalized context doc bulk-uploader with manifest support
- scripts/manifests/: heimdall-devops.yaml (10 docs ingested) + example.yaml template
- fix: _ingest_upload -> _glean_upload in context doc upload endpoint (was 500)

Closes: #56
Closes: #45
Closes: #47
Closes: #49
Closes: #21
2026-05-28 08:51:05 -07:00
73a14bd782 fix(diagnose): add max_tokens to all LLM calls; fix reasoning card contrast
Truncation fix: call_llm() in _llm_client.py now accepts max_tokens (default
2048) and passes it in both the cf-orch task payload and the OpenAI-compat
fallback body. Hypothesizer uses max_tokens=1024 (JSON array output);
synthesizer and legacy summarize use 2048 (structured 5-section narrative).
Without this, backends use their own default (often 512 tokens), causing
mid-sentence truncation of the diagnosis output.

UI fix: reasoning card changed from bg-accent/5 border-accent/30 (opacity
modifiers on CSS variables don't compose reliably across themes) to the
callout pattern: bg-surface-raised with a solid border-l-4 border-accent.
Header label changed from text-text-dim to text-accent for visual anchoring.
Text remains text-text-primary for guaranteed contrast on both light and dark
themes.

Tracks: #56 (technical-level post-processor, filed as follow-on feature)
2026-05-27 22:23:36 -07:00
7f49961ec4 fix(db): add timeout=30s to all sqlite3.connect() calls across app
Watcher, REST endpoints, services (search, incidents, blocklist),
MCP server, context retriever, embedder, glean_scheduler, and
doc_upload all used the default 5-second SQLite busy timeout.
During collect glean write phases, watcher flush threads were hitting
'database is locked' errors when the glean held the write lock longer
than 5 seconds.

All connections now use timeout=30.0, matching the pipeline fix
from commit 5a9281a. No logic changes.
2026-05-26 23:12:48 -07:00
5a9281a686 fix(glean): add timeout=30s to all pipeline DB connections; add --force flag; new patterns
pipeline.py:
- Add timeout=30.0 to all sqlite3.connect() calls (5 total).
  Previously only ensure_context_schema() had it. The main glean
  writers would fail immediately under lock contention from the live
  watcher or concurrent manual glean runs.

glean_corpus.py:
- Add --force flag (passed through to glean_sources/glean_file/glean_dir).
  Without it, unchanged-fingerprint files were silently skipped even
  after pattern updates. Use after editing patterns/default.yaml.

patterns/default.yaml:
- Add 9 new patterns for Muninn / cluster-wide coverage:
    vpn_tunnel_fail     WireGuard/tunnel service failures
    vpn_handshake       WireGuard peer handshake events
    dns_degraded        systemd-resolved DNS fallback/degradation
    nvidia_api_mismatch NVIDIA kernel module vs userspace mismatch
    nvidia_xid          NVIDIA Xid GPU hardware faults
    nvidia_gpu_reset    NVIDIA GPU reset / NVLink faults
    acpi_error          ACPI firmware _DSM evaluation failures
    thermal_throttle    CPU/GPU thermal throttling / RAPL unavailable
    undervoltage        PSU undervoltage / brownout events
- Sync from /devl/turnstone-cluster/patterns/default.yaml (authoritative
  live copy updated first; repo copy was stale)
2026-05-26 22:36:45 -07:00
3cfd587d16 fix: separate context KB into own SQLite file to eliminate write-lock contention
context_facts, context_documents, and context_chunks now live in
turnstone-context.db (sibling of turnstone.db).  The glean scheduler
held write locks on the main DB long enough to cause 5-second timeout
failures on context fact inserts; separate files have independent WAL
write locks so they never contend.

Changes:
- pipeline.py: extract _CONTEXT_SCHEMA + ensure_context_schema()
- rest.py: CONTEXT_DB_PATH (TURNSTONE_CONTEXT_DB env var, defaults to
  sibling file); init via ensure_context_schema(); all context routes
  pass CONTEXT_DB_PATH; diagnose_stream receives context_db_path kwarg
- diagnose/__init__.py: diagnose_stream() accepts context_db_path
  (falls back to db_path for backward compat); retrieve_context uses it
- store.py: sqlite3.connect() timeout=30.0 — Python driver retry loop
  is independent of PRAGMA busy_timeout; needed for any remaining
  contention during test or single-file deployments

Closes: #42
2026-05-25 21:19:32 -07:00
e851099e5c fix(hypothesizer): extract first JSON array to handle reasoning model double-output
Reasoning models (e.g. foundation-sec-8b) emit valid JSON then repeat it
inside a markdown fence block. json.loads() fails on the combined text.

extract_first_json_array() scans for the first '[' and walks to its
matching ']' with proper string/escape/nesting handling, then returns
just that slice. Combined with strip_json_fences(), this handles all
observed output patterns:
  - bare JSON array (standard models)
  - fenced JSON array (fence-wrapping models)
  - bare array followed by fenced repeat (reasoning models)
2026-05-25 21:01:14 -07:00
2375e073ba feat(pipeline): add TURNSTONE_CLASSIFIER_MODEL env var for Stage 2 ML config
Makes the HuggingFace classifier model for Stage 2 configurable via
TURNSTONE_CLASSIFIER_MODEL. When unset (default), Stage 2 falls back
to pattern_tags then regex — no download required on first run.

Also documents TURNSTONE_MULTI_AGENT_DIAGNOSE, TURNSTONE_CLASSIFIER_MODEL,
TURNSTONE_EMBED_BACKEND/MODEL/DEVICE in .env.example.
2026-05-25 19:11:32 -07:00
85e7a70536 refactor: pipeline cleanup — 6 follow-up fixes (#33-#38)
- #33: Wrap ClassifiedTimeline.cluster_severities in MappingProxyType for
  true immutability (frozen=True only blocks field reassignment, not dict
  mutation).

- #34: Remove dead suppression branch in synthesizer._build_hypothesis_block.
  active[] is already filtered to not rh.suppress, so the 'Yes — suppressed'
  branch was unreachable. Now shows novelty score only.

- #35: Extract shared _llm_client.py with call_llm() + extract_content() +
  strip_json_fences(). Both RootCauseHypothesizer and SummarySynthesizer
  now import from one source. Also strips JSON fences from LLM output before
  parsing in hypothesizer._parse_response.

- #36: Add per-stage try/except in pipeline.run_pipeline(). Unhandled
  stage exceptions now emit {type: 'error'} + {type: 'done'} SSE events
  instead of silently closing the stream.

- #37: Move format_context_block() call inside the legacy LLM branch in
  diagnose/__init__.py — it was being computed unconditionally but only
  used in the non-pipeline path.

- #38: Coerce supporting_cluster_ids items to str() in hypothesizer
  _parse_response to guard against LLMs returning integers instead of
  string cluster IDs.
2026-05-25 19:05:56 -07:00
25b7ae340b fix: invert suppress_threshold semantics to similarity_threshold in FalsePositiveSuppressor
Was suppressing when novelty_score < 0.85 (i.e. similarity > 0.15), which
would suppress nearly every hypothesis once embeddings are active.

Now suppresses when max_sim >= similarity_threshold (0.85), meaning only
hypotheses that are 85%+ similar to a resolved incident are suppressed.

Also renames suppress_threshold → similarity_threshold for clarity and
adds a borderline boundary test (0.85 suppressed, 0.84 not suppressed).

Closes: #29
2026-05-25 18:58:52 -07:00
1b949337da fix: tighten suppression_reason display guard, document unused since/until params 2026-05-25 15:02:48 -07:00
1865ba1f02 feat: Stage 5 synthesizer + pipeline orchestrator + feature flag wiring (issue #29)
- Add app/services/diagnose/synthesizer.py: SummarySynthesizer (Stage 5)
  - Builds structured LLM prompt from ranked hypotheses, timeline, RAG context
  - Excludes suppressed hypotheses from the narrative prompt
  - Deterministic fallback when no LLM configured or LLM call fails
  - Same cf-orch task endpoint + direct OpenAI-compat fallback pattern as other stages

- Replace pipeline.py stub with full run_pipeline() async generator
  - Orchestrates all 5 stages via asyncio.to_thread for each synchronous stage
  - Yields typed SSE event dicts: status, pipeline_stage (1-4), hypotheses, reasoning, done
  - Suppressor counts (active vs suppressed) reported in stage 4 event message

- Wire MULTI_AGENT_ENABLED feature flag into diagnose_stream()
  - TURNSTONE_MULTI_AGENT_DIAGNOSE=true routes through run_pipeline()
  - pipeline emits its own done event; legacy path unchanged when flag is false
  - Import of run_pipeline added to __init__.py

- Add 21 new tests (350 -> 371 passing):
  - tests/test_diagnose_synthesizer.py: 8 tests (with/without LLM, suppressed,
    empty ranked, LLM failure fallback)
  - tests/test_diagnose_pipeline.py: 13 tests (flag off, flag on event sequence,
    empty entries, no LLM, stage 1 cluster count message)

Closes: #29
2026-05-25 14:56:25 -07:00
54d4ec5325 refactor: extract _score_hypothesis helper, fix exception types, pass device in suppressor 2026-05-25 14:41:33 -07:00
84e0cf5245 feat: Stage 4 — FalsePositiveSuppressor for multi-agent diagnose pipeline (issue #29)
- Implements FalsePositiveSuppressor using embedding cosine similarity
- Lazy corpus embedding via get_embedder() with module-level cache keyed by db_path
- Cache invalidated automatically when the resolved incident corpus changes
- Suppresses hypotheses with novelty_score below configurable threshold (default 0.85)
- Full fallback path (novelty=1.0, no suppression) when model_id empty, embedding
  service unavailable, or no resolved incidents found in DB
- Graceful handling of missing incidents table and DB query failures
- Numpy bool_ leakage prevented by explicit float()/bool() coercion at assignment
- Pure-Python cosine fallback for environments without numpy
- 9 new tests (all mocked, no real model downloads): passthrough, suppress, no-suppress,
  empty list, ranking, empty corpus, DB failure, service unavailable, cache invalidation
- 350 total tests passing (341 pre-existing + 9 new)

Closes: #29
2026-05-25 14:28:31 -07:00
a2916f958a fix: defensive coercion for LLM confidence and cluster fields in hypothesizer
- Add _coerce_float() module-level helper: catches TypeError/ValueError from
  non-numeric LLM output (e.g. 'high', 'N/A') and returns a caller-supplied
  default instead of raising.
- Replace float(item.get('confidence', 0.5)) with
  _coerce_float(item.get('confidence'), 0.5) in _parse_response.
- Guard supporting_cluster_ids: tuple(item.get(...) or []) so a JSON null
  from the LLM does not cause TypeError('NoneType is not iterable').
- runbook_refs is hardcoded as () and not sourced from LLM output; no change
  needed there.
- Add test_non_numeric_confidence_uses_default (Test 10) to cover the 'high'
  string case: asserts no exception and confidence == 0.5.
- 341 tests passing (+1).

Closes: #29
2026-05-25 14:00:30 -07:00
34fb8f501d feat: Stage 3 — RootCauseHypothesizer for multi-agent diagnose pipeline (issue #29)
- Add app/services/diagnose/hypothesizer.py with RootCauseHypothesizer class
- Stage 3 of the multi-agent diagnose pipeline: accepts ClassifiedTimeline +
  RetrievedContext, builds a structured JSON prompt, calls the LLM via the
  same cf-orch task → OpenAI-compat fallback pattern used by llm.py
- Parses JSON array response into list[Hypothesis] dataclasses with UUID ids,
  severity validation (WARNING→WARN, unknown→ERROR), confidence coercion
- Gracefully returns [] when llm_url/llm_model absent or clusters empty
- Add tests/test_diagnose_hypothesizer.py: 12 tests, all mocked, no LLM I/O
  covering: valid response, UUID generation, malformed JSON, non-list JSON,
  empty clusters, missing URL/model, max_hypotheses cap, severity mapping,
  confidence string coercion
- 340 tests passing (328 prior + 12 new)

Closes: #29
2026-05-25 13:49:18 -07:00
6ea8fbfec1 feat: Stage 2 — SeverityClassifier for multi-agent diagnose pipeline (issue #29)
Three-path classification: ML (transformers pipeline, lazy singleton) →
pattern_tags (YAML pattern severity dict) → regex (detect_severity).

- Path A: HF text-classification pipeline loaded lazily on first classify()
  call via module-level singleton; shim promotes ERROR+keyword hits to CRITICAL
  and demotes low-confidence INFO to DEBUG.
- Path B: maps cluster.pattern_tags through the loaded pattern severity dict;
  picks the highest severity across matching tags.
- Path C: falls back to detect_severity() regex scan on representative_text;
  defaults to INFO when no keyword matches.
- Pattern file resolved from constructor arg or TURNSTONE_PATTERNS env var
  (mirrors app/rest.py convention).
- No crash when transformers is not installed; ImportError on per-cluster ML
  inference triggers clean per-cluster fallback to pattern_tags/regex.
- ClassifiedTimeline.classifier_used reflects the primary session path.

Tests (10 new, 328 total, all passing):
- ML ERROR, CRITICAL promotion, DEBUG demotion, WARNING→WARN
- pattern_tags resolution from YAML fixture
- regex ERROR detection and INFO default
- ImportError clean fallback
- empty timeline no-crash
- ClassifiedTimeline FrozenInstanceError on mutation

Closes: #29
2026-05-25 13:27:17 -07:00
7abb76e628 refactor: split TimelineReconstructor.reconstruct into helpers, fix magic number + error handling
- Add gap_significance_seconds constructor param (default 30) to replace hardcoded magic number in gap_count computation
- _parse_iso now returns datetime | None with try/except on ValueError; all callers handle None return by treating malformed timestamps as absent
- Extract reconstruct into four private helpers: _sort_entries, _group_into_raw_clusters, _build_cluster, _dominant_sources_tuple
- Promote _sort_key to module-level function (was nested inside reconstruct)
- Rename old module-level _build_cluster to _make_event_cluster to avoid name collision with new instance method
- Add explanatory comment to type: ignore[arg-type] at _highest_severity call site
- Black-formatted
2026-05-25 13:22:18 -07:00
f7429ee963 feat: Stage 1 — TimelineReconstructor for multi-agent diagnose pipeline (issue #29)
- Add app/services/diagnose/timeline.py: pure-Python TimelineReconstructor
  - Sorts entries by timestamp_iso (None entries appended at end)
  - Sliding-window clustering anchored to first entry in each cluster
  - Computes cluster_id (sha1[:12]), severity (highest wins), burst flag,
    gap_before_seconds, representative_text (highest rank, longest text tiebreak)
  - Builds TimelineResult with dominant_sources sorted by entry count descending
- Update pipeline.py stub to import TimelineReconstructor (Task 6 wiring prep)
- Add tests/test_diagnose_timeline.py: 15 tests covering all 13 required cases
  plus null-timestamp edge case variant; all 318 tests passing

Closes: #29
2026-05-25 12:54:15 -07:00