turnstone

Author	SHA1	Message	Date
pyr0ball	5da8db2bcd	fix(diagnose): pass full timeline clusters and hypothesis descriptions to synthesizer LLM Stage 5 (SummarySynthesizer) was only sending aggregate timeline stats to the LLM (cluster count, burst count, gap count) — the actual sequenced cluster data that Stage 1 reconstructed was never included. The LLM had no per-cluster timestamps, severity, burst flags, silence gaps, or representative text to write the TIMELINE section from. Added _build_timeline_block() to emit a numbered per-cluster summary matching the format Stage 3 uses for the hypothesizer, and included it in the user message alongside the hypothesis block. Also fixed _build_hypothesis_block() to include the 2-4 sentence description each hypothesis carries — previously only the title and novelty score reached the LLM. 11 new tests cover _build_timeline_block() directly (burst label, gap threshold, pattern tags, text truncation at 200 chars, null start_iso, multi-cluster numbering). 529 tests passing.	2026-06-16 21:46:01 -07:00
pyr0ball	4c1940d12e	fix: strip reasoning-model thinking tags; surface untracked node names - app/services/diagnose/_llm_client.py: strip <think>…</think> blocks (case-insensitive, multiline) from LLM response content before it reaches the UI or any JSON parser — affects DeepSeek-R1, Qwen QwQ, and any other model that emits chain-of-thought in content - app/rest.py: suggest_sources now also returns untracked_names — query tokens that look like hostnames/service names but don't appear in any monitored source, so the UI can prompt the user to add them - web/src/components/ChatDiagnose.vue: show amber "Not monitoring: X" banner with "Add as a log source →" link when untracked_names present - tests/test_llm_client.py: 13 tests covering think-strip edge cases (single/multi-line, multiple blocks, case-insensitive, only-thinking) plus existing extract_content and JSON-fence helpers	2026-06-16 09:42:44 -07:00
pyr0ball	6039ab2464	feat: incident ticket export — Notion and Jira integration (#12 ) - app/services/ticket_export.py: plugin-dispatch architecture; Notion exporter (Notion API v1, blocks-based, 50 entry cap, 2000-char truncation per block); Jira exporter (REST API v3, Basic Auth, ADF description, configurable issue type defaulting to Bug) - app/rest.py: POST /api/incidents/{id}/export endpoint; Notion/Jira credential fields added to SettingsBody and PATCH /api/settings handler - web/src/views/IncidentsView.vue: "Export ticket ▾" dropdown in incident detail drawer — click-outside close, inline URL link on success - web/src/views/SettingsView.vue: Ticket Trackers section with Notion token + database ID, Jira URL/email/token/project/issue-type; show/hide for secret fields - tests/test_ticket_export.py: 17 tests covering dispatch, Notion success/error/config/payload/truncation paths, Jira success/error/ auth/project/summary/default-issue-type	2026-06-14 15:46:11 -07:00
pyr0ball	b8f766fb74	feat: SSH target manager — GUI editor for remote host configuration (#24 ) - app/services/ssh_targets.py: full CRUD service with lazy paramiko import, key-path validation, permission warning, and test_connection - app/db/schema.py: ssh_targets table (id, label, host, port, user, key_path, last_tested, last_ok, last_error, timestamps) - app/rest.py: GET/POST /api/ssh-targets, PATCH/DELETE /{id}, POST /{id}/test — key contents never returned in any response - web/src/views/SettingsView.vue: Remote Hosts section with add/edit form, inline connection status badges, test-connection flow, delete with confirmation; new Set() pattern for reactive sshTesting state - tests/test_ssh_targets.py: 22 tests — schema, CRUD, validation, key-warning, serialization, paramiko-absent path	2026-06-14 15:27:12 -07:00
pyr0ball	7a2ab0bb46	feat(orchard): auto-enrollment API for branch node provisioning (#27 ) Implements the Orchard branch grafting system for harvest.circuitforge.tech: - POST /api/orchard/graft: provisions data dir, starts a new turnstone-submissions-<slug> Docker container on the next free port (ORCHARD_PORT_BASE=8538+), injects a handle_path block into the Caddyfile dynamic-branches marker section, restarts caddy-proxy, returns {submit_endpoint, api_key} - GET /api/orchard/branches: list active/inactive branches (admin-only) - DELETE /api/orchard/branches/<slug>: deactivate branch + stop container - POST /api/orchard/branches/<slug>/anonymize: HMAC-based IP/username pseudonymization worker over a branch DB - POST /api/glean/batch: optional TURNSTONE_BRANCH_KEY auth guard - anonymized column added to log_entries schema (migration-safe) - Updated Caddyfile with /huginn/* route (port 8536), /node2/* (8537), and dynamic-branch marker section - All endpoints admin-gated via TURNSTONE_ORCHARD_ADMIN_KEY Closes: #27	2026-06-14 14:30:18 -07:00
pyr0ball	600e5a9eac	feat(sources): context-aware filesystem log scanner (#23 ) Add scan_log_directories() to discover.py that recursively walks /var/log and /opt, filters to readable log files, and scores each candidate by recency (mtime, 0.7 weight), file size (0.3), and keyword match against an optional problem-context query (shifts weights to 0.4/0.2/0.4 when a query is provided). - GET /api/setup/scan?query=...&max_results=N — new API endpoint - SourcesView: "Scan" button opens a panel with ranked candidates, checkboxes, and "Add selected" to write to sources.yaml - 13 new unit tests, 466 passing total Closes: #23	2026-06-14 14:01:45 -07:00
pyr0ball	7ed01fbd48	chore: sanitize contributor names and personal node IDs - docker-compose.submissions.yml: rename submissions-contrib1/contrib2 to submissions-contrib1/contrib2; update paths and host env vars - podman-standalone.sh: replace 'Contributor's instance' with generic 'WireGuard-connected Docker hosts' - docker-standalone.sh: replace personal node-id in harvest endpoint	2026-06-13 22:17:38 -07:00
pyr0ball	58680b3b27	chore: replace vendor product name with generic ext_device throughout - Rename _EXT_DEVICE_CODES → _EXT_DEVICE_CODES, gen_ext_device → gen_ext_device - Rename corpus output directory ext_device/ → ext_device/ - Update default.yaml placeholder pattern name and description - Update tests to match new directory and class names - Corresponding Forgejo issue titles updated (#43, #44, #54)	2026-06-13 22:03:26 -07:00
pyr0ball	be134a4465	chore: replace personal node-id in harvest endpoint example	2026-06-13 21:58:22 -07:00
pyr0ball	8006d79a11	Merge feat/42-50-postgres-multitenant: dual-backend + full feature set Brings in 18 commits since v0.6.2: - Dual-backend SQLite/Postgres + multi-tenant source namespacing - Anomaly scoring pipeline + cybersec zero-shot scoring - Security alerts tab — full scorer integration - Audio domain patterns (PipeWire/ALSA xrun, quantum) - Incidents: auto-incident detection, timeline visualizer - Diagnose: conversational chat mode, NL source discovery - Corpus: synthetic log generator, watermark-preserving updates - UI: security alert dedup/collapse, clickable criticals with inline LLM explanation, loading shimmer animations, default diagnose prompt - Backend: DB-lock retry in anomaly scorer, FTS build via get_conn(), timeline_events in stats_summary - Sanitize: internal hostnames and IPs replaced with generic placeholders	2026-06-13 10:02:59 -07:00
pyr0ball	7c76217149	chore: sanitize internal hostnames and IP references - Rename patterns/sources-example-node.yaml → patterns/sources-example.yaml and update header/comments to be host-agnostic - Replace internal node names in gen_corpus.py _HOSTS with generic names - Replace example-node hostname in syslog test fixtures with testhost - Replace example-node example in mcp_server.py doc with myserver - Replace private LAN IP (<YOUR_HOST_IP>) in docker-standalone.sh with <HEIMDALL_LAN_IP> placeholder - Replace private IPs in sources-cluster.yaml comments with <YOUR_HOST_IP> - Remove instance-specific hostname from llm.py fallback comment - Replace Caddy example domain in podman-standalone.sh with placeholder	2026-06-13 10:02:46 -07:00
pyr0ball	502ff54fd0	feat(ui): security alert dedup, clickable criticals, loading shimmer Security Alerts: - Client-side duplicate collapsing via anomaly_label + text fingerprint - ×N count badge chip on collapsed rows; toggle to expand - Skeleton shimmer rows replace "Loading..." text Dashboard: - Clickable Recent Criticals — inline LLM explanation via SSE stream - ±5 min time window scoped to source_id for useful context - Explanation cache keyed by entry_id (no re-fetch on re-expand) - Default diagnose query injected on Diagnose button navigation to prevent local models hallucinating from bare log data - Stat card and source-health skeleton shimmer loading states Backend: - anomaly.py: 4-attempt retry on "database is locked" with 10s backoff - search.py: migrate build_fts_index to get_conn() (WAL race fix); add timeline_events to stats_summary for clickable criticals feature - theme.css: @keyframes shimmer + .loading-shimmer utility; prefers-reduced-motion degrades gracefully to static muted block	2026-06-13 09:32:26 -07:00
pyr0ball	f3d807d991	feat(diagnose): conversational chat mode + NL source discovery - New ChatDiagnose.vue: multi-turn chat UI in the Diagnose tab - Textarea input (auto-grows) for long free-form problem descriptions - Source suggestion pre-flight: debounced POST /api/sources/suggest identifies relevant log sources from the query text and shows them as interactive chips (deselect to exclude before searching) - Conversation history preserved across turns with LLM reasoning, collapsible log entries, and "Save as incident" per turn - Reuses existing /api/diagnose/stream — no new pipeline - DiagnoseView.vue: Chat is now default tab; viewport-height layout - POST /api/sources/suggest: token-overlap source ranking, no LLM - Fix: add missing 'import re' causing 500 on suggest route	2026-06-11 22:04:53 -07:00
pyr0ball	b6b69e2150	feat(incidents): auto-incident detection + example-node Podman setup Auto-incident detector: - New app/tasks/incident_detector.py: post-glean error cluster detector - Sliding window algorithm: source + N errors within window_s seconds - Deduplication via issue_type='auto:{source_id}' + interval overlap check - Respects TURNSTONE_AUTO_INCIDENT_THRESHOLD (default 5) and TURNSTONE_AUTO_INCIDENT_WINDOW (default 600s) env vars - 20 tests all passing - Wired into glean_scheduler.run_once() and scheduler_loop() - TURNSTONE_AUTO_INCIDENT env var to disable (default enabled) Podman standalone improvements: - REPO_DIR auto-detected from script location (no longer hardcoded to /opt/turnstone) - DATA_DIR/PATTERNS_DIR/HF_CACHE_DIR configurable via env vars - Bootstrap step copies host-specific sources-<hostname>.yaml on first run - Auto-incident env vars passed through example-node sources: - patterns/sources-example-node.yaml: Sonarr, Radarr, Bazarr, Prowlarr, Tautulli, autoscan, organizr, nextcloud, journal export	2026-06-11 18:37:53 -07:00
pyr0ball	74c9de9ccf	fix(corpus): glean_dir now recurses subdirs; fix docker SOURCE prefix - Changed glob → rglob in glean_dir so corpus directories with format subfolders (journald/, docker/, etc.) are fully ingested - Fixed gen_corpus.py docker SOURCE to emit "docker:<service>" prefix so the pipeline correctly detects format as 'docker' not 'plaintext' - 17/17 gen_corpus tests passing Closes: #46	2026-06-11 16:30:28 -07:00
pyr0ball	5816ed69ae	feat(corpus): synthetic log corpus generator for demos and testing Adds scripts/gen_corpus.py that produces realistic-but-artificial log files across all four supported formats (journald JSON, docker envelope, qBittorrent hotio, EXT_DEVICE plaintext). Output feeds directly into glean_corpus.py for demo environments and parser regression tests with no production data required. - Seed-based RNG with independent per-source sub-streams (same seed = same sequence for each file regardless of source count changes) - Controllable time range, event density, and error injection rate - Severity distribution mirrors real infrastructure (70% INFO, ~6% ERROR, ~2% CRITICAL) with adjustable boost via --error-rate - 17 tests covering output structure, reproducibility, format correctness, parser round-trip, and CLI acceptance criteria Also fixes a latent bug in app/glean/plaintext.py: ISO 8601 timestamps were silently failing to parse because the T separator was normalised to space in the input string but the strptime format string still contained T. Fix: apply the same normalisation to the format before calling strptime. Closes: #46	2026-06-11 10:57:20 -07:00
pyr0ball	4dcc1a441a	feat(incidents): incident timeline visualizer + fix entry lookup using wrong DB path Adds IncidentTimeline.vue — a pure SVG time-axis component rendered inside the incident detail drawer when entries are present: - Horizontal strip scaled to incident window (preserveAspectRatio=none) - Event ticks colored by severity, height proportional to severity level - 50-bin density shading shows burst periods as blue bands - Gap markers (dashed lines) for silence > 10% of window or > 60s - Hover tooltip showing nearest entry's severity, time, and truncated text - Click-to-scroll: clicking a tick highlights and scrolls to its entry in the list below - Legend showing only severity levels present in the incident Also fixes a pre-existing bug: get_incident_endpoint and both build_bundle callers were passing INCIDENTS_DB_PATH to get_incident_entries/build_bundle, causing all incident entry lookups to silently search the empty incidents DB instead of the main log DB. This made all incident detail views show "No log entries found". Closes: #57	2026-06-10 16:02:24 -07:00
pyr0ball	5f7296ad6d	chore(corpus): preserve watermark files across updates; document corpus env vars update.sh now backs up data/corpus_watermark.txt and data/incident_watermark.txt before git pull and restores them after, mirroring the existing watch.yaml pattern. Without this, an update would reset watermarks to zero and re-push all corpus entries from the beginning on the next export run. .env.example adds a corpus export section documenting the three env vars needed to opt a node into the Avocet training pipeline. Closes: #6	2026-06-10 15:01:19 -07:00
pyr0ball	313b25e0d0	feat(alerts): security alerts tab — full scorer integration - Fix loadScorerStatus: was spreading data.state + data.config (both undefined); API returns flat object; now uses data directly - Fix v-for to use filteredDetections (was using raw detections array, breaking the Unacknowledged tab filter) - Fix double-prefix URL bug: BASE already contains /turnstone, so fetches to ${BASE}/turnstone/api/... doubled the prefix → returned SPA HTML → silent JSON parse failure. Fixed all fetch URLs to use ${BASE}/api/... in SecurityAlertsView and DashboardView - Add CybersecStatus interface to replace Record<string, unknown> - Add scorer field to Detection interface; show 'cybersec' badge in label cell when scorer !== 'anomaly' - Add cybersecStatus.running to cybersec badge (pulse animation) - Add ANOMALY / CYBERSEC stats rows side-by-side - Add 'Run cybersec' button with cybersecTriggerLoading state and runCybersec() function posting to /api/cybersec/run - Rename 'Run scorer' → 'Run anomaly' for clarity Closes: #11	2026-06-10 14:32:43 -07:00
pyr0ball	61816c26bd	fix(cybersec): clean up debug traceback logging Replaced manual traceback import with exc_info=True, which is the idiomatic logging pattern and produces the same output.	2026-06-10 13:20:56 -07:00
pyr0ball	971a859c0d	fix(watcher): remove per-flush FTS sync to eliminate SQLite write lock contention Each WatchSource was calling build_fts_index() every 3 flushes (~30s). With 70+ active sources, this produced a near-continuous stream of FTS INSERT operations, each holding the SQLite write lock for several seconds while scanning the 5.4GB log_entries table. Every other writer (other watcher flushes, cybersec scorer) timed out with 'database is locked'. FTS index is now only updated by the glean scheduler (every 900s) and the manual `build-fts` command — both already call build_fts_index() through glean_dir(). Real-time freshness of watcher-ingested entries in FTS was ~30s before; it's now up to ~15min, which is acceptable. This is the root cause of the persistent 'database is locked' errors blocking the cybersec scorer (issue #9). Closes: #9	2026-06-10 12:42:24 -07:00
pyr0ball	c17c6c42ea	feat(patterns): add audio domain — PipeWire/ALSA xrun and quantum patterns Six new patterns covering the PipeWire + ALSA audio failure modes that surface as crackling/stuttering on Linux desktops: - pipewire_overflow: protocol-pulse OVERFLOW channel messages (confirmed present in Muninn journal — dozens per incident) - pipewire_underrun: pw.node/spa.alsa underrun messages - alsa_xrun: ALSA-level xrun from kernel or ALSA lib (snd_pcm) - pipewire_quantum_mismatch: sample-rate/quantum mismatch detection - pipewire_node_error: PipeWire node failures (device unavailable) - pipewire_jackdbus_missing: harmless JACK probe at INFO — suppresses false positives from daily PipeWire restarts Also adds 'audio' as a valid domain value in the header comment. Companion Robin knowledge doc: circuitforge-plans/robin/known-issues/pipewire-alsa-quantum-xrun.md	2026-06-10 11:33:19 -07:00
pyr0ball	cffe6bcd31	feat: cybersec zero-shot scoring pipeline (#9 ) Second-pass cybersec classifier using DeBERTa-v3-base-mnli (already cached — no download required). Runs after each anomaly scoring pass on entries flagged by the anomaly scorer or with pattern matches. Architecture: - app/services/cybersec.py: zero-shot-classification pipeline with 5 cybersec candidate labels (auth failure, privilege escalation, network intrusion, malware, data exfiltration). Writes ml_score/ml_label/ ml_scored_at to log_entries; inserts high-confidence hits into detections with scorer='cybersec'. - app/tasks/cybersec_scorer.py: async background task (same shape as anomaly_scorer.py). - REST: GET/POST /turnstone/api/cybersec/status\|run\|detections. GET /turnstone/api/anomaly/detections now accepts scorer= filter. Schema: ml_score, ml_label, ml_scored_at added to log_entries; scorer column added to detections (idempotent migrations + DDL for both SQLite and Postgres). UI: Security Alerts view gains Source dropdown (All / Anomaly / Cybersec) and cybersec scorer status badge. Label dropdown split into optgroups. Deployment: TURNSTONE_CYBERSEC_MODEL/DEVICE/THRESHOLD vars added to .env.example, docker-compose.yml, docker-standalone.sh. Tests: 10 new tests — no model, no eligible entries, scoring, detection creation, normal label suppression, threshold filtering, pattern-tag filtering, idempotency, list filtering, scorer column filter. 416/416 passing. Closes: #9	2026-06-10 01:03:25 -07:00
pyr0ball	6e228fe0bf	feat: security alerts tab — UI view for anomaly detections (#11 ) New SecurityAlertsView (/alerts route) surfaces the detections table built in #10. Features: - All / Unacknowledged tab filter with live counts - Label dropdown (SECURITY_ANOMALY, SYSTEM_FAILURE, NETWORK_ANOMALY, etc.) - Score confidence bar per detection (colour-coded by threshold) - Acknowledge drawer: full log text, optional notes, in-place row dim on save - Scorer status badge + manual "Run scorer" button - Config warning when TURNSTONE_ANOMALY_MODEL is unset Dashboard: new "Unreviewed Alerts" stat card (red border when > 0) links to /alerts so alerts surface on the landing page without navigating away. Closes: #11	2026-06-10 00:28:15 -07:00
pyr0ball	40694a30e5	chore: wire anomaly scoring pipeline into deployment config Add TURNSTONE_ANOMALY_* env vars to docker-compose.yml, docker-standalone.sh, and .env.example. Mount shared HF model cache (/Library/Assets/LLM on Heimdall) as read-only bind in both compose and standalone — avoids re-downloading models that are already cached by the diagnose pipeline. Heimdall: byviz/bylastic_classification_logs already cached, threshold 0.80, glean-triggered only (TURNSTONE_ANOMALY_INTERVAL=0).	2026-06-09 23:01:48 -07:00
pyr0ball	0693e1fd54	feat: anomaly scoring pipeline (#10 ) - Add app/services/anomaly.py: batch scorer using HF text-classification pipeline; rewrites anomaly_score/anomaly_label/anomaly_scored_at on log_entries; inserts high-confidence hits into detections table - Add app/tasks/anomaly_scorer.py: background task (same shape as glean_scheduler); triggered after each glean cycle when TURNSTONE_ANOMALY_MODEL is set - DB schema: add anomaly_score/anomaly_label/anomaly_scored_at columns to log_entries (idempotent ALTER TABLE migration); add detections table - Wire scorer into scheduler_loop and glean_scheduler.run_once; no-op when model env var is empty (safe to leave unconfigured) - REST endpoints: GET/POST /api/anomaly/status, /api/anomaly/run, GET /api/anomaly/detections, POST /api/anomaly/detections/{id}/acknowledge - Reuses Hybrid-BERT label map from diagnose/classifier.py; works with any HF text-classification model - 12 new tests; 406/406 passing Closes: #10	2026-06-09 11:15:13 -07:00
pyr0ball	0311d72e53	feat: dual-backend SQLite/Postgres + multi-tenant source namespacing - Add app/db/ abstraction layer: Backend enum, DbConn wrapper, dialect helper (q() for ? vs %s paramstyle), get_conn(), tenant_id() - Auto-detect backend from DATABASE_URL; SQLite remains default when unset — no config change for local deployments - Add tenant_id column to all three logical DBs (main, context, incidents); idempotent ALTER TABLE migration runs before schema scripts on existing DBs - All INSERTs inject tenant_id; SELECTs use (tenant_id = ? OR tenant_id = '') for backward compat with pre-namespacing rows - Add docker-compose.yml with named volume turnstone_pgdata (survives rebuilds) and optional external Postgres support via DATABASE_URL override - Add scripts/migrate_sqlite_to_postgres.py — one-shot idempotent migration for existing SQLite data; ON CONFLICT DO NOTHING for safe re-runs - Fix SSH glean path in pipeline.py to use ensure_schema + get_conn (was still using raw sqlite3.connect + old _SCHEMA without tenant_id) - Fix FTS5 JOIN ambiguity: qualify repeat_count as f.repeat_count in search - Update all tests to use ensure_*_schema fixtures; add row_factory where needed - 394/394 tests passing Closes: #42 Closes: #50	2026-06-08 08:37:54 -07:00
pyr0ball	1de156ebde	fix: reset browser UA button chrome for dark mode HTML buttons get a ~#efefef background and 2px outset border from the browser UA stylesheet. In light mode these blend in; in dark mode they render as stark white boxes. Adding a global button reset in theme.css clears the UA defaults — explicit bg-* utility classes still win. Affects: theme toggle, hamburger nav button, dashboard diagnose buttons, and all other icon/text buttons that had no explicit bg class. Bumps version to 0.6.2.	2026-06-05 09:55:08 -07:00
pyr0ball	93975dcc0c	fix: settings page CSS — selected card bg and toggle switch thumb - Replace bg-accent/10 with bg-accent-muted on selected radio cards (opacity modifiers on CSS variable colors are silently dropped by UnoCSS, causing full-opacity solid blue backgrounds) - Add explicit left-0.5 to toggle switch thumb and set off-state to translate-x-0 — without an explicit left the browser auto-placed the thumb 18px inside the track, causing 14px overflow when translated on	2026-06-02 11:54:35 -07:00
pyr0ball	876cfb9a63	fix: group journal sources by prefix:host stem in source health source_ids with 3+ colon segments (e.g. muninn-journal:Muninn:ssh.service) are now aggregated by their prefix:host key at the SQL level in both list_sources() and stats_summary(). This collapses ~19K transient systemd unit rows (crash-loop scope entries from Muninn) into ~24 grouped rows. - list_sources: SQL CASE/INSTR group-by stem + unit_count field - stats_summary: same stem grouping for dashboard source health table - delete endpoint: LIKE-based cascade delete covers grouped stems - SourcesView: unit_count badge (e.g. "2686 units") on grouped rows; delete confirmation names the unit count when deleting a group - Bump version to v0.6.1	2026-06-02 04:35:26 -07:00
pyr0ball	9cd7450591	chore: bump version to 0.6.0 Release summary: - #60 split incidents tables to turnstone-incidents.db (eliminates FTS5 write lock starvation) - #41 Hybrid-BERT label mapping shim (7-class vocabulary support in classifier) - #15 hybrid BM25 + vector re-ranking for diagnose search (semantic=True, alpha=0.6/beta=0.4) - #32 domain-view mapping: 42 patterns annotated across 10 domains, by_domain in diagnose summary	2026-06-01 20:52:35 -07:00
pyr0ball	ce2a2b55a6	Merge feat/32-domain-view: domain-view mapping for patterns and diagnose output (#32 )	2026-06-01 20:01:19 -07:00
pyr0ball	eac9a4ba28	Merge feat/15-hybrid-rag: hybrid BM25 + vector re-ranking for diagnose search (#15 )	2026-06-01 20:00:02 -07:00
pyr0ball	cfddff6a2a	Merge feat/41-hybrid-bert-shim: Hybrid-BERT label mapping shim (#41 )	2026-06-01 19:59:34 -07:00
pyr0ball	48816f4ef3	Merge feat/60-incidents-db: split incidents tables to dedicated DB (#60 )	2026-06-01 19:58:49 -07:00
pyr0ball	b1f3d68724	feat: domain-view mapping for patterns and diagnose output (#32 ) Adds a domain: field to the pattern taxonomy and surfaces per-domain hit counts in diagnose summaries for faster triage. Changes: - LogPattern gains domain: str = "" (backward-compatible default) - load_patterns() reads domain from YAML via p.get("domain", "") - All 42 patterns in default.yaml annotated across 10 domains: service_health \| networking \| auth \| storage \| memory \| kernel \| power \| web_proxy \| media \| gpu - _pattern_domain dict built at startup from compiled patterns - _domain_counts() helper: maps matched_patterns tags to domains, counts hits per domain across a result set - diagnose POST: summary includes by_domain: {domain: count} - diagnose stream: summary SSE event includes by_domain when pattern_domain is provided (passed from rest.py at startup) - /api/search gains ?domain= filter: post-filters results to entries whose matched_patterns include at least one tag in the given domain Test fixtures: patch _pattern_domain={} and CONTEXT_DB_PATH in test_blocklist_endpoints.py and test_glean_tautulli.py (worktree has no data/ dir; same fix as feat/60-incidents-db). 372 tests passing. Closes: #32	2026-06-01 19:57:16 -07:00
pyr0ball	1abdcfb1f3	feat: hybrid BM25 + vector re-ranking for diagnose search (#15 ) Adds late-fusion hybrid search to Turnstone's log retrieval layer: hybrid_score = 0.6 * bm25_normalized + 0.4 * cosine_similarity Implementation: - _bm25_search() extracts the existing FTS5 BM25 path as a named helper - _hybrid_search() fetches an oversized BM25 candidate pool (5x limit, min 100), embeds the query and each candidate text in-process via the existing embeddings service, normalizes BM25 rank to [0,1], combines with cosine similarity, and re-ranks - search() gets semantic=False param that dispatches to _hybrid_search() when True; pure BM25 remains the default for all existing call sites - diagnose_stream() enables semantic=True so symptom-based queries ("database connection failed") surface semantically equivalent entries ("ECONNREFUSED", "backend gone away", "max retries exceeded") - /api/search REST endpoint exposes ?semantic=true query param Graceful degradation: falls back silently to pure BM25 when the embedding backend is unavailable (EMBEDDING_AVAILABLE=False) or when embed_batch raises an exception. No new infra — in-process numpy cosine, no vector DB. 11 new tests: BM25 helper, hybrid re-ranking, fallback paths, dispatcher. 372 + 11 = 383 tests passing. Closes: #15	2026-06-01 18:13:09 -07:00
pyr0ball	503a36d76c	feat(classifier): add Hybrid-BERT label mapping shim (#41 ) Adds _HYBRID_BERT_LABEL_MAP to translate the 7-class output vocabulary of krishnas4415/log-anomaly-detection-models (Hybrid-BERT, MIT) to Turnstone SeverityLabel. _map_label now checks the Hybrid-BERT map before the standard map so either model family works via TURNSTONE_CLASSIFIER_MODEL without any additional code path. Mapping (confirmed from model config.json): normal → INFO security_anomaly → ERROR system_failure → CRITICAL performance_issue → WARN network_anomaly → WARN config_error → ERROR hardware_issue → CRITICAL Keyword-based CRITICAL promotion and low-confidence DEBUG demotion apply on top of the base mapping (same rules as the standard vocabulary). 11 new tests covering all 7 Hybrid-BERT labels, case-insensitivity, and regression on standard-vocabulary labels. 372 tests passing total. Note: custom loading code for the non-standard .pt checkpoint format is explicitly out of scope — evaluate better-packaged HF alternatives first (see #41 for candidate list). Closes: #41	2026-06-01 16:20:31 -07:00
pyr0ball	bd3923e163	fix: split incidents tables to dedicated turnstone-incidents.db (#60 ) FTS5 bulk-insert write locks starved the incident API and bundle endpoints during log bursts (sonarr/radarr, high-volume docker sources). Fix mirrors the context_facts split (context -> turnstone-context.db): - Add INCIDENTS_DB_PATH / TURNSTONE_INCIDENTS_DB env var in rest.py - Add _INCIDENTS_SCHEMA, ensure_incidents_schema(), and migrate_incidents_to_dedicated_db() in glean/pipeline.py - Stub out incidents/received_bundles/sent_bundles in _SCHEMA (no-op CREATE IF NOT EXISTS) so legacy single-file deployments still open - Thread incidents_db_path through diagnose_stream -> run_pipeline -> FalsePositiveSuppressor.suppress -> _fetch_resolved_incidents - One-shot migration on startup: copy existing rows from main DB to incidents DB via INSERT OR IGNORE (idempotent, safe to re-run) - Fix test_blocklist_endpoints fixtures to patch CONTEXT_DB_PATH and INCIDENTS_DB_PATH alongside DB_PATH (worktree has no data/ dir) 372 tests passing. Closes: #60	2026-06-01 15:54:23 -07:00
pyr0ball	1131816666	feat: bundle PII sanitization, onboarding wizard, NL source addition (#51 , #52 , #53 ) Bundle export (#51): - _redact_text() with 5 compiled regex patterns (IPv4, email, user=, host=, password=) - build_bundle(sanitize=False) — per-entry redaction at export time - sent_bundles table tracks every outgoing export (GET and POST /send) - GET /api/sent-bundles exposes history; SentBundle model added - BundlesView: Received/Sent tabs, sanitized badge, 5-entry preview, re-download - IncidentsView: Sanitize PII checkbox next to Send Bundle Onboarding wizard (#52): - app/services/discover.py: journald/Docker/file detection (best-effort, safe in containers) - GET /api/setup/status, /discover, POST /api/setup/write (additive, appends to existing) - SetupWizard.vue: 3-step Detect → Select → Confirm - Step 1 shows grouped summary (journald/file/docker counts) - Step 2: collapsible groups with All/None section toggles - journald + file: pre-selected; docker: collapsed, none pre-selected - Step 3: YAML preview before write - SourcesView: shows wizard on first run; Add Source button reuses it NL source addition (#53): - app/services/nl_source.py: keyword shortcut (13 well-known apps) + LLM fallback - POST /api/setup/interpret: keyword → LLM → null (graceful fallback) - NL field in wizard step 2; manual form shown when interpretation fails - Added sources appear in grouped list immediately	2026-05-29 14:14:28 -07:00
pyr0ball	054ebfa0e3	feat(diagnose): tech-level post-processor, offline mode, API auth, context harvest - synthesizer: 3 system prompts (sysadmin/homelab/executive) selected by tech_level pref - settings: tech_level selector (UI + backend) persisted in preferences.json - QuickCapture: shows active level label in diagnosis card header - TURNSTONE_OFFLINE_MODE=1: sets HF_HUB_OFFLINE + TRANSFORMERS_OFFLINE before lib load - TURNSTONE_API_KEY: bearer token auth on all /api/ routes (hmac.compare_digest) - /health always open; unset key = no auth (backward compatible) - docs/air-gapped-deployment.md: full offline deployment guide - scripts/harvest_docs.py: generalized context doc bulk-uploader with manifest support - scripts/manifests/: heimdall-devops.yaml (10 docs ingested) + example.yaml template - fix: _ingest_upload -> _glean_upload in context doc upload endpoint (was 500) Closes: #56 Closes: #45 Closes: #47 Closes: #49 Closes: #21	2026-05-28 08:51:05 -07:00
pyr0ball	73a14bd782	fix(diagnose): add max_tokens to all LLM calls; fix reasoning card contrast Truncation fix: call_llm() in _llm_client.py now accepts max_tokens (default 2048) and passes it in both the cf-orch task payload and the OpenAI-compat fallback body. Hypothesizer uses max_tokens=1024 (JSON array output); synthesizer and legacy summarize use 2048 (structured 5-section narrative). Without this, backends use their own default (often 512 tokens), causing mid-sentence truncation of the diagnosis output. UI fix: reasoning card changed from bg-accent/5 border-accent/30 (opacity modifiers on CSS variables don't compose reliably across themes) to the callout pattern: bg-surface-raised with a solid border-l-4 border-accent. Header label changed from text-text-dim to text-accent for visual anchoring. Text remains text-text-primary for guaranteed contrast on both light and dark themes. Tracks: #56 (technical-level post-processor, filed as follow-on feature)	2026-05-27 22:23:36 -07:00
pyr0ball	7f49961ec4	fix(db): add timeout=30s to all sqlite3.connect() calls across app Watcher, REST endpoints, services (search, incidents, blocklist), MCP server, context retriever, embedder, glean_scheduler, and doc_upload all used the default 5-second SQLite busy timeout. During collect glean write phases, watcher flush threads were hitting 'database is locked' errors when the glean held the write lock longer than 5 seconds. All connections now use timeout=30.0, matching the pipeline fix from commit `5a9281a`. No logic changes.	2026-05-26 23:12:48 -07:00
pyr0ball	5a9281a686	fix(glean): add timeout=30s to all pipeline DB connections; add --force flag; new patterns pipeline.py: - Add timeout=30.0 to all sqlite3.connect() calls (5 total). Previously only ensure_context_schema() had it. The main glean writers would fail immediately under lock contention from the live watcher or concurrent manual glean runs. glean_corpus.py: - Add --force flag (passed through to glean_sources/glean_file/glean_dir). Without it, unchanged-fingerprint files were silently skipped even after pattern updates. Use after editing patterns/default.yaml. patterns/default.yaml: - Add 9 new patterns for Muninn / cluster-wide coverage: vpn_tunnel_fail WireGuard/tunnel service failures vpn_handshake WireGuard peer handshake events dns_degraded systemd-resolved DNS fallback/degradation nvidia_api_mismatch NVIDIA kernel module vs userspace mismatch nvidia_xid NVIDIA Xid GPU hardware faults nvidia_gpu_reset NVIDIA GPU reset / NVLink faults acpi_error ACPI firmware _DSM evaluation failures thermal_throttle CPU/GPU thermal throttling / RAPL unavailable undervoltage PSU undervoltage / brownout events - Sync from /devl/turnstone-cluster/patterns/default.yaml (authoritative live copy updated first; repo copy was stale)	2026-05-26 22:36:45 -07:00
pyr0ball	09b4912c8e	fix(cluster): add Muninn to SSH collection, fix ingest_corpus → glean_corpus rename - Add [muninn] to NODES map in collect_cluster_logs.sh Muninn is accessible via WireGuard (ssh muninn). One-time 7-day backfill already gleaned: 262,659 entries. - Fix broken script reference: ingest_corpus.py was renamed to glean_corpus.py — ongoing cluster glean was silently broken since the rename	2026-05-26 17:02:53 -07:00
pyr0ball	74e0d5fcd6	docs(container): fix GPU_SERVER_URL for Contributor2 — use public orch.circuitforge.tech Contributor2's example-node.tv has no WireGuard route to Heimdall's LAN (10.1.10.x), so the <YOUR_HOST_IP>:7700 private address is unreachable from there. Use the public cf-orch endpoint instead: GPU_SERVER_URL=https://orch.circuitforge.tech Contributor's Huginn has WireGuard to Heimdall LAN — <YOUR_HOST_IP>:7700 stays correct. Added both options to docker-standalone.sh for clarity.	2026-05-26 13:39:38 -07:00
pyr0ball	3a83e0e31d	feat(container): add docker-standalone.sh for Docker hosts (Contributor/Huginn) Mirrors podman-standalone.sh for Docker-native setups. Key differences: - Uses ~/turnstone as default REPO_DIR (no /opt assumption) - -p 8534:8534 port mapping instead of --net=host - No systemd unit generation (Docker --restart=unless-stopped handles reboots) - Volume mounts without :Z (Docker SELinux labeling differs from Podman) Documents the multi-agent setup steps for Huginn: export GPU_SERVER_URL=http://<YOUR_HOST_IP>:7700 export TURNSTONE_MULTI_AGENT_DIAGNOSE=true bash ~/turnstone/docker-standalone.sh	2026-05-26 13:21:54 -07:00
pyr0ball	2a4a5a5152	feat(container): multi-agent env vars, HF cache mount, and ML deps podman-standalone.sh: - Add HF_CACHE_DIR=/opt/turnstone/hf-cache with mkdir guard - Mount HF_HOME=/hf-cache so model weights persist across restarts - Forward all multi-agent env vars (TURNSTONE_MULTI_AGENT_DIAGNOSE, GPU_SERVER_URL, TURNSTONE_CLASSIFIER_MODEL, TURNSTONE_EMBED_*) - Add documentation comments for Contributor/Contributor2 remote instance setup requirements.txt: - Add torch (CPU-only), transformers, sentence-transformers for the 5-stage multi-agent diagnose pipeline (classifier + suppressor stages) - Use --extra-index-url for cpu wheel to keep image ~2GB lighter - Both modules keep ImportError guards so server starts without them, but container images should ship fully capable	2026-05-26 13:20:26 -07:00
pyr0ball	3cfd587d16	fix: separate context KB into own SQLite file to eliminate write-lock contention context_facts, context_documents, and context_chunks now live in turnstone-context.db (sibling of turnstone.db). The glean scheduler held write locks on the main DB long enough to cause 5-second timeout failures on context fact inserts; separate files have independent WAL write locks so they never contend. Changes: - pipeline.py: extract _CONTEXT_SCHEMA + ensure_context_schema() - rest.py: CONTEXT_DB_PATH (TURNSTONE_CONTEXT_DB env var, defaults to sibling file); init via ensure_context_schema(); all context routes pass CONTEXT_DB_PATH; diagnose_stream receives context_db_path kwarg - diagnose/__init__.py: diagnose_stream() accepts context_db_path (falls back to db_path for backward compat); retrieve_context uses it - store.py: sqlite3.connect() timeout=30.0 — Python driver retry loop is independent of PRAGMA busy_timeout; needed for any remaining contention during test or single-file deployments Closes: #42	2026-05-25 21:19:32 -07:00
pyr0ball	e851099e5c	fix(hypothesizer): extract first JSON array to handle reasoning model double-output Reasoning models (e.g. foundation-sec-8b) emit valid JSON then repeat it inside a markdown fence block. json.loads() fails on the combined text. extract_first_json_array() scans for the first '[' and walks to its matching ']' with proper string/escape/nesting handling, then returns just that slice. Combined with strip_json_fences(), this handles all observed output patterns: - bare JSON array (standard models) - fenced JSON array (fence-wrapping models) - bare array followed by fenced repeat (reasoning models)	2026-05-25 21:01:14 -07:00

1 2 3 4

170 commits