Commit graph

11 commits

Author SHA1 Message Date
ce2a2b55a6 Merge feat/32-domain-view: domain-view mapping for patterns and diagnose output (#32) 2026-06-01 20:01:19 -07:00
b1f3d68724 feat: domain-view mapping for patterns and diagnose output (#32)
Adds a domain: field to the pattern taxonomy and surfaces per-domain
hit counts in diagnose summaries for faster triage.

Changes:
- LogPattern gains domain: str = "" (backward-compatible default)
- load_patterns() reads domain from YAML via p.get("domain", "")
- All 42 patterns in default.yaml annotated across 10 domains:
    service_health | networking | auth | storage | memory |
    kernel | power | web_proxy | media | gpu
- _pattern_domain dict built at startup from compiled patterns
- _domain_counts() helper: maps matched_patterns tags to domains,
  counts hits per domain across a result set
- diagnose POST: summary includes by_domain: {domain: count}
- diagnose stream: summary SSE event includes by_domain when
  pattern_domain is provided (passed from rest.py at startup)
- /api/search gains ?domain= filter: post-filters results to entries
  whose matched_patterns include at least one tag in the given domain

Test fixtures: patch _pattern_domain={} and CONTEXT_DB_PATH in
test_blocklist_endpoints.py and test_glean_tautulli.py (worktree
has no data/ dir; same fix as feat/60-incidents-db).

372 tests passing.

Closes: #32
2026-06-01 19:57:16 -07:00
bd3923e163 fix: split incidents tables to dedicated turnstone-incidents.db (#60)
FTS5 bulk-insert write locks starved the incident API and bundle endpoints
during log bursts (sonarr/radarr, high-volume docker sources). Fix mirrors
the context_facts split (context -> turnstone-context.db):

- Add INCIDENTS_DB_PATH / TURNSTONE_INCIDENTS_DB env var in rest.py
- Add _INCIDENTS_SCHEMA, ensure_incidents_schema(), and
  migrate_incidents_to_dedicated_db() in glean/pipeline.py
- Stub out incidents/received_bundles/sent_bundles in _SCHEMA (no-op
  CREATE IF NOT EXISTS) so legacy single-file deployments still open
- Thread incidents_db_path through diagnose_stream -> run_pipeline ->
  FalsePositiveSuppressor.suppress -> _fetch_resolved_incidents
- One-shot migration on startup: copy existing rows from main DB to
  incidents DB via INSERT OR IGNORE (idempotent, safe to re-run)
- Fix test_blocklist_endpoints fixtures to patch CONTEXT_DB_PATH and
  INCIDENTS_DB_PATH alongside DB_PATH (worktree has no data/ dir)

372 tests passing.

Closes: #60
2026-06-01 15:54:23 -07:00
1131816666 feat: bundle PII sanitization, onboarding wizard, NL source addition (#51, #52, #53)
Bundle export (#51):
- _redact_text() with 5 compiled regex patterns (IPv4, email, user=, host=, password=)
- build_bundle(sanitize=False) — per-entry redaction at export time
- sent_bundles table tracks every outgoing export (GET and POST /send)
- GET /api/sent-bundles exposes history; SentBundle model added
- BundlesView: Received/Sent tabs, sanitized badge, 5-entry preview, re-download
- IncidentsView: Sanitize PII checkbox next to Send Bundle

Onboarding wizard (#52):
- app/services/discover.py: journald/Docker/file detection (best-effort, safe in containers)
- GET /api/setup/status, /discover, POST /api/setup/write (additive, appends to existing)
- SetupWizard.vue: 3-step Detect → Select → Confirm
  - Step 1 shows grouped summary (journald/file/docker counts)
  - Step 2: collapsible groups with All/None section toggles
    - journald + file: pre-selected; docker: collapsed, none pre-selected
  - Step 3: YAML preview before write
- SourcesView: shows wizard on first run; Add Source button reuses it

NL source addition (#53):
- app/services/nl_source.py: keyword shortcut (13 well-known apps) + LLM fallback
- POST /api/setup/interpret: keyword → LLM → null (graceful fallback)
- NL field in wizard step 2; manual form shown when interpretation fails
- Added sources appear in grouped list immediately
2026-05-29 14:14:28 -07:00
7f49961ec4 fix(db): add timeout=30s to all sqlite3.connect() calls across app
Watcher, REST endpoints, services (search, incidents, blocklist),
MCP server, context retriever, embedder, glean_scheduler, and
doc_upload all used the default 5-second SQLite busy timeout.
During collect glean write phases, watcher flush threads were hitting
'database is locked' errors when the glean held the write lock longer
than 5 seconds.

All connections now use timeout=30.0, matching the pipeline fix
from commit 5a9281a. No logic changes.
2026-05-26 23:12:48 -07:00
5a9281a686 fix(glean): add timeout=30s to all pipeline DB connections; add --force flag; new patterns
pipeline.py:
- Add timeout=30.0 to all sqlite3.connect() calls (5 total).
  Previously only ensure_context_schema() had it. The main glean
  writers would fail immediately under lock contention from the live
  watcher or concurrent manual glean runs.

glean_corpus.py:
- Add --force flag (passed through to glean_sources/glean_file/glean_dir).
  Without it, unchanged-fingerprint files were silently skipped even
  after pattern updates. Use after editing patterns/default.yaml.

patterns/default.yaml:
- Add 9 new patterns for Muninn / cluster-wide coverage:
    vpn_tunnel_fail     WireGuard/tunnel service failures
    vpn_handshake       WireGuard peer handshake events
    dns_degraded        systemd-resolved DNS fallback/degradation
    nvidia_api_mismatch NVIDIA kernel module vs userspace mismatch
    nvidia_xid          NVIDIA Xid GPU hardware faults
    nvidia_gpu_reset    NVIDIA GPU reset / NVLink faults
    acpi_error          ACPI firmware _DSM evaluation failures
    thermal_throttle    CPU/GPU thermal throttling / RAPL unavailable
    undervoltage        PSU undervoltage / brownout events
- Sync from /devl/turnstone-cluster/patterns/default.yaml (authoritative
  live copy updated first; repo copy was stale)
2026-05-26 22:36:45 -07:00
3cfd587d16 fix: separate context KB into own SQLite file to eliminate write-lock contention
context_facts, context_documents, and context_chunks now live in
turnstone-context.db (sibling of turnstone.db).  The glean scheduler
held write locks on the main DB long enough to cause 5-second timeout
failures on context fact inserts; separate files have independent WAL
write locks so they never contend.

Changes:
- pipeline.py: extract _CONTEXT_SCHEMA + ensure_context_schema()
- rest.py: CONTEXT_DB_PATH (TURNSTONE_CONTEXT_DB env var, defaults to
  sibling file); init via ensure_context_schema(); all context routes
  pass CONTEXT_DB_PATH; diagnose_stream receives context_db_path kwarg
- diagnose/__init__.py: diagnose_stream() accepts context_db_path
  (falls back to db_path for backward compat); retrieve_context uses it
- store.py: sqlite3.connect() timeout=30.0 — Python driver retry loop
  is independent of PRAGMA busy_timeout; needed for any remaining
  contention during test or single-file deployments

Closes: #42
2026-05-25 21:19:32 -07:00
6fec294a53 feat: fingerprint-based incremental glean — skip unchanged files (#30)
- Add glean_fingerprints table to schema (sha256 + mtime + size)
- _fingerprint(), _fp_unchanged(), _save_fingerprint() helpers in pipeline.py
- _glean_files() now checks fingerprint; skips file if hash unchanged
- force=True param threads through glean_dir → glean_file → glean_sources
- POST /api/tasks/glean and POST /api/sources/{id}/glean accept force=true
- 14 unit tests in tests/test_glean_fingerprint.py, all passing

Closes: #30
2026-05-25 11:01:18 -07:00
41fc89c474 feat: SSH remote glean — transport layer, pipeline integration, REST + UI (#22)
Closes turnstone#22.

## Transport layer (app/glean/ssh.py)
- SSHTransport context manager: key-only auth, paramiko backend
- SSHConnectionError / SSHCommandError exception hierarchy
- exec_stream() generator: yields stdout lines, raises SSHCommandError on
  non-zero exit (isinstance(int) guard for test-mock safety)
- Command builders: _build_journald_command, _build_syslog_command,
  _build_plaintext_command, _build_docker_command
- 18 unit tests in tests/test_glean_ssh.py

## Pipeline integration (app/glean/pipeline.py)
- _stream_and_write(): per-item error isolation — SSHCommandError skips
  one glean item without aborting the rest of the host connection
- _glean_ssh_source(): one SSHTransport per host, dispatches all glean
  items (journald/syslog/plaintext/docker); SSHConnectionError aborts host
- glean_sources(): splits local vs SSH sources; local → _glean_files();
  SSH → _glean_ssh_source(); shared compiled patterns and DB connection
- glean_ssh_source(): public wrapper for REST use — manages DB connection,
  pattern compilation, FTS rebuild lifecycle
- 15 integration tests in tests/test_glean_pipeline_ssh.py
- All 285 tests passing

## REST layer (app/rest.py)
- GET /api/sources/configured: reads sources.yaml and enriches with DB
  stats; SSH sources appear before first glean (entry_count=0); sub-source
  IDs (rack01/journald, rack01/docker/myapp) aggregated per host entry
- POST /api/sources/{id}/glean: detects transport:ssh and dispatches to
  glean_ssh_source() wrapper; local sources unchanged
- Import: glean_ssh_source as _glean_ssh_source

## Frontend (web/src/views/SourcesView.vue)
- Fetches /api/sources/configured (primary) + /api/sources (DB-only) in
  parallel; merges into unified SourceRow list
- SSH sources show: ssh badge (with user@host tooltip), glean-type pills
  (journald/syslog/docker/etc.), host subtitle
- SSH sub-source IDs (rack01/journald) suppressed from the DB-only list
  since they are covered by the parent SSH row
- DB-only sources (uploads) appear below configured sources with 'uploaded'
  badge; reglean button disabled (not in sources.yaml)
- Delete zeroes out configured-source stats in-place rather than removing
  the row (so the source remains visible for re-gleaning)
2026-05-21 12:37:30 -07:00
39c13f39ba feat: SSH remote host glean — transport layer and pipeline integration (closes #22, backend)
Adds SSH-based log collection from remote hosts via Paramiko.
One SSH connection per host, multiple log types per connection.

New files:
- app/glean/ssh.py: SSHTransport context manager + command builders
  for journald, syslog, plaintext, and docker log types
- tests/test_glean_ssh.py: 18 tests for transport layer (all mocked)
- tests/test_glean_pipeline_ssh.py: 15 tests for pipeline integration

Pipeline changes (app/glean/pipeline.py):
- glean_sources() now splits sources into local-file and SSH categories
- SSH sources use transport: ssh + glean: list schema in sources.yaml
- _glean_ssh_source(): one SSHTransport per host, N commands per connection
- _stream_and_write(): SSHCommandError caught per-item so one bad
  command does not abort the rest of the host's glean items
- SSHConnectionError skips the entire host with a warning log

SSH source schema (sources.yaml):
  - id: rack01
    transport: ssh
    host: 192.168.1.10
    user: admin
    key_path: ~/.ssh/id_ed25519
    glean:
      - type: journald
        args: [--since, 2 hours ago]
      - type: syslog
        path: /var/log/syslog
      - type: plaintext
        path: /var/log/app/error.log
      - type: docker
        containers: [myapp, nginx]

Key design decisions:
- Key-based auth only (no password prompts in daemon context)
- exit-status check fires after all stdout lines yielded; callers
  drain the iterator to trigger it
- Local file sources path unchanged; SSH sources co-exist in same yaml
- Docker multi-container: one exec_stream call per container,
  source_id scoped as host_id/type/container_name

Remaining for #22: REST endpoint, SourcesView UI, sources.yaml docs.
285 → 285 tests passing (33 new SSH tests).
2026-05-20 23:03:13 -07:00
828b69768a refactor: rename ingest → glean throughout codebase
Renames the app/ingest/ package to app/glean/ and updates all
references across Python modules, shell scripts, Vue components,
tests, and documentation.

Intentionally preserved:
- SQLite column name ingest_time (avoids schema migration)
- RetrievedEntry.ingest_time field (maps to the column above)
- Any public-facing JSON keys that reference ingest_time

Changes by category:
- app/ingest/ → app/glean/ (full package move, all parsers)
- app/tasks/ingest_scheduler.py → app/tasks/glean_scheduler.py
- scripts/ingest_corpus.py → scripts/glean_corpus.py
- tests/test_ingest_*.py → tests/test_glean_*.py
- Docstrings, log messages, comments: ingest → glean
- Env var: TURNSTONE_INGEST_INTERVAL → TURNSTONE_GLEAN_INTERVAL
- Shell scripts: glean.log, glean_corpus.py references
- README.md: multi-source ingest → multi-source glean
- .env.example: updated env var name
- patterns/: new diagnostic patterns from 2026-05-20 SSH incident
  (service_crash_loop, pkg_daemon_restart, ssh_forward_conflict)
- SourcesView.vue: pipeline label updated
- All test import paths updated to app.glean.*

285 tests passing.
2026-05-20 23:02:55 -07:00