turnstone

Author	SHA1	Message	Date
pyr0ball	876cfb9a63	fix: group journal sources by prefix:host stem in source health source_ids with 3+ colon segments (e.g. muninn-journal:Muninn:ssh.service) are now aggregated by their prefix:host key at the SQL level in both list_sources() and stats_summary(). This collapses ~19K transient systemd unit rows (crash-loop scope entries from Muninn) into ~24 grouped rows. - list_sources: SQL CASE/INSTR group-by stem + unit_count field - stats_summary: same stem grouping for dashboard source health table - delete endpoint: LIKE-based cascade delete covers grouped stems - SourcesView: unit_count badge (e.g. "2686 units") on grouped rows; delete confirmation names the unit count when deleting a group - Bump version to v0.6.1	2026-06-02 04:35:26 -07:00
pyr0ball	9cd7450591	chore: bump version to 0.6.0 Release summary: - #60 split incidents tables to turnstone-incidents.db (eliminates FTS5 write lock starvation) - #41 Hybrid-BERT label mapping shim (7-class vocabulary support in classifier) - #15 hybrid BM25 + vector re-ranking for diagnose search (semantic=True, alpha=0.6/beta=0.4) - #32 domain-view mapping: 42 patterns annotated across 10 domains, by_domain in diagnose summary	2026-06-01 20:52:35 -07:00
pyr0ball	ce2a2b55a6	Merge feat/32-domain-view: domain-view mapping for patterns and diagnose output (#32 )	2026-06-01 20:01:19 -07:00
pyr0ball	eac9a4ba28	Merge feat/15-hybrid-rag: hybrid BM25 + vector re-ranking for diagnose search (#15 )	2026-06-01 20:00:02 -07:00
pyr0ball	cfddff6a2a	Merge feat/41-hybrid-bert-shim: Hybrid-BERT label mapping shim (#41 )	2026-06-01 19:59:34 -07:00
pyr0ball	48816f4ef3	Merge feat/60-incidents-db: split incidents tables to dedicated DB (#60 )	2026-06-01 19:58:49 -07:00
pyr0ball	b1f3d68724	feat: domain-view mapping for patterns and diagnose output (#32 ) Adds a domain: field to the pattern taxonomy and surfaces per-domain hit counts in diagnose summaries for faster triage. Changes: - LogPattern gains domain: str = "" (backward-compatible default) - load_patterns() reads domain from YAML via p.get("domain", "") - All 42 patterns in default.yaml annotated across 10 domains: service_health \| networking \| auth \| storage \| memory \| kernel \| power \| web_proxy \| media \| gpu - _pattern_domain dict built at startup from compiled patterns - _domain_counts() helper: maps matched_patterns tags to domains, counts hits per domain across a result set - diagnose POST: summary includes by_domain: {domain: count} - diagnose stream: summary SSE event includes by_domain when pattern_domain is provided (passed from rest.py at startup) - /api/search gains ?domain= filter: post-filters results to entries whose matched_patterns include at least one tag in the given domain Test fixtures: patch _pattern_domain={} and CONTEXT_DB_PATH in test_blocklist_endpoints.py and test_glean_tautulli.py (worktree has no data/ dir; same fix as feat/60-incidents-db). 372 tests passing. Closes: #32	2026-06-01 19:57:16 -07:00
pyr0ball	1abdcfb1f3	feat: hybrid BM25 + vector re-ranking for diagnose search (#15 ) Adds late-fusion hybrid search to Turnstone's log retrieval layer: hybrid_score = 0.6 * bm25_normalized + 0.4 * cosine_similarity Implementation: - _bm25_search() extracts the existing FTS5 BM25 path as a named helper - _hybrid_search() fetches an oversized BM25 candidate pool (5x limit, min 100), embeds the query and each candidate text in-process via the existing embeddings service, normalizes BM25 rank to [0,1], combines with cosine similarity, and re-ranks - search() gets semantic=False param that dispatches to _hybrid_search() when True; pure BM25 remains the default for all existing call sites - diagnose_stream() enables semantic=True so symptom-based queries ("database connection failed") surface semantically equivalent entries ("ECONNREFUSED", "backend gone away", "max retries exceeded") - /api/search REST endpoint exposes ?semantic=true query param Graceful degradation: falls back silently to pure BM25 when the embedding backend is unavailable (EMBEDDING_AVAILABLE=False) or when embed_batch raises an exception. No new infra — in-process numpy cosine, no vector DB. 11 new tests: BM25 helper, hybrid re-ranking, fallback paths, dispatcher. 372 + 11 = 383 tests passing. Closes: #15	2026-06-01 18:13:09 -07:00
pyr0ball	503a36d76c	feat(classifier): add Hybrid-BERT label mapping shim (#41 ) Adds _HYBRID_BERT_LABEL_MAP to translate the 7-class output vocabulary of krishnas4415/log-anomaly-detection-models (Hybrid-BERT, MIT) to Turnstone SeverityLabel. _map_label now checks the Hybrid-BERT map before the standard map so either model family works via TURNSTONE_CLASSIFIER_MODEL without any additional code path. Mapping (confirmed from model config.json): normal → INFO security_anomaly → ERROR system_failure → CRITICAL performance_issue → WARN network_anomaly → WARN config_error → ERROR hardware_issue → CRITICAL Keyword-based CRITICAL promotion and low-confidence DEBUG demotion apply on top of the base mapping (same rules as the standard vocabulary). 11 new tests covering all 7 Hybrid-BERT labels, case-insensitivity, and regression on standard-vocabulary labels. 372 tests passing total. Note: custom loading code for the non-standard .pt checkpoint format is explicitly out of scope — evaluate better-packaged HF alternatives first (see #41 for candidate list). Closes: #41	2026-06-01 16:20:31 -07:00
pyr0ball	bd3923e163	fix: split incidents tables to dedicated turnstone-incidents.db (#60 ) FTS5 bulk-insert write locks starved the incident API and bundle endpoints during log bursts (sonarr/radarr, high-volume docker sources). Fix mirrors the context_facts split (context -> turnstone-context.db): - Add INCIDENTS_DB_PATH / TURNSTONE_INCIDENTS_DB env var in rest.py - Add _INCIDENTS_SCHEMA, ensure_incidents_schema(), and migrate_incidents_to_dedicated_db() in glean/pipeline.py - Stub out incidents/received_bundles/sent_bundles in _SCHEMA (no-op CREATE IF NOT EXISTS) so legacy single-file deployments still open - Thread incidents_db_path through diagnose_stream -> run_pipeline -> FalsePositiveSuppressor.suppress -> _fetch_resolved_incidents - One-shot migration on startup: copy existing rows from main DB to incidents DB via INSERT OR IGNORE (idempotent, safe to re-run) - Fix test_blocklist_endpoints fixtures to patch CONTEXT_DB_PATH and INCIDENTS_DB_PATH alongside DB_PATH (worktree has no data/ dir) 372 tests passing. Closes: #60	2026-06-01 15:54:23 -07:00
pyr0ball	1131816666	feat: bundle PII sanitization, onboarding wizard, NL source addition (#51 , #52 , #53 ) Bundle export (#51): - _redact_text() with 5 compiled regex patterns (IPv4, email, user=, host=, password=) - build_bundle(sanitize=False) — per-entry redaction at export time - sent_bundles table tracks every outgoing export (GET and POST /send) - GET /api/sent-bundles exposes history; SentBundle model added - BundlesView: Received/Sent tabs, sanitized badge, 5-entry preview, re-download - IncidentsView: Sanitize PII checkbox next to Send Bundle Onboarding wizard (#52): - app/services/discover.py: journald/Docker/file detection (best-effort, safe in containers) - GET /api/setup/status, /discover, POST /api/setup/write (additive, appends to existing) - SetupWizard.vue: 3-step Detect → Select → Confirm - Step 1 shows grouped summary (journald/file/docker counts) - Step 2: collapsible groups with All/None section toggles - journald + file: pre-selected; docker: collapsed, none pre-selected - Step 3: YAML preview before write - SourcesView: shows wizard on first run; Add Source button reuses it NL source addition (#53): - app/services/nl_source.py: keyword shortcut (13 well-known apps) + LLM fallback - POST /api/setup/interpret: keyword → LLM → null (graceful fallback) - NL field in wizard step 2; manual form shown when interpretation fails - Added sources appear in grouped list immediately	2026-05-29 14:14:28 -07:00
pyr0ball	054ebfa0e3	feat(diagnose): tech-level post-processor, offline mode, API auth, context harvest - synthesizer: 3 system prompts (sysadmin/homelab/executive) selected by tech_level pref - settings: tech_level selector (UI + backend) persisted in preferences.json - QuickCapture: shows active level label in diagnosis card header - TURNSTONE_OFFLINE_MODE=1: sets HF_HUB_OFFLINE + TRANSFORMERS_OFFLINE before lib load - TURNSTONE_API_KEY: bearer token auth on all /api/ routes (hmac.compare_digest) - /health always open; unset key = no auth (backward compatible) - docs/air-gapped-deployment.md: full offline deployment guide - scripts/harvest_docs.py: generalized context doc bulk-uploader with manifest support - scripts/manifests/: heimdall-devops.yaml (10 docs ingested) + example.yaml template - fix: _ingest_upload -> _glean_upload in context doc upload endpoint (was 500) Closes: #56 Closes: #45 Closes: #47 Closes: #49 Closes: #21	2026-05-28 08:51:05 -07:00
pyr0ball	73a14bd782	fix(diagnose): add max_tokens to all LLM calls; fix reasoning card contrast Truncation fix: call_llm() in _llm_client.py now accepts max_tokens (default 2048) and passes it in both the cf-orch task payload and the OpenAI-compat fallback body. Hypothesizer uses max_tokens=1024 (JSON array output); synthesizer and legacy summarize use 2048 (structured 5-section narrative). Without this, backends use their own default (often 512 tokens), causing mid-sentence truncation of the diagnosis output. UI fix: reasoning card changed from bg-accent/5 border-accent/30 (opacity modifiers on CSS variables don't compose reliably across themes) to the callout pattern: bg-surface-raised with a solid border-l-4 border-accent. Header label changed from text-text-dim to text-accent for visual anchoring. Text remains text-text-primary for guaranteed contrast on both light and dark themes. Tracks: #56 (technical-level post-processor, filed as follow-on feature)	2026-05-27 22:23:36 -07:00
pyr0ball	7f49961ec4	fix(db): add timeout=30s to all sqlite3.connect() calls across app Watcher, REST endpoints, services (search, incidents, blocklist), MCP server, context retriever, embedder, glean_scheduler, and doc_upload all used the default 5-second SQLite busy timeout. During collect glean write phases, watcher flush threads were hitting 'database is locked' errors when the glean held the write lock longer than 5 seconds. All connections now use timeout=30.0, matching the pipeline fix from commit `5a9281a`. No logic changes.	2026-05-26 23:12:48 -07:00
pyr0ball	5a9281a686	fix(glean): add timeout=30s to all pipeline DB connections; add --force flag; new patterns pipeline.py: - Add timeout=30.0 to all sqlite3.connect() calls (5 total). Previously only ensure_context_schema() had it. The main glean writers would fail immediately under lock contention from the live watcher or concurrent manual glean runs. glean_corpus.py: - Add --force flag (passed through to glean_sources/glean_file/glean_dir). Without it, unchanged-fingerprint files were silently skipped even after pattern updates. Use after editing patterns/default.yaml. patterns/default.yaml: - Add 9 new patterns for Muninn / cluster-wide coverage: vpn_tunnel_fail WireGuard/tunnel service failures vpn_handshake WireGuard peer handshake events dns_degraded systemd-resolved DNS fallback/degradation nvidia_api_mismatch NVIDIA kernel module vs userspace mismatch nvidia_xid NVIDIA Xid GPU hardware faults nvidia_gpu_reset NVIDIA GPU reset / NVLink faults acpi_error ACPI firmware _DSM evaluation failures thermal_throttle CPU/GPU thermal throttling / RAPL unavailable undervoltage PSU undervoltage / brownout events - Sync from /devl/turnstone-cluster/patterns/default.yaml (authoritative live copy updated first; repo copy was stale)	2026-05-26 22:36:45 -07:00
pyr0ball	09b4912c8e	fix(cluster): add Muninn to SSH collection, fix ingest_corpus → glean_corpus rename - Add [muninn] to NODES map in collect_cluster_logs.sh Muninn is accessible via WireGuard (ssh muninn). One-time 7-day backfill already gleaned: 262,659 entries. - Fix broken script reference: ingest_corpus.py was renamed to glean_corpus.py — ongoing cluster glean was silently broken since the rename	2026-05-26 17:02:53 -07:00
pyr0ball	74e0d5fcd6	docs(container): fix GPU_SERVER_URL for Contributor2 — use public orch.circuitforge.tech Contributor2's example-node.tv has no WireGuard route to Heimdall's LAN (10.1.10.x), so the <YOUR_HOST_IP>:7700 private address is unreachable from there. Use the public cf-orch endpoint instead: GPU_SERVER_URL=https://orch.circuitforge.tech Contributor's Huginn has WireGuard to Heimdall LAN — <YOUR_HOST_IP>:7700 stays correct. Added both options to docker-standalone.sh for clarity.	2026-05-26 13:39:38 -07:00
pyr0ball	3a83e0e31d	feat(container): add docker-standalone.sh for Docker hosts (Contributor/Huginn) Mirrors podman-standalone.sh for Docker-native setups. Key differences: - Uses ~/turnstone as default REPO_DIR (no /opt assumption) - -p 8534:8534 port mapping instead of --net=host - No systemd unit generation (Docker --restart=unless-stopped handles reboots) - Volume mounts without :Z (Docker SELinux labeling differs from Podman) Documents the multi-agent setup steps for Huginn: export GPU_SERVER_URL=http://<YOUR_HOST_IP>:7700 export TURNSTONE_MULTI_AGENT_DIAGNOSE=true bash ~/turnstone/docker-standalone.sh	2026-05-26 13:21:54 -07:00
pyr0ball	2a4a5a5152	feat(container): multi-agent env vars, HF cache mount, and ML deps podman-standalone.sh: - Add HF_CACHE_DIR=/opt/turnstone/hf-cache with mkdir guard - Mount HF_HOME=/hf-cache so model weights persist across restarts - Forward all multi-agent env vars (TURNSTONE_MULTI_AGENT_DIAGNOSE, GPU_SERVER_URL, TURNSTONE_CLASSIFIER_MODEL, TURNSTONE_EMBED_*) - Add documentation comments for Contributor/Contributor2 remote instance setup requirements.txt: - Add torch (CPU-only), transformers, sentence-transformers for the 5-stage multi-agent diagnose pipeline (classifier + suppressor stages) - Use --extra-index-url for cpu wheel to keep image ~2GB lighter - Both modules keep ImportError guards so server starts without them, but container images should ship fully capable	2026-05-26 13:20:26 -07:00
pyr0ball	3cfd587d16	fix: separate context KB into own SQLite file to eliminate write-lock contention context_facts, context_documents, and context_chunks now live in turnstone-context.db (sibling of turnstone.db). The glean scheduler held write locks on the main DB long enough to cause 5-second timeout failures on context fact inserts; separate files have independent WAL write locks so they never contend. Changes: - pipeline.py: extract _CONTEXT_SCHEMA + ensure_context_schema() - rest.py: CONTEXT_DB_PATH (TURNSTONE_CONTEXT_DB env var, defaults to sibling file); init via ensure_context_schema(); all context routes pass CONTEXT_DB_PATH; diagnose_stream receives context_db_path kwarg - diagnose/__init__.py: diagnose_stream() accepts context_db_path (falls back to db_path for backward compat); retrieve_context uses it - store.py: sqlite3.connect() timeout=30.0 — Python driver retry loop is independent of PRAGMA busy_timeout; needed for any remaining contention during test or single-file deployments Closes: #42	2026-05-25 21:19:32 -07:00
pyr0ball	e851099e5c	fix(hypothesizer): extract first JSON array to handle reasoning model double-output Reasoning models (e.g. foundation-sec-8b) emit valid JSON then repeat it inside a markdown fence block. json.loads() fails on the combined text. extract_first_json_array() scans for the first '[' and walks to its matching ']' with proper string/escape/nesting handling, then returns just that slice. Combined with strip_json_fences(), this handles all observed output patterns: - bare JSON array (standard models) - fenced JSON array (fence-wrapping models) - bare array followed by fenced repeat (reasoning models)	2026-05-25 21:01:14 -07:00
pyr0ball	b19bea8f2a	Merge pull request 'refactor: pipeline cleanup — 6 follow-up fixes (#33–#38)' (#40 ) from feat/pipeline-cleanup into main	2026-05-25 20:00:11 -07:00
pyr0ball	f302f27350	Merge pull request 'feat(diagnose): 5-stage multi-agent diagnose pipeline (#29 )' (#39 ) from feat/29-multi-agent-diagnose into main	2026-05-25 19:59:34 -07:00
pyr0ball	39ef1320b0	feat(manage): source .env before starting uvicorn Enables TURNSTONE_MULTI_AGENT_DIAGNOSE and other env vars set in .env to reach the running process without manual export. Variables already set in the caller's environment take precedence.	2026-05-25 19:15:33 -07:00
pyr0ball	2375e073ba	feat(pipeline): add TURNSTONE_CLASSIFIER_MODEL env var for Stage 2 ML config Makes the HuggingFace classifier model for Stage 2 configurable via TURNSTONE_CLASSIFIER_MODEL. When unset (default), Stage 2 falls back to pattern_tags then regex — no download required on first run. Also documents TURNSTONE_MULTI_AGENT_DIAGNOSE, TURNSTONE_CLASSIFIER_MODEL, TURNSTONE_EMBED_BACKEND/MODEL/DEVICE in .env.example.	2026-05-25 19:11:32 -07:00
pyr0ball	85e7a70536	refactor: pipeline cleanup — 6 follow-up fixes (#33-#38) - #33: Wrap ClassifiedTimeline.cluster_severities in MappingProxyType for true immutability (frozen=True only blocks field reassignment, not dict mutation). - #34: Remove dead suppression branch in synthesizer._build_hypothesis_block. active[] is already filtered to not rh.suppress, so the 'Yes — suppressed' branch was unreachable. Now shows novelty score only. - #35: Extract shared _llm_client.py with call_llm() + extract_content() + strip_json_fences(). Both RootCauseHypothesizer and SummarySynthesizer now import from one source. Also strips JSON fences from LLM output before parsing in hypothesizer._parse_response. - #36: Add per-stage try/except in pipeline.run_pipeline(). Unhandled stage exceptions now emit {type: 'error'} + {type: 'done'} SSE events instead of silently closing the stream. - #37: Move format_context_block() call inside the legacy LLM branch in diagnose/__init__.py — it was being computed unconditionally but only used in the non-pipeline path. - #38: Coerce supporting_cluster_ids items to str() in hypothesizer _parse_response to guard against LLMs returning integers instead of string cluster IDs.	2026-05-25 19:05:56 -07:00
pyr0ball	25b7ae340b	fix: invert suppress_threshold semantics to similarity_threshold in FalsePositiveSuppressor Was suppressing when novelty_score < 0.85 (i.e. similarity > 0.15), which would suppress nearly every hypothesis once embeddings are active. Now suppresses when max_sim >= similarity_threshold (0.85), meaning only hypotheses that are 85%+ similar to a resolved incident are suppressed. Also renames suppress_threshold → similarity_threshold for clarity and adds a borderline boundary test (0.85 suppressed, 0.84 not suppressed). Closes: #29	2026-05-25 18:58:52 -07:00
pyr0ball	1b949337da	fix: tighten suppression_reason display guard, document unused since/until params	2026-05-25 15:02:48 -07:00
pyr0ball	1865ba1f02	feat: Stage 5 synthesizer + pipeline orchestrator + feature flag wiring (issue #29 ) - Add app/services/diagnose/synthesizer.py: SummarySynthesizer (Stage 5) - Builds structured LLM prompt from ranked hypotheses, timeline, RAG context - Excludes suppressed hypotheses from the narrative prompt - Deterministic fallback when no LLM configured or LLM call fails - Same cf-orch task endpoint + direct OpenAI-compat fallback pattern as other stages - Replace pipeline.py stub with full run_pipeline() async generator - Orchestrates all 5 stages via asyncio.to_thread for each synchronous stage - Yields typed SSE event dicts: status, pipeline_stage (1-4), hypotheses, reasoning, done - Suppressor counts (active vs suppressed) reported in stage 4 event message - Wire MULTI_AGENT_ENABLED feature flag into diagnose_stream() - TURNSTONE_MULTI_AGENT_DIAGNOSE=true routes through run_pipeline() - pipeline emits its own done event; legacy path unchanged when flag is false - Import of run_pipeline added to __init__.py - Add 21 new tests (350 -> 371 passing): - tests/test_diagnose_synthesizer.py: 8 tests (with/without LLM, suppressed, empty ranked, LLM failure fallback) - tests/test_diagnose_pipeline.py: 13 tests (flag off, flag on event sequence, empty entries, no LLM, stage 1 cluster count message) Closes: #29	2026-05-25 14:56:25 -07:00
pyr0ball	54d4ec5325	refactor: extract _score_hypothesis helper, fix exception types, pass device in suppressor	2026-05-25 14:41:33 -07:00
pyr0ball	84e0cf5245	feat: Stage 4 — FalsePositiveSuppressor for multi-agent diagnose pipeline (issue #29 ) - Implements FalsePositiveSuppressor using embedding cosine similarity - Lazy corpus embedding via get_embedder() with module-level cache keyed by db_path - Cache invalidated automatically when the resolved incident corpus changes - Suppresses hypotheses with novelty_score below configurable threshold (default 0.85) - Full fallback path (novelty=1.0, no suppression) when model_id empty, embedding service unavailable, or no resolved incidents found in DB - Graceful handling of missing incidents table and DB query failures - Numpy bool_ leakage prevented by explicit float()/bool() coercion at assignment - Pure-Python cosine fallback for environments without numpy - 9 new tests (all mocked, no real model downloads): passthrough, suppress, no-suppress, empty list, ranking, empty corpus, DB failure, service unavailable, cache invalidation - 350 total tests passing (341 pre-existing + 9 new) Closes: #29	2026-05-25 14:28:31 -07:00
pyr0ball	a2916f958a	fix: defensive coercion for LLM confidence and cluster fields in hypothesizer - Add _coerce_float() module-level helper: catches TypeError/ValueError from non-numeric LLM output (e.g. 'high', 'N/A') and returns a caller-supplied default instead of raising. - Replace float(item.get('confidence', 0.5)) with _coerce_float(item.get('confidence'), 0.5) in _parse_response. - Guard supporting_cluster_ids: tuple(item.get(...) or []) so a JSON null from the LLM does not cause TypeError('NoneType is not iterable'). - runbook_refs is hardcoded as () and not sourced from LLM output; no change needed there. - Add test_non_numeric_confidence_uses_default (Test 10) to cover the 'high' string case: asserts no exception and confidence == 0.5. - 341 tests passing (+1). Closes: #29	2026-05-25 14:00:30 -07:00
pyr0ball	34fb8f501d	feat: Stage 3 — RootCauseHypothesizer for multi-agent diagnose pipeline (issue #29 ) - Add app/services/diagnose/hypothesizer.py with RootCauseHypothesizer class - Stage 3 of the multi-agent diagnose pipeline: accepts ClassifiedTimeline + RetrievedContext, builds a structured JSON prompt, calls the LLM via the same cf-orch task → OpenAI-compat fallback pattern used by llm.py - Parses JSON array response into list[Hypothesis] dataclasses with UUID ids, severity validation (WARNING→WARN, unknown→ERROR), confidence coercion - Gracefully returns [] when llm_url/llm_model absent or clusters empty - Add tests/test_diagnose_hypothesizer.py: 12 tests, all mocked, no LLM I/O covering: valid response, UUID generation, malformed JSON, non-list JSON, empty clusters, missing URL/model, max_hypotheses cap, severity mapping, confidence string coercion - 340 tests passing (328 prior + 12 new) Closes: #29	2026-05-25 13:49:18 -07:00
pyr0ball	6ea8fbfec1	feat: Stage 2 — SeverityClassifier for multi-agent diagnose pipeline (issue #29 ) Three-path classification: ML (transformers pipeline, lazy singleton) → pattern_tags (YAML pattern severity dict) → regex (detect_severity). - Path A: HF text-classification pipeline loaded lazily on first classify() call via module-level singleton; shim promotes ERROR+keyword hits to CRITICAL and demotes low-confidence INFO to DEBUG. - Path B: maps cluster.pattern_tags through the loaded pattern severity dict; picks the highest severity across matching tags. - Path C: falls back to detect_severity() regex scan on representative_text; defaults to INFO when no keyword matches. - Pattern file resolved from constructor arg or TURNSTONE_PATTERNS env var (mirrors app/rest.py convention). - No crash when transformers is not installed; ImportError on per-cluster ML inference triggers clean per-cluster fallback to pattern_tags/regex. - ClassifiedTimeline.classifier_used reflects the primary session path. Tests (10 new, 328 total, all passing): - ML ERROR, CRITICAL promotion, DEBUG demotion, WARNING→WARN - pattern_tags resolution from YAML fixture - regex ERROR detection and INFO default - ImportError clean fallback - empty timeline no-crash - ClassifiedTimeline FrozenInstanceError on mutation Closes: #29	2026-05-25 13:27:17 -07:00
pyr0ball	7abb76e628	refactor: split TimelineReconstructor.reconstruct into helpers, fix magic number + error handling - Add gap_significance_seconds constructor param (default 30) to replace hardcoded magic number in gap_count computation - _parse_iso now returns datetime \| None with try/except on ValueError; all callers handle None return by treating malformed timestamps as absent - Extract reconstruct into four private helpers: _sort_entries, _group_into_raw_clusters, _build_cluster, _dominant_sources_tuple - Promote _sort_key to module-level function (was nested inside reconstruct) - Rename old module-level _build_cluster to _make_event_cluster to avoid name collision with new instance method - Add explanatory comment to type: ignore[arg-type] at _highest_severity call site - Black-formatted	2026-05-25 13:22:18 -07:00
pyr0ball	f7429ee963	feat: Stage 1 — TimelineReconstructor for multi-agent diagnose pipeline (issue #29 ) - Add app/services/diagnose/timeline.py: pure-Python TimelineReconstructor - Sorts entries by timestamp_iso (None entries appended at end) - Sliding-window clustering anchored to first entry in each cluster - Computes cluster_id (sha1[:12]), severity (highest wins), burst flag, gap_before_seconds, representative_text (highest rank, longest text tiebreak) - Builds TimelineResult with dominant_sources sorted by entry count descending - Update pipeline.py stub to import TimelineReconstructor (Task 6 wiring prep) - Add tests/test_diagnose_timeline.py: 15 tests covering all 13 required cases plus null-timestamp edge case variant; all 318 tests passing Closes: #29	2026-05-25 12:54:15 -07:00
pyr0ball	afab3ca869	fix: frozen dataclasses, clean __all__, improve exception logging in diagnose package	2026-05-25 12:31:07 -07:00
pyr0ball	da28757a20	refactor: convert diagnose module to package for multi-agent pipeline (issue #29 ) - Move app/services/diagnose.py verbatim to app/services/diagnose/legacy.py - Create app/services/diagnose/__init__.py with full implementation so that patch('app.services.diagnose._HAS_DATEPARSER') targets the correct namespace and all 303 existing tests continue to pass without modification - Add app/services/diagnose/models.py with 5 pipeline dataclasses: EventCluster, TimelineResult, ClassifiedTimeline, Hypothesis, RankedHypothesis - Add app/services/diagnose/pipeline.py with run_pipeline() stub (Task 6) - Add MULTI_AGENT_ENABLED feature flag (off by default via env var) - Zero behavior change; ruff clean Closes: #29	2026-05-25 11:12:39 -07:00
pyr0ball	f7bcc6c9b7	refactor: extract embeddings service layer — decouple context embedder from Ollama - New app/services/embeddings.py: TURNSTONE_EMBED_* env vars, multi-backend support - embedder.py delegates to service layer; re-exports EMBEDDING_AVAILABLE for compat - retriever.py updated to use service layer - Test coverage updated in tests/context/test_embedder.py	2026-05-25 11:01:25 -07:00
pyr0ball	6fec294a53	feat: fingerprint-based incremental glean — skip unchanged files (#30 ) - Add glean_fingerprints table to schema (sha256 + mtime + size) - _fingerprint(), _fp_unchanged(), _save_fingerprint() helpers in pipeline.py - _glean_files() now checks fingerprint; skips file if hash unchanged - force=True param threads through glean_dir → glean_file → glean_sources - POST /api/tasks/glean and POST /api/sources/{id}/glean accept force=true - 14 unit tests in tests/test_glean_fingerprint.py, all passing Closes: #30	2026-05-25 11:01:18 -07:00
pyr0ball	41fc89c474	feat: SSH remote glean — transport layer, pipeline integration, REST + UI (#22 ) Closes turnstone#22. ## Transport layer (app/glean/ssh.py) - SSHTransport context manager: key-only auth, paramiko backend - SSHConnectionError / SSHCommandError exception hierarchy - exec_stream() generator: yields stdout lines, raises SSHCommandError on non-zero exit (isinstance(int) guard for test-mock safety) - Command builders: _build_journald_command, _build_syslog_command, _build_plaintext_command, _build_docker_command - 18 unit tests in tests/test_glean_ssh.py ## Pipeline integration (app/glean/pipeline.py) - _stream_and_write(): per-item error isolation — SSHCommandError skips one glean item without aborting the rest of the host connection - _glean_ssh_source(): one SSHTransport per host, dispatches all glean items (journald/syslog/plaintext/docker); SSHConnectionError aborts host - glean_sources(): splits local vs SSH sources; local → _glean_files(); SSH → _glean_ssh_source(); shared compiled patterns and DB connection - glean_ssh_source(): public wrapper for REST use — manages DB connection, pattern compilation, FTS rebuild lifecycle - 15 integration tests in tests/test_glean_pipeline_ssh.py - All 285 tests passing ## REST layer (app/rest.py) - GET /api/sources/configured: reads sources.yaml and enriches with DB stats; SSH sources appear before first glean (entry_count=0); sub-source IDs (rack01/journald, rack01/docker/myapp) aggregated per host entry - POST /api/sources/{id}/glean: detects transport:ssh and dispatches to glean_ssh_source() wrapper; local sources unchanged - Import: glean_ssh_source as _glean_ssh_source ## Frontend (web/src/views/SourcesView.vue) - Fetches /api/sources/configured (primary) + /api/sources (DB-only) in parallel; merges into unified SourceRow list - SSH sources show: ssh badge (with user@host tooltip), glean-type pills (journald/syslog/docker/etc.), host subtitle - SSH sub-source IDs (rack01/journald) suppressed from the DB-only list since they are covered by the parent SSH row - DB-only sources (uploads) appear below configured sources with 'uploaded' badge; reglean button disabled (not in sources.yaml) - Delete zeroes out configured-source stats in-place rather than removing the row (so the source remains visible for re-gleaning)	2026-05-21 12:37:30 -07:00
pyr0ball	39c13f39ba	feat: SSH remote host glean — transport layer and pipeline integration (closes #22 , backend) Adds SSH-based log collection from remote hosts via Paramiko. One SSH connection per host, multiple log types per connection. New files: - app/glean/ssh.py: SSHTransport context manager + command builders for journald, syslog, plaintext, and docker log types - tests/test_glean_ssh.py: 18 tests for transport layer (all mocked) - tests/test_glean_pipeline_ssh.py: 15 tests for pipeline integration Pipeline changes (app/glean/pipeline.py): - glean_sources() now splits sources into local-file and SSH categories - SSH sources use transport: ssh + glean: list schema in sources.yaml - _glean_ssh_source(): one SSHTransport per host, N commands per connection - _stream_and_write(): SSHCommandError caught per-item so one bad command does not abort the rest of the host's glean items - SSHConnectionError skips the entire host with a warning log SSH source schema (sources.yaml): - id: rack01 transport: ssh host: 192.168.1.10 user: admin key_path: ~/.ssh/id_ed25519 glean: - type: journald args: [--since, 2 hours ago] - type: syslog path: /var/log/syslog - type: plaintext path: /var/log/app/error.log - type: docker containers: [myapp, nginx] Key design decisions: - Key-based auth only (no password prompts in daemon context) - exit-status check fires after all stdout lines yielded; callers drain the iterator to trigger it - Local file sources path unchanged; SSH sources co-exist in same yaml - Docker multi-container: one exec_stream call per container, source_id scoped as host_id/type/container_name Remaining for #22: REST endpoint, SourcesView UI, sources.yaml docs. 285 → 285 tests passing (33 new SSH tests).	2026-05-20 23:03:13 -07:00
pyr0ball	828b69768a	refactor: rename ingest → glean throughout codebase Renames the app/ingest/ package to app/glean/ and updates all references across Python modules, shell scripts, Vue components, tests, and documentation. Intentionally preserved: - SQLite column name ingest_time (avoids schema migration) - RetrievedEntry.ingest_time field (maps to the column above) - Any public-facing JSON keys that reference ingest_time Changes by category: - app/ingest/ → app/glean/ (full package move, all parsers) - app/tasks/ingest_scheduler.py → app/tasks/glean_scheduler.py - scripts/ingest_corpus.py → scripts/glean_corpus.py - tests/test_ingest_.py → tests/test_glean_.py - Docstrings, log messages, comments: ingest → glean - Env var: TURNSTONE_INGEST_INTERVAL → TURNSTONE_GLEAN_INTERVAL - Shell scripts: glean.log, glean_corpus.py references - README.md: multi-source ingest → multi-source glean - .env.example: updated env var name - patterns/: new diagnostic patterns from 2026-05-20 SSH incident (service_crash_loop, pkg_daemon_restart, ssh_forward_conflict) - SourcesView.vue: pipeline label updated - All test import paths updated to app.glean.* 285 tests passing.	2026-05-20 23:02:55 -07:00
pyr0ball	63c742a708	feat: periodic ingest scheduler + Orchard submission pipeline Adds asyncio-native background scheduler (TURNSTONE_INGEST_INTERVAL, default 900s) that runs batch ingest then pushes pattern-matched entries to a remote CF harvest endpoint (TURNSTONE_SUBMIT_ENDPOINT). - app/tasks/ingest_scheduler.py: IngestState, scheduler_loop, run_once, submit_matched, _query_matched_since — asyncio.Lock prevents concurrent runs - app/rest.py: POST /api/ingest/batch (pre-parsed entry receiver), GET /api/tasks/ingest/status, POST /api/tasks/ingest (manual trigger), TURNSTONE_INGEST_INTERVAL + TURNSTONE_SUBMIT_ENDPOINT env wiring in lifespan - docker-compose.submissions.yml: segregated contrib1 (8536) + contrib2 (8537) receiving instances on Heimdall, isolated DBs under /devl/docker/turnstone-submissions/<node>/ - podman-standalone.sh: pass-through for TURNSTONE_SUBMIT_ENDPOINT + TURNSTONE_SOURCE_HOST - app/ingest/mqtt_subscriber.py: MQTT log source adapter - app/ingest/wazuh.py: Wazuh alert JSON adapter - tests/test_ingest_wazuh.py: Wazuh adapter test suite	2026-05-20 08:57:25 -07:00
pyr0ball	6144ba99d9	fix: make sqlite-vec download non-fatal in Dockerfile	2026-05-19 13:02:15 -07:00
pyr0ball	510499aba3	fix: use curl instead of wget for sqlite-vec download in Dockerfile	2026-05-19 13:01:45 -07:00
pyr0ball	ed0a4bb469	feat: Alpha milestone — corpus management, upload ingest, harvester agent Closes #1 (incident tagging — already implemented), #2, #3, #5. - feat(api): DELETE /api/sources/{id} — purge entries + FTS rows for a source - feat(api): POST /api/sources/{id}/ingest — re-ingest from sources.yaml - feat(api): POST /api/ingest/upload — multipart log file upload with auto-detect - feat(ui): SourcesView reingest + delete buttons and upload file input (#2) - feat(harvester): harvester.py push + incident subcommands (#5) - feat(harvester): Dockerfile, docker-compose.yml, harvester.sh (containerless) - feat(config): GPU_SERVER_URL → CF_ORCH_URL resolution + write-back (#20) - docs: .env.example, README Configuration table, version bump to 0.5.0	2026-05-19 07:45:58 -07:00
pyr0ball	1361547c36	docs: bump version badge to match latest Forgejo release	2026-05-17 11:19:13 -07:00
pyr0ball	9f2ae5464a	fix(ui): nested overflow wrapper to prevent overflow-hidden clipping table columns overflow-hidden and overflow-x-auto on the same element conflict in Tailwind's CSS generation order. The shorthand overflow:hidden can override overflow-x:auto, clipping the rightmost column (diagnose buttons). Fix: outer div keeps overflow-hidden for rounded corners, inner div handles overflow-x-auto scrolling.	2026-05-16 09:11:42 -07:00
pyr0ball	0d60533576	feat(ui): mobile fixes for Dashboard and Diagnose views - DashboardView: p-4 sm:p-6 padding, overflow-x-auto on source health table - DiagnoseView: p-4 sm:p-6 padding - QuickCapture: px-4 sm:px-6 + shrink-0 on Search button to avoid input squeeze	2026-05-16 09:04:37 -07:00

1 2 3

141 commits