turnstone

Author	SHA1	Message	Date
pyr0ball	eac9a4ba28	Merge feat/15-hybrid-rag: hybrid BM25 + vector re-ranking for diagnose search (#15 )	2026-06-01 20:00:02 -07:00
pyr0ball	cfddff6a2a	Merge feat/41-hybrid-bert-shim: Hybrid-BERT label mapping shim (#41 )	2026-06-01 19:59:34 -07:00
pyr0ball	1abdcfb1f3	feat: hybrid BM25 + vector re-ranking for diagnose search (#15 ) Adds late-fusion hybrid search to Turnstone's log retrieval layer: hybrid_score = 0.6 * bm25_normalized + 0.4 * cosine_similarity Implementation: - _bm25_search() extracts the existing FTS5 BM25 path as a named helper - _hybrid_search() fetches an oversized BM25 candidate pool (5x limit, min 100), embeds the query and each candidate text in-process via the existing embeddings service, normalizes BM25 rank to [0,1], combines with cosine similarity, and re-ranks - search() gets semantic=False param that dispatches to _hybrid_search() when True; pure BM25 remains the default for all existing call sites - diagnose_stream() enables semantic=True so symptom-based queries ("database connection failed") surface semantically equivalent entries ("ECONNREFUSED", "backend gone away", "max retries exceeded") - /api/search REST endpoint exposes ?semantic=true query param Graceful degradation: falls back silently to pure BM25 when the embedding backend is unavailable (EMBEDDING_AVAILABLE=False) or when embed_batch raises an exception. No new infra — in-process numpy cosine, no vector DB. 11 new tests: BM25 helper, hybrid re-ranking, fallback paths, dispatcher. 372 + 11 = 383 tests passing. Closes: #15	2026-06-01 18:13:09 -07:00
pyr0ball	503a36d76c	feat(classifier): add Hybrid-BERT label mapping shim (#41 ) Adds _HYBRID_BERT_LABEL_MAP to translate the 7-class output vocabulary of krishnas4415/log-anomaly-detection-models (Hybrid-BERT, MIT) to Turnstone SeverityLabel. _map_label now checks the Hybrid-BERT map before the standard map so either model family works via TURNSTONE_CLASSIFIER_MODEL without any additional code path. Mapping (confirmed from model config.json): normal → INFO security_anomaly → ERROR system_failure → CRITICAL performance_issue → WARN network_anomaly → WARN config_error → ERROR hardware_issue → CRITICAL Keyword-based CRITICAL promotion and low-confidence DEBUG demotion apply on top of the base mapping (same rules as the standard vocabulary). 11 new tests covering all 7 Hybrid-BERT labels, case-insensitivity, and regression on standard-vocabulary labels. 372 tests passing total. Note: custom loading code for the non-standard .pt checkpoint format is explicitly out of scope — evaluate better-packaged HF alternatives first (see #41 for candidate list). Closes: #41	2026-06-01 16:20:31 -07:00
pyr0ball	bd3923e163	fix: split incidents tables to dedicated turnstone-incidents.db (#60 ) FTS5 bulk-insert write locks starved the incident API and bundle endpoints during log bursts (sonarr/radarr, high-volume docker sources). Fix mirrors the context_facts split (context -> turnstone-context.db): - Add INCIDENTS_DB_PATH / TURNSTONE_INCIDENTS_DB env var in rest.py - Add _INCIDENTS_SCHEMA, ensure_incidents_schema(), and migrate_incidents_to_dedicated_db() in glean/pipeline.py - Stub out incidents/received_bundles/sent_bundles in _SCHEMA (no-op CREATE IF NOT EXISTS) so legacy single-file deployments still open - Thread incidents_db_path through diagnose_stream -> run_pipeline -> FalsePositiveSuppressor.suppress -> _fetch_resolved_incidents - One-shot migration on startup: copy existing rows from main DB to incidents DB via INSERT OR IGNORE (idempotent, safe to re-run) - Fix test_blocklist_endpoints fixtures to patch CONTEXT_DB_PATH and INCIDENTS_DB_PATH alongside DB_PATH (worktree has no data/ dir) 372 tests passing. Closes: #60	2026-06-01 15:54:23 -07:00
pyr0ball	25b7ae340b	fix: invert suppress_threshold semantics to similarity_threshold in FalsePositiveSuppressor Was suppressing when novelty_score < 0.85 (i.e. similarity > 0.15), which would suppress nearly every hypothesis once embeddings are active. Now suppresses when max_sim >= similarity_threshold (0.85), meaning only hypotheses that are 85%+ similar to a resolved incident are suppressed. Also renames suppress_threshold → similarity_threshold for clarity and adds a borderline boundary test (0.85 suppressed, 0.84 not suppressed). Closes: #29	2026-05-25 18:58:52 -07:00
pyr0ball	1865ba1f02	feat: Stage 5 synthesizer + pipeline orchestrator + feature flag wiring (issue #29 ) - Add app/services/diagnose/synthesizer.py: SummarySynthesizer (Stage 5) - Builds structured LLM prompt from ranked hypotheses, timeline, RAG context - Excludes suppressed hypotheses from the narrative prompt - Deterministic fallback when no LLM configured or LLM call fails - Same cf-orch task endpoint + direct OpenAI-compat fallback pattern as other stages - Replace pipeline.py stub with full run_pipeline() async generator - Orchestrates all 5 stages via asyncio.to_thread for each synchronous stage - Yields typed SSE event dicts: status, pipeline_stage (1-4), hypotheses, reasoning, done - Suppressor counts (active vs suppressed) reported in stage 4 event message - Wire MULTI_AGENT_ENABLED feature flag into diagnose_stream() - TURNSTONE_MULTI_AGENT_DIAGNOSE=true routes through run_pipeline() - pipeline emits its own done event; legacy path unchanged when flag is false - Import of run_pipeline added to __init__.py - Add 21 new tests (350 -> 371 passing): - tests/test_diagnose_synthesizer.py: 8 tests (with/without LLM, suppressed, empty ranked, LLM failure fallback) - tests/test_diagnose_pipeline.py: 13 tests (flag off, flag on event sequence, empty entries, no LLM, stage 1 cluster count message) Closes: #29	2026-05-25 14:56:25 -07:00
pyr0ball	54d4ec5325	refactor: extract _score_hypothesis helper, fix exception types, pass device in suppressor	2026-05-25 14:41:33 -07:00
pyr0ball	84e0cf5245	feat: Stage 4 — FalsePositiveSuppressor for multi-agent diagnose pipeline (issue #29 ) - Implements FalsePositiveSuppressor using embedding cosine similarity - Lazy corpus embedding via get_embedder() with module-level cache keyed by db_path - Cache invalidated automatically when the resolved incident corpus changes - Suppresses hypotheses with novelty_score below configurable threshold (default 0.85) - Full fallback path (novelty=1.0, no suppression) when model_id empty, embedding service unavailable, or no resolved incidents found in DB - Graceful handling of missing incidents table and DB query failures - Numpy bool_ leakage prevented by explicit float()/bool() coercion at assignment - Pure-Python cosine fallback for environments without numpy - 9 new tests (all mocked, no real model downloads): passthrough, suppress, no-suppress, empty list, ranking, empty corpus, DB failure, service unavailable, cache invalidation - 350 total tests passing (341 pre-existing + 9 new) Closes: #29	2026-05-25 14:28:31 -07:00
pyr0ball	a2916f958a	fix: defensive coercion for LLM confidence and cluster fields in hypothesizer - Add _coerce_float() module-level helper: catches TypeError/ValueError from non-numeric LLM output (e.g. 'high', 'N/A') and returns a caller-supplied default instead of raising. - Replace float(item.get('confidence', 0.5)) with _coerce_float(item.get('confidence'), 0.5) in _parse_response. - Guard supporting_cluster_ids: tuple(item.get(...) or []) so a JSON null from the LLM does not cause TypeError('NoneType is not iterable'). - runbook_refs is hardcoded as () and not sourced from LLM output; no change needed there. - Add test_non_numeric_confidence_uses_default (Test 10) to cover the 'high' string case: asserts no exception and confidence == 0.5. - 341 tests passing (+1). Closes: #29	2026-05-25 14:00:30 -07:00
pyr0ball	34fb8f501d	feat: Stage 3 — RootCauseHypothesizer for multi-agent diagnose pipeline (issue #29 ) - Add app/services/diagnose/hypothesizer.py with RootCauseHypothesizer class - Stage 3 of the multi-agent diagnose pipeline: accepts ClassifiedTimeline + RetrievedContext, builds a structured JSON prompt, calls the LLM via the same cf-orch task → OpenAI-compat fallback pattern used by llm.py - Parses JSON array response into list[Hypothesis] dataclasses with UUID ids, severity validation (WARNING→WARN, unknown→ERROR), confidence coercion - Gracefully returns [] when llm_url/llm_model absent or clusters empty - Add tests/test_diagnose_hypothesizer.py: 12 tests, all mocked, no LLM I/O covering: valid response, UUID generation, malformed JSON, non-list JSON, empty clusters, missing URL/model, max_hypotheses cap, severity mapping, confidence string coercion - 340 tests passing (328 prior + 12 new) Closes: #29	2026-05-25 13:49:18 -07:00
pyr0ball	6ea8fbfec1	feat: Stage 2 — SeverityClassifier for multi-agent diagnose pipeline (issue #29 ) Three-path classification: ML (transformers pipeline, lazy singleton) → pattern_tags (YAML pattern severity dict) → regex (detect_severity). - Path A: HF text-classification pipeline loaded lazily on first classify() call via module-level singleton; shim promotes ERROR+keyword hits to CRITICAL and demotes low-confidence INFO to DEBUG. - Path B: maps cluster.pattern_tags through the loaded pattern severity dict; picks the highest severity across matching tags. - Path C: falls back to detect_severity() regex scan on representative_text; defaults to INFO when no keyword matches. - Pattern file resolved from constructor arg or TURNSTONE_PATTERNS env var (mirrors app/rest.py convention). - No crash when transformers is not installed; ImportError on per-cluster ML inference triggers clean per-cluster fallback to pattern_tags/regex. - ClassifiedTimeline.classifier_used reflects the primary session path. Tests (10 new, 328 total, all passing): - ML ERROR, CRITICAL promotion, DEBUG demotion, WARNING→WARN - pattern_tags resolution from YAML fixture - regex ERROR detection and INFO default - ImportError clean fallback - empty timeline no-crash - ClassifiedTimeline FrozenInstanceError on mutation Closes: #29	2026-05-25 13:27:17 -07:00
pyr0ball	f7429ee963	feat: Stage 1 — TimelineReconstructor for multi-agent diagnose pipeline (issue #29 ) - Add app/services/diagnose/timeline.py: pure-Python TimelineReconstructor - Sorts entries by timestamp_iso (None entries appended at end) - Sliding-window clustering anchored to first entry in each cluster - Computes cluster_id (sha1[:12]), severity (highest wins), burst flag, gap_before_seconds, representative_text (highest rank, longest text tiebreak) - Builds TimelineResult with dominant_sources sorted by entry count descending - Update pipeline.py stub to import TimelineReconstructor (Task 6 wiring prep) - Add tests/test_diagnose_timeline.py: 15 tests covering all 13 required cases plus null-timestamp edge case variant; all 318 tests passing Closes: #29	2026-05-25 12:54:15 -07:00
pyr0ball	f7bcc6c9b7	refactor: extract embeddings service layer — decouple context embedder from Ollama - New app/services/embeddings.py: TURNSTONE_EMBED_* env vars, multi-backend support - embedder.py delegates to service layer; re-exports EMBEDDING_AVAILABLE for compat - retriever.py updated to use service layer - Test coverage updated in tests/context/test_embedder.py	2026-05-25 11:01:25 -07:00
pyr0ball	6fec294a53	feat: fingerprint-based incremental glean — skip unchanged files (#30 ) - Add glean_fingerprints table to schema (sha256 + mtime + size) - _fingerprint(), _fp_unchanged(), _save_fingerprint() helpers in pipeline.py - _glean_files() now checks fingerprint; skips file if hash unchanged - force=True param threads through glean_dir → glean_file → glean_sources - POST /api/tasks/glean and POST /api/sources/{id}/glean accept force=true - 14 unit tests in tests/test_glean_fingerprint.py, all passing Closes: #30	2026-05-25 11:01:18 -07:00
pyr0ball	39c13f39ba	feat: SSH remote host glean — transport layer and pipeline integration (closes #22 , backend) Adds SSH-based log collection from remote hosts via Paramiko. One SSH connection per host, multiple log types per connection. New files: - app/glean/ssh.py: SSHTransport context manager + command builders for journald, syslog, plaintext, and docker log types - tests/test_glean_ssh.py: 18 tests for transport layer (all mocked) - tests/test_glean_pipeline_ssh.py: 15 tests for pipeline integration Pipeline changes (app/glean/pipeline.py): - glean_sources() now splits sources into local-file and SSH categories - SSH sources use transport: ssh + glean: list schema in sources.yaml - _glean_ssh_source(): one SSHTransport per host, N commands per connection - _stream_and_write(): SSHCommandError caught per-item so one bad command does not abort the rest of the host's glean items - SSHConnectionError skips the entire host with a warning log SSH source schema (sources.yaml): - id: rack01 transport: ssh host: 192.168.1.10 user: admin key_path: ~/.ssh/id_ed25519 glean: - type: journald args: [--since, 2 hours ago] - type: syslog path: /var/log/syslog - type: plaintext path: /var/log/app/error.log - type: docker containers: [myapp, nginx] Key design decisions: - Key-based auth only (no password prompts in daemon context) - exit-status check fires after all stdout lines yielded; callers drain the iterator to trigger it - Local file sources path unchanged; SSH sources co-exist in same yaml - Docker multi-container: one exec_stream call per container, source_id scoped as host_id/type/container_name Remaining for #22: REST endpoint, SourcesView UI, sources.yaml docs. 285 → 285 tests passing (33 new SSH tests).	2026-05-20 23:03:13 -07:00
pyr0ball	828b69768a	refactor: rename ingest → glean throughout codebase Renames the app/ingest/ package to app/glean/ and updates all references across Python modules, shell scripts, Vue components, tests, and documentation. Intentionally preserved: - SQLite column name ingest_time (avoids schema migration) - RetrievedEntry.ingest_time field (maps to the column above) - Any public-facing JSON keys that reference ingest_time Changes by category: - app/ingest/ → app/glean/ (full package move, all parsers) - app/tasks/ingest_scheduler.py → app/tasks/glean_scheduler.py - scripts/ingest_corpus.py → scripts/glean_corpus.py - tests/test_ingest_.py → tests/test_glean_.py - Docstrings, log messages, comments: ingest → glean - Env var: TURNSTONE_INGEST_INTERVAL → TURNSTONE_GLEAN_INTERVAL - Shell scripts: glean.log, glean_corpus.py references - README.md: multi-source ingest → multi-source glean - .env.example: updated env var name - patterns/: new diagnostic patterns from 2026-05-20 SSH incident (service_crash_loop, pkg_daemon_restart, ssh_forward_conflict) - SourcesView.vue: pipeline label updated - All test import paths updated to app.glean.* 285 tests passing.	2026-05-20 23:02:55 -07:00
pyr0ball	63c742a708	feat: periodic ingest scheduler + Orchard submission pipeline Adds asyncio-native background scheduler (TURNSTONE_INGEST_INTERVAL, default 900s) that runs batch ingest then pushes pattern-matched entries to a remote CF harvest endpoint (TURNSTONE_SUBMIT_ENDPOINT). - app/tasks/ingest_scheduler.py: IngestState, scheduler_loop, run_once, submit_matched, _query_matched_since — asyncio.Lock prevents concurrent runs - app/rest.py: POST /api/ingest/batch (pre-parsed entry receiver), GET /api/tasks/ingest/status, POST /api/tasks/ingest (manual trigger), TURNSTONE_INGEST_INTERVAL + TURNSTONE_SUBMIT_ENDPOINT env wiring in lifespan - docker-compose.submissions.yml: segregated contrib1 (8536) + contrib2 (8537) receiving instances on Heimdall, isolated DBs under /devl/docker/turnstone-submissions/<node>/ - podman-standalone.sh: pass-through for TURNSTONE_SUBMIT_ENDPOINT + TURNSTONE_SOURCE_HOST - app/ingest/mqtt_subscriber.py: MQTT log source adapter - app/ingest/wazuh.py: Wazuh alert JSON adapter - tests/test_ingest_wazuh.py: Wazuh adapter test suite	2026-05-20 08:57:25 -07:00
pyr0ball	1e186591d7	feat(blocklist): 6 REST endpoints + Pi-hole settings fields Add blocklist candidate listing, scan trigger, status update, push/unblock to Pi-hole, and connection test endpoints. Add pihole_url/version/api_key and router_source_ids/device_names fields to SettingsBody and prefs handling in patch_settings. Add PiholeClient.__post_init__ validation so 503 fires naturally when url/api_key are unconfigured (mock-safe: bypassed in tests).	2026-05-15 21:15:09 -07:00
pyr0ball	aa55a1ce24	feat(blocklist): extraction scan + candidate CRUD + full test suite	2026-05-15 21:05:49 -07:00
pyr0ball	38138dc0c0	fix(blocklist): validate _v6_auth session JSON, add auth-failure test	2026-05-15 21:03:03 -07:00
pyr0ball	dceb2d30ca	feat(blocklist): Pi-hole v5/v6 API client + tests PiholeClient dataclass supporting both Pi-hole v5 (PHP /admin/api.php) and v6 (REST /api/) with public block/unblock/test_connection methods. 9 tests covering both API versions, auth flow, and error handling.	2026-05-15 21:00:01 -07:00
pyr0ball	f469692c52	feat(blocklist): telemetry YAML list + loader + domain matcher Adds patterns/telemetry.yaml with 6 rule groups (samsung, belkin, roku, lg, amazon, advertising). Adds app/services/blocklist.py with TelemetryRule and BlocklistCandidate dataclasses, load_telemetry_rules(), and matches_telemetry() with exact and subdomain matching. 6 new TestTelemetry tests pass; 199 total passing.	2026-05-15 20:54:40 -07:00
pyr0ball	4d7c436721	feat(blocklist): blocklist_candidates schema + tests Add blocklist_candidates table and indexes to _SCHEMA in pipeline.py. Add TestSchema tests verifying table existence, column set, and status/hit_count defaults. All 193 tests pass.	2026-05-15 20:51:00 -07:00
pyr0ball	279b01902f	fix: tautulli — hmac token compare, public pattern loader, startup cache, endpoint tests	2026-05-13 19:08:49 -07:00
pyr0ball	581e0314b4	fix: tautulli — entry_id collision on missing ts, token settings, test coverage	2026-05-13 19:04:07 -07:00
pyr0ball	4fbac2554e	feat: Tautulli webhook ingest endpoint — plex events -> log_entries POST /turnstone/api/ingest/tautulli accepts Tautulli notification agent payloads and stores them as log_entries under source 'tautulli'. Severity maps error->CRITICAL, buffer->WARN, all others->None. Optional bearer token auth via X-Tautulli-Token header + tautulli_token pref. FTS index rebuilt as a background task after each write. 28 new tests, all passing.	2026-05-13 18:41:03 -07:00
pyr0ball	0b3d95cd26	fix: ingestors treat naive log timestamps as local time, not UTC All five parsers (plex, syslog, servarr, qbittorrent, plaintext) were using .replace(tzinfo=timezone.utc) on naive datetimes parsed from log files, which slaps a UTC label on what is actually local-time data. On a UTC-7 system a 2pm entry was stored as 14:00Z instead of 21:00Z, causing time-window searches to return zero results. Fix: use .astimezone(timezone.utc) instead, which treats the naive datetime as local time and converts correctly. Tests updated to round-trip back to local time for assertion so they pass on any timezone, not just UTC.	2026-05-13 18:16:33 -07:00
pyr0ball	e0bfa11642	feat: optional sqlite-vec embedding pipeline for Paid-tier RAG	2026-05-13 16:32:57 -07:00
pyr0ball	b5ce0a24b2	feat: inject environment context into diagnose pipeline and LLM prompt - Add context_block param to summarize() and thread it into _PROMPT_TEMPLATE - Wire retrieve_context/format_context_block into diagnose_stream() before log search; emit context SSE event (facts + chunks) to the client - 3 new tests covering prompt injection and SSE event emission (155 total, all pass)	2026-05-13 16:29:26 -07:00
pyr0ball	783edbe496	feat: wizard state machine — structured Q&A writes context facts and source config	2026-05-13 16:25:52 -07:00
pyr0ball	ef8d164188	feat: context retriever — keyword fact lookup and chunk search	2026-05-13 16:23:54 -07:00
pyr0ball	ebbb1af32d	feat: doc upload adapter — writes facts, document, and chunks to context store	2026-05-13 16:21:55 -07:00
pyr0ball	b23a60a602	feat: context chunker — type detection, YAML extraction, text chunking - Implement document type detection for yaml/json/markdown/text - Extract service facts from docker-compose YAML (names, images, ports) - Split text into overlapping word chunks (300-word default with 50-word overlap) - Enforce 5 MB file size limit - Comprehensive TDD test suite: 15 tests passing	2026-05-13 15:54:51 -07:00
pyr0ball	54c756dfe8	feat: context store — fact and document CRUD	2026-05-13 15:53:03 -07:00
pyr0ball	7461953021	feat: add context_facts, context_documents, context_chunks tables to schema	2026-05-13 15:51:19 -07:00
pyr0ball	7d46314e86	feat: switch LLM backend to OpenAI-compat; add cf-orch remote inference support Turnstone now calls /v1/chat/completions instead of Ollama's /api/generate. This format works with both local Ollama (>=0.1.24) and a remote cf-orch coordinator, enabling GPU-less nodes like Contributor2's to route diagnoses through the cluster without any local model. - llm.py: OpenAI-compat messages format, optional Bearer auth header - diagnose.py: thread llm_api_key through the call chain - rest.py: llm_api_key pref (default empty), SettingsBody field, passed to diagnose - SettingsView.vue: API Key field, label updated from "Ollama URL" to "LLM Endpoint URL" - tests: updated mocks for new response shape; added bearer token assertion test	2026-05-12 12:58:38 -07:00
pyr0ball	afcac6ff05	feat: periodic corpus export — push ERROR/CRITICAL entries and incidents to Avocet Watermark-based batch export script (scripts/export_corpus.py) pushes up to 500 ERROR/CRITICAL entries and labeled incidents per run to AVOCET_CORPUS_ENDPOINT. Uses SQLite rowid watermark (entry log) and ISO timestamp watermark (incidents). Skips silently when AVOCET_CORPUS_ENDPOINT is not set. 19 tests. Closes turnstone#6.	2026-05-11 17:08:35 -07:00
pyr0ball	9cc8bf3662	feat: add file tail source type; configure example-node watchers - type: file uses tail -F (handles rotation) with auto-format detection - _parse_lines dispatches to journald/servarr/qbit/caddy/syslog/plaintext based on first-line format detection — same logic as batch ingest - watch.yaml updated with file type docs and example-node-specific example - scripts/journal-bridge.sh + .service written directly to example-node Contributor2's watch.yaml covers: system-journal-live (via bridge file), sonarr, radarr, lidarr, prowlarr, bazarr, qbittorrent, nzbget, tautulli	2026-05-11 15:44:10 -07:00
pyr0ball	3fd81e5ab1	feat: live watch mode — tail journald/docker/podman sources continuously (#4 ) Adds background watcher that tails active log sources and ingests entries in near-real-time, keeping the DB fresh without manual ingest runs. - app/watch/watcher.py: Watcher + WatchSource using subprocess + select loop; flushes every 10s or 100 lines; syncs FTS index every 3 flushes - patterns/watch.yaml: declarative source config (journald/docker/podman) - app/rest.py: lifespan context manager starts/stops watcher on app startup/shutdown; GET /api/watch/status + POST /api/watch/reload - web/src/views/DashboardView.vue: live/manual indicator chip + stale banner copy adapts to whether live watching is active - tests/test_watch_watcher.py: 16 tests covering config load, command building, docker timestamp stripping, orchestrator lifecycle Closes #4	2026-05-11 15:34:13 -07:00
pyr0ball	0882083755	feat: LLM reasoning layer — Ollama summarization on diagnose results	2026-05-11 11:35:07 -07:00
pyr0ball	ca0cb1361e	fix: correct time_detected logic, immutable sort pattern, add diagnose() test	2026-05-11 09:08:24 -07:00
pyr0ball	abd142addf	feat: add diagnose service with NL time extraction via dateparser Adds app/services/diagnose.py with parse_time_window() (dateparser-backed NL time phrase extraction with 60-min fallback) and diagnose() (layered FTS + window search returning severity/source summary). Includes 5 TDD tests.	2026-05-11 09:04:50 -07:00
pyr0ball	346ea6e0c6	feat: syslog and dmesg parsers with graceful journald fallback - Add syslog.py — RFC 3164 parser for /var/log/syslog, /var/log/messages, auth.log, kern.log; ident prepended to message text for searchability - Add dmesg_log.py — handles both relative [secs.usecs] and human-readable [Dow Mon DD HH:MM:SS YYYY] formats; relative timestamps preserved as raw - Wire both into pipeline.py auto-detection (before plaintext fallback) - Update export_journal.sh: checks for journalctl availability, falls back gracefully on non-systemd systems; adds dmesg -T export (falls back to plain dmesg on older kernels) - Add syslog entries (commented) + dmesg source to sources.yaml - 30 tests covering both parsers (detection + parse correctness)	2026-05-11 06:57:38 -07:00
pyr0ball	40bbc9225d	fix: support hotio qBittorrent 5.x log format (N/I/W/C single-char level) Contributor2's ghcr.io/hotio/qbittorrent:latest container uses a different format than the classic GUI build: `(N) 2026-04-26T03:32:59 - message` with a single-char level code before an ISO timestamp, not inside parens. Added _HOTIO_RE alongside _CLASSIC_RE; unified via _match_line() helper so parse() loop is unchanged. 28 tests passing, both formats covered.	2026-05-11 05:55:40 -07:00
pyr0ball	a3c0962277	feat: qBittorrent log ingestor with 8 diagnostic patterns Adds app/ingest/qbittorrent.py — auto-detected by the pipeline on the (YYYY/MM/DD HH:MM:SS) timestamp fingerprint. Handles both slash and dash date separators, optional [Warning\|Critical] bracket levels, and multi-line continuations (Qt stack traces). patterns/default.yaml: 8 new qbit_ patterns covering tracker errors, port bind failures, disk errors, hash check failures, peer bans, download completion, ratio limits, and session errors. manage.sh: ingest-qbit [HOST] command mirrors ingest-plex — probes known default log paths locally or via SSH, ingests, restarts server. 14 tests covering format detection, severity mapping, multiline handling, and timestamp normalization.	2026-05-10 08:21:16 -07:00
pyr0ball	bbe4b1e360	feat: initial Turnstone POC — ingest, FTS search, MCP server Ingest pipeline (journald / Caddy / Docker-wrapped formats) with per-source state tracking (repeat dedup, out-of-order detection), named pattern tagging at ingest time, and idempotent SHA1-keyed writes. FTS5 search layer with porter stemmer, severity/source/pattern/time filters, and BM25 ranking. MCP server (FastMCP stdio) with three tools: search_logs, diagnose, list_log_sources — compatible with both Claude Code and Copilot CLI. WAL mode enabled on all connections. FTS index auto-built after ingest. MCP configs included for Claude Code (.mcp.json) and Copilot CLI (.github/copilot/mcp.json).	2026-05-08 12:12:34 -07:00

47 commits