Commit graph

5 commits

Author SHA1 Message Date
828b69768a refactor: rename ingest → glean throughout codebase
Renames the app/ingest/ package to app/glean/ and updates all
references across Python modules, shell scripts, Vue components,
tests, and documentation.

Intentionally preserved:
- SQLite column name ingest_time (avoids schema migration)
- RetrievedEntry.ingest_time field (maps to the column above)
- Any public-facing JSON keys that reference ingest_time

Changes by category:
- app/ingest/ → app/glean/ (full package move, all parsers)
- app/tasks/ingest_scheduler.py → app/tasks/glean_scheduler.py
- scripts/ingest_corpus.py → scripts/glean_corpus.py
- tests/test_ingest_*.py → tests/test_glean_*.py
- Docstrings, log messages, comments: ingest → glean
- Env var: TURNSTONE_INGEST_INTERVAL → TURNSTONE_GLEAN_INTERVAL
- Shell scripts: glean.log, glean_corpus.py references
- README.md: multi-source ingest → multi-source glean
- .env.example: updated env var name
- patterns/: new diagnostic patterns from 2026-05-20 SSH incident
  (service_crash_loop, pkg_daemon_restart, ssh_forward_conflict)
- SourcesView.vue: pipeline label updated
- All test import paths updated to app.glean.*

285 tests passing.
2026-05-20 23:02:55 -07:00
63c742a708 feat: periodic ingest scheduler + Orchard submission pipeline
Adds asyncio-native background scheduler (TURNSTONE_INGEST_INTERVAL,
default 900s) that runs batch ingest then pushes pattern-matched entries
to a remote CF harvest endpoint (TURNSTONE_SUBMIT_ENDPOINT).

- app/tasks/ingest_scheduler.py: IngestState, scheduler_loop, run_once,
  submit_matched, _query_matched_since — asyncio.Lock prevents concurrent runs
- app/rest.py: POST /api/ingest/batch (pre-parsed entry receiver),
  GET /api/tasks/ingest/status, POST /api/tasks/ingest (manual trigger),
  TURNSTONE_INGEST_INTERVAL + TURNSTONE_SUBMIT_ENDPOINT env wiring in lifespan
- docker-compose.submissions.yml: segregated contrib1 (8536) + contrib2 (8537)
  receiving instances on Heimdall, isolated DBs under
  /devl/docker/turnstone-submissions/<node>/
- podman-standalone.sh: pass-through for TURNSTONE_SUBMIT_ENDPOINT +
  TURNSTONE_SOURCE_HOST
- app/ingest/mqtt_subscriber.py: MQTT log source adapter
- app/ingest/wazuh.py: Wazuh alert JSON adapter
- tests/test_ingest_wazuh.py: Wazuh adapter test suite
2026-05-20 08:57:25 -07:00
346ea6e0c6 feat: syslog and dmesg parsers with graceful journald fallback
- Add syslog.py — RFC 3164 parser for /var/log/syslog, /var/log/messages,
  auth.log, kern.log; ident prepended to message text for searchability
- Add dmesg_log.py — handles both relative [secs.usecs] and human-readable
  [Dow Mon DD HH:MM:SS YYYY] formats; relative timestamps preserved as raw
- Wire both into pipeline.py auto-detection (before plaintext fallback)
- Update export_journal.sh: checks for journalctl availability, falls back
  gracefully on non-systemd systems; adds dmesg -T export (falls back to
  plain dmesg on older kernels)
- Add syslog entries (commented) + dmesg source to sources.yaml
- 30 tests covering both parsers (detection + parse correctness)
2026-05-11 06:57:38 -07:00
286778d6a9 feat: journald export + system failure patterns
- Add scripts/export_journal.sh — dumps recent journal (priority 0-5,
  20min window) to /opt/turnstone/data/journal-export.jsonl; idempotent
  via entry_id deduplication so overlap is safe
- Add system-journal source to sources.yaml pointing at the export file
- Add 9 system-level patterns to default.yaml:
  systemd_fail, oom_kill, disk_hw_error, fs_error, kernel_error,
  ssh_brute, container_crash, smart_error, nfs_error
2026-05-11 06:54:42 -07:00
f46aba8165 feat: multi-source ingest via sources.yaml + servarr parser
- Add servarr.py parser for all *arr services (sonarr/radarr/lidarr/
  prowlarr/readarr/whisparr/bazarr) — pipe-delimited format with
  component prefix prepended for searchability
- Add ingest_sources() to pipeline.py; reads sources.yaml, skips
  missing paths with a warning so cron keeps running if a service
  is down
- Add --sources mode to ingest_corpus.py CLI; legacy positional args
  unchanged for backward compat
- Add patterns/sources.yaml with all of Contributor2's discovered service
  log paths (qbit, 7 servarr services, nzbget, tautulli, jellyseerr)
- Replace per-service volume mounts in podman-standalone.sh with
  /opt:/opt:ro + /var/log:/var/log:ro; adding a new source now
  requires only editing sources.yaml — no container restart
2026-05-11 06:26:32 -07:00