turnstone/scripts/export_journal.sh
pyr0ball a37ec559aa feat: journald export + system failure patterns
- Add scripts/export_journal.sh — dumps recent journal (priority 0-5,
  20min window) to /opt/turnstone/data/journal-export.jsonl; idempotent
  via entry_id deduplication so overlap is safe
- Add system-journal source to sources.yaml pointing at the export file
- Add 9 system-level patterns to default.yaml:
  systemd_fail, oom_kill, disk_hw_error, fs_error, kernel_error,
  ssh_brute, container_crash, smart_error, nfs_error
2026-05-11 06:54:42 -07:00

32 lines
1.2 KiB
Bash

#!/usr/bin/env bash
# Export recent journald entries to a JSONL file the Turnstone container can ingest.
#
# Run this on the HOST before the container ingest step. The output file lands in
# /opt/turnstone/data/ which is bind-mounted at /data/ inside the container.
#
# Priority filter 0-5 (emerg→notice) skips debug/info noise while keeping
# all warnings, errors, and service lifecycle events.
#
# Usage (standalone):
# sudo bash /opt/turnstone/scripts/export_journal.sh
#
# Typical cron (combined with ingest — see crontab comment below):
# */15 * * * * bash /opt/turnstone/scripts/export_journal.sh && \
# podman exec turnstone python scripts/ingest_corpus.py \
# --sources /patterns/sources.yaml --db /data/turnstone.db \
# >> /var/log/turnstone-ingest.log 2>&1
set -euo pipefail
OUT=/opt/turnstone/data/journal-export.jsonl
# 20-minute window (slightly wider than the 15-min cron interval) ensures no
# gaps between runs. Ingest is idempotent via entry_id hash, so overlap is safe.
journalctl \
--output=json \
--priority=0..5 \
--since "20 minutes ago" \
--no-pager \
> "${OUT}"
echo "Exported $(wc -l < "${OUT}") journal entries to ${OUT}"