turnstone/harvester/sources.example.yaml
pyr0ball aa80f307fe refactor: rename ingest → glean throughout codebase
Renames the app/ingest/ package to app/glean/ and updates all
references across Python modules, shell scripts, Vue components,
tests, and documentation.

Intentionally preserved:
- SQLite column name ingest_time (avoids schema migration)
- RetrievedEntry.ingest_time field (maps to the column above)
- Any public-facing JSON keys that reference ingest_time

Changes by category:
- app/ingest/ → app/glean/ (full package move, all parsers)
- app/tasks/ingest_scheduler.py → app/tasks/glean_scheduler.py
- scripts/ingest_corpus.py → scripts/glean_corpus.py
- tests/test_ingest_*.py → tests/test_glean_*.py
- Docstrings, log messages, comments: ingest → glean
- Env var: TURNSTONE_INGEST_INTERVAL → TURNSTONE_GLEAN_INTERVAL
- Shell scripts: glean.log, glean_corpus.py references
- README.md: multi-source ingest → multi-source glean
- .env.example: updated env var name
- patterns/: new diagnostic patterns from 2026-05-20 SSH incident
  (service_crash_loop, pkg_daemon_restart, ssh_forward_conflict)
- SourcesView.vue: pipeline label updated
- All test import paths updated to app.glean.*

285 tests passing.
2026-05-20 23:02:55 -07:00

51 lines
1.5 KiB
YAML

# Turnstone Harvester — sources.example.yaml
# Copy to sources.yaml and adjust paths for your system.
# The harvester reads this file and POSTs each log file to Turnstone.
#
# Each source needs:
# id: Short identifier (used as source_id in Turnstone)
# path: Absolute path to the log file on the host
sources:
# System journal (export with: journalctl -o json-pretty > /var/log/journal-export.jsonl)
# - id: system-journal
# path: /var/log/journal-export.jsonl
# Syslog
- id: syslog
path: /var/log/syslog
# Docker daemon log
# - id: docker
# path: /var/log/docker.log
# Podman events (rootful)
# - id: podman
# path: /var/log/podman-events.log
# Caddy access log
# - id: caddy
# path: /var/log/caddy/access.log
# Arr stack — adjust container paths to match your setup
# - id: sonarr
# path: /opt/sonarr/config/logs/sonarr.0.txt
# - id: radarr
# path: /opt/radarr/config/logs/radarr.0.txt
# - id: prowlarr
# path: /opt/prowlarr/config/logs/prowlarr.0.txt
# qBittorrent
# - id: qbittorrent
# path: /opt/qbittorrent/config/data/logs/qbittorrent.log
# Jellyfin
# - id: jellyfin
# path: /opt/jellyfin/log/jellyfin.log
# Wazuh SIEM — alerts.json on the Wazuh manager
# Turnstone auto-detects this format; source_id is qualified per agent automatically.
# For push-based ingestion from Wazuh custom integrations, use:
# POST /api/glean/wazuh/alert (single alert JSON body)
# - id: wazuh
# path: /var/ossec/logs/alerts/alerts.json