Commit graph

4 commits

Author SHA1 Message Date
12cd0a23d5 refactor: rename ingest → glean throughout codebase
Renames the app/ingest/ package to app/glean/ and updates all
references across Python modules, shell scripts, Vue components,
tests, and documentation.

Intentionally preserved:
- SQLite column name ingest_time (avoids schema migration)
- RetrievedEntry.ingest_time field (maps to the column above)
- Any public-facing JSON keys that reference ingest_time

Changes by category:
- app/ingest/ → app/glean/ (full package move, all parsers)
- app/tasks/ingest_scheduler.py → app/tasks/glean_scheduler.py
- scripts/ingest_corpus.py → scripts/glean_corpus.py
- tests/test_ingest_*.py → tests/test_glean_*.py
- Docstrings, log messages, comments: ingest → glean
- Env var: TURNSTONE_INGEST_INTERVAL → TURNSTONE_GLEAN_INTERVAL
- Shell scripts: glean.log, glean_corpus.py references
- README.md: multi-source ingest → multi-source glean
- .env.example: updated env var name
- patterns/: new diagnostic patterns from 2026-05-20 SSH incident
  (service_crash_loop, pkg_daemon_restart, ssh_forward_conflict)
- SourcesView.vue: pipeline label updated
- All test import paths updated to app.glean.*

285 tests passing.
2026-05-20 23:02:55 -07:00
ec931598ca feat: multi-source ingest via sources.yaml + servarr parser
- Add servarr.py parser for all *arr services (sonarr/radarr/lidarr/
  prowlarr/readarr/whisparr/bazarr) — pipe-delimited format with
  component prefix prepended for searchability
- Add ingest_sources() to pipeline.py; reads sources.yaml, skips
  missing paths with a warning so cron keeps running if a service
  is down
- Add --sources mode to ingest_corpus.py CLI; legacy positional args
  unchanged for backward compat
- Add patterns/sources.yaml with all of Xander's discovered service
  log paths (qbit, 7 servarr services, nzbget, tautulli, jellyseerr)
- Replace per-service volume mounts in podman-standalone.sh with
  /opt:/opt:ro + /var/log:/var/log:ro; adding a new source now
  requires only editing sources.yaml — no container restart
2026-05-11 06:26:32 -07:00
cbdd681faf feat: plain-text and Plex log ingestors
- app/ingest/plex.py: Plex Media Server log parser
  Regex-based line parser for 'Mon DD, YYYY HH:MM:SS.mmm [pid] LEVEL - msg'
  format. Handles multi-line entries (stack traces). Detects plex_eae_failure
  and all other patterns via shared pattern library.
- app/ingest/plaintext.py: generic fallback parser for unrecognized formats
  Extracts timestamps (ISO 8601, syslog, common log) and severity via regex.
- pipeline.py: detect plex format via is_plex_log(); fall back to plaintext
  instead of skipping; process *.log files alongside *.jsonl; add ingest_file()
  for single-file ingestion.
- scripts/ingest_corpus.py: accept single file or directory as target
- manage.sh: ingest-plex command SSHes to Cass (or HOST arg), pulls
  Plex Media Server.log, and ingests it directly
2026-05-08 17:50:01 -07:00
3e6eabb7ce feat: initial Turnstone POC — ingest, FTS search, MCP server
Ingest pipeline (journald / Caddy / Docker-wrapped formats) with
per-source state tracking (repeat dedup, out-of-order detection),
named pattern tagging at ingest time, and idempotent SHA1-keyed writes.

FTS5 search layer with porter stemmer, severity/source/pattern/time
filters, and BM25 ranking. MCP server (FastMCP stdio) with three tools:
search_logs, diagnose, list_log_sources — compatible with both
Claude Code and Copilot CLI.

WAL mode enabled on all connections. FTS index auto-built after ingest.
MCP configs included for Claude Code (.mcp.json) and Copilot CLI
(.github/copilot/mcp.json).
2026-05-08 12:12:34 -07:00