turnstone/scripts
pyr0ball 5816ed69ae feat(corpus): synthetic log corpus generator for demos and testing
Adds scripts/gen_corpus.py that produces realistic-but-artificial log
files across all four supported formats (journald JSON, docker envelope,
qBittorrent hotio, EXT_DEVICE plaintext). Output feeds directly into
glean_corpus.py for demo environments and parser regression tests with
no production data required.

- Seed-based RNG with independent per-source sub-streams (same seed =
  same sequence for each file regardless of source count changes)
- Controllable time range, event density, and error injection rate
- Severity distribution mirrors real infrastructure (70% INFO, ~6% ERROR,
  ~2% CRITICAL) with adjustable boost via --error-rate
- 17 tests covering output structure, reproducibility, format correctness,
  parser round-trip, and CLI acceptance criteria

Also fixes a latent bug in app/glean/plaintext.py: ISO 8601 timestamps
were silently failing to parse because the T separator was normalised to
space in the input string but the strptime format string still contained T.
Fix: apply the same normalisation to the format before calling strptime.

Closes: #46
2026-06-11 10:57:20 -07:00
..
manifests feat(diagnose): tech-level post-processor, offline mode, API auth, context harvest 2026-05-28 08:51:05 -07:00
build_fts_index.py refactor: rename ingest → glean throughout codebase 2026-05-20 23:02:55 -07:00
collect_cluster_logs.sh fix(db): add timeout=30s to all sqlite3.connect() calls across app 2026-05-26 23:12:48 -07:00
docker-cluster.sh fix: correct cf-orch port to 7700; fix relative time parsing in diagnose; fix syslog PRI prefix 2026-05-13 05:33:41 -07:00
export_corpus.py feat: periodic corpus export — push ERROR/CRITICAL entries and incidents to Avocet 2026-05-11 17:08:35 -07:00
export_journal.sh refactor: rename ingest → glean throughout codebase 2026-05-20 23:02:55 -07:00
gen_corpus.py feat(corpus): synthetic log corpus generator for demos and testing 2026-06-11 10:57:20 -07:00
glean_corpus.py fix(glean): add timeout=30s to all pipeline DB connections; add --force flag; new patterns 2026-05-26 22:36:45 -07:00
harvest_docs.py feat(diagnose): tech-level post-processor, offline mode, API auth, context harvest 2026-05-28 08:51:05 -07:00
migrate_sqlite_to_postgres.py feat: dual-backend SQLite/Postgres + multi-tenant source namespacing 2026-06-08 08:37:54 -07:00
syslog_receiver.py feat: add UDP syslog receiver for network device log collection 2026-05-13 04:58:51 -07:00
turnstone-cluster-collect.service fix: run collect service as alan user; call ingest directly without Docker 2026-05-13 05:17:43 -07:00
turnstone-cluster-collect.timer refactor: use live watcher + systemd timer instead of cron for cluster ingest 2026-05-13 04:55:25 -07:00
turnstone-cluster.service feat: source-scoped diagnose; multi-node Docker log collection 2026-05-13 08:10:42 -07:00
turnstone-syslog-receiver.service feat: add UDP syslog receiver for network device log collection 2026-05-13 04:58:51 -07:00
update.sh chore(corpus): preserve watermark files across updates; document corpus env vars 2026-06-10 15:01:19 -07:00
watch_plex.py feat: plex EAE watchdog and plex_eae_failure pattern 2026-05-08 13:41:34 -07:00