turnstone

Circuit-Forge/turnstone

Fork 0

Commit graph

Author	SHA1	Message	Date
pyr0ball	99b44ddb81	feat(corpus): synthetic log corpus generator for demos and testing Adds scripts/gen_corpus.py that produces realistic-but-artificial log files across all four supported formats (journald JSON, docker envelope, qBittorrent hotio, AVCX plaintext). Output feeds directly into glean_corpus.py for demo environments and parser regression tests with no production data required. - Seed-based RNG with independent per-source sub-streams (same seed = same sequence for each file regardless of source count changes) - Controllable time range, event density, and error injection rate - Severity distribution mirrors real infrastructure (70% INFO, ~6% ERROR, ~2% CRITICAL) with adjustable boost via --error-rate - 17 tests covering output structure, reproducibility, format correctness, parser round-trip, and CLI acceptance criteria Also fixes a latent bug in app/glean/plaintext.py: ISO 8601 timestamps were silently failing to parse because the T separator was normalised to space in the input string but the strptime format string still contained T. Fix: apply the same normalisation to the format before calling strptime. Closes: #46	2026-06-11 10:57:20 -07:00

Author

SHA1

Message

Date

pyr0ball

99b44ddb81

feat(corpus): synthetic log corpus generator for demos and testing

Adds scripts/gen_corpus.py that produces realistic-but-artificial log
files across all four supported formats (journald JSON, docker envelope,
qBittorrent hotio, AVCX plaintext). Output feeds directly into
glean_corpus.py for demo environments and parser regression tests with
no production data required.

- Seed-based RNG with independent per-source sub-streams (same seed =
  same sequence for each file regardless of source count changes)
- Controllable time range, event density, and error injection rate
- Severity distribution mirrors real infrastructure (70% INFO, ~6% ERROR,
  ~2% CRITICAL) with adjustable boost via --error-rate
- 17 tests covering output structure, reproducibility, format correctness,
  parser round-trip, and CLI acceptance criteria

Also fixes a latent bug in app/glean/plaintext.py: ISO 8601 timestamps
were silently failing to parse because the T separator was normalised to
space in the input string but the strptime format string still contained T.
Fix: apply the same normalisation to the format before calling strptime.

Closes: #46

2026-06-11 10:57:20 -07:00

1 commit