Adds scripts/gen_corpus.py that produces realistic-but-artificial log
files across all four supported formats (journald JSON, docker envelope,
qBittorrent hotio, AVCX plaintext). Output feeds directly into
glean_corpus.py for demo environments and parser regression tests with
no production data required.
- Seed-based RNG with independent per-source sub-streams (same seed =
same sequence for each file regardless of source count changes)
- Controllable time range, event density, and error injection rate
- Severity distribution mirrors real infrastructure (70% INFO, ~6% ERROR,
~2% CRITICAL) with adjustable boost via --error-rate
- 17 tests covering output structure, reproducibility, format correctness,
parser round-trip, and CLI acceptance criteria
Also fixes a latent bug in app/glean/plaintext.py: ISO 8601 timestamps
were silently failing to parse because the T separator was normalised to
space in the input string but the strptime format string still contained T.
Fix: apply the same normalisation to the format before calling strptime.
Closes: #46