Commit graph

2 commits

Author SHA1 Message Date
58680b3b27 chore: replace vendor product name with generic ext_device throughout
- Rename _EXT_DEVICE_CODES → _EXT_DEVICE_CODES, gen_ext_device → gen_ext_device
- Rename corpus output directory ext_device/ → ext_device/
- Update default.yaml placeholder pattern name and description
- Update tests to match new directory and class names
- Corresponding Forgejo issue titles updated (#43, #44, #54)
2026-06-13 22:03:26 -07:00
5816ed69ae feat(corpus): synthetic log corpus generator for demos and testing
Adds scripts/gen_corpus.py that produces realistic-but-artificial log
files across all four supported formats (journald JSON, docker envelope,
qBittorrent hotio, EXT_DEVICE plaintext). Output feeds directly into
glean_corpus.py for demo environments and parser regression tests with
no production data required.

- Seed-based RNG with independent per-source sub-streams (same seed =
  same sequence for each file regardless of source count changes)
- Controllable time range, event density, and error injection rate
- Severity distribution mirrors real infrastructure (70% INFO, ~6% ERROR,
  ~2% CRITICAL) with adjustable boost via --error-rate
- 17 tests covering output structure, reproducibility, format correctness,
  parser round-trip, and CLI acceptance criteria

Also fixes a latent bug in app/glean/plaintext.py: ISO 8601 timestamps
were silently failing to parse because the T separator was normalised to
space in the input string but the strptime format string still contained T.
Fix: apply the same normalisation to the format before calling strptime.

Closes: #46
2026-06-11 10:57:20 -07:00