turnstone/tests
pyr0ball 5816ed69ae feat(corpus): synthetic log corpus generator for demos and testing
Adds scripts/gen_corpus.py that produces realistic-but-artificial log
files across all four supported formats (journald JSON, docker envelope,
qBittorrent hotio, EXT_DEVICE plaintext). Output feeds directly into
glean_corpus.py for demo environments and parser regression tests with
no production data required.

- Seed-based RNG with independent per-source sub-streams (same seed =
  same sequence for each file regardless of source count changes)
- Controllable time range, event density, and error injection rate
- Severity distribution mirrors real infrastructure (70% INFO, ~6% ERROR,
  ~2% CRITICAL) with adjustable boost via --error-rate
- 17 tests covering output structure, reproducibility, format correctness,
  parser round-trip, and CLI acceptance criteria

Also fixes a latent bug in app/glean/plaintext.py: ISO 8601 timestamps
were silently failing to parse because the T separator was normalised to
space in the input string but the strptime format string still contained T.
Fix: apply the same normalisation to the format before calling strptime.

Closes: #46
2026-06-11 10:57:20 -07:00
..
context feat: dual-backend SQLite/Postgres + multi-tenant source namespacing 2026-06-08 08:37:54 -07:00
__init__.py feat: initial Turnstone POC — ingest, FTS search, MCP server 2026-05-08 12:12:34 -07:00
test_anomaly.py feat: anomaly scoring pipeline (#10) 2026-06-09 11:15:13 -07:00
test_blocklist_endpoints.py Merge feat/32-domain-view: domain-view mapping for patterns and diagnose output (#32) 2026-06-01 20:01:19 -07:00
test_cybersec.py feat: cybersec zero-shot scoring pipeline (#9) 2026-06-10 01:03:25 -07:00
test_diagnose_classifier.py feat(classifier): add Hybrid-BERT label mapping shim (#41) 2026-06-01 16:20:31 -07:00
test_diagnose_hypothesizer.py fix: defensive coercion for LLM confidence and cluster fields in hypothesizer 2026-05-25 14:00:30 -07:00
test_diagnose_pipeline.py feat: Stage 5 synthesizer + pipeline orchestrator + feature flag wiring (issue #29) 2026-05-25 14:56:25 -07:00
test_diagnose_suppressor.py fix: invert suppress_threshold semantics to similarity_threshold in FalsePositiveSuppressor 2026-05-25 18:58:52 -07:00
test_diagnose_synthesizer.py feat: Stage 5 synthesizer + pipeline orchestrator + feature flag wiring (issue #29) 2026-05-25 14:56:25 -07:00
test_diagnose_timeline.py feat: Stage 1 — TimelineReconstructor for multi-agent diagnose pipeline (issue #29) 2026-05-25 12:54:15 -07:00
test_export_corpus.py feat: periodic corpus export — push ERROR/CRITICAL entries and incidents to Avocet 2026-05-11 17:08:35 -07:00
test_gen_corpus.py feat(corpus): synthetic log corpus generator for demos and testing 2026-06-11 10:57:20 -07:00
test_glean_dmesg.py refactor: rename ingest → glean throughout codebase 2026-05-20 23:02:55 -07:00
test_glean_fingerprint.py feat: dual-backend SQLite/Postgres + multi-tenant source namespacing 2026-06-08 08:37:54 -07:00
test_glean_pipeline_ssh.py feat: SSH remote host glean — transport layer and pipeline integration (closes #22, backend) 2026-05-20 23:03:13 -07:00
test_glean_qbittorrent.py refactor: rename ingest → glean throughout codebase 2026-05-20 23:02:55 -07:00
test_glean_ssh.py feat: SSH remote host glean — transport layer and pipeline integration (closes #22, backend) 2026-05-20 23:03:13 -07:00
test_glean_syslog.py refactor: rename ingest → glean throughout codebase 2026-05-20 23:02:55 -07:00
test_glean_tautulli.py feat: domain-view mapping for patterns and diagnose output (#32) 2026-06-01 19:57:16 -07:00
test_glean_wazuh.py refactor: rename ingest → glean throughout codebase 2026-05-20 23:02:55 -07:00
test_hybrid_search.py feat: dual-backend SQLite/Postgres + multi-tenant source namespacing 2026-06-08 08:37:54 -07:00
test_service_blocklist.py refactor: rename ingest → glean throughout codebase 2026-05-20 23:02:55 -07:00
test_service_pihole.py fix(blocklist): validate _v6_auth session JSON, add auth-failure test 2026-05-15 21:03:03 -07:00
test_services_diagnose.py refactor: rename ingest → glean throughout codebase 2026-05-20 23:02:55 -07:00
test_services_llm.py feat: switch LLM backend to OpenAI-compat; add cf-orch remote inference support 2026-05-12 12:58:38 -07:00
test_watch_watcher.py feat: add file tail source type; configure example-node watchers 2026-05-11 15:44:10 -07:00