Second-pass cybersec classifier using DeBERTa-v3-base-mnli (already
cached — no download required). Runs after each anomaly scoring pass on
entries flagged by the anomaly scorer or with pattern matches.
Architecture:
- app/services/cybersec.py: zero-shot-classification pipeline with 5
cybersec candidate labels (auth failure, privilege escalation, network
intrusion, malware, data exfiltration). Writes ml_score/ml_label/
ml_scored_at to log_entries; inserts high-confidence hits into
detections with scorer='cybersec'.
- app/tasks/cybersec_scorer.py: async background task (same shape as
anomaly_scorer.py).
- REST: GET/POST /turnstone/api/cybersec/status|run|detections.
GET /turnstone/api/anomaly/detections now accepts scorer= filter.
Schema: ml_score, ml_label, ml_scored_at added to log_entries; scorer
column added to detections (idempotent migrations + DDL for both SQLite
and Postgres).
UI: Security Alerts view gains Source dropdown (All / Anomaly / Cybersec)
and cybersec scorer status badge. Label dropdown split into optgroups.
Deployment: TURNSTONE_CYBERSEC_MODEL/DEVICE/THRESHOLD vars added to
.env.example, docker-compose.yml, docker-standalone.sh.
Tests: 10 new tests — no model, no eligible entries, scoring, detection
creation, normal label suppression, threshold filtering, pattern-tag
filtering, idempotency, list filtering, scorer column filter.
416/416 passing.
Closes: #9
Add TURNSTONE_ANOMALY_* env vars to docker-compose.yml, docker-standalone.sh,
and .env.example. Mount shared HF model cache (/Library/Assets/LLM on Heimdall)
as read-only bind in both compose and standalone — avoids re-downloading models
that are already cached by the diagnose pipeline.
Heimdall: byviz/bylastic_classification_logs already cached, threshold 0.80,
glean-triggered only (TURNSTONE_ANOMALY_INTERVAL=0).
- Add app/db/ abstraction layer: Backend enum, DbConn wrapper,
dialect helper (q() for ? vs %s paramstyle), get_conn(), tenant_id()
- Auto-detect backend from DATABASE_URL; SQLite remains default when
unset — no config change for local deployments
- Add tenant_id column to all three logical DBs (main, context, incidents);
idempotent ALTER TABLE migration runs before schema scripts on existing DBs
- All INSERTs inject tenant_id; SELECTs use (tenant_id = ? OR tenant_id = '')
for backward compat with pre-namespacing rows
- Add docker-compose.yml with named volume turnstone_pgdata (survives rebuilds)
and optional external Postgres support via DATABASE_URL override
- Add scripts/migrate_sqlite_to_postgres.py — one-shot idempotent migration
for existing SQLite data; ON CONFLICT DO NOTHING for safe re-runs
- Fix SSH glean path in pipeline.py to use ensure_schema + get_conn
(was still using raw sqlite3.connect + old _SCHEMA without tenant_id)
- Fix FTS5 JOIN ambiguity: qualify repeat_count as f.repeat_count in search
- Update all tests to use ensure_*_schema fixtures; add row_factory where needed
- 394/394 tests passing
Closes: #42
Closes: #50