Second-pass cybersec classifier using DeBERTa-v3-base-mnli (already cached — no download required). Runs after each anomaly scoring pass on entries flagged by the anomaly scorer or with pattern matches. Architecture: - app/services/cybersec.py: zero-shot-classification pipeline with 5 cybersec candidate labels (auth failure, privilege escalation, network intrusion, malware, data exfiltration). Writes ml_score/ml_label/ ml_scored_at to log_entries; inserts high-confidence hits into detections with scorer='cybersec'. - app/tasks/cybersec_scorer.py: async background task (same shape as anomaly_scorer.py). - REST: GET/POST /turnstone/api/cybersec/status|run|detections. GET /turnstone/api/anomaly/detections now accepts scorer= filter. Schema: ml_score, ml_label, ml_scored_at added to log_entries; scorer column added to detections (idempotent migrations + DDL for both SQLite and Postgres). UI: Security Alerts view gains Source dropdown (All / Anomaly / Cybersec) and cybersec scorer status badge. Label dropdown split into optgroups. Deployment: TURNSTONE_CYBERSEC_MODEL/DEVICE/THRESHOLD vars added to .env.example, docker-compose.yml, docker-standalone.sh. Tests: 10 new tests — no model, no eligible entries, scoring, detection creation, normal label suppression, threshold filtering, pattern-tag filtering, idempotency, list filtering, scorer column filter. 416/416 passing. Closes: #9
79 lines
4 KiB
Text
79 lines
4 KiB
Text
# Turnstone environment variables
|
|
# Copy to .env and adjust for your setup. All variables are optional unless noted.
|
|
|
|
# --- Database & paths ---
|
|
# TURNSTONE_DB=/data/turnstone.db
|
|
# TURNSTONE_PATTERNS=/patterns
|
|
# TURNSTONE_SOURCE_HOST=my-server
|
|
|
|
# --- GPU / LLM inference ---
|
|
# GPU_SERVER_URL — URL of your GPU inference server (Ollama, vLLM, or cf-orch coordinator).
|
|
# Paid+ users: leave unset to auto-default to https://orch.circuitforge.tech via CF_LICENSE_KEY.
|
|
# Local Ollama (default if unset): http://localhost:11434
|
|
# Local cf-orch coordinator: http://<YOUR_HOST_IP>:7700
|
|
# CF_ORCH_URL is also accepted as a backward-compatible alias.
|
|
# GPU_SERVER_URL=http://localhost:11434
|
|
|
|
# --- CircuitForge license (Paid+) ---
|
|
# Enables cloud GPU inference and premium features.
|
|
# When set, GPU_SERVER_URL defaults to https://orch.circuitforge.tech automatically.
|
|
# CF_LICENSE_KEY=CFG-TRSN-XXXX-XXXX-XXXX
|
|
|
|
# --- Bundle endpoint (optional) ---
|
|
# Remote endpoint to push diagnostic bundles for escalation.
|
|
# TURNSTONE_BUNDLE_ENDPOINT=https://example.com/api/bundles
|
|
|
|
# --- Periodic batch glean ---
|
|
# Seconds between automatic glean runs from sources.yaml. Set to 0 to disable.
|
|
# TURNSTONE_GLEAN_INTERVAL=900
|
|
|
|
# --- Multi-agent diagnose pipeline (experimental) ---
|
|
# Enable the 5-stage ML pipeline instead of the single-LLM summarize() call.
|
|
# TURNSTONE_MULTI_AGENT_DIAGNOSE=true
|
|
|
|
# Stage 2 — ML severity classifier (optional; falls back to pattern_tags then regex).
|
|
# Recommended: byviz/bylastic_classification_logs (~300MB, downloaded from HuggingFace)
|
|
# TURNSTONE_CLASSIFIER_MODEL=byviz/bylastic_classification_logs
|
|
|
|
# Stage 4 — Embedding backend for false-positive suppression.
|
|
# sentence_transformers: in-process local model (downloads on first use)
|
|
# ollama: uses a running Ollama instance (no download needed if model is already pulled)
|
|
# TURNSTONE_EMBED_BACKEND=sentence_transformers
|
|
# TURNSTONE_EMBED_MODEL=BAAI/bge-small-en-v1.5
|
|
# TURNSTONE_EMBED_DEVICE=cpu
|
|
|
|
# --- Cybersec scoring pipeline (zero-shot, second-pass on flagged entries) ---
|
|
# Runs a zero-shot classifier on entries already flagged by the anomaly scorer
|
|
# or that have pattern matches — a focused second opinion using cybersec vocabulary.
|
|
# The DeBERTa-v3-base-mnli model (required by the diagnose pipeline) is the recommended
|
|
# zero-shot classifier — it produces human-readable cybersec labels with no fine-tuning.
|
|
# TURNSTONE_CYBERSEC_MODEL=MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli
|
|
# TURNSTONE_CYBERSEC_DEVICE=cpu
|
|
# TURNSTONE_CYBERSEC_THRESHOLD=0.60 # lower than anomaly threshold (zero-shot is calibrated differently)
|
|
|
|
# --- Anomaly scoring pipeline (IDS / watchdog) ---
|
|
# Batch-scores every ingested log entry after each glean cycle.
|
|
# Any HuggingFace text-classification model works; the byviz classifier (already
|
|
# required by the diagnose pipeline) is the recommended starting point.
|
|
# Detections above the threshold are inserted into the detections table and
|
|
# surfaced in the Security Alerts tab.
|
|
#
|
|
# Set TURNSTONE_ANOMALY_MODEL to enable; leave unset to disable (safe default).
|
|
# TURNSTONE_ANOMALY_MODEL=byviz/bylastic_classification_logs
|
|
# TURNSTONE_ANOMALY_DEVICE=cpu # or "cuda" / "mps" for GPU inference
|
|
# TURNSTONE_ANOMALY_THRESHOLD=0.80 # confidence floor for detection insertion
|
|
# TURNSTONE_ANOMALY_INTERVAL=0 # standalone loop (0 = glean-triggered only)
|
|
#
|
|
# HuggingFace model cache — share with the host to avoid re-downloading models.
|
|
# HF_HOME=/hf_cache # inside container (set in docker-compose)
|
|
# HF_CACHE_PATH=/Library/Assets/LLM # host bind-mount source (docker-compose only)
|
|
|
|
# --- Air-gapped / offline deployment ---
|
|
# Set to 1 to block all HuggingFace hub network access at runtime.
|
|
# Pre-download models to ~/.cache/huggingface/ before deploying — see docs/air-gapped-deployment.md.
|
|
# TURNSTONE_OFFLINE_MODE=1
|
|
|
|
# --- API authentication ---
|
|
# When set, all /api/ requests require: Authorization: Bearer <token>
|
|
# Generate a token: python -c "import secrets; print(secrets.token_urlsafe(32))"
|
|
# TURNSTONE_API_KEY=your-secret-token-here
|