Second-pass cybersec classifier using DeBERTa-v3-base-mnli (already cached — no download required). Runs after each anomaly scoring pass on entries flagged by the anomaly scorer or with pattern matches. Architecture: - app/services/cybersec.py: zero-shot-classification pipeline with 5 cybersec candidate labels (auth failure, privilege escalation, network intrusion, malware, data exfiltration). Writes ml_score/ml_label/ ml_scored_at to log_entries; inserts high-confidence hits into detections with scorer='cybersec'. - app/tasks/cybersec_scorer.py: async background task (same shape as anomaly_scorer.py). - REST: GET/POST /turnstone/api/cybersec/status|run|detections. GET /turnstone/api/anomaly/detections now accepts scorer= filter. Schema: ml_score, ml_label, ml_scored_at added to log_entries; scorer column added to detections (idempotent migrations + DDL for both SQLite and Postgres). UI: Security Alerts view gains Source dropdown (All / Anomaly / Cybersec) and cybersec scorer status badge. Label dropdown split into optgroups. Deployment: TURNSTONE_CYBERSEC_MODEL/DEVICE/THRESHOLD vars added to .env.example, docker-compose.yml, docker-standalone.sh. Tests: 10 new tests — no model, no eligible entries, scoring, detection creation, normal label suppression, threshold filtering, pattern-tag filtering, idempotency, list filtering, scorer column filter. 416/416 passing. Closes: #9
171 lines
8.3 KiB
Bash
Executable file
171 lines
8.3 KiB
Bash
Executable file
#!/usr/bin/env bash
|
|
# docker-standalone.sh — Turnstone Docker setup (no Compose)
|
|
#
|
|
# For hosts running Docker (not Podman). The container restarts automatically
|
|
# on boot via Docker's built-in restart policy — no systemd unit needed.
|
|
# Turnstone is a diagnostic log intelligence layer — glean service logs,
|
|
# search by symptom, and view incidents in a lightweight web UI.
|
|
#
|
|
# ── Prerequisites ────────────────────────────────────────────────────────────
|
|
# 1. Clone the repo:
|
|
# git clone https://git.opensourcesolarpunk.com/Circuit-Forge/turnstone.git ~/turnstone
|
|
# (or wherever you prefer — update REPO_DIR below)
|
|
#
|
|
# 2. Build the image:
|
|
# cd ~/turnstone && docker build -t localhost/turnstone:latest .
|
|
#
|
|
# 3. Create data and patterns directories, then copy config files:
|
|
# mkdir -p ~/turnstone/{data,patterns}
|
|
# cp ~/turnstone/patterns/default.yaml ~/turnstone/patterns/
|
|
# cp ~/turnstone/patterns/sources.yaml ~/turnstone/patterns/
|
|
# # Edit sources.yaml — set log paths that exist on this host.
|
|
#
|
|
# 4. Set any env vars (see sections below), then run this script:
|
|
# bash ~/turnstone/docker-standalone.sh
|
|
#
|
|
# ── After setup ──────────────────────────────────────────────────────────────
|
|
# The container starts with --restart=unless-stopped so it survives reboots.
|
|
# To upgrade: git pull && bash ~/turnstone/docker-standalone.sh
|
|
#
|
|
# ── Gleaning logs ─────────────────────────────────────────────────────────────
|
|
# All service logs under /opt are accessible inside the container.
|
|
# Sources are configured in patterns/sources.yaml (bind-mounted at /patterns/).
|
|
#
|
|
# To glean all sources (run manually or via cron):
|
|
#
|
|
# docker exec turnstone python scripts/glean_corpus.py \
|
|
# --sources /patterns/sources.yaml --db /data/turnstone.db
|
|
#
|
|
# Example cron (every 15 minutes, add with: crontab -e):
|
|
# */15 * * * * docker exec turnstone python scripts/glean_corpus.py \
|
|
# --sources /patterns/sources.yaml --db /data/turnstone.db >> /var/log/turnstone-glean.log 2>&1
|
|
#
|
|
# To add a new log source: edit patterns/sources.yaml — no restart needed.
|
|
#
|
|
# ── Adding Caddy reverse proxy ────────────────────────────────────────────────
|
|
# Add to /etc/caddy/Caddyfile on this host:
|
|
#
|
|
# turnstone.yourdomain.tld {
|
|
# import protected
|
|
# reverse_proxy 127.0.0.1:8534
|
|
# import cloudflare
|
|
# }
|
|
#
|
|
# Then: sudo systemctl reload caddy
|
|
#
|
|
# ── Ports ────────────────────────────────────────────────────────────────────
|
|
# Turnstone UI → http://localhost:8534/turnstone/
|
|
#
|
|
set -euo pipefail
|
|
|
|
# ── Paths — update to match your clone location ──────────────────────────────
|
|
REPO_DIR="${HOME}/turnstone"
|
|
DATA_DIR="${REPO_DIR}/data"
|
|
PATTERNS_DIR="${REPO_DIR}/patterns"
|
|
# HF_CACHE_DIR: override to a shared cache directory to avoid re-downloading models.
|
|
# Example (Heimdall, where byviz/bylastic_classification_logs is already cached):
|
|
# export HF_CACHE_DIR=/Library/Assets/LLM
|
|
HF_CACHE_DIR="${HF_CACHE_DIR:-${REPO_DIR}/hf-cache}"
|
|
|
|
TZ="${TZ:-America/Los_Angeles}"
|
|
|
|
# ── Bundle push configuration ────────────────────────────────────────────────
|
|
# Set TURNSTONE_BUNDLE_ENDPOINT to enable the "Send Bundle" button in the
|
|
# Incidents UI:
|
|
#
|
|
# export TURNSTONE_BUNDLE_ENDPOINT=https://turnstone.circuitforge.tech/turnstone/api/bundles
|
|
# bash ~/turnstone/docker-standalone.sh
|
|
#
|
|
|
|
# ── Orchard submission (opt-in telemetry) ────────────────────────────────────
|
|
# Set TURNSTONE_SUBMIT_ENDPOINT to push pattern-matched log entries to a CF
|
|
# receiving instance after each glean run. Only matched entries are sent —
|
|
# no raw log content. Used to build Avocet training data.
|
|
#
|
|
# export TURNSTONE_SUBMIT_ENDPOINT=https://harvest.circuitforge.tech/contrib1
|
|
# bash ~/turnstone/docker-standalone.sh
|
|
#
|
|
|
|
# ── Anomaly scoring pipeline (IDS / watchdog) ────────────────────────────────
|
|
# Set TURNSTONE_ANOMALY_MODEL to enable automatic anomaly scoring after each
|
|
# glean run. The byviz classifier (already used by the diagnose pipeline) is
|
|
# a good default — it's cached alongside the other models.
|
|
#
|
|
# export TURNSTONE_ANOMALY_MODEL=byviz/bylastic_classification_logs
|
|
# export TURNSTONE_ANOMALY_THRESHOLD=0.80 # confidence floor (default 0.75)
|
|
# bash ~/turnstone/docker-standalone.sh
|
|
#
|
|
|
|
# ── Multi-agent diagnose pipeline ────────────────────────────────────────────
|
|
# Enable the 5-stage ML pipeline to get smarter diagnose results.
|
|
#
|
|
# If your host has WireGuard to Heimdall's LAN (e.g. Huginn):
|
|
# export GPU_SERVER_URL=http://<YOUR_HOST_IP>:7700
|
|
# export TURNSTONE_MULTI_AGENT_DIAGNOSE=true
|
|
# bash ~/turnstone/docker-standalone.sh
|
|
#
|
|
# If your host has no WireGuard to Heimdall (use public cf-orch endpoint):
|
|
# export GPU_SERVER_URL=https://orch.circuitforge.tech
|
|
# export TURNSTONE_MULTI_AGENT_DIAGNOSE=true
|
|
# bash ~/turnstone/docker-standalone.sh
|
|
#
|
|
# ML models are downloaded on first diagnose run and cached in HF_CACHE_DIR.
|
|
# First run takes a few minutes (downloading ~400MB of CPU-only models).
|
|
# Subsequent runs are instant (models served from hf-cache/).
|
|
#
|
|
|
|
# ── Build image from current source ─────────────────────────────────────────
|
|
echo "Building Turnstone image..."
|
|
docker build -t localhost/turnstone:latest "${REPO_DIR}"
|
|
|
|
# Create HF model cache dir if not present (persists across container rebuilds)
|
|
mkdir -p "${HF_CACHE_DIR}"
|
|
mkdir -p "${DATA_DIR}" "${PATTERNS_DIR}"
|
|
|
|
# Remove existing container if present (safe re-run)
|
|
docker rm -f turnstone 2>/dev/null || true
|
|
|
|
docker run -d \
|
|
--name=turnstone \
|
|
--restart=unless-stopped \
|
|
-p 8534:8534 \
|
|
-v "${DATA_DIR}:/data" \
|
|
-v "${PATTERNS_DIR}:/patterns" \
|
|
-v "${HF_CACHE_DIR}:/hf-cache" \
|
|
-v /opt:/opt:ro \
|
|
-v /var/log:/var/log:ro \
|
|
-e TURNSTONE_DB=/data/turnstone.db \
|
|
-e TURNSTONE_SOURCE_HOST="$(hostname)" \
|
|
-e TURNSTONE_BUNDLE_ENDPOINT="${TURNSTONE_BUNDLE_ENDPOINT:-}" \
|
|
-e TURNSTONE_SUBMIT_ENDPOINT="${TURNSTONE_SUBMIT_ENDPOINT:-}" \
|
|
-e PYTHONUNBUFFERED=1 \
|
|
-e TZ="${TZ}" \
|
|
-e TURNSTONE_MULTI_AGENT_DIAGNOSE="${TURNSTONE_MULTI_AGENT_DIAGNOSE:-false}" \
|
|
-e GPU_SERVER_URL="${GPU_SERVER_URL:-}" \
|
|
-e HF_HOME=/hf-cache \
|
|
-e TURNSTONE_CLASSIFIER_MODEL="${TURNSTONE_CLASSIFIER_MODEL:-byviz/bylastic_classification_logs}" \
|
|
-e TURNSTONE_EMBED_BACKEND="${TURNSTONE_EMBED_BACKEND:-sentence_transformers}" \
|
|
-e TURNSTONE_EMBED_MODEL="${TURNSTONE_EMBED_MODEL:-sentence-transformers/all-MiniLM-L6-v2}" \
|
|
-e TURNSTONE_EMBED_DEVICE="${TURNSTONE_EMBED_DEVICE:-cpu}" \
|
|
-e TURNSTONE_CYBERSEC_MODEL="${TURNSTONE_CYBERSEC_MODEL:-}" \
|
|
-e TURNSTONE_CYBERSEC_DEVICE="${TURNSTONE_CYBERSEC_DEVICE:-cpu}" \
|
|
-e TURNSTONE_CYBERSEC_THRESHOLD="${TURNSTONE_CYBERSEC_THRESHOLD:-0.60}" \
|
|
-e TURNSTONE_ANOMALY_MODEL="${TURNSTONE_ANOMALY_MODEL:-}" \
|
|
-e TURNSTONE_ANOMALY_DEVICE="${TURNSTONE_ANOMALY_DEVICE:-cpu}" \
|
|
-e TURNSTONE_ANOMALY_THRESHOLD="${TURNSTONE_ANOMALY_THRESHOLD:-0.75}" \
|
|
-e TURNSTONE_ANOMALY_INTERVAL="${TURNSTONE_ANOMALY_INTERVAL:-0}" \
|
|
localhost/turnstone:latest
|
|
|
|
echo ""
|
|
echo "Turnstone is starting up."
|
|
echo " UI: http://localhost:8534/turnstone/"
|
|
echo ""
|
|
echo "Check container health with:"
|
|
echo " docker ps"
|
|
echo " docker logs turnstone"
|
|
echo ""
|
|
echo "To glean all sources now:"
|
|
echo " docker exec turnstone python scripts/glean_corpus.py \\"
|
|
echo " --sources /patterns/sources.yaml --db /data/turnstone.db"
|
|
echo ""
|
|
echo "To add a new source: edit ${PATTERNS_DIR}/sources.yaml — no restart needed."
|