turnstone/.env.example
pyr0ball 674e945004 chore(corpus): preserve watermark files across updates; document corpus env vars
update.sh now backs up data/corpus_watermark.txt and data/incident_watermark.txt
before git pull and restores them after, mirroring the existing watch.yaml pattern.
Without this, an update would reset watermarks to zero and re-push all corpus
entries from the beginning on the next export run.

.env.example adds a corpus export section documenting the three env vars
needed to opt a node into the Avocet training pipeline.

Closes: #6
2026-06-10 15:01:19 -07:00

88 lines
4.6 KiB
Text

# Turnstone environment variables
# Copy to .env and adjust for your setup. All variables are optional unless noted.
# --- Database & paths ---
# TURNSTONE_DB=/data/turnstone.db
# TURNSTONE_PATTERNS=/patterns
# TURNSTONE_SOURCE_HOST=my-server
# --- GPU / LLM inference ---
# GPU_SERVER_URL — URL of your GPU inference server (Ollama, vLLM, or cf-orch coordinator).
# Paid+ users: leave unset to auto-default to https://orch.circuitforge.tech via CF_LICENSE_KEY.
# Local Ollama (default if unset): http://localhost:11434
# Local cf-orch coordinator: http://<YOUR_HOST_IP>:7700
# CF_ORCH_URL is also accepted as a backward-compatible alias.
# GPU_SERVER_URL=http://localhost:11434
# --- CircuitForge license (Paid+) ---
# Enables cloud GPU inference and premium features.
# When set, GPU_SERVER_URL defaults to https://orch.circuitforge.tech automatically.
# CF_LICENSE_KEY=CFG-TRSN-XXXX-XXXX-XXXX
# --- Bundle endpoint (optional) ---
# Remote endpoint to push diagnostic bundles for escalation.
# TURNSTONE_BUNDLE_ENDPOINT=https://example.com/api/bundles
# --- Log corpus export to Avocet (optional) ---
# Push ERROR/CRITICAL entries and labeled incidents to the Avocet corpus endpoint
# for logreading fine-tune training. Requires a consent token issued by CF.
# Contact alan@circuitforge.tech to register your node and receive a token.
# Watermarks are stored at data/corpus_watermark.txt and data/incident_watermark.txt.
# AVOCET_CORPUS_ENDPOINT=https://avocet.circuitforge.tech/api/corpus/log-batch
# AVOCET_CONSENT_TOKEN=your-uuid-token-here
# TURNSTONE_SOURCE_HOST=my-server-name # defaults to system hostname if unset
# --- Periodic batch glean ---
# Seconds between automatic glean runs from sources.yaml. Set to 0 to disable.
# TURNSTONE_GLEAN_INTERVAL=900
# --- Multi-agent diagnose pipeline (experimental) ---
# Enable the 5-stage ML pipeline instead of the single-LLM summarize() call.
# TURNSTONE_MULTI_AGENT_DIAGNOSE=true
# Stage 2 — ML severity classifier (optional; falls back to pattern_tags then regex).
# Recommended: byviz/bylastic_classification_logs (~300MB, downloaded from HuggingFace)
# TURNSTONE_CLASSIFIER_MODEL=byviz/bylastic_classification_logs
# Stage 4 — Embedding backend for false-positive suppression.
# sentence_transformers: in-process local model (downloads on first use)
# ollama: uses a running Ollama instance (no download needed if model is already pulled)
# TURNSTONE_EMBED_BACKEND=sentence_transformers
# TURNSTONE_EMBED_MODEL=BAAI/bge-small-en-v1.5
# TURNSTONE_EMBED_DEVICE=cpu
# --- Cybersec scoring pipeline (zero-shot, second-pass on flagged entries) ---
# Runs a zero-shot classifier on entries already flagged by the anomaly scorer
# or that have pattern matches — a focused second opinion using cybersec vocabulary.
# The DeBERTa-v3-base-mnli model (required by the diagnose pipeline) is the recommended
# zero-shot classifier — it produces human-readable cybersec labels with no fine-tuning.
# TURNSTONE_CYBERSEC_MODEL=MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli
# TURNSTONE_CYBERSEC_DEVICE=cpu
# TURNSTONE_CYBERSEC_THRESHOLD=0.60 # lower than anomaly threshold (zero-shot is calibrated differently)
# --- Anomaly scoring pipeline (IDS / watchdog) ---
# Batch-scores every ingested log entry after each glean cycle.
# Any HuggingFace text-classification model works; the byviz classifier (already
# required by the diagnose pipeline) is the recommended starting point.
# Detections above the threshold are inserted into the detections table and
# surfaced in the Security Alerts tab.
#
# Set TURNSTONE_ANOMALY_MODEL to enable; leave unset to disable (safe default).
# TURNSTONE_ANOMALY_MODEL=byviz/bylastic_classification_logs
# TURNSTONE_ANOMALY_DEVICE=cpu # or "cuda" / "mps" for GPU inference
# TURNSTONE_ANOMALY_THRESHOLD=0.80 # confidence floor for detection insertion
# TURNSTONE_ANOMALY_INTERVAL=0 # standalone loop (0 = glean-triggered only)
#
# HuggingFace model cache — share with the host to avoid re-downloading models.
# HF_HOME=/hf_cache # inside container (set in docker-compose)
# HF_CACHE_PATH=/Library/Assets/LLM # host bind-mount source (docker-compose only)
# --- Air-gapped / offline deployment ---
# Set to 1 to block all HuggingFace hub network access at runtime.
# Pre-download models to ~/.cache/huggingface/ before deploying — see docs/air-gapped-deployment.md.
# TURNSTONE_OFFLINE_MODE=1
# --- API authentication ---
# When set, all /api/ requests require: Authorization: Bearer <token>
# Generate a token: python -c "import secrets; print(secrets.token_urlsafe(32))"
# TURNSTONE_API_KEY=your-secret-token-here