pyr0ball ae922ef6c6 feat(diagnose): tech-level post-processor, offline mode, API auth, context harvest

- synthesizer: 3 system prompts (sysadmin/homelab/executive) selected by tech_level pref
- settings: tech_level selector (UI + backend) persisted in preferences.json
- QuickCapture: shows active level label in diagnosis card header
- TURNSTONE_OFFLINE_MODE=1: sets HF_HUB_OFFLINE + TRANSFORMERS_OFFLINE before lib load
- TURNSTONE_API_KEY: bearer token auth on all /api/ routes (hmac.compare_digest)
- /health always open; unset key = no auth (backward compatible)
- docs/air-gapped-deployment.md: full offline deployment guide
- scripts/harvest_docs.py: generalized context doc bulk-uploader with manifest support
- scripts/manifests/: heimdall-devops.yaml (10 docs ingested) + example.yaml template
- fix: _ingest_upload -> _glean_upload in context doc upload endpoint (was 500)

Closes: #56
Closes: #45
Closes: #47
Closes: #49
Closes: #21

2026-05-28 08:51:05 -07:00

4.8 KiB

Raw Blame History

Air-Gapped Deployment Guide

Turnstone can run entirely without internet access. This guide covers pre-downloading all model weights, configuring offline mode, and verifying that no outbound connections are made at runtime.

What requires network access by default

Component	When	What it downloads
Stage 2 ML classifier	First diagnose run (if `TURNSTONE_CLASSIFIER_MODEL` is set)	HuggingFace model weights (~300 MB)
Stage 4 sentence-transformers embedder	First diagnose run (if `TURNSTONE_EMBED_BACKEND=sentence_transformers`)	Embedding model (~130 MB)
LLM inference	Every diagnose run	Nothing — calls your configured `GPU_SERVER_URL` only
Log glean	Every glean run	Nothing — reads local files or SSH sources

If neither the classifier nor the sentence-transformers embedder is enabled, Turnstone makes no outbound network calls at runtime (only local SQLite reads/writes and your configured LLM endpoint).

Step 1 — Pre-download models (on an internet-connected machine)

Run these commands in the cf conda environment before moving to the air-gapped host:

# Stage 2 ML classifier (only needed if TURNSTONE_CLASSIFIER_MODEL is set)
conda run -n cf python -c "
from transformers import pipeline
pipeline('text-classification', model='byviz/bylastic_classification_logs')
print('classifier cached')
"

# Stage 4 sentence-transformers embedder (only if TURNSTONE_EMBED_BACKEND=sentence_transformers)
conda run -n cf python -c "
from sentence_transformers import SentenceTransformer
SentenceTransformer('BAAI/bge-small-en-v1.5')
print('embedder cached')
"

Models are cached to ~/.cache/huggingface/. Copy that directory to the air-gapped host at the same path before deployment.

Step 2 — Pre-ingest your documentation corpus

On the internet-connected machine, or before cutting the network:

# Write your manifest (see scripts/manifests/example.yaml)
# Then bulk-upload to the context DB:
conda run -n cf python scripts/harvest_docs.py --manifest scripts/manifests/your-site.yaml

The context DB (turnstone-context.db) is a plain SQLite file — copy it to the air-gapped host alongside turnstone.db.

Step 3 — Set offline environment variables

Add to your .env file (copy from .env.example):

# Block all HuggingFace hub network access
TURNSTONE_OFFLINE_MODE=1

# Point models at the pre-downloaded cache (usually the default)
# HF_HOME=/home/youruser/.cache/huggingface

TURNSTONE_OFFLINE_MODE=1 sets both HF_HUB_OFFLINE=1 and TRANSFORMERS_OFFLINE=1 before any model library loads. If the cache is missing or incomplete, the classifier falls back to the pattern-tag / regex path and embedding is skipped — diagnose still works, just without ML-assisted severity or suppression.

Step 4 — Configure a local LLM endpoint

Turnstone's LLM reasoning calls your GPU_SERVER_URL. On an air-gapped host this must be a local endpoint — either Ollama or a local cf-orch coordinator:

# Local Ollama
GPU_SERVER_URL=http://localhost:11434

# Local cf-orch coordinator
GPU_SERVER_URL=http://localhost:7700

Pull the Ollama model before cutting network access:

ollama pull llama3.1:8b

Step 5 — Verify no outbound connections at runtime

Start Turnstone and run a diagnose query, then check for unexpected outbound connections:

# Watch for any connection to HuggingFace, PyPI, or other external hosts
ss -tp | grep python
# or
lsof -i -n -P | grep python | grep ESTABLISHED

Expected: only connections to your GPU_SERVER_URL and any SSH log sources. No connections to huggingface.co, cdn-lfs.huggingface.co, or pypi.org.

Deployment checklist

~/.cache/huggingface/ copied to air-gapped host (if using ML classifier or embedder)
TURNSTONE_OFFLINE_MODE=1 set in .env
GPU_SERVER_URL points to a local inference endpoint
Ollama model pulled locally (if using Ollama)
Context DB pre-populated with runbooks via harvest_docs.py
No internet access verified with ss -tp during a diagnose run
TURNSTONE_API_KEY set if the host is accessible over the network (see API auth docs)

Troubleshooting

"OSError: We couldn't connect to huggingface.co…" The model is not in the local cache. Either download it on a connected machine and copy ~/.cache/huggingface/, or unset TURNSTONE_CLASSIFIER_MODEL to fall back to the pattern-based classifier.

Diagnose still works but no ML severity in pipeline stages Expected when running offline without a pre-cached model. Stage 2 falls back to pattern_tags → regex severity detection automatically.

LLM reasoning missing from diagnose output Check that GPU_SERVER_URL is reachable from the air-gapped host and that your local Ollama/vLLM has the configured model pulled.

4.8 KiB Raw Blame History