- synthesizer: 3 system prompts (sysadmin/homelab/executive) selected by tech_level pref - settings: tech_level selector (UI + backend) persisted in preferences.json - QuickCapture: shows active level label in diagnosis card header - TURNSTONE_OFFLINE_MODE=1: sets HF_HUB_OFFLINE + TRANSFORMERS_OFFLINE before lib load - TURNSTONE_API_KEY: bearer token auth on all /api/ routes (hmac.compare_digest) - /health always open; unset key = no auth (backward compatible) - docs/air-gapped-deployment.md: full offline deployment guide - scripts/harvest_docs.py: generalized context doc bulk-uploader with manifest support - scripts/manifests/: heimdall-devops.yaml (10 docs ingested) + example.yaml template - fix: _ingest_upload -> _glean_upload in context doc upload endpoint (was 500) Closes: #56 Closes: #45 Closes: #47 Closes: #49 Closes: #21
129 lines
4.8 KiB
Markdown
129 lines
4.8 KiB
Markdown
# Air-Gapped Deployment Guide
|
|
|
|
Turnstone can run entirely without internet access. This guide covers pre-downloading
|
|
all model weights, configuring offline mode, and verifying that no outbound connections
|
|
are made at runtime.
|
|
|
|
## What requires network access by default
|
|
|
|
| Component | When | What it downloads |
|
|
|-----------|------|------------------|
|
|
| Stage 2 ML classifier | First diagnose run (if `TURNSTONE_CLASSIFIER_MODEL` is set) | HuggingFace model weights (~300 MB) |
|
|
| Stage 4 sentence-transformers embedder | First diagnose run (if `TURNSTONE_EMBED_BACKEND=sentence_transformers`) | Embedding model (~130 MB) |
|
|
| LLM inference | Every diagnose run | Nothing — calls your configured `GPU_SERVER_URL` only |
|
|
| Log glean | Every glean run | Nothing — reads local files or SSH sources |
|
|
|
|
If neither the classifier nor the sentence-transformers embedder is enabled, Turnstone
|
|
makes no outbound network calls at runtime (only local SQLite reads/writes and your
|
|
configured LLM endpoint).
|
|
|
|
## Step 1 — Pre-download models (on an internet-connected machine)
|
|
|
|
Run these commands in the `cf` conda environment before moving to the air-gapped host:
|
|
|
|
```bash
|
|
# Stage 2 ML classifier (only needed if TURNSTONE_CLASSIFIER_MODEL is set)
|
|
conda run -n cf python -c "
|
|
from transformers import pipeline
|
|
pipeline('text-classification', model='byviz/bylastic_classification_logs')
|
|
print('classifier cached')
|
|
"
|
|
|
|
# Stage 4 sentence-transformers embedder (only if TURNSTONE_EMBED_BACKEND=sentence_transformers)
|
|
conda run -n cf python -c "
|
|
from sentence_transformers import SentenceTransformer
|
|
SentenceTransformer('BAAI/bge-small-en-v1.5')
|
|
print('embedder cached')
|
|
"
|
|
```
|
|
|
|
Models are cached to `~/.cache/huggingface/`. Copy that directory to the air-gapped host
|
|
at the same path before deployment.
|
|
|
|
## Step 2 — Pre-ingest your documentation corpus
|
|
|
|
On the internet-connected machine, or before cutting the network:
|
|
|
|
```bash
|
|
# Write your manifest (see scripts/manifests/example.yaml)
|
|
# Then bulk-upload to the context DB:
|
|
conda run -n cf python scripts/harvest_docs.py --manifest scripts/manifests/your-site.yaml
|
|
```
|
|
|
|
The context DB (`turnstone-context.db`) is a plain SQLite file — copy it to the
|
|
air-gapped host alongside `turnstone.db`.
|
|
|
|
## Step 3 — Set offline environment variables
|
|
|
|
Add to your `.env` file (copy from `.env.example`):
|
|
|
|
```bash
|
|
# Block all HuggingFace hub network access
|
|
TURNSTONE_OFFLINE_MODE=1
|
|
|
|
# Point models at the pre-downloaded cache (usually the default)
|
|
# HF_HOME=/home/youruser/.cache/huggingface
|
|
```
|
|
|
|
`TURNSTONE_OFFLINE_MODE=1` sets both `HF_HUB_OFFLINE=1` and `TRANSFORMERS_OFFLINE=1`
|
|
before any model library loads. If the cache is missing or incomplete, the classifier
|
|
falls back to the pattern-tag / regex path and embedding is skipped — diagnose still
|
|
works, just without ML-assisted severity or suppression.
|
|
|
|
## Step 4 — Configure a local LLM endpoint
|
|
|
|
Turnstone's LLM reasoning calls your `GPU_SERVER_URL`. On an air-gapped host this
|
|
must be a local endpoint — either Ollama or a local cf-orch coordinator:
|
|
|
|
```bash
|
|
# Local Ollama
|
|
GPU_SERVER_URL=http://localhost:11434
|
|
|
|
# Local cf-orch coordinator
|
|
GPU_SERVER_URL=http://localhost:7700
|
|
```
|
|
|
|
Pull the Ollama model before cutting network access:
|
|
|
|
```bash
|
|
ollama pull llama3.1:8b
|
|
```
|
|
|
|
## Step 5 — Verify no outbound connections at runtime
|
|
|
|
Start Turnstone and run a diagnose query, then check for unexpected outbound connections:
|
|
|
|
```bash
|
|
# Watch for any connection to HuggingFace, PyPI, or other external hosts
|
|
ss -tp | grep python
|
|
# or
|
|
lsof -i -n -P | grep python | grep ESTABLISHED
|
|
```
|
|
|
|
Expected: only connections to your `GPU_SERVER_URL` and any SSH log sources.
|
|
No connections to `huggingface.co`, `cdn-lfs.huggingface.co`, or `pypi.org`.
|
|
|
|
## Deployment checklist
|
|
|
|
- [ ] `~/.cache/huggingface/` copied to air-gapped host (if using ML classifier or embedder)
|
|
- [ ] `TURNSTONE_OFFLINE_MODE=1` set in `.env`
|
|
- [ ] `GPU_SERVER_URL` points to a local inference endpoint
|
|
- [ ] Ollama model pulled locally (if using Ollama)
|
|
- [ ] Context DB pre-populated with runbooks via `harvest_docs.py`
|
|
- [ ] No internet access verified with `ss -tp` during a diagnose run
|
|
- [ ] `TURNSTONE_API_KEY` set if the host is accessible over the network (see API auth docs)
|
|
|
|
## Troubleshooting
|
|
|
|
**"OSError: We couldn't connect to huggingface.co…"**
|
|
The model is not in the local cache. Either download it on a connected machine and copy
|
|
`~/.cache/huggingface/`, or unset `TURNSTONE_CLASSIFIER_MODEL` to fall back to the
|
|
pattern-based classifier.
|
|
|
|
**Diagnose still works but no ML severity in pipeline stages**
|
|
Expected when running offline without a pre-cached model. Stage 2 falls back to
|
|
`pattern_tags` → regex severity detection automatically.
|
|
|
|
**LLM reasoning missing from diagnose output**
|
|
Check that `GPU_SERVER_URL` is reachable from the air-gapped host and that your local
|
|
Ollama/vLLM has the configured model pulled.
|