turnstone/README.md
pyr0ball aa80f307fe refactor: rename ingest → glean throughout codebase
Renames the app/ingest/ package to app/glean/ and updates all
references across Python modules, shell scripts, Vue components,
tests, and documentation.

Intentionally preserved:
- SQLite column name ingest_time (avoids schema migration)
- RetrievedEntry.ingest_time field (maps to the column above)
- Any public-facing JSON keys that reference ingest_time

Changes by category:
- app/ingest/ → app/glean/ (full package move, all parsers)
- app/tasks/ingest_scheduler.py → app/tasks/glean_scheduler.py
- scripts/ingest_corpus.py → scripts/glean_corpus.py
- tests/test_ingest_*.py → tests/test_glean_*.py
- Docstrings, log messages, comments: ingest → glean
- Env var: TURNSTONE_INGEST_INTERVAL → TURNSTONE_GLEAN_INTERVAL
- Shell scripts: glean.log, glean_corpus.py references
- README.md: multi-source ingest → multi-source glean
- .env.example: updated env var name
- patterns/: new diagnostic patterns from 2026-05-20 SSH incident
  (service_crash_loop, pkg_daemon_restart, ssh_forward_conflict)
- SourcesView.vue: pipeline label updated
- All test import paths updated to app.glean.*

285 tests passing.
2026-05-20 23:02:55 -07:00

175 lines
6.4 KiB
Markdown

# Turnstone
> **Diagnostic log intelligence for self-hosted infrastructure.**
[![Status](https://img.shields.io/badge/status-beta-blue)](https://git.opensourcesolarpunk.com/Circuit-Forge/turnstone)
[![Version](https://img.shields.io/badge/version-0.5.0-green)](https://git.opensourcesolarpunk.com/Circuit-Forge/turnstone/releases)
[![License](https://img.shields.io/badge/license-private-red)](LICENSE)
[![Python](https://img.shields.io/badge/python-3.11%2B-blue)](requirements.txt)
Turnstone ingests logs from your services, indexes them for full-text and pattern search, and lets you tag incidents, build diagnostic bundles, and query across your infrastructure — from a web UI or an MCP-compatible agent client.
---
## What it does
```
Service logs (journald, Docker, syslog, Caddy, Plex, arr stack, qBittorrent, dmesg)
→ Ingest pipeline (auto-detect format, parse, deduplicate, pattern-tag)
→ SQLite + FTS index
→ REST API → Vue web UI / MCP server → agent clients (Orchard)
```
**Human workflow:** Search logs by symptom or time window, create incidents, attach relevant log entries, bundle everything into a diagnostic package for hand-off or archival.
**Agent workflow:** MCP tools expose search, incident management, and diagnose over a standard protocol — Orchard agents can query Turnstone as part of automated triage and resolution pipelines.
---
## Features
- **Multi-source glean** — journald, Docker, syslog, Caddy, dmesg, Plex, Servarr (arr stack), qBittorrent, plaintext; paths configured in `patterns/sources.yaml`
- **Pattern tagging** — named regex patterns applied at glean time (`service_restart`, `auth_failure`, `oom`, `segfault`, `disk_full`, `timeout`, …); extend in `patterns/default.yaml`
- **Full-text search** — SQLite FTS5 index across all ingested entries; filter by source, severity, time window
- **Natural-language time queries** — "what happened yesterday morning", "show me errors from the last 3 hours"; powered by dateparser
- **Incident management** — create, label, and track incidents; attach supporting log entries
- **Diagnostic bundles** — group log entries + incident metadata into a shareable bundle for escalation or archival
- **MCP server** — exposes search, incident, and diagnose tools to MCP-compatible agent clients
- **Dark/light theme** — Vue 3 + UnoCSS, system-aware
---
## Quick start (Docker)
```bash
git clone https://git.opensourcesolarpunk.com/Circuit-Forge/turnstone.git
cd turnstone
# Edit sources to match your paths
cp patterns/sources.yaml.example patterns/sources.yaml
$EDITOR patterns/sources.yaml
docker build -t turnstone:latest .
docker run -d --name turnstone \
-p 8534:8534 \
-v $(pwd)/data:/data \
-v $(pwd)/patterns:/patterns \
turnstone:latest
```
Open `http://localhost:8534/turnstone/`
---
## Quick start (dev)
```bash
# Backend
conda run -n cf pip install -r requirements.txt
conda run -n cf bash manage.sh start
# Frontend (separate terminal, hot-reload)
cd web && npm install && npm run dev
```
API: `http://localhost:8534/turnstone/docs`
UI: `http://localhost:5174/`
---
## Deployment (Podman + systemd)
See [`podman-standalone.sh`](podman-standalone.sh) for rootful Podman setup with systemd unit generation. Suitable for hosts that run system Podman rather than Docker Compose.
For Caddy reverse-proxy setup (e.g. `menagerie.circuitforge.tech/turnstone`), see [`docs/caddy-routing-pattern.md`](docs/caddy-routing-pattern.md) — all routes are pre-mounted at `/turnstone` so no prefix stripping is needed.
---
## Log source configuration
Edit `patterns/sources.yaml` to tell Turnstone where your logs live (container-side paths):
```yaml
sources:
- id: system-journal
path: /data/journal-export.jsonl # exported by export_journal.sh on host
- id: docker-logs
path: /var/log/docker # bind-mounted from host
- id: caddy
path: /var/log/caddy/access.log
```
For `journald` sources, run `scripts/export_journal.sh` on the host before each glean (e.g. via cron). Missing paths are skipped with a warning — safe to leave entries for services that are temporarily down.
---
## Pattern library
Named patterns in `patterns/default.yaml` are matched against every log entry at glean time. Matched pattern names are stored and used to boost search relevance for diagnostic queries.
```yaml
patterns:
- name: oom
pattern: "(out of memory|OOM|killed process|cannot allocate)"
severity: CRITICAL
description: Out-of-memory condition
```
Add domain-specific patterns for your stack. Multiple patterns can match a single entry.
---
## MCP server
Turnstone exposes an MCP (Model Context Protocol) server for agent clients. Start it alongside the REST API:
```bash
conda run -n cf python -m app.mcp_server
```
Tools exposed: `search`, `diagnose`, `create_incident`, `list_incidents`, `build_bundle`.
---
## Manage script
```bash
bash manage.sh start # start API (and Vite dev server if --dev)
bash manage.sh stop # stop API
bash manage.sh restart # restart
bash manage.sh status # show process state and port bindings
bash manage.sh logs # tail API log
```
---
## Configuration
Copy `.env.example` to `.env` (or pass as `-e` flags to Docker/Podman). All variables are optional.
| Variable | Default | Description |
|----------|---------|-------------|
| `GPU_SERVER_URL` | `http://localhost:11434` | GPU inference server (Ollama, vLLM, or cf-orch). `CF_ORCH_URL` is accepted as a backward-compat alias. Paid+ users: leave unset — auto-defaults to `https://orch.circuitforge.tech` when `CF_LICENSE_KEY` is present. |
| `CF_LICENSE_KEY` | — | CircuitForge Paid+ license key. Enables cloud GPU inference and premium features. |
| `TURNSTONE_DB` | `/data/turnstone.db` | Path to the SQLite database. |
| `TURNSTONE_PATTERNS` | `./patterns` | Pattern directory (default.yaml, sources.yaml, watch.yaml). |
| `TURNSTONE_SOURCE_HOST` | `unknown` | Host identifier stamped on ingested entries. |
| `TURNSTONE_BUNDLE_ENDPOINT` | — | Remote URL to push diagnostic bundles for escalation. |
| `TURNSTONE_GLEAN_INTERVAL` | `900` | Seconds between automatic batch glean runs. Set to `0` to disable. |
---
## Ports
| Service | Port | Notes |
|---------|------|-------|
| FastAPI + Vue SPA | `8534` | Production: REST API + built frontend |
| Vite HMR | `5174` | Dev only: hot-reload frontend, proxies `/api` → 8534 |
---
## License
Private — CircuitForge internal tooling. Not licensed for redistribution.