FTS5 bulk-insert write locks starved the incident API and bundle endpoints
during log bursts (sonarr/radarr, high-volume docker sources). Fix mirrors
the context_facts split (context -> turnstone-context.db):
- Add INCIDENTS_DB_PATH / TURNSTONE_INCIDENTS_DB env var in rest.py
- Add _INCIDENTS_SCHEMA, ensure_incidents_schema(), and
migrate_incidents_to_dedicated_db() in glean/pipeline.py
- Stub out incidents/received_bundles/sent_bundles in _SCHEMA (no-op
CREATE IF NOT EXISTS) so legacy single-file deployments still open
- Thread incidents_db_path through diagnose_stream -> run_pipeline ->
FalsePositiveSuppressor.suppress -> _fetch_resolved_incidents
- One-shot migration on startup: copy existing rows from main DB to
incidents DB via INSERT OR IGNORE (idempotent, safe to re-run)
- Fix test_blocklist_endpoints fixtures to patch CONTEXT_DB_PATH and
INCIDENTS_DB_PATH alongside DB_PATH (worktree has no data/ dir)
372 tests passing.
Closes: #60
Watcher, REST endpoints, services (search, incidents, blocklist),
MCP server, context retriever, embedder, glean_scheduler, and
doc_upload all used the default 5-second SQLite busy timeout.
During collect glean write phases, watcher flush threads were hitting
'database is locked' errors when the glean held the write lock longer
than 5 seconds.
All connections now use timeout=30.0, matching the pipeline fix
from commit 6882248. No logic changes.
context_facts, context_documents, and context_chunks now live in
turnstone-context.db (sibling of turnstone.db). The glean scheduler
held write locks on the main DB long enough to cause 5-second timeout
failures on context fact inserts; separate files have independent WAL
write locks so they never contend.
Changes:
- pipeline.py: extract _CONTEXT_SCHEMA + ensure_context_schema()
- rest.py: CONTEXT_DB_PATH (TURNSTONE_CONTEXT_DB env var, defaults to
sibling file); init via ensure_context_schema(); all context routes
pass CONTEXT_DB_PATH; diagnose_stream receives context_db_path kwarg
- diagnose/__init__.py: diagnose_stream() accepts context_db_path
(falls back to db_path for backward compat); retrieve_context uses it
- store.py: sqlite3.connect() timeout=30.0 — Python driver retry loop
is independent of PRAGMA busy_timeout; needed for any remaining
contention during test or single-file deployments
Closes: #42
Closes turnstone#22.
## Transport layer (app/glean/ssh.py)
- SSHTransport context manager: key-only auth, paramiko backend
- SSHConnectionError / SSHCommandError exception hierarchy
- exec_stream() generator: yields stdout lines, raises SSHCommandError on
non-zero exit (isinstance(int) guard for test-mock safety)
- Command builders: _build_journald_command, _build_syslog_command,
_build_plaintext_command, _build_docker_command
- 18 unit tests in tests/test_glean_ssh.py
## Pipeline integration (app/glean/pipeline.py)
- _stream_and_write(): per-item error isolation — SSHCommandError skips
one glean item without aborting the rest of the host connection
- _glean_ssh_source(): one SSHTransport per host, dispatches all glean
items (journald/syslog/plaintext/docker); SSHConnectionError aborts host
- glean_sources(): splits local vs SSH sources; local → _glean_files();
SSH → _glean_ssh_source(); shared compiled patterns and DB connection
- glean_ssh_source(): public wrapper for REST use — manages DB connection,
pattern compilation, FTS rebuild lifecycle
- 15 integration tests in tests/test_glean_pipeline_ssh.py
- All 285 tests passing
## REST layer (app/rest.py)
- GET /api/sources/configured: reads sources.yaml and enriches with DB
stats; SSH sources appear before first glean (entry_count=0); sub-source
IDs (rack01/journald, rack01/docker/myapp) aggregated per host entry
- POST /api/sources/{id}/glean: detects transport:ssh and dispatches to
glean_ssh_source() wrapper; local sources unchanged
- Import: glean_ssh_source as _glean_ssh_source
## Frontend (web/src/views/SourcesView.vue)
- Fetches /api/sources/configured (primary) + /api/sources (DB-only) in
parallel; merges into unified SourceRow list
- SSH sources show: ssh badge (with user@host tooltip), glean-type pills
(journald/syslog/docker/etc.), host subtitle
- SSH sub-source IDs (rack01/journald) suppressed from the DB-only list
since they are covered by the parent SSH row
- DB-only sources (uploads) appear below configured sources with 'uploaded'
badge; reglean button disabled (not in sources.yaml)
- Delete zeroes out configured-source stats in-place rather than removing
the row (so the source remains visible for re-gleaning)
Add blocklist candidate listing, scan trigger, status update,
push/unblock to Pi-hole, and connection test endpoints.
Add pihole_url/version/api_key and router_source_ids/device_names
fields to SettingsBody and prefs handling in patch_settings.
Add PiholeClient.__post_init__ validation so 503 fires naturally
when url/api_key are unconfigured (mock-safe: bypassed in tests).
POST /turnstone/api/ingest/tautulli accepts Tautulli notification agent
payloads and stores them as log_entries under source 'tautulli'. Severity
maps error->CRITICAL, buffer->WARN, all others->None. Optional bearer token
auth via X-Tautulli-Token header + tautulli_token pref. FTS index rebuilt
as a background task after each write. 28 new tests, all passing.
Wires the context/RAG layer into FastAPI via a dedicated _ctx router
(/turnstone/api/context/*): document upload (POST/GET/DELETE /docs),
fact CRUD (POST/GET/DELETE /facts), wizard state machine
(/wizard/schema, /wizard/step, /wizard/apply), and a debug search
endpoint (/debug/search). All blocking DB calls are dispatched via
asyncio.to_thread to keep the event loop free.
- Diagnose: add source_filter param threaded through entries_in_window,
search, _diagnose, and DiagnoseRequest — clicking diagnose on a
dashboard source now scopes both keyword and window hits to that source
- QuickCapture: read route.query.source; show scope badge with clear ✕;
auto-run when source param is present without a query
- DashboardView: pass source= (not q=) when navigating to diagnose
- collect_cluster_logs.sh: auto-discover Docker containers on all nodes
(Heimdall non-watched, Navi, Strahl via SSH); collect Cass Plex logs
via SSH; write to per-node dirs for directory-mode ingest
- turnstone-cluster.service: add --reload for hot-reload during dev
Turnstone now calls /v1/chat/completions instead of Ollama's /api/generate.
This format works with both local Ollama (>=0.1.24) and a remote cf-orch
coordinator, enabling GPU-less nodes like Xander's to route diagnoses through
the cluster without any local model.
- llm.py: OpenAI-compat messages format, optional Bearer auth header
- diagnose.py: thread llm_api_key through the call chain
- rest.py: llm_api_key pref (default empty), SettingsBody field, passed to diagnose
- SettingsView.vue: API Key field, label updated from "Ollama URL" to "LLM Endpoint URL"
- tests: updated mocks for new response shape; added bearer token assertion test
Adds background watcher that tails active log sources and ingests entries
in near-real-time, keeping the DB fresh without manual ingest runs.
- app/watch/watcher.py: Watcher + WatchSource using subprocess + select
loop; flushes every 10s or 100 lines; syncs FTS index every 3 flushes
- patterns/watch.yaml: declarative source config (journald/docker/podman)
- app/rest.py: lifespan context manager starts/stops watcher on app
startup/shutdown; GET /api/watch/status + POST /api/watch/reload
- web/src/views/DashboardView.vue: live/manual indicator chip + stale
banner copy adapts to whether live watching is active
- tests/test_watch_watcher.py: 16 tests covering config load, command
building, docker timestamp stripping, orchestrator lifecycle
Closes#4
Turnstone incidents now carry an issue_type tag (free-text with datalist
suggestions) used to categorize patterns for signature building.
Backend:
- Incident model gains issue_type; additive ALTER TABLE migration keeps
existing DBs working without a full schema rebuild
- New received_bundles table stores incoming JSON bundles with indexes on
bundled_at and issue_type
- build_bundle() assembles incident + related log entries into a versioned
bundle dict; store_bundle()/list_bundles()/get_bundle() for the receiver
- POST /api/incidents/{id}/send — pushes bundle to TURNSTONE_BUNDLE_ENDPOINT
- GET /api/incidents/{id}/bundle — export without sending
- POST /api/bundles — receive and store an incoming bundle
- GET /api/bundles — list all received bundles
- TURNSTONE_SOURCE_HOST and TURNSTONE_BUNDLE_ENDPOINT env vars; auto-set
source host from hostname in podman-standalone.sh
Frontend:
- Incidents form: issue_type field with datalist suggestions; Type column
in the table; Send Bundle button + status feedback in the detail drawer
- New BundlesView: collapsible bundle rows, inline JSON parse (no extra
round-trip), Export JSON download button
- Router and nav updated with /bundles route
- Add GET /api/stats endpoint with 24h windowed aggregation (criticals,
errors, per-source health, recent criticals list)
- Fix timestamp format bug: strftime('%Y-%m-%dT%H:%M:%S', ...) to match
stored ISO-8601 T-separated timestamps (datetime('now') uses space)
- Add composite index idx_ts_repeat(timestamp_iso, repeat_count) — drops
stats query from 3.5 s to <1 ms by resolving both WHERE conditions
from the index without table row fetches
- New DashboardView: 3 stat cards, source health table with health dots,
diagnose-per-source button, recent criticals panel, zero-state card
- Router default / → /dashboard; Dashboard first in nav
- DiagnoseView: reads ?q= query param on mount and auto-runs; shows
formatted LLM summary block
- LogEntryRow: expand/collapse for long entries (>200 chars or multiline)
When diagnose() auto-detects a source name, FTS keyword scoring can
bury real errors whose text doesn't match the symptom query. Add
recent_source_errors() — a plain-SQL scan ordered by timestamp — so
the most recent errors from a known service always surface regardless
of keyword overlap.
- Add `incidents` table to SQLite schema (id, label, started_at, ended_at,
notes, created_at, severity)
- Extract `ensure_schema()` from ingest pipeline so tables are always
created at startup, not only during ingest
- New `app/services/incidents.py`: create/list/get/delete + time-window
entry association (FTS keyword search + raw window fallback)
- New `entries_in_window()` in search.py: plain SQL scan for incident
detail when keyword FTS returns nothing
- REST endpoints: POST/GET /api/incidents, GET/DELETE /api/incidents/{id}
- Incident detail returns up to 100 associated log entries sorted by
timestamp, prioritising FTS keyword hits then ERROR/CRITICAL then all
Vite builds with base='/turnstone/' so asset paths in index.html are
/turnstone/assets/*. Serving FastAPI at root / meant direct hits to
port 8534 got index.html for asset requests (blank page).
- All routes now under /turnstone (APIRouter prefix + StaticFiles mount
at /turnstone/assets + SPA catch-all at /turnstone/{path})
- Root / redirects to /turnstone/
- Caddy block reverted to no-strip: both direct LAN and Caddy access
hit the same paths, no per-host routing differences
Python http.server can't do SPA routing and Caddy was forwarding
/turnstone/* paths that the static server couldn't resolve.
- app/rest.py: mount web/dist/assets as StaticFiles; add SPA catch-all
route that serves index.html for any unmatched path
- manage.sh: start/stop/status simplified to single process on :8534;
remove UI_PORT / UI_PID_FILE; drop http.server invocation
- Caddyfile: replace split API/:8534 + SPA/:8535 block with a single
strip_prefix + reverse_proxy to :8534