- New app/services/embeddings.py: TURNSTONE_EMBED_* env vars, multi-backend support
- embedder.py delegates to service layer; re-exports EMBEDDING_AVAILABLE for compat
- retriever.py updated to use service layer
- Test coverage updated in tests/context/test_embedder.py
Adds SSH-based log collection from remote hosts via Paramiko.
One SSH connection per host, multiple log types per connection.
New files:
- app/glean/ssh.py: SSHTransport context manager + command builders
for journald, syslog, plaintext, and docker log types
- tests/test_glean_ssh.py: 18 tests for transport layer (all mocked)
- tests/test_glean_pipeline_ssh.py: 15 tests for pipeline integration
Pipeline changes (app/glean/pipeline.py):
- glean_sources() now splits sources into local-file and SSH categories
- SSH sources use transport: ssh + glean: list schema in sources.yaml
- _glean_ssh_source(): one SSHTransport per host, N commands per connection
- _stream_and_write(): SSHCommandError caught per-item so one bad
command does not abort the rest of the host's glean items
- SSHConnectionError skips the entire host with a warning log
SSH source schema (sources.yaml):
- id: rack01
transport: ssh
host: 192.168.1.10
user: admin
key_path: ~/.ssh/id_ed25519
glean:
- type: journald
args: [--since, 2 hours ago]
- type: syslog
path: /var/log/syslog
- type: plaintext
path: /var/log/app/error.log
- type: docker
containers: [myapp, nginx]
Key design decisions:
- Key-based auth only (no password prompts in daemon context)
- exit-status check fires after all stdout lines yielded; callers
drain the iterator to trigger it
- Local file sources path unchanged; SSH sources co-exist in same yaml
- Docker multi-container: one exec_stream call per container,
source_id scoped as host_id/type/container_name
Remaining for #22: REST endpoint, SourcesView UI, sources.yaml docs.
285 → 285 tests passing (33 new SSH tests).
Add blocklist candidate listing, scan trigger, status update,
push/unblock to Pi-hole, and connection test endpoints.
Add pihole_url/version/api_key and router_source_ids/device_names
fields to SettingsBody and prefs handling in patch_settings.
Add PiholeClient.__post_init__ validation so 503 fires naturally
when url/api_key are unconfigured (mock-safe: bypassed in tests).
PiholeClient dataclass supporting both Pi-hole v5 (PHP /admin/api.php)
and v6 (REST /api/) with public block/unblock/test_connection methods.
9 tests covering both API versions, auth flow, and error handling.
Adds patterns/telemetry.yaml with 6 rule groups (samsung, belkin, roku, lg, amazon, advertising).
Adds app/services/blocklist.py with TelemetryRule and BlocklistCandidate dataclasses, load_telemetry_rules(), and matches_telemetry() with exact and subdomain matching.
6 new TestTelemetry tests pass; 199 total passing.
Add blocklist_candidates table and indexes to _SCHEMA in pipeline.py.
Add TestSchema tests verifying table existence, column set, and status/hit_count defaults.
All 193 tests pass.
POST /turnstone/api/ingest/tautulli accepts Tautulli notification agent
payloads and stores them as log_entries under source 'tautulli'. Severity
maps error->CRITICAL, buffer->WARN, all others->None. Optional bearer token
auth via X-Tautulli-Token header + tautulli_token pref. FTS index rebuilt
as a background task after each write. 28 new tests, all passing.
All five parsers (plex, syslog, servarr, qbittorrent, plaintext) were
using .replace(tzinfo=timezone.utc) on naive datetimes parsed from log
files, which slaps a UTC label on what is actually local-time data.
On a UTC-7 system a 2pm entry was stored as 14:00Z instead of 21:00Z,
causing time-window searches to return zero results.
Fix: use .astimezone(timezone.utc) instead, which treats the naive
datetime as local time and converts correctly.
Tests updated to round-trip back to local time for assertion so they
pass on any timezone, not just UTC.
- Add context_block param to summarize() and thread it into _PROMPT_TEMPLATE
- Wire retrieve_context/format_context_block into diagnose_stream() before
log search; emit context SSE event (facts + chunks) to the client
- 3 new tests covering prompt injection and SSE event emission (155 total, all pass)
- Implement document type detection for yaml/json/markdown/text
- Extract service facts from docker-compose YAML (names, images, ports)
- Split text into overlapping word chunks (300-word default with 50-word overlap)
- Enforce 5 MB file size limit
- Comprehensive TDD test suite: 15 tests passing
Turnstone now calls /v1/chat/completions instead of Ollama's /api/generate.
This format works with both local Ollama (>=0.1.24) and a remote cf-orch
coordinator, enabling GPU-less nodes like Contributor2's to route diagnoses through
the cluster without any local model.
- llm.py: OpenAI-compat messages format, optional Bearer auth header
- diagnose.py: thread llm_api_key through the call chain
- rest.py: llm_api_key pref (default empty), SettingsBody field, passed to diagnose
- SettingsView.vue: API Key field, label updated from "Ollama URL" to "LLM Endpoint URL"
- tests: updated mocks for new response shape; added bearer token assertion test
Watermark-based batch export script (scripts/export_corpus.py) pushes up to 500
ERROR/CRITICAL entries and labeled incidents per run to AVOCET_CORPUS_ENDPOINT.
Uses SQLite rowid watermark (entry log) and ISO timestamp watermark (incidents).
Skips silently when AVOCET_CORPUS_ENDPOINT is not set. 19 tests. Closes turnstone#6.
- type: file uses tail -F (handles rotation) with auto-format detection
- _parse_lines dispatches to journald/servarr/qbit/caddy/syslog/plaintext
based on first-line format detection — same logic as batch ingest
- watch.yaml updated with file type docs and example-node-specific example
- scripts/journal-bridge.sh + .service written directly to example-node
Contributor2's watch.yaml covers: system-journal-live (via bridge file),
sonarr, radarr, lidarr, prowlarr, bazarr, qbittorrent, nzbget, tautulli
Adds background watcher that tails active log sources and ingests entries
in near-real-time, keeping the DB fresh without manual ingest runs.
- app/watch/watcher.py: Watcher + WatchSource using subprocess + select
loop; flushes every 10s or 100 lines; syncs FTS index every 3 flushes
- patterns/watch.yaml: declarative source config (journald/docker/podman)
- app/rest.py: lifespan context manager starts/stops watcher on app
startup/shutdown; GET /api/watch/status + POST /api/watch/reload
- web/src/views/DashboardView.vue: live/manual indicator chip + stale
banner copy adapts to whether live watching is active
- tests/test_watch_watcher.py: 16 tests covering config load, command
building, docker timestamp stripping, orchestrator lifecycle
Closes#4
Contributor2's ghcr.io/hotio/qbittorrent:latest container uses a different format
than the classic GUI build: `(N) 2026-04-26T03:32:59 - message` with a
single-char level code before an ISO timestamp, not inside parens.
Added _HOTIO_RE alongside _CLASSIC_RE; unified via _match_line() helper so
parse() loop is unchanged. 28 tests passing, both formats covered.
Adds app/ingest/qbittorrent.py — auto-detected by the pipeline on the
(YYYY/MM/DD HH:MM:SS) timestamp fingerprint. Handles both slash and dash
date separators, optional [Warning|Critical] bracket levels, and
multi-line continuations (Qt stack traces).
patterns/default.yaml: 8 new qbit_ patterns covering tracker errors,
port bind failures, disk errors, hash check failures, peer bans, download
completion, ratio limits, and session errors.
manage.sh: ingest-qbit [HOST] command mirrors ingest-plex — probes known
default log paths locally or via SSH, ingests, restarts server.
14 tests covering format detection, severity mapping, multiline handling,
and timestamp normalization.
Ingest pipeline (journald / Caddy / Docker-wrapped formats) with
per-source state tracking (repeat dedup, out-of-order detection),
named pattern tagging at ingest time, and idempotent SHA1-keyed writes.
FTS5 search layer with porter stemmer, severity/source/pattern/time
filters, and BM25 ranking. MCP server (FastMCP stdio) with three tools:
search_logs, diagnose, list_log_sources — compatible with both
Claude Code and Copilot CLI.
WAL mode enabled on all connections. FTS index auto-built after ingest.
MCP configs included for Claude Code (.mcp.json) and Copilot CLI
(.github/copilot/mcp.json).