Commit graph

88 commits

Author SHA1 Message Date
0b3d95cd26 fix: ingestors treat naive log timestamps as local time, not UTC
All five parsers (plex, syslog, servarr, qbittorrent, plaintext) were
using .replace(tzinfo=timezone.utc) on naive datetimes parsed from log
files, which slaps a UTC label on what is actually local-time data.
On a UTC-7 system a 2pm entry was stored as 14:00Z instead of 21:00Z,
causing time-window searches to return zero results.

Fix: use .astimezone(timezone.utc) instead, which treats the naive
datetime as local time and converts correctly.

Tests updated to round-trip back to local time for assertion so they
pass on any timezone, not just UTC.
2026-05-13 18:16:33 -07:00
251109ae96 fix: final review fixes — port guard, network error handling, wizard back nav, tablist arrow keys, dialog focus trap
- wizard.py: wrap syslog_port int() in try/except to default 514 on non-numeric input
- ContextView: add try/catch to doDelete, doDeleteFact, addFact for network errors
- ContextView: arrow-key navigation for tablist (ArrowLeft/ArrowRight)
- DiagnoseView: arrow-key navigation for tablist (ArrowLeft/ArrowRight)
- WizardOverlay: reset current_step to last schema step when clicking 'Go back and edit'
- WizardOverlay: focus trap on Tab/Shift+Tab within dialog element
2026-05-13 17:40:40 -07:00
e0bfa11642 feat: optional sqlite-vec embedding pipeline for Paid-tier RAG 2026-05-13 16:32:57 -07:00
d8c3eba0f8 feat: context REST API — docs, facts, wizard, and debug endpoints
Wires the context/RAG layer into FastAPI via a dedicated _ctx router
(/turnstone/api/context/*): document upload (POST/GET/DELETE /docs),
fact CRUD (POST/GET/DELETE /facts), wizard state machine
(/wizard/schema, /wizard/step, /wizard/apply), and a debug search
endpoint (/debug/search). All blocking DB calls are dispatched via
asyncio.to_thread to keep the event loop free.
2026-05-13 16:31:07 -07:00
b5ce0a24b2 feat: inject environment context into diagnose pipeline and LLM prompt
- Add context_block param to summarize() and thread it into _PROMPT_TEMPLATE
- Wire retrieve_context/format_context_block into diagnose_stream() before
  log search; emit context SSE event (facts + chunks) to the client
- 3 new tests covering prompt injection and SSE event emission (155 total, all pass)
2026-05-13 16:29:26 -07:00
783edbe496 feat: wizard state machine — structured Q&A writes context facts and source config 2026-05-13 16:25:52 -07:00
ef8d164188 feat: context retriever — keyword fact lookup and chunk search 2026-05-13 16:23:54 -07:00
ebbb1af32d feat: doc upload adapter — writes facts, document, and chunks to context store 2026-05-13 16:21:55 -07:00
b23a60a602 feat: context chunker — type detection, YAML extraction, text chunking
- Implement document type detection for yaml/json/markdown/text
- Extract service facts from docker-compose YAML (names, images, ports)
- Split text into overlapping word chunks (300-word default with 50-word overlap)
- Enforce 5 MB file size limit
- Comprehensive TDD test suite: 15 tests passing
2026-05-13 15:54:51 -07:00
54c756dfe8 feat: context store — fact and document CRUD 2026-05-13 15:53:03 -07:00
7461953021 feat: add context_facts, context_documents, context_chunks tables to schema 2026-05-13 15:51:19 -07:00
784a4072b4 feat: SSE streaming diagnose, severity filter pills, per-source-cap search
- diagnose_stream() async generator: status/summary/entries/reasoning/done events
- POST /api/diagnose/stream SSE endpoint wired in rest.py
- entries_in_window() gains per_source_cap to prevent high-volume sources crowding results
- QuickCapture: severity filter pills, filtered entries view, pipeline status spinner
- llm.py: remove overly broad HTTPStatusError re-raise
2026-05-13 15:45:35 -07:00
b70c89e7b5 feat: try cf-orch task endpoint first; fall back to direct model call
POST /api/inference/task with product=turnstone task=log_analysis routes to
the security reasoning model assigned in cf-orch. Falls back to the OpenAI-
compat /v1/chat/completions path on 404 (no assignment) or if the task
endpoint is absent (local instances, example-node).
2026-05-13 08:20:29 -07:00
caa85b3d30 feat: source-scoped diagnose; multi-node Docker log collection
- Diagnose: add source_filter param threaded through entries_in_window,
  search, _diagnose, and DiagnoseRequest — clicking diagnose on a
  dashboard source now scopes both keyword and window hits to that source
- QuickCapture: read route.query.source; show scope badge with clear ✕;
  auto-run when source param is present without a query
- DashboardView: pass source= (not q=) when navigating to diagnose
- collect_cluster_logs.sh: auto-discover Docker containers on all nodes
  (Heimdall non-watched, Navi, Strahl via SSH); collect Cass Plex logs
  via SSH; write to per-node dirs for directory-mode ingest
- turnstone-cluster.service: add --reload for hot-reload during dev
2026-05-13 08:10:42 -07:00
53fa350adf fix: correct cf-orch port to 7700; fix relative time parsing in diagnose; fix syslog PRI prefix 2026-05-13 05:33:41 -07:00
a4ec5a6951 feat: add UDP syslog receiver for network device log collection
scripts/syslog_receiver.py: asyncio UDP server listening on port 5140,
appends raw syslog lines to network-syslog.txt for the Turnstone live
watcher to tail. Requires no root — port 5140 is non-privileged.

scripts/turnstone-syslog-receiver.service: systemd unit for auto-start.

app/ingest/syslog.py: strip optional RFC 3164 <PRI> prefix before
parsing so network-forwarded syslog (OpenWRT logd, Arista EOS, etc.)
is handled correctly without the PRI value breaking the regex.
2026-05-13 04:58:51 -07:00
a21c158917 fix: increase LLM summarize timeout to 120s for remote cf-orch routing
20s was too tight for first-request model swaps in Ollama (model cold load
can take 30-60s). 120s matches coordinator inference timeout.
2026-05-12 18:27:52 -07:00
7d46314e86 feat: switch LLM backend to OpenAI-compat; add cf-orch remote inference support
Turnstone now calls /v1/chat/completions instead of Ollama's /api/generate.
This format works with both local Ollama (>=0.1.24) and a remote cf-orch
coordinator, enabling GPU-less nodes like Contributor2's to route diagnoses through
the cluster without any local model.

- llm.py: OpenAI-compat messages format, optional Bearer auth header
- diagnose.py: thread llm_api_key through the call chain
- rest.py: llm_api_key pref (default empty), SettingsBody field, passed to diagnose
- SettingsView.vue: API Key field, label updated from "Ollama URL" to "LLM Endpoint URL"
- tests: updated mocks for new response shape; added bearer token assertion test
2026-05-12 12:58:38 -07:00
9cc8bf3662 feat: add file tail source type; configure example-node watchers
- type: file uses tail -F (handles rotation) with auto-format detection
- _parse_lines dispatches to journald/servarr/qbit/caddy/syslog/plaintext
  based on first-line format detection — same logic as batch ingest
- watch.yaml updated with file type docs and example-node-specific example
- scripts/journal-bridge.sh + .service written directly to example-node

Contributor2's watch.yaml covers: system-journal-live (via bridge file),
sonarr, radarr, lidarr, prowlarr, bazarr, qbittorrent, nzbget, tautulli
2026-05-11 15:44:10 -07:00
3fd81e5ab1 feat: live watch mode — tail journald/docker/podman sources continuously (#4)
Adds background watcher that tails active log sources and ingests entries
in near-real-time, keeping the DB fresh without manual ingest runs.

- app/watch/watcher.py: Watcher + WatchSource using subprocess + select
  loop; flushes every 10s or 100 lines; syncs FTS index every 3 flushes
- patterns/watch.yaml: declarative source config (journald/docker/podman)
- app/rest.py: lifespan context manager starts/stops watcher on app
  startup/shutdown; GET /api/watch/status + POST /api/watch/reload
- web/src/views/DashboardView.vue: live/manual indicator chip + stale
  banner copy adapts to whether live watching is active
- tests/test_watch_watcher.py: 16 tests covering config load, command
  building, docker timestamp stripping, orchestrator lifecycle

Closes #4
2026-05-11 15:34:13 -07:00
c12cc6d68a feat: severity overrides + last_ingested timestamp on dashboard 2026-05-11 13:00:11 -07:00
0882083755 feat: LLM reasoning layer — Ollama summarization on diagnose results 2026-05-11 11:35:07 -07:00
05ad314ed5 feat: add POST /api/diagnose and GET/PATCH /api/settings endpoints 2026-05-11 09:10:58 -07:00
ca0cb1361e fix: correct time_detected logic, immutable sort pattern, add diagnose() test 2026-05-11 09:08:24 -07:00
abd142addf feat: add diagnose service with NL time extraction via dateparser
Adds app/services/diagnose.py with parse_time_window() (dateparser-backed
NL time phrase extraction with 60-min fallback) and diagnose() (layered
FTS + window search returning severity/source summary). Includes 5 TDD tests.
2026-05-11 09:04:50 -07:00
346ea6e0c6 feat: syslog and dmesg parsers with graceful journald fallback
- Add syslog.py — RFC 3164 parser for /var/log/syslog, /var/log/messages,
  auth.log, kern.log; ident prepended to message text for searchability
- Add dmesg_log.py — handles both relative [secs.usecs] and human-readable
  [Dow Mon DD HH:MM:SS YYYY] formats; relative timestamps preserved as raw
- Wire both into pipeline.py auto-detection (before plaintext fallback)
- Update export_journal.sh: checks for journalctl availability, falls back
  gracefully on non-systemd systems; adds dmesg -T export (falls back to
  plain dmesg on older kernels)
- Add syslog entries (commented) + dmesg source to sources.yaml
- 30 tests covering both parsers (detection + parse correctness)
2026-05-11 06:57:38 -07:00
f46aba8165 feat: multi-source ingest via sources.yaml + servarr parser
- Add servarr.py parser for all *arr services (sonarr/radarr/lidarr/
  prowlarr/readarr/whisparr/bazarr) — pipe-delimited format with
  component prefix prepended for searchability
- Add ingest_sources() to pipeline.py; reads sources.yaml, skips
  missing paths with a warning so cron keeps running if a service
  is down
- Add --sources mode to ingest_corpus.py CLI; legacy positional args
  unchanged for backward compat
- Add patterns/sources.yaml with all of Contributor2's discovered service
  log paths (qbit, 7 servarr services, nzbget, tautulli, jellyseerr)
- Replace per-service volume mounts in podman-standalone.sh with
  /opt:/opt:ro + /var/log:/var/log:ro; adding a new source now
  requires only editing sources.yaml — no container restart
2026-05-11 06:26:32 -07:00
40bbc9225d fix: support hotio qBittorrent 5.x log format (N/I/W/C single-char level)
Contributor2's ghcr.io/hotio/qbittorrent:latest container uses a different format
than the classic GUI build: `(N) 2026-04-26T03:32:59 - message` with a
single-char level code before an ISO timestamp, not inside parens.

Added _HOTIO_RE alongside _CLASSIC_RE; unified via _match_line() helper so
parse() loop is unchanged. 28 tests passing, both formats covered.
2026-05-11 05:55:40 -07:00
457b4fd7ae feat: incident labeling, bundle export, and push/receive flow
Turnstone incidents now carry an issue_type tag (free-text with datalist
suggestions) used to categorize patterns for signature building.

Backend:
- Incident model gains issue_type; additive ALTER TABLE migration keeps
  existing DBs working without a full schema rebuild
- New received_bundles table stores incoming JSON bundles with indexes on
  bundled_at and issue_type
- build_bundle() assembles incident + related log entries into a versioned
  bundle dict; store_bundle()/list_bundles()/get_bundle() for the receiver
- POST /api/incidents/{id}/send — pushes bundle to TURNSTONE_BUNDLE_ENDPOINT
- GET  /api/incidents/{id}/bundle — export without sending
- POST /api/bundles — receive and store an incoming bundle
- GET  /api/bundles — list all received bundles
- TURNSTONE_SOURCE_HOST and TURNSTONE_BUNDLE_ENDPOINT env vars; auto-set
  source host from hostname in podman-standalone.sh

Frontend:
- Incidents form: issue_type field with datalist suggestions; Type column
  in the table; Send Bundle button + status feedback in the detail drawer
- New BundlesView: collapsible bundle rows, inline JSON parse (no extra
  round-trip), Export JSON download button
- Router and nav updated with /bundles route
2026-05-11 05:23:55 -07:00
f5893c6003 feat: dashboard view, stats API, and composite index for query perf
- Add GET /api/stats endpoint with 24h windowed aggregation (criticals,
  errors, per-source health, recent criticals list)
- Fix timestamp format bug: strftime('%Y-%m-%dT%H:%M:%S', ...) to match
  stored ISO-8601 T-separated timestamps (datetime('now') uses space)
- Add composite index idx_ts_repeat(timestamp_iso, repeat_count) — drops
  stats query from 3.5 s to <1 ms by resolving both WHERE conditions
  from the index without table row fetches
- New DashboardView: 3 stat cards, source health table with health dots,
  diagnose-per-source button, recent criticals panel, zero-state card
- Router default / → /dashboard; Dashboard first in nav
- DiagnoseView: reads ?q= query param on mount and auto-runs; shows
  formatted LLM summary block
- LogEntryRow: expand/collapse for long entries (>200 chars or multiline)
2026-05-11 03:41:55 -07:00
a3c0962277 feat: qBittorrent log ingestor with 8 diagnostic patterns
Adds app/ingest/qbittorrent.py — auto-detected by the pipeline on the
(YYYY/MM/DD HH:MM:SS) timestamp fingerprint. Handles both slash and dash
date separators, optional [Warning|Critical] bracket levels, and
multi-line continuations (Qt stack traces).

patterns/default.yaml: 8 new qbit_ patterns covering tracker errors,
port bind failures, disk errors, hash check failures, peer bans, download
completion, ratio limits, and session errors.

manage.sh: ingest-qbit [HOST] command mirrors ingest-plex — probes known
default log paths locally or via SSH, ingests, restarts server.

14 tests covering format detection, severity mapping, multiline handling,
and timestamp normalization.
2026-05-10 08:21:16 -07:00
19d3827e2d fix: bypass FTS ranking for named-source error retrieval
When diagnose() auto-detects a source name, FTS keyword scoring can
bury real errors whose text doesn't match the symptom query. Add
recent_source_errors() — a plain-SQL scan ordered by timestamp — so
the most recent errors from a known service always surface regardless
of keyword overlap.
2026-05-10 08:14:23 -07:00
d63dc2a714 feat: incident tagging — DB schema, CRUD service, REST API (#1)
- Add `incidents` table to SQLite schema (id, label, started_at, ended_at,
  notes, created_at, severity)
- Extract `ensure_schema()` from ingest pipeline so tables are always
  created at startup, not only during ingest
- New `app/services/incidents.py`: create/list/get/delete + time-window
  entry association (FTS keyword search + raw window fallback)
- New `entries_in_window()` in search.py: plain SQL scan for incident
  detail when keyword FTS returns nothing
- REST endpoints: POST/GET /api/incidents, GET/DELETE /api/incidents/{id}
- Incident detail returns up to 100 associated log entries sorted by
  timestamp, prioritising FTS keyword hits then ERROR/CRITICAL then all
2026-05-09 15:37:14 -07:00
f6b261280c feat: plain-text and Plex log ingestors
- app/ingest/plex.py: Plex Media Server log parser
  Regex-based line parser for 'Mon DD, YYYY HH:MM:SS.mmm [pid] LEVEL - msg'
  format. Handles multi-line entries (stack traces). Detects plex_eae_failure
  and all other patterns via shared pattern library.
- app/ingest/plaintext.py: generic fallback parser for unrecognized formats
  Extracts timestamps (ISO 8601, syslog, common log) and severity via regex.
- pipeline.py: detect plex format via is_plex_log(); fall back to plaintext
  instead of skipping; process *.log files alongside *.jsonl; add ingest_file()
  for single-file ingestion.
- scripts/ingest_corpus.py: accept single file or directory as target
- manage.sh: ingest-plex command SSHes to Cass (or HOST arg), pulls
  Plex Media Server.log, and ingests it directly
2026-05-08 17:50:01 -07:00
a3e8ebb0d9 fix: mount all routes at /turnstone prefix for direct LAN access
Vite builds with base='/turnstone/' so asset paths in index.html are
/turnstone/assets/*. Serving FastAPI at root / meant direct hits to
port 8534 got index.html for asset requests (blank page).

- All routes now under /turnstone (APIRouter prefix + StaticFiles mount
  at /turnstone/assets + SPA catch-all at /turnstone/{path})
- Root / redirects to /turnstone/
- Caddy block reverted to no-strip: both direct LAN and Caddy access
  hit the same paths, no per-host routing differences
2026-05-08 17:45:34 -07:00
9e46cd4c7f fix: serve Vue SPA from FastAPI, drop separate port 8535
Python http.server can't do SPA routing and Caddy was forwarding
/turnstone/* paths that the static server couldn't resolve.

- app/rest.py: mount web/dist/assets as StaticFiles; add SPA catch-all
  route that serves index.html for any unmatched path
- manage.sh: start/stop/status simplified to single process on :8534;
  remove UI_PORT / UI_PID_FILE; drop http.server invocation
- Caddyfile: replace split API/:8534 + SPA/:8535 block with a single
  strip_prefix + reverse_proxy to :8534
2026-05-08 17:27:46 -07:00
eef84d55be feat: Vue 3 frontend and FastAPI REST layer
- app/rest.py: FastAPI app wrapping search/diagnose/sources with CORS
- web/: Vue 3 + Vite + UnoCSS + Pinia frontend at port 8535
  - LogSearchView: sidebar filters (source, severity, limit) + FTS search
  - DiagnoseView: layered symptom investigation matching MCP diagnose tool
  - SourcesView: corpus table with entry count, error count, time range
  - LogEntryRow: severity badge, pattern chips, repeat count, timestamp
  - StatusDot: live API health indicator in nav
- scripts/start_dev.sh: launch FastAPI (:8534) + Vite dev server (:8535)
- .gitignore: add web/node_modules/ and web/dist/
- Caddy: /turnstone* route added to menagerie.circuitforge.tech block
  (API → :8534 with /turnstone strip, SPA fallback → :8535)
2026-05-08 16:27:59 -07:00
bbe4b1e360 feat: initial Turnstone POC — ingest, FTS search, MCP server
Ingest pipeline (journald / Caddy / Docker-wrapped formats) with
per-source state tracking (repeat dedup, out-of-order detection),
named pattern tagging at ingest time, and idempotent SHA1-keyed writes.

FTS5 search layer with porter stemmer, severity/source/pattern/time
filters, and BM25 ranking. MCP server (FastMCP stdio) with three tools:
search_logs, diagnose, list_log_sources — compatible with both
Claude Code and Copilot CLI.

WAL mode enabled on all connections. FTS index auto-built after ingest.
MCP configs included for Claude Code (.mcp.json) and Copilot CLI
(.github/copilot/mcp.json).
2026-05-08 12:12:34 -07:00