source_ids with 3+ colon segments (e.g. muninn-journal:Muninn:ssh.service)
are now aggregated by their prefix:host key at the SQL level in both
list_sources() and stats_summary(). This collapses ~19K transient systemd
unit rows (crash-loop scope entries from Muninn) into ~24 grouped rows.
- list_sources: SQL CASE/INSTR group-by stem + unit_count field
- stats_summary: same stem grouping for dashboard source health table
- delete endpoint: LIKE-based cascade delete covers grouped stems
- SourcesView: unit_count badge (e.g. "2686 units") on grouped rows;
delete confirmation names the unit count when deleting a group
- Bump version to v0.6.1
Adds late-fusion hybrid search to Turnstone's log retrieval layer:
hybrid_score = 0.6 * bm25_normalized + 0.4 * cosine_similarity
Implementation:
- _bm25_search() extracts the existing FTS5 BM25 path as a named helper
- _hybrid_search() fetches an oversized BM25 candidate pool (5x limit,
min 100), embeds the query and each candidate text in-process via the
existing embeddings service, normalizes BM25 rank to [0,1], combines
with cosine similarity, and re-ranks
- search() gets semantic=False param that dispatches to _hybrid_search()
when True; pure BM25 remains the default for all existing call sites
- diagnose_stream() enables semantic=True so symptom-based queries
("database connection failed") surface semantically equivalent entries
("ECONNREFUSED", "backend gone away", "max retries exceeded")
- /api/search REST endpoint exposes ?semantic=true query param
Graceful degradation: falls back silently to pure BM25 when the embedding
backend is unavailable (EMBEDDING_AVAILABLE=False) or when embed_batch
raises an exception. No new infra — in-process numpy cosine, no vector DB.
11 new tests: BM25 helper, hybrid re-ranking, fallback paths, dispatcher.
372 + 11 = 383 tests passing.
Closes: #15
Watcher, REST endpoints, services (search, incidents, blocklist),
MCP server, context retriever, embedder, glean_scheduler, and
doc_upload all used the default 5-second SQLite busy timeout.
During collect glean write phases, watcher flush threads were hitting
'database is locked' errors when the glean held the write lock longer
than 5 seconds.
All connections now use timeout=30.0, matching the pipeline fix
from commit 6882248. No logic changes.
- Diagnose: add source_filter param threaded through entries_in_window,
search, _diagnose, and DiagnoseRequest — clicking diagnose on a
dashboard source now scopes both keyword and window hits to that source
- QuickCapture: read route.query.source; show scope badge with clear ✕;
auto-run when source param is present without a query
- DashboardView: pass source= (not q=) when navigating to diagnose
- collect_cluster_logs.sh: auto-discover Docker containers on all nodes
(Heimdall non-watched, Navi, Strahl via SSH); collect Cass Plex logs
via SSH; write to per-node dirs for directory-mode ingest
- turnstone-cluster.service: add --reload for hot-reload during dev
- Add GET /api/stats endpoint with 24h windowed aggregation (criticals,
errors, per-source health, recent criticals list)
- Fix timestamp format bug: strftime('%Y-%m-%dT%H:%M:%S', ...) to match
stored ISO-8601 T-separated timestamps (datetime('now') uses space)
- Add composite index idx_ts_repeat(timestamp_iso, repeat_count) — drops
stats query from 3.5 s to <1 ms by resolving both WHERE conditions
from the index without table row fetches
- New DashboardView: 3 stat cards, source health table with health dots,
diagnose-per-source button, recent criticals panel, zero-state card
- Router default / → /dashboard; Dashboard first in nav
- DiagnoseView: reads ?q= query param on mount and auto-runs; shows
formatted LLM summary block
- LogEntryRow: expand/collapse for long entries (>200 chars or multiline)
When diagnose() auto-detects a source name, FTS keyword scoring can
bury real errors whose text doesn't match the symptom query. Add
recent_source_errors() — a plain-SQL scan ordered by timestamp — so
the most recent errors from a known service always surface regardless
of keyword overlap.
- Add `incidents` table to SQLite schema (id, label, started_at, ended_at,
notes, created_at, severity)
- Extract `ensure_schema()` from ingest pipeline so tables are always
created at startup, not only during ingest
- New `app/services/incidents.py`: create/list/get/delete + time-window
entry association (FTS keyword search + raw window fallback)
- New `entries_in_window()` in search.py: plain SQL scan for incident
detail when keyword FTS returns nothing
- REST endpoints: POST/GET /api/incidents, GET/DELETE /api/incidents/{id}
- Incident detail returns up to 100 associated log entries sorted by
timestamp, prioritising FTS keyword hits then ERROR/CRITICAL then all
Ingest pipeline (journald / Caddy / Docker-wrapped formats) with
per-source state tracking (repeat dedup, out-of-order detection),
named pattern tagging at ingest time, and idempotent SHA1-keyed writes.
FTS5 search layer with porter stemmer, severity/source/pattern/time
filters, and BM25 ranking. MCP server (FastMCP stdio) with three tools:
search_logs, diagnose, list_log_sources — compatible with both
Claude Code and Copilot CLI.
WAL mode enabled on all connections. FTS index auto-built after ingest.
MCP configs included for Claude Code (.mcp.json) and Copilot CLI
(.github/copilot/mcp.json).