Adds late-fusion hybrid search to Turnstone's log retrieval layer:
hybrid_score = 0.6 * bm25_normalized + 0.4 * cosine_similarity
Implementation:
- _bm25_search() extracts the existing FTS5 BM25 path as a named helper
- _hybrid_search() fetches an oversized BM25 candidate pool (5x limit,
min 100), embeds the query and each candidate text in-process via the
existing embeddings service, normalizes BM25 rank to [0,1], combines
with cosine similarity, and re-ranks
- search() gets semantic=False param that dispatches to _hybrid_search()
when True; pure BM25 remains the default for all existing call sites
- diagnose_stream() enables semantic=True so symptom-based queries
("database connection failed") surface semantically equivalent entries
("ECONNREFUSED", "backend gone away", "max retries exceeded")
- /api/search REST endpoint exposes ?semantic=true query param
Graceful degradation: falls back silently to pure BM25 when the embedding
backend is unavailable (EMBEDDING_AVAILABLE=False) or when embed_batch
raises an exception. No new infra — in-process numpy cosine, no vector DB.
11 new tests: BM25 helper, hybrid re-ranking, fallback paths, dispatcher.
372 + 11 = 383 tests passing.
Closes: #15
Adds _HYBRID_BERT_LABEL_MAP to translate the 7-class output vocabulary of
krishnas4415/log-anomaly-detection-models (Hybrid-BERT, MIT) to Turnstone
SeverityLabel. _map_label now checks the Hybrid-BERT map before the standard
map so either model family works via TURNSTONE_CLASSIFIER_MODEL without any
additional code path.
Mapping (confirmed from model config.json):
normal → INFO
security_anomaly → ERROR
system_failure → CRITICAL
performance_issue → WARN
network_anomaly → WARN
config_error → ERROR
hardware_issue → CRITICAL
Keyword-based CRITICAL promotion and low-confidence DEBUG demotion apply on
top of the base mapping (same rules as the standard vocabulary).
11 new tests covering all 7 Hybrid-BERT labels, case-insensitivity, and
regression on standard-vocabulary labels. 372 tests passing total.
Note: custom loading code for the non-standard .pt checkpoint format is
explicitly out of scope — evaluate better-packaged HF alternatives first
(see #41 for candidate list).
Closes: #41
FTS5 bulk-insert write locks starved the incident API and bundle endpoints
during log bursts (sonarr/radarr, high-volume docker sources). Fix mirrors
the context_facts split (context -> turnstone-context.db):
- Add INCIDENTS_DB_PATH / TURNSTONE_INCIDENTS_DB env var in rest.py
- Add _INCIDENTS_SCHEMA, ensure_incidents_schema(), and
migrate_incidents_to_dedicated_db() in glean/pipeline.py
- Stub out incidents/received_bundles/sent_bundles in _SCHEMA (no-op
CREATE IF NOT EXISTS) so legacy single-file deployments still open
- Thread incidents_db_path through diagnose_stream -> run_pipeline ->
FalsePositiveSuppressor.suppress -> _fetch_resolved_incidents
- One-shot migration on startup: copy existing rows from main DB to
incidents DB via INSERT OR IGNORE (idempotent, safe to re-run)
- Fix test_blocklist_endpoints fixtures to patch CONTEXT_DB_PATH and
INCIDENTS_DB_PATH alongside DB_PATH (worktree has no data/ dir)
372 tests passing.
Closes: #60
Truncation fix: call_llm() in _llm_client.py now accepts max_tokens (default
2048) and passes it in both the cf-orch task payload and the OpenAI-compat
fallback body. Hypothesizer uses max_tokens=1024 (JSON array output);
synthesizer and legacy summarize use 2048 (structured 5-section narrative).
Without this, backends use their own default (often 512 tokens), causing
mid-sentence truncation of the diagnosis output.
UI fix: reasoning card changed from bg-accent/5 border-accent/30 (opacity
modifiers on CSS variables don't compose reliably across themes) to the
callout pattern: bg-surface-raised with a solid border-l-4 border-accent.
Header label changed from text-text-dim to text-accent for visual anchoring.
Text remains text-text-primary for guaranteed contrast on both light and dark
themes.
Tracks: #56 (technical-level post-processor, filed as follow-on feature)
Watcher, REST endpoints, services (search, incidents, blocklist),
MCP server, context retriever, embedder, glean_scheduler, and
doc_upload all used the default 5-second SQLite busy timeout.
During collect glean write phases, watcher flush threads were hitting
'database is locked' errors when the glean held the write lock longer
than 5 seconds.
All connections now use timeout=30.0, matching the pipeline fix
from commit 5a9281a. No logic changes.
context_facts, context_documents, and context_chunks now live in
turnstone-context.db (sibling of turnstone.db). The glean scheduler
held write locks on the main DB long enough to cause 5-second timeout
failures on context fact inserts; separate files have independent WAL
write locks so they never contend.
Changes:
- pipeline.py: extract _CONTEXT_SCHEMA + ensure_context_schema()
- rest.py: CONTEXT_DB_PATH (TURNSTONE_CONTEXT_DB env var, defaults to
sibling file); init via ensure_context_schema(); all context routes
pass CONTEXT_DB_PATH; diagnose_stream receives context_db_path kwarg
- diagnose/__init__.py: diagnose_stream() accepts context_db_path
(falls back to db_path for backward compat); retrieve_context uses it
- store.py: sqlite3.connect() timeout=30.0 — Python driver retry loop
is independent of PRAGMA busy_timeout; needed for any remaining
contention during test or single-file deployments
Closes: #42
Reasoning models (e.g. foundation-sec-8b) emit valid JSON then repeat it
inside a markdown fence block. json.loads() fails on the combined text.
extract_first_json_array() scans for the first '[' and walks to its
matching ']' with proper string/escape/nesting handling, then returns
just that slice. Combined with strip_json_fences(), this handles all
observed output patterns:
- bare JSON array (standard models)
- fenced JSON array (fence-wrapping models)
- bare array followed by fenced repeat (reasoning models)
Makes the HuggingFace classifier model for Stage 2 configurable via
TURNSTONE_CLASSIFIER_MODEL. When unset (default), Stage 2 falls back
to pattern_tags then regex — no download required on first run.
Also documents TURNSTONE_MULTI_AGENT_DIAGNOSE, TURNSTONE_CLASSIFIER_MODEL,
TURNSTONE_EMBED_BACKEND/MODEL/DEVICE in .env.example.
- #33: Wrap ClassifiedTimeline.cluster_severities in MappingProxyType for
true immutability (frozen=True only blocks field reassignment, not dict
mutation).
- #34: Remove dead suppression branch in synthesizer._build_hypothesis_block.
active[] is already filtered to not rh.suppress, so the 'Yes — suppressed'
branch was unreachable. Now shows novelty score only.
- #35: Extract shared _llm_client.py with call_llm() + extract_content() +
strip_json_fences(). Both RootCauseHypothesizer and SummarySynthesizer
now import from one source. Also strips JSON fences from LLM output before
parsing in hypothesizer._parse_response.
- #36: Add per-stage try/except in pipeline.run_pipeline(). Unhandled
stage exceptions now emit {type: 'error'} + {type: 'done'} SSE events
instead of silently closing the stream.
- #37: Move format_context_block() call inside the legacy LLM branch in
diagnose/__init__.py — it was being computed unconditionally but only
used in the non-pipeline path.
- #38: Coerce supporting_cluster_ids items to str() in hypothesizer
_parse_response to guard against LLMs returning integers instead of
string cluster IDs.
Was suppressing when novelty_score < 0.85 (i.e. similarity > 0.15), which
would suppress nearly every hypothesis once embeddings are active.
Now suppresses when max_sim >= similarity_threshold (0.85), meaning only
hypotheses that are 85%+ similar to a resolved incident are suppressed.
Also renames suppress_threshold → similarity_threshold for clarity and
adds a borderline boundary test (0.85 suppressed, 0.84 not suppressed).
Closes: #29
- Implements FalsePositiveSuppressor using embedding cosine similarity
- Lazy corpus embedding via get_embedder() with module-level cache keyed by db_path
- Cache invalidated automatically when the resolved incident corpus changes
- Suppresses hypotheses with novelty_score below configurable threshold (default 0.85)
- Full fallback path (novelty=1.0, no suppression) when model_id empty, embedding
service unavailable, or no resolved incidents found in DB
- Graceful handling of missing incidents table and DB query failures
- Numpy bool_ leakage prevented by explicit float()/bool() coercion at assignment
- Pure-Python cosine fallback for environments without numpy
- 9 new tests (all mocked, no real model downloads): passthrough, suppress, no-suppress,
empty list, ranking, empty corpus, DB failure, service unavailable, cache invalidation
- 350 total tests passing (341 pre-existing + 9 new)
Closes: #29
- Add _coerce_float() module-level helper: catches TypeError/ValueError from
non-numeric LLM output (e.g. 'high', 'N/A') and returns a caller-supplied
default instead of raising.
- Replace float(item.get('confidence', 0.5)) with
_coerce_float(item.get('confidence'), 0.5) in _parse_response.
- Guard supporting_cluster_ids: tuple(item.get(...) or []) so a JSON null
from the LLM does not cause TypeError('NoneType is not iterable').
- runbook_refs is hardcoded as () and not sourced from LLM output; no change
needed there.
- Add test_non_numeric_confidence_uses_default (Test 10) to cover the 'high'
string case: asserts no exception and confidence == 0.5.
- 341 tests passing (+1).
Closes: #29
Three-path classification: ML (transformers pipeline, lazy singleton) →
pattern_tags (YAML pattern severity dict) → regex (detect_severity).
- Path A: HF text-classification pipeline loaded lazily on first classify()
call via module-level singleton; shim promotes ERROR+keyword hits to CRITICAL
and demotes low-confidence INFO to DEBUG.
- Path B: maps cluster.pattern_tags through the loaded pattern severity dict;
picks the highest severity across matching tags.
- Path C: falls back to detect_severity() regex scan on representative_text;
defaults to INFO when no keyword matches.
- Pattern file resolved from constructor arg or TURNSTONE_PATTERNS env var
(mirrors app/rest.py convention).
- No crash when transformers is not installed; ImportError on per-cluster ML
inference triggers clean per-cluster fallback to pattern_tags/regex.
- ClassifiedTimeline.classifier_used reflects the primary session path.
Tests (10 new, 328 total, all passing):
- ML ERROR, CRITICAL promotion, DEBUG demotion, WARNING→WARN
- pattern_tags resolution from YAML fixture
- regex ERROR detection and INFO default
- ImportError clean fallback
- empty timeline no-crash
- ClassifiedTimeline FrozenInstanceError on mutation
Closes: #29
- Add gap_significance_seconds constructor param (default 30) to replace hardcoded magic number in gap_count computation
- _parse_iso now returns datetime | None with try/except on ValueError; all callers handle None return by treating malformed timestamps as absent
- Extract reconstruct into four private helpers: _sort_entries, _group_into_raw_clusters, _build_cluster, _dominant_sources_tuple
- Promote _sort_key to module-level function (was nested inside reconstruct)
- Rename old module-level _build_cluster to _make_event_cluster to avoid name collision with new instance method
- Add explanatory comment to type: ignore[arg-type] at _highest_severity call site
- Black-formatted
- Move app/services/diagnose.py verbatim to app/services/diagnose/legacy.py
- Create app/services/diagnose/__init__.py with full implementation so that
patch('app.services.diagnose._HAS_DATEPARSER') targets the correct namespace
and all 303 existing tests continue to pass without modification
- Add app/services/diagnose/models.py with 5 pipeline dataclasses:
EventCluster, TimelineResult, ClassifiedTimeline, Hypothesis, RankedHypothesis
- Add app/services/diagnose/pipeline.py with run_pipeline() stub (Task 6)
- Add MULTI_AGENT_ENABLED feature flag (off by default via env var)
- Zero behavior change; ruff clean
Closes: #29
- New app/services/embeddings.py: TURNSTONE_EMBED_* env vars, multi-backend support
- embedder.py delegates to service layer; re-exports EMBEDDING_AVAILABLE for compat
- retriever.py updated to use service layer
- Test coverage updated in tests/context/test_embedder.py
Add blocklist candidate listing, scan trigger, status update,
push/unblock to Pi-hole, and connection test endpoints.
Add pihole_url/version/api_key and router_source_ids/device_names
fields to SettingsBody and prefs handling in patch_settings.
Add PiholeClient.__post_init__ validation so 503 fires naturally
when url/api_key are unconfigured (mock-safe: bypassed in tests).
PiholeClient dataclass supporting both Pi-hole v5 (PHP /admin/api.php)
and v6 (REST /api/) with public block/unblock/test_connection methods.
9 tests covering both API versions, auth flow, and error handling.
Adds patterns/telemetry.yaml with 6 rule groups (samsung, belkin, roku, lg, amazon, advertising).
Adds app/services/blocklist.py with TelemetryRule and BlocklistCandidate dataclasses, load_telemetry_rules(), and matches_telemetry() with exact and subdomain matching.
6 new TestTelemetry tests pass; 199 total passing.
The relative-time regex only matched digits between 'last/past' and
the unit, so 'last few hours' fell through to dateparser which then
found the bare word 'hours' and resolved it as midnight local time.
Extended the regex to capture 'few', 'couple of', 'several', 'a few'
as approximate quantifiers, mapped to 3 units each. Numeric expressions
and bare 'last hour' still work as before.
- Add context_block param to summarize() and thread it into _PROMPT_TEMPLATE
- Wire retrieve_context/format_context_block into diagnose_stream() before
log search; emit context SSE event (facts + chunks) to the client
- 3 new tests covering prompt injection and SSE event emission (155 total, all pass)
POST /api/inference/task with product=turnstone task=log_analysis routes to
the security reasoning model assigned in cf-orch. Falls back to the OpenAI-
compat /v1/chat/completions path on 404 (no assignment) or if the task
endpoint is absent (local instances, example-node).
- Diagnose: add source_filter param threaded through entries_in_window,
search, _diagnose, and DiagnoseRequest — clicking diagnose on a
dashboard source now scopes both keyword and window hits to that source
- QuickCapture: read route.query.source; show scope badge with clear ✕;
auto-run when source param is present without a query
- DashboardView: pass source= (not q=) when navigating to diagnose
- collect_cluster_logs.sh: auto-discover Docker containers on all nodes
(Heimdall non-watched, Navi, Strahl via SSH); collect Cass Plex logs
via SSH; write to per-node dirs for directory-mode ingest
- turnstone-cluster.service: add --reload for hot-reload during dev
Turnstone now calls /v1/chat/completions instead of Ollama's /api/generate.
This format works with both local Ollama (>=0.1.24) and a remote cf-orch
coordinator, enabling GPU-less nodes like Contributor2's to route diagnoses through
the cluster without any local model.
- llm.py: OpenAI-compat messages format, optional Bearer auth header
- diagnose.py: thread llm_api_key through the call chain
- rest.py: llm_api_key pref (default empty), SettingsBody field, passed to diagnose
- SettingsView.vue: API Key field, label updated from "Ollama URL" to "LLM Endpoint URL"
- tests: updated mocks for new response shape; added bearer token assertion test
Turnstone incidents now carry an issue_type tag (free-text with datalist
suggestions) used to categorize patterns for signature building.
Backend:
- Incident model gains issue_type; additive ALTER TABLE migration keeps
existing DBs working without a full schema rebuild
- New received_bundles table stores incoming JSON bundles with indexes on
bundled_at and issue_type
- build_bundle() assembles incident + related log entries into a versioned
bundle dict; store_bundle()/list_bundles()/get_bundle() for the receiver
- POST /api/incidents/{id}/send — pushes bundle to TURNSTONE_BUNDLE_ENDPOINT
- GET /api/incidents/{id}/bundle — export without sending
- POST /api/bundles — receive and store an incoming bundle
- GET /api/bundles — list all received bundles
- TURNSTONE_SOURCE_HOST and TURNSTONE_BUNDLE_ENDPOINT env vars; auto-set
source host from hostname in podman-standalone.sh
Frontend:
- Incidents form: issue_type field with datalist suggestions; Type column
in the table; Send Bundle button + status feedback in the detail drawer
- New BundlesView: collapsible bundle rows, inline JSON parse (no extra
round-trip), Export JSON download button
- Router and nav updated with /bundles route
- Add GET /api/stats endpoint with 24h windowed aggregation (criticals,
errors, per-source health, recent criticals list)
- Fix timestamp format bug: strftime('%Y-%m-%dT%H:%M:%S', ...) to match
stored ISO-8601 T-separated timestamps (datetime('now') uses space)
- Add composite index idx_ts_repeat(timestamp_iso, repeat_count) — drops
stats query from 3.5 s to <1 ms by resolving both WHERE conditions
from the index without table row fetches
- New DashboardView: 3 stat cards, source health table with health dots,
diagnose-per-source button, recent criticals panel, zero-state card
- Router default / → /dashboard; Dashboard first in nav
- DiagnoseView: reads ?q= query param on mount and auto-runs; shows
formatted LLM summary block
- LogEntryRow: expand/collapse for long entries (>200 chars or multiline)
When diagnose() auto-detects a source name, FTS keyword scoring can
bury real errors whose text doesn't match the symptom query. Add
recent_source_errors() — a plain-SQL scan ordered by timestamp — so
the most recent errors from a known service always surface regardless
of keyword overlap.
- Add `incidents` table to SQLite schema (id, label, started_at, ended_at,
notes, created_at, severity)
- Extract `ensure_schema()` from ingest pipeline so tables are always
created at startup, not only during ingest
- New `app/services/incidents.py`: create/list/get/delete + time-window
entry association (FTS keyword search + raw window fallback)
- New `entries_in_window()` in search.py: plain SQL scan for incident
detail when keyword FTS returns nothing
- REST endpoints: POST/GET /api/incidents, GET/DELETE /api/incidents/{id}
- Incident detail returns up to 100 associated log entries sorted by
timestamp, prioritising FTS keyword hits then ERROR/CRITICAL then all