Both log_fts and log_entries have timestamp_iso, severity, source_id, and
matched_patterns columns. After the JOIN, unqualified references to any of
these caused SQLite to raise 'ambiguous column name', silently falling back
to the non-FTS scan path on every time-filtered or severity-filtered query.
Prefix all filter conditions that touch FTS-mirror columns with f. to
resolve the ambiguity. The e. prefix on tenant_id was already correct since
tenant_id is not present in the FTS virtual table.
Stage 5 (SummarySynthesizer) was only sending aggregate timeline stats to the
LLM (cluster count, burst count, gap count) — the actual sequenced cluster data
that Stage 1 reconstructed was never included. The LLM had no per-cluster
timestamps, severity, burst flags, silence gaps, or representative text to
write the TIMELINE section from.
Added _build_timeline_block() to emit a numbered per-cluster summary matching
the format Stage 3 uses for the hypothesizer, and included it in the user
message alongside the hypothesis block.
Also fixed _build_hypothesis_block() to include the 2-4 sentence description
each hypothesis carries — previously only the title and novelty score reached
the LLM.
11 new tests cover _build_timeline_block() directly (burst label, gap threshold,
pattern tags, text truncation at 200 chars, null start_iso, multi-cluster
numbering). 529 tests passing.
- app/services/diagnose/_llm_client.py: strip <think>…</think> blocks
(case-insensitive, multiline) from LLM response content before it
reaches the UI or any JSON parser — affects DeepSeek-R1, Qwen QwQ,
and any other model that emits chain-of-thought in content
- app/rest.py: suggest_sources now also returns untracked_names — query
tokens that look like hostnames/service names but don't appear in any
monitored source, so the UI can prompt the user to add them
- web/src/components/ChatDiagnose.vue: show amber "Not monitoring: X"
banner with "Add as a log source →" link when untracked_names present
- tests/test_llm_client.py: 13 tests covering think-strip edge cases
(single/multi-line, multiple blocks, case-insensitive, only-thinking)
plus existing extract_content and JSON-fence helpers
Implements the Orchard branch grafting system for harvest.circuitforge.tech:
- POST /api/orchard/graft: provisions data dir, starts a new
turnstone-submissions-<slug> Docker container on the next free port
(ORCHARD_PORT_BASE=8538+), injects a handle_path block into the
Caddyfile dynamic-branches marker section, restarts caddy-proxy,
returns {submit_endpoint, api_key}
- GET /api/orchard/branches: list active/inactive branches (admin-only)
- DELETE /api/orchard/branches/<slug>: deactivate branch + stop container
- POST /api/orchard/branches/<slug>/anonymize: HMAC-based IP/username
pseudonymization worker over a branch DB
- POST /api/glean/batch: optional TURNSTONE_BRANCH_KEY auth guard
- anonymized column added to log_entries schema (migration-safe)
- Updated Caddyfile with /huginn/* route (port 8536), /node2/* (8537),
and dynamic-branch marker section
- All endpoints admin-gated via TURNSTONE_ORCHARD_ADMIN_KEY
Closes: #27
Add scan_log_directories() to discover.py that recursively walks
/var/log and /opt, filters to readable log files, and scores each
candidate by recency (mtime, 0.7 weight), file size (0.3), and
keyword match against an optional problem-context query (shifts
weights to 0.4/0.2/0.4 when a query is provided).
- GET /api/setup/scan?query=...&max_results=N — new API endpoint
- SourcesView: "Scan" button opens a panel with ranked candidates,
checkboxes, and "Add selected" to write to sources.yaml
- 13 new unit tests, 466 passing total
Closes: #23
- Rename patterns/sources-example-node.yaml → patterns/sources-example.yaml
and update header/comments to be host-agnostic
- Replace internal node names in gen_corpus.py _HOSTS with generic names
- Replace example-node hostname in syslog test fixtures with testhost
- Replace example-node example in mcp_server.py doc with myserver
- Replace private LAN IP (<YOUR_HOST_IP>) in docker-standalone.sh with
<HEIMDALL_LAN_IP> placeholder
- Replace private IPs in sources-cluster.yaml comments with <YOUR_HOST_IP>
- Remove instance-specific hostname from llm.py fallback comment
- Replace Caddy example domain in podman-standalone.sh with placeholder
Security Alerts:
- Client-side duplicate collapsing via anomaly_label + text fingerprint
- ×N count badge chip on collapsed rows; toggle to expand
- Skeleton shimmer rows replace "Loading..." text
Dashboard:
- Clickable Recent Criticals — inline LLM explanation via SSE stream
- ±5 min time window scoped to source_id for useful context
- Explanation cache keyed by entry_id (no re-fetch on re-expand)
- Default diagnose query injected on Diagnose button navigation to
prevent local models hallucinating from bare log data
- Stat card and source-health skeleton shimmer loading states
Backend:
- anomaly.py: 4-attempt retry on "database is locked" with 10s backoff
- search.py: migrate build_fts_index to get_conn() (WAL race fix);
add timeline_events to stats_summary for clickable criticals feature
- theme.css: @keyframes shimmer + .loading-shimmer utility;
prefers-reduced-motion degrades gracefully to static muted block
Second-pass cybersec classifier using DeBERTa-v3-base-mnli (already
cached — no download required). Runs after each anomaly scoring pass on
entries flagged by the anomaly scorer or with pattern matches.
Architecture:
- app/services/cybersec.py: zero-shot-classification pipeline with 5
cybersec candidate labels (auth failure, privilege escalation, network
intrusion, malware, data exfiltration). Writes ml_score/ml_label/
ml_scored_at to log_entries; inserts high-confidence hits into
detections with scorer='cybersec'.
- app/tasks/cybersec_scorer.py: async background task (same shape as
anomaly_scorer.py).
- REST: GET/POST /turnstone/api/cybersec/status|run|detections.
GET /turnstone/api/anomaly/detections now accepts scorer= filter.
Schema: ml_score, ml_label, ml_scored_at added to log_entries; scorer
column added to detections (idempotent migrations + DDL for both SQLite
and Postgres).
UI: Security Alerts view gains Source dropdown (All / Anomaly / Cybersec)
and cybersec scorer status badge. Label dropdown split into optgroups.
Deployment: TURNSTONE_CYBERSEC_MODEL/DEVICE/THRESHOLD vars added to
.env.example, docker-compose.yml, docker-standalone.sh.
Tests: 10 new tests — no model, no eligible entries, scoring, detection
creation, normal label suppression, threshold filtering, pattern-tag
filtering, idempotency, list filtering, scorer column filter.
416/416 passing.
Closes: #9
- Add app/services/anomaly.py: batch scorer using HF text-classification
pipeline; rewrites anomaly_score/anomaly_label/anomaly_scored_at on
log_entries; inserts high-confidence hits into detections table
- Add app/tasks/anomaly_scorer.py: background task (same shape as
glean_scheduler); triggered after each glean cycle when
TURNSTONE_ANOMALY_MODEL is set
- DB schema: add anomaly_score/anomaly_label/anomaly_scored_at columns to
log_entries (idempotent ALTER TABLE migration); add detections table
- Wire scorer into scheduler_loop and glean_scheduler.run_once; no-op when
model env var is empty (safe to leave unconfigured)
- REST endpoints: GET/POST /api/anomaly/status, /api/anomaly/run,
GET /api/anomaly/detections, POST /api/anomaly/detections/{id}/acknowledge
- Reuses Hybrid-BERT label map from diagnose/classifier.py; works with any
HF text-classification model
- 12 new tests; 406/406 passing
Closes: #10
- Add app/db/ abstraction layer: Backend enum, DbConn wrapper,
dialect helper (q() for ? vs %s paramstyle), get_conn(), tenant_id()
- Auto-detect backend from DATABASE_URL; SQLite remains default when
unset — no config change for local deployments
- Add tenant_id column to all three logical DBs (main, context, incidents);
idempotent ALTER TABLE migration runs before schema scripts on existing DBs
- All INSERTs inject tenant_id; SELECTs use (tenant_id = ? OR tenant_id = '')
for backward compat with pre-namespacing rows
- Add docker-compose.yml with named volume turnstone_pgdata (survives rebuilds)
and optional external Postgres support via DATABASE_URL override
- Add scripts/migrate_sqlite_to_postgres.py — one-shot idempotent migration
for existing SQLite data; ON CONFLICT DO NOTHING for safe re-runs
- Fix SSH glean path in pipeline.py to use ensure_schema + get_conn
(was still using raw sqlite3.connect + old _SCHEMA without tenant_id)
- Fix FTS5 JOIN ambiguity: qualify repeat_count as f.repeat_count in search
- Update all tests to use ensure_*_schema fixtures; add row_factory where needed
- 394/394 tests passing
Closes: #42
Closes: #50
source_ids with 3+ colon segments (e.g. muninn-journal:Muninn:ssh.service)
are now aggregated by their prefix:host key at the SQL level in both
list_sources() and stats_summary(). This collapses ~19K transient systemd
unit rows (crash-loop scope entries from Muninn) into ~24 grouped rows.
- list_sources: SQL CASE/INSTR group-by stem + unit_count field
- stats_summary: same stem grouping for dashboard source health table
- delete endpoint: LIKE-based cascade delete covers grouped stems
- SourcesView: unit_count badge (e.g. "2686 units") on grouped rows;
delete confirmation names the unit count when deleting a group
- Bump version to v0.6.1
Adds a domain: field to the pattern taxonomy and surfaces per-domain
hit counts in diagnose summaries for faster triage.
Changes:
- LogPattern gains domain: str = "" (backward-compatible default)
- load_patterns() reads domain from YAML via p.get("domain", "")
- All 42 patterns in default.yaml annotated across 10 domains:
service_health | networking | auth | storage | memory |
kernel | power | web_proxy | media | gpu
- _pattern_domain dict built at startup from compiled patterns
- _domain_counts() helper: maps matched_patterns tags to domains,
counts hits per domain across a result set
- diagnose POST: summary includes by_domain: {domain: count}
- diagnose stream: summary SSE event includes by_domain when
pattern_domain is provided (passed from rest.py at startup)
- /api/search gains ?domain= filter: post-filters results to entries
whose matched_patterns include at least one tag in the given domain
Test fixtures: patch _pattern_domain={} and CONTEXT_DB_PATH in
test_blocklist_endpoints.py and test_glean_tautulli.py (worktree
has no data/ dir; same fix as feat/60-incidents-db).
372 tests passing.
Closes: #32
Adds late-fusion hybrid search to Turnstone's log retrieval layer:
hybrid_score = 0.6 * bm25_normalized + 0.4 * cosine_similarity
Implementation:
- _bm25_search() extracts the existing FTS5 BM25 path as a named helper
- _hybrid_search() fetches an oversized BM25 candidate pool (5x limit,
min 100), embeds the query and each candidate text in-process via the
existing embeddings service, normalizes BM25 rank to [0,1], combines
with cosine similarity, and re-ranks
- search() gets semantic=False param that dispatches to _hybrid_search()
when True; pure BM25 remains the default for all existing call sites
- diagnose_stream() enables semantic=True so symptom-based queries
("database connection failed") surface semantically equivalent entries
("ECONNREFUSED", "backend gone away", "max retries exceeded")
- /api/search REST endpoint exposes ?semantic=true query param
Graceful degradation: falls back silently to pure BM25 when the embedding
backend is unavailable (EMBEDDING_AVAILABLE=False) or when embed_batch
raises an exception. No new infra — in-process numpy cosine, no vector DB.
11 new tests: BM25 helper, hybrid re-ranking, fallback paths, dispatcher.
372 + 11 = 383 tests passing.
Closes: #15
Adds _HYBRID_BERT_LABEL_MAP to translate the 7-class output vocabulary of
krishnas4415/log-anomaly-detection-models (Hybrid-BERT, MIT) to Turnstone
SeverityLabel. _map_label now checks the Hybrid-BERT map before the standard
map so either model family works via TURNSTONE_CLASSIFIER_MODEL without any
additional code path.
Mapping (confirmed from model config.json):
normal → INFO
security_anomaly → ERROR
system_failure → CRITICAL
performance_issue → WARN
network_anomaly → WARN
config_error → ERROR
hardware_issue → CRITICAL
Keyword-based CRITICAL promotion and low-confidence DEBUG demotion apply on
top of the base mapping (same rules as the standard vocabulary).
11 new tests covering all 7 Hybrid-BERT labels, case-insensitivity, and
regression on standard-vocabulary labels. 372 tests passing total.
Note: custom loading code for the non-standard .pt checkpoint format is
explicitly out of scope — evaluate better-packaged HF alternatives first
(see #41 for candidate list).
Closes: #41
FTS5 bulk-insert write locks starved the incident API and bundle endpoints
during log bursts (sonarr/radarr, high-volume docker sources). Fix mirrors
the context_facts split (context -> turnstone-context.db):
- Add INCIDENTS_DB_PATH / TURNSTONE_INCIDENTS_DB env var in rest.py
- Add _INCIDENTS_SCHEMA, ensure_incidents_schema(), and
migrate_incidents_to_dedicated_db() in glean/pipeline.py
- Stub out incidents/received_bundles/sent_bundles in _SCHEMA (no-op
CREATE IF NOT EXISTS) so legacy single-file deployments still open
- Thread incidents_db_path through diagnose_stream -> run_pipeline ->
FalsePositiveSuppressor.suppress -> _fetch_resolved_incidents
- One-shot migration on startup: copy existing rows from main DB to
incidents DB via INSERT OR IGNORE (idempotent, safe to re-run)
- Fix test_blocklist_endpoints fixtures to patch CONTEXT_DB_PATH and
INCIDENTS_DB_PATH alongside DB_PATH (worktree has no data/ dir)
372 tests passing.
Closes: #60
Truncation fix: call_llm() in _llm_client.py now accepts max_tokens (default
2048) and passes it in both the cf-orch task payload and the OpenAI-compat
fallback body. Hypothesizer uses max_tokens=1024 (JSON array output);
synthesizer and legacy summarize use 2048 (structured 5-section narrative).
Without this, backends use their own default (often 512 tokens), causing
mid-sentence truncation of the diagnosis output.
UI fix: reasoning card changed from bg-accent/5 border-accent/30 (opacity
modifiers on CSS variables don't compose reliably across themes) to the
callout pattern: bg-surface-raised with a solid border-l-4 border-accent.
Header label changed from text-text-dim to text-accent for visual anchoring.
Text remains text-text-primary for guaranteed contrast on both light and dark
themes.
Tracks: #56 (technical-level post-processor, filed as follow-on feature)
Watcher, REST endpoints, services (search, incidents, blocklist),
MCP server, context retriever, embedder, glean_scheduler, and
doc_upload all used the default 5-second SQLite busy timeout.
During collect glean write phases, watcher flush threads were hitting
'database is locked' errors when the glean held the write lock longer
than 5 seconds.
All connections now use timeout=30.0, matching the pipeline fix
from commit 5a9281a. No logic changes.
context_facts, context_documents, and context_chunks now live in
turnstone-context.db (sibling of turnstone.db). The glean scheduler
held write locks on the main DB long enough to cause 5-second timeout
failures on context fact inserts; separate files have independent WAL
write locks so they never contend.
Changes:
- pipeline.py: extract _CONTEXT_SCHEMA + ensure_context_schema()
- rest.py: CONTEXT_DB_PATH (TURNSTONE_CONTEXT_DB env var, defaults to
sibling file); init via ensure_context_schema(); all context routes
pass CONTEXT_DB_PATH; diagnose_stream receives context_db_path kwarg
- diagnose/__init__.py: diagnose_stream() accepts context_db_path
(falls back to db_path for backward compat); retrieve_context uses it
- store.py: sqlite3.connect() timeout=30.0 — Python driver retry loop
is independent of PRAGMA busy_timeout; needed for any remaining
contention during test or single-file deployments
Closes: #42
Reasoning models (e.g. foundation-sec-8b) emit valid JSON then repeat it
inside a markdown fence block. json.loads() fails on the combined text.
extract_first_json_array() scans for the first '[' and walks to its
matching ']' with proper string/escape/nesting handling, then returns
just that slice. Combined with strip_json_fences(), this handles all
observed output patterns:
- bare JSON array (standard models)
- fenced JSON array (fence-wrapping models)
- bare array followed by fenced repeat (reasoning models)
Makes the HuggingFace classifier model for Stage 2 configurable via
TURNSTONE_CLASSIFIER_MODEL. When unset (default), Stage 2 falls back
to pattern_tags then regex — no download required on first run.
Also documents TURNSTONE_MULTI_AGENT_DIAGNOSE, TURNSTONE_CLASSIFIER_MODEL,
TURNSTONE_EMBED_BACKEND/MODEL/DEVICE in .env.example.
- #33: Wrap ClassifiedTimeline.cluster_severities in MappingProxyType for
true immutability (frozen=True only blocks field reassignment, not dict
mutation).
- #34: Remove dead suppression branch in synthesizer._build_hypothesis_block.
active[] is already filtered to not rh.suppress, so the 'Yes — suppressed'
branch was unreachable. Now shows novelty score only.
- #35: Extract shared _llm_client.py with call_llm() + extract_content() +
strip_json_fences(). Both RootCauseHypothesizer and SummarySynthesizer
now import from one source. Also strips JSON fences from LLM output before
parsing in hypothesizer._parse_response.
- #36: Add per-stage try/except in pipeline.run_pipeline(). Unhandled
stage exceptions now emit {type: 'error'} + {type: 'done'} SSE events
instead of silently closing the stream.
- #37: Move format_context_block() call inside the legacy LLM branch in
diagnose/__init__.py — it was being computed unconditionally but only
used in the non-pipeline path.
- #38: Coerce supporting_cluster_ids items to str() in hypothesizer
_parse_response to guard against LLMs returning integers instead of
string cluster IDs.
Was suppressing when novelty_score < 0.85 (i.e. similarity > 0.15), which
would suppress nearly every hypothesis once embeddings are active.
Now suppresses when max_sim >= similarity_threshold (0.85), meaning only
hypotheses that are 85%+ similar to a resolved incident are suppressed.
Also renames suppress_threshold → similarity_threshold for clarity and
adds a borderline boundary test (0.85 suppressed, 0.84 not suppressed).
Closes: #29
- Implements FalsePositiveSuppressor using embedding cosine similarity
- Lazy corpus embedding via get_embedder() with module-level cache keyed by db_path
- Cache invalidated automatically when the resolved incident corpus changes
- Suppresses hypotheses with novelty_score below configurable threshold (default 0.85)
- Full fallback path (novelty=1.0, no suppression) when model_id empty, embedding
service unavailable, or no resolved incidents found in DB
- Graceful handling of missing incidents table and DB query failures
- Numpy bool_ leakage prevented by explicit float()/bool() coercion at assignment
- Pure-Python cosine fallback for environments without numpy
- 9 new tests (all mocked, no real model downloads): passthrough, suppress, no-suppress,
empty list, ranking, empty corpus, DB failure, service unavailable, cache invalidation
- 350 total tests passing (341 pre-existing + 9 new)
Closes: #29
- Add _coerce_float() module-level helper: catches TypeError/ValueError from
non-numeric LLM output (e.g. 'high', 'N/A') and returns a caller-supplied
default instead of raising.
- Replace float(item.get('confidence', 0.5)) with
_coerce_float(item.get('confidence'), 0.5) in _parse_response.
- Guard supporting_cluster_ids: tuple(item.get(...) or []) so a JSON null
from the LLM does not cause TypeError('NoneType is not iterable').
- runbook_refs is hardcoded as () and not sourced from LLM output; no change
needed there.
- Add test_non_numeric_confidence_uses_default (Test 10) to cover the 'high'
string case: asserts no exception and confidence == 0.5.
- 341 tests passing (+1).
Closes: #29
Three-path classification: ML (transformers pipeline, lazy singleton) →
pattern_tags (YAML pattern severity dict) → regex (detect_severity).
- Path A: HF text-classification pipeline loaded lazily on first classify()
call via module-level singleton; shim promotes ERROR+keyword hits to CRITICAL
and demotes low-confidence INFO to DEBUG.
- Path B: maps cluster.pattern_tags through the loaded pattern severity dict;
picks the highest severity across matching tags.
- Path C: falls back to detect_severity() regex scan on representative_text;
defaults to INFO when no keyword matches.
- Pattern file resolved from constructor arg or TURNSTONE_PATTERNS env var
(mirrors app/rest.py convention).
- No crash when transformers is not installed; ImportError on per-cluster ML
inference triggers clean per-cluster fallback to pattern_tags/regex.
- ClassifiedTimeline.classifier_used reflects the primary session path.
Tests (10 new, 328 total, all passing):
- ML ERROR, CRITICAL promotion, DEBUG demotion, WARNING→WARN
- pattern_tags resolution from YAML fixture
- regex ERROR detection and INFO default
- ImportError clean fallback
- empty timeline no-crash
- ClassifiedTimeline FrozenInstanceError on mutation
Closes: #29
- Add gap_significance_seconds constructor param (default 30) to replace hardcoded magic number in gap_count computation
- _parse_iso now returns datetime | None with try/except on ValueError; all callers handle None return by treating malformed timestamps as absent
- Extract reconstruct into four private helpers: _sort_entries, _group_into_raw_clusters, _build_cluster, _dominant_sources_tuple
- Promote _sort_key to module-level function (was nested inside reconstruct)
- Rename old module-level _build_cluster to _make_event_cluster to avoid name collision with new instance method
- Add explanatory comment to type: ignore[arg-type] at _highest_severity call site
- Black-formatted
- Move app/services/diagnose.py verbatim to app/services/diagnose/legacy.py
- Create app/services/diagnose/__init__.py with full implementation so that
patch('app.services.diagnose._HAS_DATEPARSER') targets the correct namespace
and all 303 existing tests continue to pass without modification
- Add app/services/diagnose/models.py with 5 pipeline dataclasses:
EventCluster, TimelineResult, ClassifiedTimeline, Hypothesis, RankedHypothesis
- Add app/services/diagnose/pipeline.py with run_pipeline() stub (Task 6)
- Add MULTI_AGENT_ENABLED feature flag (off by default via env var)
- Zero behavior change; ruff clean
Closes: #29
- New app/services/embeddings.py: TURNSTONE_EMBED_* env vars, multi-backend support
- embedder.py delegates to service layer; re-exports EMBEDDING_AVAILABLE for compat
- retriever.py updated to use service layer
- Test coverage updated in tests/context/test_embedder.py
Add blocklist candidate listing, scan trigger, status update,
push/unblock to Pi-hole, and connection test endpoints.
Add pihole_url/version/api_key and router_source_ids/device_names
fields to SettingsBody and prefs handling in patch_settings.
Add PiholeClient.__post_init__ validation so 503 fires naturally
when url/api_key are unconfigured (mock-safe: bypassed in tests).
PiholeClient dataclass supporting both Pi-hole v5 (PHP /admin/api.php)
and v6 (REST /api/) with public block/unblock/test_connection methods.
9 tests covering both API versions, auth flow, and error handling.
Adds patterns/telemetry.yaml with 6 rule groups (samsung, belkin, roku, lg, amazon, advertising).
Adds app/services/blocklist.py with TelemetryRule and BlocklistCandidate dataclasses, load_telemetry_rules(), and matches_telemetry() with exact and subdomain matching.
6 new TestTelemetry tests pass; 199 total passing.