- Implements FalsePositiveSuppressor using embedding cosine similarity
- Lazy corpus embedding via get_embedder() with module-level cache keyed by db_path
- Cache invalidated automatically when the resolved incident corpus changes
- Suppresses hypotheses with novelty_score below configurable threshold (default 0.85)
- Full fallback path (novelty=1.0, no suppression) when model_id empty, embedding
service unavailable, or no resolved incidents found in DB
- Graceful handling of missing incidents table and DB query failures
- Numpy bool_ leakage prevented by explicit float()/bool() coercion at assignment
- Pure-Python cosine fallback for environments without numpy
- 9 new tests (all mocked, no real model downloads): passthrough, suppress, no-suppress,
empty list, ranking, empty corpus, DB failure, service unavailable, cache invalidation
- 350 total tests passing (341 pre-existing + 9 new)
Closes: #29
- Add _coerce_float() module-level helper: catches TypeError/ValueError from
non-numeric LLM output (e.g. 'high', 'N/A') and returns a caller-supplied
default instead of raising.
- Replace float(item.get('confidence', 0.5)) with
_coerce_float(item.get('confidence'), 0.5) in _parse_response.
- Guard supporting_cluster_ids: tuple(item.get(...) or []) so a JSON null
from the LLM does not cause TypeError('NoneType is not iterable').
- runbook_refs is hardcoded as () and not sourced from LLM output; no change
needed there.
- Add test_non_numeric_confidence_uses_default (Test 10) to cover the 'high'
string case: asserts no exception and confidence == 0.5.
- 341 tests passing (+1).
Closes: #29
Three-path classification: ML (transformers pipeline, lazy singleton) →
pattern_tags (YAML pattern severity dict) → regex (detect_severity).
- Path A: HF text-classification pipeline loaded lazily on first classify()
call via module-level singleton; shim promotes ERROR+keyword hits to CRITICAL
and demotes low-confidence INFO to DEBUG.
- Path B: maps cluster.pattern_tags through the loaded pattern severity dict;
picks the highest severity across matching tags.
- Path C: falls back to detect_severity() regex scan on representative_text;
defaults to INFO when no keyword matches.
- Pattern file resolved from constructor arg or TURNSTONE_PATTERNS env var
(mirrors app/rest.py convention).
- No crash when transformers is not installed; ImportError on per-cluster ML
inference triggers clean per-cluster fallback to pattern_tags/regex.
- ClassifiedTimeline.classifier_used reflects the primary session path.
Tests (10 new, 328 total, all passing):
- ML ERROR, CRITICAL promotion, DEBUG demotion, WARNING→WARN
- pattern_tags resolution from YAML fixture
- regex ERROR detection and INFO default
- ImportError clean fallback
- empty timeline no-crash
- ClassifiedTimeline FrozenInstanceError on mutation
Closes: #29
- Add gap_significance_seconds constructor param (default 30) to replace hardcoded magic number in gap_count computation
- _parse_iso now returns datetime | None with try/except on ValueError; all callers handle None return by treating malformed timestamps as absent
- Extract reconstruct into four private helpers: _sort_entries, _group_into_raw_clusters, _build_cluster, _dominant_sources_tuple
- Promote _sort_key to module-level function (was nested inside reconstruct)
- Rename old module-level _build_cluster to _make_event_cluster to avoid name collision with new instance method
- Add explanatory comment to type: ignore[arg-type] at _highest_severity call site
- Black-formatted
- Move app/services/diagnose.py verbatim to app/services/diagnose/legacy.py
- Create app/services/diagnose/__init__.py with full implementation so that
patch('app.services.diagnose._HAS_DATEPARSER') targets the correct namespace
and all 303 existing tests continue to pass without modification
- Add app/services/diagnose/models.py with 5 pipeline dataclasses:
EventCluster, TimelineResult, ClassifiedTimeline, Hypothesis, RankedHypothesis
- Add app/services/diagnose/pipeline.py with run_pipeline() stub (Task 6)
- Add MULTI_AGENT_ENABLED feature flag (off by default via env var)
- Zero behavior change; ruff clean
Closes: #29
- New app/services/embeddings.py: TURNSTONE_EMBED_* env vars, multi-backend support
- embedder.py delegates to service layer; re-exports EMBEDDING_AVAILABLE for compat
- retriever.py updated to use service layer
- Test coverage updated in tests/context/test_embedder.py
Closes turnstone#22.
## Transport layer (app/glean/ssh.py)
- SSHTransport context manager: key-only auth, paramiko backend
- SSHConnectionError / SSHCommandError exception hierarchy
- exec_stream() generator: yields stdout lines, raises SSHCommandError on
non-zero exit (isinstance(int) guard for test-mock safety)
- Command builders: _build_journald_command, _build_syslog_command,
_build_plaintext_command, _build_docker_command
- 18 unit tests in tests/test_glean_ssh.py
## Pipeline integration (app/glean/pipeline.py)
- _stream_and_write(): per-item error isolation — SSHCommandError skips
one glean item without aborting the rest of the host connection
- _glean_ssh_source(): one SSHTransport per host, dispatches all glean
items (journald/syslog/plaintext/docker); SSHConnectionError aborts host
- glean_sources(): splits local vs SSH sources; local → _glean_files();
SSH → _glean_ssh_source(); shared compiled patterns and DB connection
- glean_ssh_source(): public wrapper for REST use — manages DB connection,
pattern compilation, FTS rebuild lifecycle
- 15 integration tests in tests/test_glean_pipeline_ssh.py
- All 285 tests passing
## REST layer (app/rest.py)
- GET /api/sources/configured: reads sources.yaml and enriches with DB
stats; SSH sources appear before first glean (entry_count=0); sub-source
IDs (rack01/journald, rack01/docker/myapp) aggregated per host entry
- POST /api/sources/{id}/glean: detects transport:ssh and dispatches to
glean_ssh_source() wrapper; local sources unchanged
- Import: glean_ssh_source as _glean_ssh_source
## Frontend (web/src/views/SourcesView.vue)
- Fetches /api/sources/configured (primary) + /api/sources (DB-only) in
parallel; merges into unified SourceRow list
- SSH sources show: ssh badge (with user@host tooltip), glean-type pills
(journald/syslog/docker/etc.), host subtitle
- SSH sub-source IDs (rack01/journald) suppressed from the DB-only list
since they are covered by the parent SSH row
- DB-only sources (uploads) appear below configured sources with 'uploaded'
badge; reglean button disabled (not in sources.yaml)
- Delete zeroes out configured-source stats in-place rather than removing
the row (so the source remains visible for re-gleaning)
Adds SSH-based log collection from remote hosts via Paramiko.
One SSH connection per host, multiple log types per connection.
New files:
- app/glean/ssh.py: SSHTransport context manager + command builders
for journald, syslog, plaintext, and docker log types
- tests/test_glean_ssh.py: 18 tests for transport layer (all mocked)
- tests/test_glean_pipeline_ssh.py: 15 tests for pipeline integration
Pipeline changes (app/glean/pipeline.py):
- glean_sources() now splits sources into local-file and SSH categories
- SSH sources use transport: ssh + glean: list schema in sources.yaml
- _glean_ssh_source(): one SSHTransport per host, N commands per connection
- _stream_and_write(): SSHCommandError caught per-item so one bad
command does not abort the rest of the host's glean items
- SSHConnectionError skips the entire host with a warning log
SSH source schema (sources.yaml):
- id: rack01
transport: ssh
host: 192.168.1.10
user: admin
key_path: ~/.ssh/id_ed25519
glean:
- type: journald
args: [--since, 2 hours ago]
- type: syslog
path: /var/log/syslog
- type: plaintext
path: /var/log/app/error.log
- type: docker
containers: [myapp, nginx]
Key design decisions:
- Key-based auth only (no password prompts in daemon context)
- exit-status check fires after all stdout lines yielded; callers
drain the iterator to trigger it
- Local file sources path unchanged; SSH sources co-exist in same yaml
- Docker multi-container: one exec_stream call per container,
source_id scoped as host_id/type/container_name
Remaining for #22: REST endpoint, SourcesView UI, sources.yaml docs.
285 → 285 tests passing (33 new SSH tests).
overflow-hidden and overflow-x-auto on the same element conflict in Tailwind's
CSS generation order. The shorthand overflow:hidden can override overflow-x:auto,
clipping the rightmost column (diagnose buttons). Fix: outer div keeps
overflow-hidden for rounded corners, inner div handles overflow-x-auto scrolling.
- App: hamburger menu on mobile, nav links hidden below md breakpoint
- LogSearch: collapsible sidebar on mobile, stacks above results vertically
- Incidents/Sources: overflow-x-auto on table containers, min-w to preserve
column layout on desktop; drawer action buttons flex-wrap on small screens
- Bundles: flex-wrap on header row, hide source_host + timestamp below sm
- General: p-4 sm:p-6 padding on all standard views
Add blocklist candidate listing, scan trigger, status update,
push/unblock to Pi-hole, and connection test endpoints.
Add pihole_url/version/api_key and router_source_ids/device_names
fields to SettingsBody and prefs handling in patch_settings.
Add PiholeClient.__post_init__ validation so 503 fires naturally
when url/api_key are unconfigured (mock-safe: bypassed in tests).
PiholeClient dataclass supporting both Pi-hole v5 (PHP /admin/api.php)
and v6 (REST /api/) with public block/unblock/test_connection methods.
9 tests covering both API versions, auth flow, and error handling.
Adds patterns/telemetry.yaml with 6 rule groups (samsung, belkin, roku, lg, amazon, advertising).
Adds app/services/blocklist.py with TelemetryRule and BlocklistCandidate dataclasses, load_telemetry_rules(), and matches_telemetry() with exact and subdomain matching.
6 new TestTelemetry tests pass; 199 total passing.
Add blocklist_candidates table and indexes to _SCHEMA in pipeline.py.
Add TestSchema tests verifying table existence, column set, and status/hit_count defaults.
All 193 tests pass.
POST /turnstone/api/ingest/tautulli accepts Tautulli notification agent
payloads and stores them as log_entries under source 'tautulli'. Severity
maps error->CRITICAL, buffer->WARN, all others->None. Optional bearer token
auth via X-Tautulli-Token header + tautulli_token pref. FTS index rebuilt
as a background task after each write. 28 new tests, all passing.
The relative-time regex only matched digits between 'last/past' and
the unit, so 'last few hours' fell through to dateparser which then
found the bare word 'hours' and resolved it as midnight local time.
Extended the regex to capture 'few', 'couple of', 'several', 'a few'
as approximate quantifiers, mapped to 3 units each. Numeric expressions
and bare 'last hour' still work as before.
All five parsers (plex, syslog, servarr, qbittorrent, plaintext) were
using .replace(tzinfo=timezone.utc) on naive datetimes parsed from log
files, which slaps a UTC label on what is actually local-time data.
On a UTC-7 system a 2pm entry was stored as 14:00Z instead of 21:00Z,
causing time-window searches to return zero results.
Fix: use .astimezone(timezone.utc) instead, which treats the naive
datetime as local time and converts correctly.
Tests updated to round-trip back to local time for assertion so they
pass on any timezone, not just UTC.
- wizard.py: wrap syslog_port int() in try/except to default 514 on non-numeric input
- ContextView: add try/catch to doDelete, doDeleteFact, addFact for network errors
- ContextView: arrow-key navigation for tablist (ArrowLeft/ArrowRight)
- DiagnoseView: arrow-key navigation for tablist (ArrowLeft/ArrowRight)
- WizardOverlay: reset current_step to last schema step when clicking 'Go back and edit'
- WizardOverlay: focus trap on Tab/Shift+Tab within dialog element
Adds /context route with tabbed UI for managing uploaded documents and
manually-entered environment facts. Includes inline confirm-before-delete,
add-fact form with category/key/value fields, wizard CTA panel, and
stub components for DocUploadZone and WizardOverlay (Task 14).