FTS5 bulk-insert write locks starved the incident API and bundle endpoints
during log bursts (sonarr/radarr, high-volume docker sources). Fix mirrors
the context_facts split (context -> turnstone-context.db):
- Add INCIDENTS_DB_PATH / TURNSTONE_INCIDENTS_DB env var in rest.py
- Add _INCIDENTS_SCHEMA, ensure_incidents_schema(), and
migrate_incidents_to_dedicated_db() in glean/pipeline.py
- Stub out incidents/received_bundles/sent_bundles in _SCHEMA (no-op
CREATE IF NOT EXISTS) so legacy single-file deployments still open
- Thread incidents_db_path through diagnose_stream -> run_pipeline ->
FalsePositiveSuppressor.suppress -> _fetch_resolved_incidents
- One-shot migration on startup: copy existing rows from main DB to
incidents DB via INSERT OR IGNORE (idempotent, safe to re-run)
- Fix test_blocklist_endpoints fixtures to patch CONTEXT_DB_PATH and
INCIDENTS_DB_PATH alongside DB_PATH (worktree has no data/ dir)
372 tests passing.
Closes: #60
Watcher, REST endpoints, services (search, incidents, blocklist),
MCP server, context retriever, embedder, glean_scheduler, and
doc_upload all used the default 5-second SQLite busy timeout.
During collect glean write phases, watcher flush threads were hitting
'database is locked' errors when the glean held the write lock longer
than 5 seconds.
All connections now use timeout=30.0, matching the pipeline fix
from commit 6882248. No logic changes.
Was suppressing when novelty_score < 0.85 (i.e. similarity > 0.15), which
would suppress nearly every hypothesis once embeddings are active.
Now suppresses when max_sim >= similarity_threshold (0.85), meaning only
hypotheses that are 85%+ similar to a resolved incident are suppressed.
Also renames suppress_threshold → similarity_threshold for clarity and
adds a borderline boundary test (0.85 suppressed, 0.84 not suppressed).
Closes: #29
- Implements FalsePositiveSuppressor using embedding cosine similarity
- Lazy corpus embedding via get_embedder() with module-level cache keyed by db_path
- Cache invalidated automatically when the resolved incident corpus changes
- Suppresses hypotheses with novelty_score below configurable threshold (default 0.85)
- Full fallback path (novelty=1.0, no suppression) when model_id empty, embedding
service unavailable, or no resolved incidents found in DB
- Graceful handling of missing incidents table and DB query failures
- Numpy bool_ leakage prevented by explicit float()/bool() coercion at assignment
- Pure-Python cosine fallback for environments without numpy
- 9 new tests (all mocked, no real model downloads): passthrough, suppress, no-suppress,
empty list, ranking, empty corpus, DB failure, service unavailable, cache invalidation
- 350 total tests passing (341 pre-existing + 9 new)
Closes: #29