- Add app/db/ abstraction layer: Backend enum, DbConn wrapper,
dialect helper (q() for ? vs %s paramstyle), get_conn(), tenant_id()
- Auto-detect backend from DATABASE_URL; SQLite remains default when
unset — no config change for local deployments
- Add tenant_id column to all three logical DBs (main, context, incidents);
idempotent ALTER TABLE migration runs before schema scripts on existing DBs
- All INSERTs inject tenant_id; SELECTs use (tenant_id = ? OR tenant_id = '')
for backward compat with pre-namespacing rows
- Add docker-compose.yml with named volume turnstone_pgdata (survives rebuilds)
and optional external Postgres support via DATABASE_URL override
- Add scripts/migrate_sqlite_to_postgres.py — one-shot idempotent migration
for existing SQLite data; ON CONFLICT DO NOTHING for safe re-runs
- Fix SSH glean path in pipeline.py to use ensure_schema + get_conn
(was still using raw sqlite3.connect + old _SCHEMA without tenant_id)
- Fix FTS5 JOIN ambiguity: qualify repeat_count as f.repeat_count in search
- Update all tests to use ensure_*_schema fixtures; add row_factory where needed
- 394/394 tests passing
Closes: #42
Closes: #50
Adds late-fusion hybrid search to Turnstone's log retrieval layer:
hybrid_score = 0.6 * bm25_normalized + 0.4 * cosine_similarity
Implementation:
- _bm25_search() extracts the existing FTS5 BM25 path as a named helper
- _hybrid_search() fetches an oversized BM25 candidate pool (5x limit,
min 100), embeds the query and each candidate text in-process via the
existing embeddings service, normalizes BM25 rank to [0,1], combines
with cosine similarity, and re-ranks
- search() gets semantic=False param that dispatches to _hybrid_search()
when True; pure BM25 remains the default for all existing call sites
- diagnose_stream() enables semantic=True so symptom-based queries
("database connection failed") surface semantically equivalent entries
("ECONNREFUSED", "backend gone away", "max retries exceeded")
- /api/search REST endpoint exposes ?semantic=true query param
Graceful degradation: falls back silently to pure BM25 when the embedding
backend is unavailable (EMBEDDING_AVAILABLE=False) or when embed_batch
raises an exception. No new infra — in-process numpy cosine, no vector DB.
11 new tests: BM25 helper, hybrid re-ranking, fallback paths, dispatcher.
372 + 11 = 383 tests passing.
Closes: #15