turnstone/app/services
pyr0ball 3155bde4ce feat: hybrid BM25 + vector re-ranking for diagnose search (#15)
Adds late-fusion hybrid search to Turnstone's log retrieval layer:

  hybrid_score = 0.6 * bm25_normalized + 0.4 * cosine_similarity

Implementation:
- _bm25_search() extracts the existing FTS5 BM25 path as a named helper
- _hybrid_search() fetches an oversized BM25 candidate pool (5x limit,
  min 100), embeds the query and each candidate text in-process via the
  existing embeddings service, normalizes BM25 rank to [0,1], combines
  with cosine similarity, and re-ranks
- search() gets semantic=False param that dispatches to _hybrid_search()
  when True; pure BM25 remains the default for all existing call sites
- diagnose_stream() enables semantic=True so symptom-based queries
  ("database connection failed") surface semantically equivalent entries
  ("ECONNREFUSED", "backend gone away", "max retries exceeded")
- /api/search REST endpoint exposes ?semantic=true query param

Graceful degradation: falls back silently to pure BM25 when the embedding
backend is unavailable (EMBEDDING_AVAILABLE=False) or when embed_batch
raises an exception. No new infra — in-process numpy cosine, no vector DB.

11 new tests: BM25 helper, hybrid re-ranking, fallback paths, dispatcher.
372 + 11 = 383 tests passing.

Closes: #15
2026-06-01 18:13:09 -07:00
..
diagnose feat: hybrid BM25 + vector re-ranking for diagnose search (#15) 2026-06-01 18:13:09 -07:00
__init__.py feat: initial Turnstone POC — ingest, FTS search, MCP server 2026-05-08 12:12:34 -07:00
blocklist.py fix(db): add timeout=30s to all sqlite3.connect() calls across app 2026-05-26 23:12:48 -07:00
discover.py feat: bundle PII sanitization, onboarding wizard, NL source addition (#51, #52, #53) 2026-05-29 14:14:28 -07:00
embeddings.py refactor: extract embeddings service layer — decouple context embedder from Ollama 2026-05-25 11:01:25 -07:00
incidents.py feat: bundle PII sanitization, onboarding wizard, NL source addition (#51, #52, #53) 2026-05-29 14:14:28 -07:00
llm.py fix(diagnose): add max_tokens to all LLM calls; fix reasoning card contrast 2026-05-27 22:23:36 -07:00
models.py feat: bundle PII sanitization, onboarding wizard, NL source addition (#51, #52, #53) 2026-05-29 14:14:28 -07:00
nl_source.py feat: bundle PII sanitization, onboarding wizard, NL source addition (#51, #52, #53) 2026-05-29 14:14:28 -07:00
pihole.py feat(blocklist): 6 REST endpoints + Pi-hole settings fields 2026-05-15 21:15:09 -07:00
search.py feat: hybrid BM25 + vector re-ranking for diagnose search (#15) 2026-06-01 18:13:09 -07:00