turnstone/tests
pyr0ball 1abdcfb1f3 feat: hybrid BM25 + vector re-ranking for diagnose search (#15)
Adds late-fusion hybrid search to Turnstone's log retrieval layer:

  hybrid_score = 0.6 * bm25_normalized + 0.4 * cosine_similarity

Implementation:
- _bm25_search() extracts the existing FTS5 BM25 path as a named helper
- _hybrid_search() fetches an oversized BM25 candidate pool (5x limit,
  min 100), embeds the query and each candidate text in-process via the
  existing embeddings service, normalizes BM25 rank to [0,1], combines
  with cosine similarity, and re-ranks
- search() gets semantic=False param that dispatches to _hybrid_search()
  when True; pure BM25 remains the default for all existing call sites
- diagnose_stream() enables semantic=True so symptom-based queries
  ("database connection failed") surface semantically equivalent entries
  ("ECONNREFUSED", "backend gone away", "max retries exceeded")
- /api/search REST endpoint exposes ?semantic=true query param

Graceful degradation: falls back silently to pure BM25 when the embedding
backend is unavailable (EMBEDDING_AVAILABLE=False) or when embed_batch
raises an exception. No new infra — in-process numpy cosine, no vector DB.

11 new tests: BM25 helper, hybrid re-ranking, fallback paths, dispatcher.
372 + 11 = 383 tests passing.

Closes: #15
2026-06-01 18:13:09 -07:00
..
context refactor: extract embeddings service layer — decouple context embedder from Ollama 2026-05-25 11:01:25 -07:00
__init__.py feat: initial Turnstone POC — ingest, FTS search, MCP server 2026-05-08 12:12:34 -07:00
test_blocklist_endpoints.py refactor: rename ingest → glean throughout codebase 2026-05-20 23:02:55 -07:00
test_diagnose_classifier.py feat: Stage 2 — SeverityClassifier for multi-agent diagnose pipeline (issue #29) 2026-05-25 13:27:17 -07:00
test_diagnose_hypothesizer.py fix: defensive coercion for LLM confidence and cluster fields in hypothesizer 2026-05-25 14:00:30 -07:00
test_diagnose_pipeline.py feat: Stage 5 synthesizer + pipeline orchestrator + feature flag wiring (issue #29) 2026-05-25 14:56:25 -07:00
test_diagnose_suppressor.py fix: invert suppress_threshold semantics to similarity_threshold in FalsePositiveSuppressor 2026-05-25 18:58:52 -07:00
test_diagnose_synthesizer.py feat: Stage 5 synthesizer + pipeline orchestrator + feature flag wiring (issue #29) 2026-05-25 14:56:25 -07:00
test_diagnose_timeline.py feat: Stage 1 — TimelineReconstructor for multi-agent diagnose pipeline (issue #29) 2026-05-25 12:54:15 -07:00
test_export_corpus.py feat: periodic corpus export — push ERROR/CRITICAL entries and incidents to Avocet 2026-05-11 17:08:35 -07:00
test_glean_dmesg.py refactor: rename ingest → glean throughout codebase 2026-05-20 23:02:55 -07:00
test_glean_fingerprint.py feat: fingerprint-based incremental glean — skip unchanged files (#30) 2026-05-25 11:01:18 -07:00
test_glean_pipeline_ssh.py feat: SSH remote host glean — transport layer and pipeline integration (closes #22, backend) 2026-05-20 23:03:13 -07:00
test_glean_qbittorrent.py refactor: rename ingest → glean throughout codebase 2026-05-20 23:02:55 -07:00
test_glean_ssh.py feat: SSH remote host glean — transport layer and pipeline integration (closes #22, backend) 2026-05-20 23:03:13 -07:00
test_glean_syslog.py refactor: rename ingest → glean throughout codebase 2026-05-20 23:02:55 -07:00
test_glean_tautulli.py refactor: rename ingest → glean throughout codebase 2026-05-20 23:02:55 -07:00
test_glean_wazuh.py refactor: rename ingest → glean throughout codebase 2026-05-20 23:02:55 -07:00
test_hybrid_search.py feat: hybrid BM25 + vector re-ranking for diagnose search (#15) 2026-06-01 18:13:09 -07:00
test_service_blocklist.py refactor: rename ingest → glean throughout codebase 2026-05-20 23:02:55 -07:00
test_service_pihole.py fix(blocklist): validate _v6_auth session JSON, add auth-failure test 2026-05-15 21:03:03 -07:00
test_services_diagnose.py refactor: rename ingest → glean throughout codebase 2026-05-20 23:02:55 -07:00
test_services_llm.py feat: switch LLM backend to OpenAI-compat; add cf-orch remote inference support 2026-05-12 12:58:38 -07:00
test_watch_watcher.py feat: add file tail source type; configure example-node watchers 2026-05-11 15:44:10 -07:00