feat: hybrid BM25 + vector re-ranking for diagnose search (#15) #63
No reviewers
Labels
No labels
compliance
demo
deployment
docs
enhancement
parser
patterns
performance
security
ux
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: Circuit-Forge/turnstone#63
Loading…
Reference in a new issue
No description provided.
Delete branch "feat/15-hybrid-rag"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
_hybrid_search()tosearch.py: late-fusion BM25 + cosine re-ranking with alpha=0.6 / beta=0.4 weightssearch(semantic=True)dispatches to hybrid path; pure BM25 remains the defaultdiagnose_stream()enables semantic=True so symptom queries surface semantically equivalent entries (ECONNREFUSED, backend gone away, max retries exceeded)/api/search?semantic=trueexposes the hybrid path on the REST endpointTest plan
?semantic=trueon /api/search works with BAAI/bge-small-en-v1.5 loadedAdds late-fusion hybrid search to Turnstone's log retrieval layer: hybrid_score = 0.6 * bm25_normalized + 0.4 * cosine_similarity Implementation: - _bm25_search() extracts the existing FTS5 BM25 path as a named helper - _hybrid_search() fetches an oversized BM25 candidate pool (5x limit, min 100), embeds the query and each candidate text in-process via the existing embeddings service, normalizes BM25 rank to [0,1], combines with cosine similarity, and re-ranks - search() gets semantic=False param that dispatches to _hybrid_search() when True; pure BM25 remains the default for all existing call sites - diagnose_stream() enables semantic=True so symptom-based queries ("database connection failed") surface semantically equivalent entries ("ECONNREFUSED", "backend gone away", "max retries exceeded") - /api/search REST endpoint exposes ?semantic=true query param Graceful degradation: falls back silently to pure BM25 when the embedding backend is unavailable (EMBEDDING_AVAILABLE=False) or when embed_batch raises an exception. No new infra — in-process numpy cosine, no vector DB. 11 new tests: BM25 helper, hybrid re-ranking, fallback paths, dispatcher. 372 + 11 = 383 tests passing. Closes: #15Pull request closed