Hybrid Retrieval: Combining Semantic and Keyword Search for RAG

Master hybrid retrieval for RAG — combining BM25 and vector search with RRF fusion, score normalization, and implementation in Weaviate, Qdrant, and LangChain.

Hybrid Retrieval: Getting the Best of Both Search Worlds

If you’ve ever wondered why production RAG systems at companies like Microsoft, Elastic, and Weaviate consistently outperform simple vector-only approaches, hybrid retrieval is a big part of the answer.

The core idea: run both a semantic (dense) search and a keyword (sparse/BM25) search on the same query, then combine the ranked result lists into a single unified ranking. The combined ranking captures both the semantic intent that vector search is good at and the exact vocabulary matching that keyword search excels at.

Why Hybrid Works

Dense and sparse retrieval have complementary failure modes:

Query: "What is the return policy for defective iPhone 15 Pro Max units?"
Semantic search strengths:
✓ Understands "return policy" semantics (warranty, refund, replacement)
✓ Captures intent beyond exact words
✗ May not match "iPhone 15 Pro Max" precisely (product name)
Keyword (BM25) strengths:
✓ Exact match on "iPhone 15 Pro Max" and "return policy"
✓ Handles precise product names and model numbers
✗ Misses synonyms ("defective" vs "broken" vs "faulty")
Hybrid:
✓ Both exact product match AND semantic understanding
✓ Best of both worlds

Empirically, hybrid search outperforms either approach alone on virtually every BEIR benchmark task, with improvement margins ranging from 5–25% NDCG@10.

Reciprocal Rank Fusion (RRF)

The most widely used fusion method is Reciprocal Rank Fusion (RRF), introduced by Cormack et al. (2009). It’s robust, parameter-free, and works well in practice.

RRF score for a document d:
RRF(d) = Σ 1 / (k + rank(d, result_list))
for each result list
k = 60 (constant that prevents high ranks from dominating)
Example with 2 result lists (semantic + keyword):
Document A: rank 1 in semantic, rank 5 in keyword
→ 1/(60+1) + 1/(60+5) = 0.01639 + 0.01538 = 0.03177
Document B: rank 3 in semantic, rank 2 in keyword
→ 1/(60+3) + 1/(60+2) = 0.01587 + 0.01613 = 0.03200
Document C: rank 2 in semantic, rank 50 in keyword
→ 1/(60+2) + 1/(60+50) = 0.01613 + 0.00909 = 0.02522
Final ranking: B (0.032) > A (0.0318) > C (0.025)

RRF’s k=60 parameter smooths over rank differences — a document ranked 1st isn’t 60× better than a document ranked 60th. Documents appearing in multiple lists get combined scores, rewarding consistent presence across retrieval methods.

Python Implementation of RRF

from collections import defaultdict
def reciprocal_rank_fusion(
result_lists: list[list[str]], # each list is doc IDs in rank order
k: int = 60,
) -> list[tuple[str, float]]:
scores = defaultdict(float)
for result_list in result_lists:
for rank, doc_id in enumerate(result_list, start=1):
scores[doc_id] += 1.0 / (k + rank)
# Sort by descending RRF score
return sorted(scores.items(), key=lambda x: x[1], reverse=True)
# Example usage
semantic_results = ["doc_42", "doc_7", "doc_891", "doc_3", "doc_55"]
bm25_results = ["doc_7", "doc_42", "doc_233", "doc_891", "doc_91"]
fused_results = reciprocal_rank_fusion([semantic_results, bm25_results])
# → [("doc_42", ...), ("doc_7", ...), ("doc_891", ...), ...]

Score Normalization (Alternative to RRF)

Instead of using ranks, you can normalize similarity scores to [0,1] and combine them with a weighted average:

def score_normalization_fusion(
semantic_results: list[tuple[str, float]],
keyword_results: list[tuple[str, float]],
alpha: float = 0.5, # weight for semantic (0=all keyword, 1=all semantic)
) -> list[tuple[str, float]]:
# Normalize each score list to [0, 1]
def normalize(results):
if not results:
return {}
min_score = min(s for _, s in results)
max_score = max(s for _, s in results)
if max_score == min_score:
return {doc_id: 1.0 for doc_id, _ in results}
return {
doc_id: (score - min_score) / (max_score - min_score)
for doc_id, score in results
}
norm_sem = normalize(semantic_results)
norm_kw = normalize(keyword_results)
all_docs = set(norm_sem.keys()) | set(norm_kw.keys())
combined = {
doc: alpha * norm_sem.get(doc, 0) + (1 - alpha) * norm_kw.get(doc, 0)
for doc in all_docs
}
return sorted(combined.items(), key=lambda x: x[1], reverse=True)

Score normalization lets you tune the alpha weight to favor semantic or keyword search based on your query distribution. For general RAG, alpha=0.7 (more semantic) works well.

Implementing Hybrid Search with Weaviate

Weaviate has the most mature native hybrid search support:

import weaviate
import weaviate.classes.query as wq
client = weaviate.connect_to_local()
collection = client.collections.get("Documents")
# Hybrid search: combines BM25 + vector search internally with RRF
results = collection.query.hybrid(
query="transformer attention mechanism", # used for both BM25 and vector
vector=query_embedding, # optional: pre-computed embedding
alpha=0.5, # 0=pure BM25, 1=pure vector
fusion_type=wq.HybridFusion.RELATIVE_SCORE, # or RANKED (RRF)
limit=10,
return_metadata=wq.MetadataQuery(score=True, explain_score=True),
)
for r in results.objects:
print(r.properties["text"][:100], "Score:", r.metadata.score)

Implementing Hybrid Search with Qdrant

Qdrant supports hybrid search through sparse + dense vector combination:

from qdrant_client import QdrantClient
from qdrant_client.models import SparseVector, NamedSparseVector, NamedVector
# Query requires both dense and sparse vectors
results = client.query_points(
collection_name="documents",
prefetch=[
# Dense (semantic) retrieval
{"query": dense_embedding, "using": "dense", "limit": 50},
# Sparse (BM25-like) retrieval
{"query": SparseVector(indices=sparse_indices, values=sparse_values),
"using": "sparse", "limit": 50},
],
query={"fusion": "rrf"}, # merge with RRF
limit=10,
)

Hybrid Search in LangChain

from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever
from langchain_community.vectorstores import FAISS
# Build both retrievers
bm25_retriever = BM25Retriever.from_documents(documents, k=10)
faiss_retriever = FAISS.from_documents(
documents, embeddings
).as_retriever(search_kwargs={"k": 10})
# Combine with equal weights
ensemble_retriever = EnsembleRetriever(
retrievers=[bm25_retriever, faiss_retriever],
weights=[0.4, 0.6], # 40% BM25, 60% semantic
)
results = ensemble_retriever.invoke("What is BERT?")

Tuning the Alpha Parameter

The alpha (semantic weight) isn’t universal. Your query distribution determines the optimal value:

Query TypeRecommended Alpha
Factual Q&A with named entities0.3–0.5 (favor keyword)
Conceptual / open-ended questions0.7–0.8 (favor semantic)
Code search0.2–0.4 (keyword often wins)
General enterprise RAG0.5–0.6 (balanced)
Multi-lingual queries0.7–0.9 (semantic scales cross-language)

Build a test query set covering different query types, then tune alpha using NDCG@10 on each subset.

2025 Trend: Learned Hybrid Weights

Instead of a fixed alpha, some systems train a lightweight model to predict the optimal weight per query. The model takes the query as input and outputs the fusion weight, effectively doing “meta-retrieval” — deciding how to retrieve before retrieving. Early results show 5–10% improvement over best-fixed-alpha baselines.

Hybrid retrieval is the pragmatic upgrade from pure semantic search. Add BM25 to your RAG pipeline — it’s cheap, well-supported, and consistently improves recall across query types.