Multi-Hop Retrieval: Following the Evidence Chain
“Which investor funded both the company that acquired DeepMind and the startup that later became OpenAI?”
This question can’t be answered by retrieving a single document. You need to:
- Find who acquired DeepMind (Google)
- Find Google’s investors (Sequoia, Kleiner Perkins, etc.)
- Find who funded early OpenAI (Y Combinator, Peter Thiel, Reid Hoffman, etc.)
- Find the intersection
Each retrieval step depends on the result of the previous one. That’s multi-hop retrieval.
What Makes a Query Multi-Hop
Single-hop (standard RAG): Q: "What is the capital of France?" Retrieval: one document → answer
Two-hop: Q: "What is the capital of the country that hosts the FIFA World Cup 2030?" Hop 1: Find FIFA World Cup 2030 host → Spain, Portugal, Morocco Hop 2: Find capitals of Spain, Portugal, Morocco → Madrid, Lisbon, Rabat
Three-hop: Q: "Who founded the company that makes the software used by the hospital that treated the president of the entity that first deployed GPT-4?" Hop 1: Find who first deployed GPT-4 → Microsoft Hop 2: Find hospital that treated Microsoft's president → [specific hospital] Hop 3: Find software used by that hospital → [specific medical software] Hop 4: Find founder of that software company → [answer]Iterative Retrieval: The Core Algorithm
The fundamental multi-hop algorithm: use each retrieval’s result to formulate the next query:
import anthropic
client = anthropic.Anthropic()
def multi_hop_retrieve( initial_query: str, vectorstore, max_hops: int = 4,) -> tuple[list[str], list[str]]: """Returns (all_retrieved_docs, reasoning_chain)"""
all_docs = [] reasoning_chain = [] current_query = initial_query context = ""
for hop in range(max_hops): # Retrieve for current query results = vectorstore.similarity_search(current_query, k=3) new_docs = [r.page_content for r in results] all_docs.extend(new_docs)
# Build context from all retrieved docs so far context = "\n\n".join(all_docs)
# Ask LLM: do we have enough to answer? If not, what's the next hop? response = client.messages.create( model="claude-haiku-4-5-20251001", max_tokens=300, messages=[{ "role": "user", "content": f"""Original question: {initial_query}
Retrieved information so far:{context}
Can you answer the original question with the current information?- If YES: respond with ANSWER: [your answer]- If NO: respond with NEXT_QUERY: [what to search for next to complete the answer]""" }] )
response_text = response.content[0].text.strip() reasoning_chain.append(f"Hop {hop+1}: queried '{current_query}'\n→ {response_text[:200]}")
if response_text.startswith("ANSWER:"): break elif response_text.startswith("NEXT_QUERY:"): current_query = response_text.replace("NEXT_QUERY:", "").strip() else: break
return all_docs, reasoning_chainBridge Entity Extraction
A common multi-hop pattern involves a “bridge entity” — an intermediate entity that connects the question’s subject to its answer:
Q: "What nationality is the CEO of the company that makes Claude?"
Bridge entity: "the company that makes Claude" = Anthropic
Hop 1: Who makes Claude? → AnthropicHop 2: Who is the CEO of Anthropic? → Dario AmodeiHop 3: What nationality is Dario Amodei? → AmericanExplicitly extracting bridge entities and searching for them improves precision:
def extract_bridge_entities(query: str, initial_docs: list[str]) -> list[str]: context = "\n".join(initial_docs[:2])
response = client.messages.create( model="claude-haiku-4-5-20251001", max_tokens=200, messages=[{ "role": "user", "content": f"""Given this question and partial context, identify the key intermediate entitiesthat need to be looked up to fully answer the question.List them one per line.
Question: {query}Context: {context}
Bridge entities to look up:""" }] )
entities = response.content[0].text.strip().split('\n') return [e.strip() for e in entities if e.strip()]
# Usagebridge_entities = extract_bridge_entities( "What is the headquarters city of the company that acquired DeepMind?", initial_docs)# → ["company that acquired DeepMind", "Google"]# → second hop: search("Google headquarters city")LangChain Multi-Hop with MRKL
MRKL (Modular Reasoning, Knowledge, and Language) systems decompose complex questions into modular steps:
from langchain.agents import create_react_agent, AgentExecutorfrom langchain.tools import Toolfrom langchain_openai import ChatOpenAI
def search_with_context(query_and_context: str) -> str: """Search that accepts context from previous hops.""" parts = query_and_context.split("|||") query = parts[0].strip() prev_context = parts[1].strip() if len(parts) > 1 else ""
# Use previous context to refine search if available if prev_context: enhanced_query = f"{query} (context: {prev_context[:200]})" else: enhanced_query = query
results = vectorstore.similarity_search(enhanced_query, k=3) return "\n".join([r.page_content for r in results])
multi_hop_tool = Tool( name="contextual_search", func=search_with_context, description="""Search the knowledge base. For multi-hop queries, pass context from previous searches using format: 'current query ||| previous context'""")Comparison: Single-Hop vs Multi-Hop Performance
HotpotQA Benchmark Results (requiring 2-hop reasoning):
Approach | Exact Match | F1 Score----------------------------|-------------|----------Standard single-hop RAG | 31.2% | 43.8%Multi-hop retrieval (2 hop) | 48.7% | 61.3%Graph RAG | 52.1% | 64.9%Multi-hop + reranking | 54.3% | 67.2%LLM + internet search | 61.8% | 74.1%
Multi-hop retrieval improves over single-hop by ~56% on complex questions.Failure Modes and Mitigations
Error propagation: If Hop 1 retrieves the wrong entity, Hop 2 compounds the error. A wrong “bridge” leads to an entirely wrong answer chain.
Mitigation: Retrieve top-3 candidates at each hop and maintain parallel reasoning paths. Prune paths where intermediate results are low-confidence.
Infinite loops: The agent keeps searching because it can’t find the answer, cycling through similar queries.
Mitigation: Hard iteration limit (max_hops=4 in practice), detect repeated queries, and implement a “best effort” fallback when the hop limit is reached.
Context window explosion: 4 hops × 3 documents × 500 tokens = 6,000 tokens of context before generation. At many hops, you exceed the LLM’s effective reasoning capacity.
Mitigation: Apply context compression to each hop’s retrieved documents before accumulating them. Only carry forward the most relevant sentences from each hop.
2025 Trend: Learned Multi-Hop Planners
Rather than having the LLM decide dynamically whether another hop is needed, newer systems train a lightweight “hop planner” model that predicts the query decomposition upfront:
Input: "Which investor funded both Google and Tesla's early stage?"Plan: [ {"hop": 1, "query": "Google early investors founding round"}, {"hop": 2, "query": "Tesla early investors Series A"}, {"hop": 3, "query": "intersection of Google and Tesla investors"}]This pre-planned approach reduces per-query LLM calls and produces more predictable execution paths. It’s being developed by several research groups as part of broader “structured reasoning” for RAG frameworks.
Multi-hop retrieval is essential for knowledge-intensive applications — legal research, scientific literature review, business intelligence — where single retrievals consistently fail to connect disparate facts. Design your RAG system with multi-hop capability when your query distribution includes complex compound questions.