Query Rewriting for RAG: Transform User Input for Better Retrieval

Learn query rewriting for RAG systems — LLM-based rewriting, step-back prompting, conversation context integration, and ambiguity resolution techniques.

Query Rewriting: Turning Messy User Input Into Retrieval-Optimized Queries

Users don’t write queries optimized for retrieval. They write the way they talk — conversational, contextual, sometimes ambiguous. “What about the European version?” makes perfect sense in a conversation but is a terrible standalone retrieval query without context.

Query rewriting transforms the user’s input into a form that retrieval systems handle better. It’s different from query expansion (which adds to a query) — rewriting replaces or restructures the query while preserving (and clarifying) the original intent.

Why User Queries Need Rewriting

Several structural issues make raw user queries poor retrieval inputs:

Conversational context dependency:

Turn 1: "Tell me about the Transformer architecture"
Turn 2: "How does its attention mechanism work?" ← "its" refers to Transformer
For retrieval, Turn 2 needs to be:
"How does the Transformer attention mechanism work?"

Vague pronouns and implicit references:

User: "What are its side effects?"
System doesn't know: whose side effects? Of what medication? From what earlier context?

Colloquial phrasing:

User: "Can you help me understand why my RAG keeps hallucinating?"
Rewritten: "Causes of hallucination in RAG systems and mitigation strategies"

Overly broad queries:

User: "Tell me everything about machine learning"
Rewritten into focused sub-queries based on what information is being sought

Standalone Query Rewriting (Conversation History)

The most common use case: make a query from a conversation chain self-contained for retrieval.

import anthropic
client = anthropic.Anthropic()
def rewrite_for_retrieval(
query: str,
chat_history: list[dict],
) -> str:
history_text = "\n".join([
f"{msg['role'].upper()}: {msg['content']}"
for msg in chat_history[-6:] # last 3 turns
])
response = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=200,
messages=[{
"role": "user",
"content": f"""Given a chat history and a follow-up question, rewrite the question
to be a standalone search query that captures all context needed.
Do not answer the question — just rewrite it as a clear, self-contained query.
Chat History:
{history_text}
Follow-up Question: {query}
Standalone Query:"""
}]
)
return response.content[0].text.strip()
# Example
history = [
{"role": "user", "content": "What are the main vector database options?"},
{"role": "assistant", "content": "The main options are Pinecone, Weaviate, Qdrant, and Milvus..."},
]
original = "Which one is best for filtering?"
rewritten = rewrite_for_retrieval(original, history)
# → "Which vector database has the best metadata filtering capabilities: Pinecone, Weaviate, Qdrant, or Milvus?"

Step-Back Prompting

Step-back prompting (Zheng et al., 2023) addresses a subtle problem: users ask specific questions, but the relevant information in documents is often stated at a higher level of abstraction.

Specific query: "What is the learning rate used in GPT-3 training?"
Problem: Documents about GPT-3 training discuss schedules, not just a single number.
The embedding for this specific query may not match passages about
learning rate schedules.
Step-back query: "What are the training hyperparameters and learning rate schedule
for large language models like GPT-3?"
Better: This matches the level of abstraction in source documents.
def step_back_rewrite(specific_query: str) -> tuple[str, str]:
"""Returns (original_query, step_back_query)"""
response = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=150,
messages=[{
"role": "user",
"content": f"""Rewrite this specific question as a more general,
higher-level question that would appear in a reference document.
The rewritten question should be broader but still relevant.
Specific question: {specific_query}
General question:"""
}]
)
general = response.content[0].text.strip()
return specific_query, general
# Run retrieval for BOTH queries, merge results
specific, general = step_back_rewrite("What dropout rate does ResNet-50 use?")
# specific → "What dropout rate does ResNet-50 use?"
# general → "What regularization techniques are used in ResNet architectures?"

Query Disambiguation

When a query has multiple possible interpretations, a rewriting system should clarify or generate multiple targeted variants:

def disambiguate_query(ambiguous_query: str) -> list[str]:
response = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=300,
messages=[{
"role": "user",
"content": f"""This query may have multiple interpretations.
List up to 3 distinct, specific queries that represent different possible
user intents. One per line.
Ambiguous query: {ambiguous_query}
Disambiguated queries:"""
}]
)
lines = response.content[0].text.strip().split('\n')
return [l.strip().lstrip('•-123. ') for l in lines if l.strip()]
# Example
disambiguated = disambiguate_query("Python class inheritance")
# → [
# "Python class inheritance syntax and __init__ super()",
# "Python multiple inheritance MRO method resolution order",
# "Python abstract base classes and class hierarchy design"
# ]

Rewriting for Different Retrieval Modes

The same user query should be rewritten differently depending on whether you’re doing keyword or semantic search:

def dual_rewrite(query: str) -> dict[str, str]:
"""Rewrite for both keyword and semantic retrieval."""
keyword_response = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=100,
messages=[{
"role": "user",
"content": f"""Convert this question into a short keyword-style search query
with key technical terms. No question words, just terms.
Question: {query}
Keywords:"""
}]
)
semantic_response = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=100,
messages=[{
"role": "user",
"content": f"""Rephrase this question to be a clear, specific search query.
Keep it as a natural-language question.
Question: {query}
Rephrased:"""
}]
)
return {
"keyword": keyword_response.content[0].text.strip(),
"semantic": semantic_response.content[0].text.strip(),
}

LangChain Query Rewriting Integration

from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
rewrite_prompt = ChatPromptTemplate.from_messages([
("system", """You are an expert at reformulating search queries for a technical knowledge base.
Rewrite the user's question to maximize retrieval effectiveness.
- Make it standalone (no conversational context assumed)
- Use precise technical vocabulary
- Keep it concise but specific"""),
("human", "{question}"),
])
rewriter_chain = rewrite_prompt | ChatOpenAI(model="gpt-4o-mini") | StrOutputParser()
# Use in RAG chain
rewritten_query = rewriter_chain.invoke({"question": user_query})
retrieved_docs = retriever.invoke(rewritten_query)

Measuring Rewriting Quality

Don’t assume rewriting helps — measure it:

def evaluate_rewriting(
test_queries: list[dict], # [{"original": ..., "expected_doc_id": ...}, ...]
retriever,
rewriter,
) -> dict:
original_recalls = []
rewritten_recalls = []
for item in test_queries:
# Recall with original query
orig_results = retriever.invoke(item["original"])
orig_recall = item["expected_doc_id"] in [r.id for r in orig_results]
# Recall with rewritten query
rewritten = rewriter(item["original"])
rewrite_results = retriever.invoke(rewritten)
rewrite_recall = item["expected_doc_id"] in [r.id for r in rewrite_results]
original_recalls.append(orig_recall)
rewritten_recalls.append(rewrite_recall)
return {
"original_recall": sum(original_recalls) / len(original_recalls),
"rewritten_recall": sum(rewritten_recalls) / len(rewritten_recalls),
"improvement": sum(rewritten_recalls) / len(rewritten_recalls)
- sum(original_recalls) / len(original_recalls),
}

2025 Trend: Online Query Rewriting with Feedback

Emerging systems track which retrieved documents the user engaged with (clicked on, referenced in follow-up questions) and use that signal to fine-tune the rewriting model. Queries that led to “good” retrievals reinforce those rewriting patterns; queries that led to irrelevant results penalize them. This creates a self-improving rewriting loop over time.

Query rewriting is often the cheapest high-value improvement you can add to an existing RAG pipeline. A single LLM call to clean up the query can improve retrieval recall by 15–30% on conversational use cases where context dependency is common.