Query Rewriting: Turning Messy User Input Into Retrieval-Optimized Queries
Users don’t write queries optimized for retrieval. They write the way they talk — conversational, contextual, sometimes ambiguous. “What about the European version?” makes perfect sense in a conversation but is a terrible standalone retrieval query without context.
Query rewriting transforms the user’s input into a form that retrieval systems handle better. It’s different from query expansion (which adds to a query) — rewriting replaces or restructures the query while preserving (and clarifying) the original intent.
Why User Queries Need Rewriting
Several structural issues make raw user queries poor retrieval inputs:
Conversational context dependency:
Turn 1: "Tell me about the Transformer architecture"Turn 2: "How does its attention mechanism work?" ← "its" refers to Transformer
For retrieval, Turn 2 needs to be:"How does the Transformer attention mechanism work?"Vague pronouns and implicit references:
User: "What are its side effects?"System doesn't know: whose side effects? Of what medication? From what earlier context?Colloquial phrasing:
User: "Can you help me understand why my RAG keeps hallucinating?"Rewritten: "Causes of hallucination in RAG systems and mitigation strategies"Overly broad queries:
User: "Tell me everything about machine learning"Rewritten into focused sub-queries based on what information is being soughtStandalone Query Rewriting (Conversation History)
The most common use case: make a query from a conversation chain self-contained for retrieval.
import anthropic
client = anthropic.Anthropic()
def rewrite_for_retrieval( query: str, chat_history: list[dict],) -> str: history_text = "\n".join([ f"{msg['role'].upper()}: {msg['content']}" for msg in chat_history[-6:] # last 3 turns ])
response = client.messages.create( model="claude-haiku-4-5-20251001", max_tokens=200, messages=[{ "role": "user", "content": f"""Given a chat history and a follow-up question, rewrite the questionto be a standalone search query that captures all context needed.Do not answer the question — just rewrite it as a clear, self-contained query.
Chat History:{history_text}
Follow-up Question: {query}
Standalone Query:""" }] ) return response.content[0].text.strip()
# Examplehistory = [ {"role": "user", "content": "What are the main vector database options?"}, {"role": "assistant", "content": "The main options are Pinecone, Weaviate, Qdrant, and Milvus..."},]original = "Which one is best for filtering?"rewritten = rewrite_for_retrieval(original, history)# → "Which vector database has the best metadata filtering capabilities: Pinecone, Weaviate, Qdrant, or Milvus?"Step-Back Prompting
Step-back prompting (Zheng et al., 2023) addresses a subtle problem: users ask specific questions, but the relevant information in documents is often stated at a higher level of abstraction.
Specific query: "What is the learning rate used in GPT-3 training?"Problem: Documents about GPT-3 training discuss schedules, not just a single number. The embedding for this specific query may not match passages about learning rate schedules.
Step-back query: "What are the training hyperparameters and learning rate schedule for large language models like GPT-3?"Better: This matches the level of abstraction in source documents.def step_back_rewrite(specific_query: str) -> tuple[str, str]: """Returns (original_query, step_back_query)""" response = client.messages.create( model="claude-haiku-4-5-20251001", max_tokens=150, messages=[{ "role": "user", "content": f"""Rewrite this specific question as a more general,higher-level question that would appear in a reference document.The rewritten question should be broader but still relevant.
Specific question: {specific_query}General question:""" }] ) general = response.content[0].text.strip() return specific_query, general
# Run retrieval for BOTH queries, merge resultsspecific, general = step_back_rewrite("What dropout rate does ResNet-50 use?")# specific → "What dropout rate does ResNet-50 use?"# general → "What regularization techniques are used in ResNet architectures?"Query Disambiguation
When a query has multiple possible interpretations, a rewriting system should clarify or generate multiple targeted variants:
def disambiguate_query(ambiguous_query: str) -> list[str]: response = client.messages.create( model="claude-haiku-4-5-20251001", max_tokens=300, messages=[{ "role": "user", "content": f"""This query may have multiple interpretations.List up to 3 distinct, specific queries that represent different possibleuser intents. One per line.
Ambiguous query: {ambiguous_query}
Disambiguated queries:""" }] ) lines = response.content[0].text.strip().split('\n') return [l.strip().lstrip('•-123. ') for l in lines if l.strip()]
# Exampledisambiguated = disambiguate_query("Python class inheritance")# → [# "Python class inheritance syntax and __init__ super()",# "Python multiple inheritance MRO method resolution order",# "Python abstract base classes and class hierarchy design"# ]Rewriting for Different Retrieval Modes
The same user query should be rewritten differently depending on whether you’re doing keyword or semantic search:
def dual_rewrite(query: str) -> dict[str, str]: """Rewrite for both keyword and semantic retrieval."""
keyword_response = client.messages.create( model="claude-haiku-4-5-20251001", max_tokens=100, messages=[{ "role": "user", "content": f"""Convert this question into a short keyword-style search querywith key technical terms. No question words, just terms.
Question: {query}Keywords:""" }] )
semantic_response = client.messages.create( model="claude-haiku-4-5-20251001", max_tokens=100, messages=[{ "role": "user", "content": f"""Rephrase this question to be a clear, specific search query.Keep it as a natural-language question.
Question: {query}Rephrased:""" }] )
return { "keyword": keyword_response.content[0].text.strip(), "semantic": semantic_response.content[0].text.strip(), }LangChain Query Rewriting Integration
from langchain_core.prompts import ChatPromptTemplatefrom langchain_openai import ChatOpenAIfrom langchain_core.output_parsers import StrOutputParser
rewrite_prompt = ChatPromptTemplate.from_messages([ ("system", """You are an expert at reformulating search queries for a technical knowledge base.Rewrite the user's question to maximize retrieval effectiveness.- Make it standalone (no conversational context assumed)- Use precise technical vocabulary- Keep it concise but specific"""), ("human", "{question}"),])
rewriter_chain = rewrite_prompt | ChatOpenAI(model="gpt-4o-mini") | StrOutputParser()
# Use in RAG chainrewritten_query = rewriter_chain.invoke({"question": user_query})retrieved_docs = retriever.invoke(rewritten_query)Measuring Rewriting Quality
Don’t assume rewriting helps — measure it:
def evaluate_rewriting( test_queries: list[dict], # [{"original": ..., "expected_doc_id": ...}, ...] retriever, rewriter,) -> dict: original_recalls = [] rewritten_recalls = []
for item in test_queries: # Recall with original query orig_results = retriever.invoke(item["original"]) orig_recall = item["expected_doc_id"] in [r.id for r in orig_results]
# Recall with rewritten query rewritten = rewriter(item["original"]) rewrite_results = retriever.invoke(rewritten) rewrite_recall = item["expected_doc_id"] in [r.id for r in rewrite_results]
original_recalls.append(orig_recall) rewritten_recalls.append(rewrite_recall)
return { "original_recall": sum(original_recalls) / len(original_recalls), "rewritten_recall": sum(rewritten_recalls) / len(rewritten_recalls), "improvement": sum(rewritten_recalls) / len(rewritten_recalls) - sum(original_recalls) / len(original_recalls), }2025 Trend: Online Query Rewriting with Feedback
Emerging systems track which retrieved documents the user engaged with (clicked on, referenced in follow-up questions) and use that signal to fine-tune the rewriting model. Queries that led to “good” retrievals reinforce those rewriting patterns; queries that led to irrelevant results penalize them. This creates a self-improving rewriting loop over time.
Query rewriting is often the cheapest high-value improvement you can add to an existing RAG pipeline. A single LLM call to clean up the query can improve retrieval recall by 15–30% on conversational use cases where context dependency is common.