Query Rewriting: Turning Messy User Input Into Retrieval-Optimized Queries

Users don’t write queries optimized for retrieval. They write the way they talk — conversational, contextual, sometimes ambiguous. “What about the European version?” makes perfect sense in a conversation but is a terrible standalone retrieval query without context.

Query rewriting transforms the user’s input into a form that retrieval systems handle better. It’s different from query expansion (which adds to a query) — rewriting replaces or restructures the query while preserving (and clarifying) the original intent.

Why User Queries Need Rewriting

Several structural issues make raw user queries poor retrieval inputs:

Conversational context dependency:

Turn 1: "Tell me about the Transformer architecture"
Turn 2: "How does its attention mechanism work?"  ← "its" refers to Transformer

For retrieval, Turn 2 needs to be:
"How does the Transformer attention mechanism work?"

Vague pronouns and implicit references:

User: "What are its side effects?"
System doesn't know: whose side effects? Of what medication? From what earlier context?

Colloquial phrasing:

User: "Can you help me understand why my RAG keeps hallucinating?"
Rewritten: "Causes of hallucination in RAG systems and mitigation strategies"

Overly broad queries:

User: "Tell me everything about machine learning"
Rewritten into focused sub-queries based on what information is being sought

Standalone Query Rewriting (Conversation History)

The most common use case: make a query from a conversation chain self-contained for retrieval.

import anthropic

client = anthropic.Anthropic()

def rewrite_for_retrieval(
    query: str,
    chat_history: list[dict],
) -> str:
    history_text = "\n".join([
        f"{msg['role'].upper()}: {msg['content']}"
        for msg in chat_history[-6:]  # last 3 turns
    ])

    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=200,
        messages=[{
            "role": "user",
            "content": f"""Given a chat history and a follow-up question, rewrite the question
to be a standalone search query that captures all context needed.
Do not answer the question — just rewrite it as a clear, self-contained query.

Chat History:
{history_text}

Follow-up Question: {query}

Standalone Query:"""
        }]
    )
    return response.content[0].text.strip()

# Example
history = [
    {"role": "user", "content": "What are the main vector database options?"},
    {"role": "assistant", "content": "The main options are Pinecone, Weaviate, Qdrant, and Milvus..."},
]
original = "Which one is best for filtering?"
rewritten = rewrite_for_retrieval(original, history)
# → "Which vector database has the best metadata filtering capabilities: Pinecone, Weaviate, Qdrant, or Milvus?"

Step-Back Prompting

Step-back prompting (Zheng et al., 2023) addresses a subtle problem: users ask specific questions, but the relevant information in documents is often stated at a higher level of abstraction.

Specific query: "What is the learning rate used in GPT-3 training?"
Problem: Documents about GPT-3 training discuss schedules, not just a single number.
         The embedding for this specific query may not match passages about
         learning rate schedules.

Step-back query: "What are the training hyperparameters and learning rate schedule
                 for large language models like GPT-3?"
Better: This matches the level of abstraction in source documents.

def step_back_rewrite(specific_query: str) -> tuple[str, str]:
    """Returns (original_query, step_back_query)"""
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=150,
        messages=[{
            "role": "user",
            "content": f"""Rewrite this specific question as a more general,
higher-level question that would appear in a reference document.
The rewritten question should be broader but still relevant.

Specific question: {specific_query}
General question:"""
        }]
    )
    general = response.content[0].text.strip()
    return specific_query, general

# Run retrieval for BOTH queries, merge results
specific, general = step_back_rewrite("What dropout rate does ResNet-50 use?")
# specific → "What dropout rate does ResNet-50 use?"
# general  → "What regularization techniques are used in ResNet architectures?"

Query Disambiguation

When a query has multiple possible interpretations, a rewriting system should clarify or generate multiple targeted variants:

def disambiguate_query(ambiguous_query: str) -> list[str]:
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=300,
        messages=[{
            "role": "user",
            "content": f"""This query may have multiple interpretations.
List up to 3 distinct, specific queries that represent different possible
user intents. One per line.

Ambiguous query: {ambiguous_query}

Disambiguated queries:"""
        }]
    )
    lines = response.content[0].text.strip().split('\n')
    return [l.strip().lstrip('•-123. ') for l in lines if l.strip()]

# Example
disambiguated = disambiguate_query("Python class inheritance")
# → [
#     "Python class inheritance syntax and __init__ super()",
#     "Python multiple inheritance MRO method resolution order",
#     "Python abstract base classes and class hierarchy design"
#   ]

Rewriting for Different Retrieval Modes

The same user query should be rewritten differently depending on whether you’re doing keyword or semantic search:

def dual_rewrite(query: str) -> dict[str, str]:
    """Rewrite for both keyword and semantic retrieval."""

    keyword_response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=100,
        messages=[{
            "role": "user",
            "content": f"""Convert this question into a short keyword-style search query
with key technical terms. No question words, just terms.

Question: {query}
Keywords:"""
        }]
    )

    semantic_response = client.messages.create(
        model="claude-haiku-4-5-20251001",
        max_tokens=100,
        messages=[{
            "role": "user",
            "content": f"""Rephrase this question to be a clear, specific search query.
Keep it as a natural-language question.

Question: {query}
Rephrased:"""
        }]
    )

    return {
        "keyword": keyword_response.content[0].text.strip(),
        "semantic": semantic_response.content[0].text.strip(),
    }

LangChain Query Rewriting Integration

from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser

rewrite_prompt = ChatPromptTemplate.from_messages([
    ("system", """You are an expert at reformulating search queries for a technical knowledge base.
Rewrite the user's question to maximize retrieval effectiveness.
- Make it standalone (no conversational context assumed)
- Use precise technical vocabulary
- Keep it concise but specific"""),
    ("human", "{question}"),
])

rewriter_chain = rewrite_prompt | ChatOpenAI(model="gpt-4o-mini") | StrOutputParser()

# Use in RAG chain
rewritten_query = rewriter_chain.invoke({"question": user_query})
retrieved_docs = retriever.invoke(rewritten_query)

Measuring Rewriting Quality

Don’t assume rewriting helps — measure it:

def evaluate_rewriting(
    test_queries: list[dict],  # [{"original": ..., "expected_doc_id": ...}, ...]
    retriever,
    rewriter,
) -> dict:
    original_recalls = []
    rewritten_recalls = []

    for item in test_queries:
        # Recall with original query
        orig_results = retriever.invoke(item["original"])
        orig_recall = item["expected_doc_id"] in [r.id for r in orig_results]

        # Recall with rewritten query
        rewritten = rewriter(item["original"])
        rewrite_results = retriever.invoke(rewritten)
        rewrite_recall = item["expected_doc_id"] in [r.id for r in rewrite_results]

        original_recalls.append(orig_recall)
        rewritten_recalls.append(rewrite_recall)

    return {
        "original_recall": sum(original_recalls) / len(original_recalls),
        "rewritten_recall": sum(rewritten_recalls) / len(rewritten_recalls),
        "improvement": sum(rewritten_recalls) / len(rewritten_recalls)
                     - sum(original_recalls) / len(original_recalls),
    }

2025 Trend: Online Query Rewriting with Feedback

Emerging systems track which retrieved documents the user engaged with (clicked on, referenced in follow-up questions) and use that signal to fine-tune the rewriting model. Queries that led to “good” retrievals reinforce those rewriting patterns; queries that led to irrelevant results penalize them. This creates a self-improving rewriting loop over time.

Query rewriting is often the cheapest high-value improvement you can add to an existing RAG pipeline. A single LLM call to clean up the query can improve retrieval recall by 15–30% on conversational use cases where context dependency is common.