Agentic RAG: When One Retrieval Step Isn’t Enough
Standard RAG is a single-shot pipeline: retrieve once, generate once. For simple factual questions, that works well. But complex queries — “Compare the pricing models of our three enterprise plans, identify which best suits a customer with 500 users who need SSO and needs to stay under $10K/year, and explain the upgrade path” — can’t be answered from a single retrieval step.
Agentic RAG gives the LLM the ability to decide when to retrieve, what to retrieve, evaluate whether the retrieved information is sufficient, and decide whether to retrieve again. Instead of a fixed pipeline, you have an autonomous loop.
The Agentic Mindset: Plan, Retrieve, Evaluate, Repeat
User query: "Compare our enterprise plans and find the best fit for 500 users, SSO required, budget under $10K/year"
Agent Planning: Step 1: Retrieve enterprise plan details (features, pricing) Step 2: Retrieve SSO documentation and requirements Step 3: Retrieve pricing calculator or per-user pricing information Step 4: If budget information found, evaluate against $10K constraint Step 5: If upgrade path needed, retrieve upgrade documentation Step 6: Synthesize comparison and recommendation
Agent Execution: Iteration 1: "search('enterprise plans features pricing')" → gets plan overview Iteration 2: "search('SSO authentication setup enterprise')" → gets SSO docs Iteration 3: "search('per user pricing 500 users calculation')" → gets pricing Evaluation: "Do I have enough to answer?" → Yes Generation: Final comparative answer with recommendationReAct Framework for RAG Agents
ReAct (Reasoning + Acting) is the dominant paradigm for agentic RAG. The LLM alternates between reasoning steps (Thought) and tool calls (Action):
Thought: I need to find enterprise pricing to compare plans.Action: search("enterprise plan pricing features comparison")Observation: [Plan A: $199/user/mo, Plan B: $149/user/mo, Plan C: $99/user/mo...]
Thought: I have pricing. Now I need SSO availability per plan.Action: search("SSO SAML authentication which enterprise plans")Observation: [SSO available on Plan A and Plan B. Plan C has OAuth only...]
Thought: Plans A and B have SSO. I need to calculate 500-user costs.Action: calculate("500 * 199")Observation: $99,500
Action: calculate("500 * 149")Observation: $74,500
Thought: Both plans exceed $10K. I need to check if annual contracts reduce cost.Action: search("annual contract discount enterprise plans")Observation: [Annual billing: 20% discount on all plans...]
Thought: With 20% annual discount: Plan A: $99,500 * 0.8 = $79,600 → still over $10K/year Plan B: $74,500 * 0.8 = $59,600 → still over $10K/year
I should check if there's a smaller tier or startup pricing.Action: search("startup pricing small team under 10000 annual")Observation: [Startup plan: $8,000/year for up to 500 users with SSO...]
Thought: Found the answer. The Startup plan meets all criteria.Answer: The Startup plan at $8,000/year supports up to 500 users with SSO included...Implementation with LangChain Agents
from langchain.agents import create_react_agent, AgentExecutorfrom langchain_openai import ChatOpenAIfrom langchain.tools import Toolfrom langchain_community.vectorstores import FAISSfrom langchain import hub
# Define retrieval tooldef search_knowledge_base(query: str) -> str: results = vectorstore.similarity_search(query, k=3) return "\n\n".join([doc.page_content for doc in results])
tools = [ Tool( name="search", func=search_knowledge_base, description="Search the company knowledge base for information about products, pricing, policies, and documentation. Use this to find specific facts.", ), Tool( name="search_policies", func=lambda q: policy_store.similarity_search_str(q, k=2), description="Search specifically for company policies, terms of service, and compliance documents.", ),]
# ReAct prompt from LangChain Hubprompt = hub.pull("hwchase17/react")
llm = ChatOpenAI(model="gpt-4o", temperature=0)agent = create_react_agent(llm=llm, tools=tools, prompt=prompt)
agent_executor = AgentExecutor( agent=agent, tools=tools, verbose=True, max_iterations=8, # prevent infinite loops handle_parsing_errors=True,)
result = agent_executor.invoke({ "input": "Compare enterprise plans for 500 users with SSO under $10K/year"})Tool Design for Agentic RAG
The tools available to the agent determine what it can do. Well-designed tools make agents more effective:
from langchain.tools import StructuredToolfrom pydantic import BaseModel, Field
class SearchInput(BaseModel): query: str = Field(description="The search query") doc_type: str = Field( default="all", description="Filter by document type: 'pricing', 'technical', 'policy', 'all'" ) max_results: int = Field(default=3, description="Number of results to return (1-10)")
def structured_search(query: str, doc_type: str = "all", max_results: int = 3) -> str: filter_dict = {} if doc_type == "all" else {"doc_type": doc_type} results = vectorstore.similarity_search(query, k=max_results, filter=filter_dict) if not results: return "No results found for this query." return "\n\n---\n\n".join([ f"Source: {doc.metadata.get('source', 'unknown')}\n{doc.page_content}" for doc in results ])
search_tool = StructuredTool.from_function( func=structured_search, name="search_knowledge_base", description="Search for information in the knowledge base with optional filters", args_schema=SearchInput,)Verification and Self-Critique Loop
A key capability of agentic RAG: the agent verifies its own answer before returning it:
VERIFICATION_PROMPT = """You just generated this answer:{answer}
Based on the retrieved context:{context}
User question:{question}
Evaluate:1. Does the answer directly address the question?2. Is every factual claim supported by the retrieved context?3. Are there any claims that seem uncertain or unsupported?
If the answer is complete and accurate, respond with: VERIFIEDIf the answer needs more information, respond with: NEED_MORE_INFO: [specific what's missing]If the answer contains unsupported claims, respond with: INCORRECT: [what's wrong]"""This self-critique loop catches hallucinations before they reach the user and triggers additional retrieval when needed.
Agent Architectures Comparison
| Architecture | Use Case | Complexity | Latency |
|---|---|---|---|
| Single-hop RAG | Simple factual Q&A | Low | ~1s |
| ReAct agent | Multi-step research | Medium | 5–30s |
| Plan-and-Execute | Complex structured tasks | High | 15–60s |
| Multi-agent (specialist agents) | Large-scale enterprise | Very High | Variable |
Multi-Agent RAG for Complex Workflows
For enterprise use cases, multiple specialized agents collaborate:
Orchestrator Agent├── Retrieval Agent (search and fetch documents)├── Analysis Agent (analyze retrieved information)├── Calculation Agent (numerical reasoning)├── Synthesis Agent (combine information from multiple sources)└── Verification Agent (fact-check the final answer)The orchestrator routes subtasks to specialist agents and combines their outputs. This separation of concerns improves reliability — each agent is optimized for its specific task.
2025 Trend: Long-Context vs Agentic RAG
With Claude 3.5’s 200K context and GPT-4o’s 128K context, some argue that long-context models reduce the need for agentic retrieval — just stuff everything in the context window.
In practice, long-context models still benefit from selective retrieval:
- Cost: 200K tokens is expensive to process every query
- Accuracy: LLMs still show “lost in the middle” degradation at extreme lengths
- Freshness: Large corpora can’t fit in any context window
The sweet spot: use long context for within-document comprehension, use agentic retrieval for across-document synthesis.
Agentic RAG is the architecture for hard problems. Start simple (single-hop RAG) and escalate to agents only when the complexity of your queries genuinely requires multi-step reasoning. Agents are powerful but add latency, cost, and unpredictability that simpler pipelines don’t have.