AI Agents
An AI agent is a system that uses a language model not just to respond to a single prompt, but to autonomously plan a course of action, execute steps, observe results, and iterate — all to accomplish a goal.
This is qualitatively different from a chatbot. A chatbot responds. An agent acts.
What Makes Something an Agent?
Three properties distinguish an AI agent from a simple LLM call:
- Tool use: The agent can take actions in the world — search the web, run code, call APIs, read/write files, control browsers
- Planning: The agent breaks goals into sub-tasks and sequences them
- Iteration: The agent observes the results of its actions and adjusts its plan accordingly
Goal: "Research the top 5 AI papers from this month and write a summary"
Simple LLM: Generates a response based on training data (possibly outdated)
Agent: Step 1: Search arxiv.org for recent AI papers → observe 20 results Step 2: Filter to most cited in last 30 days → observe 5 papers Step 3: Retrieve and read each paper abstract → observe content Step 4: Synthesize findings → generate summary Step 5: Review summary quality → doneThe ReAct Pattern: Reason + Act
The most foundational agentic prompting pattern is ReAct (Reasoning + Acting). The agent alternates between thinking about what to do and actually doing it.
Thought: I need to find the current stock price of Apple. Let me search for it.Action: search("AAPL stock price today")Observation: AAPL is trading at $191.45 as of 2:30 PM EST
Thought: Now I need the price-to-earnings ratio. I'll calculate it using the reported EPS.Action: search("Apple Inc. trailing EPS 2024")Observation: Apple trailing twelve months EPS: $6.43
Thought: P/E = Price / EPS = 191.45 / 6.43 = 29.77Action: calculate(191.45 / 6.43)Observation: 29.77
Thought: I have both values. I can now answer the question.Final Answer: Apple's current P/E ratio is approximately 29.8, based on a stock price of $191.45 and trailing EPS of $6.43.The interleaved reasoning and action pattern is more reliable than trying to plan everything upfront, because the model can adapt based on what it observes.
Agent Memory: Four Types
Effective agents need memory at different timescales:
┌──────────────────────────────────────────────────────┐│ Agent Memory Types │├──────────────────────────────────────────────────────┤│ IN-CONTEXT (Working) ││ Current conversation + tool outputs in the context ││ window. Volatile — cleared when session ends. │├──────────────────────────────────────────────────────┤│ EXTERNAL (Episodic + Semantic) ││ Vector DB / key-value store. Past conversations, ││ documents, user preferences. Survives sessions. │├──────────────────────────────────────────────────────┤│ PROCEDURAL ││ System prompts and fine-tuning encode how to ││ behave. "Baked in" during training/prompting. │└──────────────────────────────────────────────────────┘Most current agent implementations use in-context memory (the conversation history) plus an external memory layer (vector DB for long-term recall).
Tool Design: The Interface Between Agent and World
The tools you give an agent define what it can do. Tool design is critical — a poorly designed tool causes more harm than no tool.
# Good tool design example (using OpenAI / Anthropic function calling)tools = [ { "name": "search_codebase", "description": "Search the codebase for files, functions, or patterns. " "Use this to find where specific functionality is implemented. " "Returns file paths and line numbers.", "input_schema": { "type": "object", "properties": { "query": { "type": "string", "description": "Search query — can be a function name, error message, or keyword" }, "file_type": { "type": "string", "enum": ["python", "javascript", "all"], "description": "Filter to specific file types" } }, "required": ["query"] } }]Tool design principles:
- Clear description of what the tool does AND when to use it
- Well-typed inputs with descriptions for each parameter
- Idempotent where possible (safe to call multiple times)
- Returns structured, parseable outputs
- Has reasonable rate limits and timeouts
- Fails gracefully with useful error messages
Agentic Frameworks (2025–2026)
Several frameworks simplify building multi-step agents:
LangGraph (LangChain)
Defines agents as graphs of nodes and edges. Each node is an LLM call or tool execution. Edges are conditional logic. Excellent for complex, branching workflows.
from langgraph.graph import StateGraph, END
workflow = StateGraph(AgentState)workflow.add_node("reason", agent_reasoning_node)workflow.add_node("call_tool", tool_execution_node)workflow.add_conditional_edges("reason", should_continue, {"continue": "call_tool", "end": END})workflow.add_edge("call_tool", "reason")AutoGen (Microsoft)
Multi-agent framework where multiple specialized agents converse to solve tasks. A “planner” agent assigns subtasks to “executor” agents with different capabilities.
CrewAI
High-level multi-agent framework focused on “crew” metaphor — roles, goals, backstories for each agent. Good for automated research, report generation pipelines.
Claude’s Native Tool Use
Anthropic’s Claude has strong native tool use without frameworks. For simple single-agent workflows, the raw API with well-designed tools often outperforms framework overhead.
Common Agent Failure Modes
Infinite Loops
The agent keeps retrying a failing tool call without recognizing the pattern. Fix: maximum step limits, loop detection.
Tool Misuse
The agent calls a write tool when it should read, or uses the wrong API endpoint. Fix: clear tool descriptions, validation on tool outputs.
Context Overload
After 10+ tool calls, the context fills with tool outputs. The model loses track of the original goal. Fix: summarize tool outputs before adding to context, truncate verbose outputs.
Hallucinated Tool Calls
The model fabricates tool results instead of actually calling the tool. Fix: use frameworks that enforce actual tool execution, not just format validation.
Sycophantic Planning
The agent convinces itself early steps worked even when they produced errors, then builds on a faulty foundation. Fix: explicit output validation, mandatory error handling.
Agent Reliability in Practice
Here’s an honest assessment: autonomous agents fail more than demos suggest. Getting a ReAct agent to solve a 5-step task reliably in production requires:
- Retry logic with exponential backoff
- Human-in-the-loop checkpoints for irreversible actions
- Comprehensive logging of every thought and action
- Hard limits on max steps and token budget
- Fallback to human if the task can’t be completed
The “fully autonomous AI agent” is a goal, not today’s reality for high-stakes tasks. The sweet spot for 2026 production agents: well-defined, bounded tasks with limited tool sets and human approval gates for consequential actions.
The Agent Trend in 2026
The shift happening across the industry: AI products are moving from “chatbots that respond” to “agents that complete tasks.” Examples:
- Coding agents: Cursor, Devin, GitHub Copilot Workspace — create full features from specifications
- Research agents: Perplexity’s Deep Research, OpenAI’s Deep Research — 30-minute autonomous internet research sessions
- Workflow agents: Anthropic Claude’s computer use, OpenAI Operator — control browsers and applications
- DevOps agents: Automatically investigate alerts, roll back deployments, file PRs
Each of these is a specialized agent with a carefully scoped set of tools, not a general-purpose autonomous AI. That scoping is what makes them reliable enough to ship.