AI Agents

An AI agent is a system that uses a language model not just to respond to a single prompt, but to autonomously plan a course of action, execute steps, observe results, and iterate — all to accomplish a goal.

This is qualitatively different from a chatbot. A chatbot responds. An agent acts.

What Makes Something an Agent?

Three properties distinguish an AI agent from a simple LLM call:

Tool use: The agent can take actions in the world — search the web, run code, call APIs, read/write files, control browsers
Planning: The agent breaks goals into sub-tasks and sequences them
Iteration: The agent observes the results of its actions and adjusts its plan accordingly

Goal: "Research the top 5 AI papers from this month and write a summary"

Simple LLM: Generates a response based on training data (possibly outdated)

Agent:
  Step 1: Search arxiv.org for recent AI papers → observe 20 results
  Step 2: Filter to most cited in last 30 days → observe 5 papers
  Step 3: Retrieve and read each paper abstract → observe content
  Step 4: Synthesize findings → generate summary
  Step 5: Review summary quality → done

The ReAct Pattern: Reason + Act

The most foundational agentic prompting pattern is ReAct (Reasoning + Acting). The agent alternates between thinking about what to do and actually doing it.

Thought: I need to find the current stock price of Apple. Let me search for it.
Action: search("AAPL stock price today")
Observation: AAPL is trading at $191.45 as of 2:30 PM EST

Thought: Now I need the price-to-earnings ratio. I'll calculate it using the
         reported EPS.
Action: search("Apple Inc. trailing EPS 2024")
Observation: Apple trailing twelve months EPS: $6.43

Thought: P/E = Price / EPS = 191.45 / 6.43 = 29.77
Action: calculate(191.45 / 6.43)
Observation: 29.77

Thought: I have both values. I can now answer the question.
Final Answer: Apple's current P/E ratio is approximately 29.8, based on a stock
              price of $191.45 and trailing EPS of $6.43.

The interleaved reasoning and action pattern is more reliable than trying to plan everything upfront, because the model can adapt based on what it observes.

Agent Memory: Four Types

Effective agents need memory at different timescales:

┌──────────────────────────────────────────────────────┐
│                  Agent Memory Types                   │
├──────────────────────────────────────────────────────┤
│ IN-CONTEXT (Working)                                 │
│ Current conversation + tool outputs in the context  │
│ window. Volatile — cleared when session ends.        │
├──────────────────────────────────────────────────────┤
│ EXTERNAL (Episodic + Semantic)                       │
│ Vector DB / key-value store. Past conversations,     │
│ documents, user preferences. Survives sessions.      │
├──────────────────────────────────────────────────────┤
│ PROCEDURAL                                           │
│ System prompts and fine-tuning encode how to        │
│ behave. "Baked in" during training/prompting.        │
└──────────────────────────────────────────────────────┘

Most current agent implementations use in-context memory (the conversation history) plus an external memory layer (vector DB for long-term recall).

Tool Design: The Interface Between Agent and World

The tools you give an agent define what it can do. Tool design is critical — a poorly designed tool causes more harm than no tool.

# Good tool design example (using OpenAI / Anthropic function calling)
tools = [
    {
        "name": "search_codebase",
        "description": "Search the codebase for files, functions, or patterns. "
                       "Use this to find where specific functionality is implemented. "
                       "Returns file paths and line numbers.",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "Search query — can be a function name, error message, or keyword"
                },
                "file_type": {
                    "type": "string",
                    "enum": ["python", "javascript", "all"],
                    "description": "Filter to specific file types"
                }
            },
            "required": ["query"]
        }
    }
]

Tool design principles:

Clear description of what the tool does AND when to use it
Well-typed inputs with descriptions for each parameter
Idempotent where possible (safe to call multiple times)
Returns structured, parseable outputs
Has reasonable rate limits and timeouts
Fails gracefully with useful error messages

Agentic Frameworks (2025–2026)

Several frameworks simplify building multi-step agents:

LangGraph (LangChain)

Defines agents as graphs of nodes and edges. Each node is an LLM call or tool execution. Edges are conditional logic. Excellent for complex, branching workflows.

from langgraph.graph import StateGraph, END

workflow = StateGraph(AgentState)
workflow.add_node("reason", agent_reasoning_node)
workflow.add_node("call_tool", tool_execution_node)
workflow.add_conditional_edges("reason", should_continue,
                                {"continue": "call_tool", "end": END})
workflow.add_edge("call_tool", "reason")

AutoGen (Microsoft)

Multi-agent framework where multiple specialized agents converse to solve tasks. A “planner” agent assigns subtasks to “executor” agents with different capabilities.

CrewAI

High-level multi-agent framework focused on “crew” metaphor — roles, goals, backstories for each agent. Good for automated research, report generation pipelines.

Claude’s Native Tool Use

Anthropic’s Claude has strong native tool use without frameworks. For simple single-agent workflows, the raw API with well-designed tools often outperforms framework overhead.

Common Agent Failure Modes

Infinite Loops

The agent keeps retrying a failing tool call without recognizing the pattern. Fix: maximum step limits, loop detection.

Tool Misuse

The agent calls a write tool when it should read, or uses the wrong API endpoint. Fix: clear tool descriptions, validation on tool outputs.

Context Overload

After 10+ tool calls, the context fills with tool outputs. The model loses track of the original goal. Fix: summarize tool outputs before adding to context, truncate verbose outputs.

Hallucinated Tool Calls

The model fabricates tool results instead of actually calling the tool. Fix: use frameworks that enforce actual tool execution, not just format validation.

Sycophantic Planning

The agent convinces itself early steps worked even when they produced errors, then builds on a faulty foundation. Fix: explicit output validation, mandatory error handling.

Agent Reliability in Practice

Here’s an honest assessment: autonomous agents fail more than demos suggest. Getting a ReAct agent to solve a 5-step task reliably in production requires:

Retry logic with exponential backoff
Human-in-the-loop checkpoints for irreversible actions
Comprehensive logging of every thought and action
Hard limits on max steps and token budget
Fallback to human if the task can’t be completed

The “fully autonomous AI agent” is a goal, not today’s reality for high-stakes tasks. The sweet spot for 2026 production agents: well-defined, bounded tasks with limited tool sets and human approval gates for consequential actions.

The Agent Trend in 2026

The shift happening across the industry: AI products are moving from “chatbots that respond” to “agents that complete tasks.” Examples:

Coding agents: Cursor, Devin, GitHub Copilot Workspace — create full features from specifications
Research agents: Perplexity’s Deep Research, OpenAI’s Deep Research — 30-minute autonomous internet research sessions
Workflow agents: Anthropic Claude’s computer use, OpenAI Operator — control browsers and applications
DevOps agents: Automatically investigate alerts, roll back deployments, file PRs

Each of these is a specialized agent with a carefully scoped set of tools, not a general-purpose autonomous AI. That scoping is what makes them reliable enough to ship.