What is an AI Agent
From Tool to Autonomous Executor
You've probably used ChatGPT or Claude — type a question, get an answer. That's the most basic way to use an LLM: human asks, model answers, one turn at a time.
But what if you want AI to complete a real task? Something like:
"Find the cause of this bug, locate the relevant code, write a fix, and run the tests to make sure nothing broke."
That's not a single question — it's a series of steps requiring autonomous decision-making. The model needs to figure out what to do, which tools to use, and what to do next based on results.
That's an Agent — an AI system that can autonomously plan and execute tasks.
The Core Agent Loop
An agent works in a loop:
Perceive → Reason → Act → Observe → Reason → Act → ... → Done
Let's walk through a concrete example. You tell a coding agent: "Find all unused dependencies in the project and remove them."
- Reason: I need to check what dependencies are in package.json
- Act: Read package.json
- Observe: The file lists 15 dependencies
- Reason: I need to search the code for references to each dependency
- Act: Search src/ for "lodash" references
- Observe: No references found
- Reason: lodash is unused, continue checking the next one...
- ...(multiple rounds)
- Act: Run
npm uninstall lodash momentto remove unused dependencies - Act: Run tests to confirm nothing broke
The key insight: every decision is made by the model itself. You didn't write code telling it "first read the file, then search" — it planned that workflow on its own.
Agent vs Chatbot vs Workflow
These three concepts are often confused, but they're fundamentally different:
Chatbot
User: What's wrong with this code?
AI: There's a null pointer exception. I'd suggest adding a null check.
User: Can you fix it?
AI: Here's the modified code...
Humans drive every step. Without your input, it does nothing.
Workflow
Trigger: New Pull Request received
Step 1: Run code style check → Pass
Step 2: Run unit tests → Pass
Step 3: Generate code review comments
Step 4: Send notification
A predefined, fixed process. What happens at each step is hardcoded — no dynamic decision-making involved.
Agent
User: Fix this CI failure for me
Agent thinks: Let me look at the CI error logs
Agent acts: Fetch CI logs → finds test failure
Agent thinks: Let me check the failing test case
Agent acts: Read test file → API response format changed
Agent thinks: Need to check the API changes
Agent acts: View recent API commits
Agent thinks: Got it — API response added a nesting level, tests need updating
Agent acts: Modify test code
Agent acts: Re-run tests → all passing
The AI decides each step. It dynamically adjusts based on what it observes, rather than following a pre-written script.
The core distinction:
| Chatbot | Workflow | Agent | |
|---|---|---|---|
| Who decides | Human | Predefined code | AI model |
| Flexibility | High (but needs human) | Low (fixed process) | High (autonomous) |
| Task type | Single-turn Q&A | Fixed-process tasks | Open-ended complex tasks |
| Predictability | Depends on model | High | Low |
What Makes Agents Possible
Agents didn't appear out of nowhere. They depend on several capabilities maturing:
1. Tool Use
LLMs can only generate text by themselves. But if you tell them "you can call these tools," they can:
- Read and write files
- Execute code
- Search the web
- Call APIs
- Query databases
Tools transform LLMs from "can only talk" to "can actually do." This is the foundational capability for agents — we'll cover it in detail in the next chapter.
2. Strong Enough Reasoning
An agent needs the model to:
- Break complex tasks into sub-steps
- Adjust plans based on tool results
- Know when a task is complete
- Recognize and correct its own mistakes
This requires strong reasoning ability, which is why agents only became practical with GPT-4-class models. Smaller models tend to lose their way in multi-step reasoning.
3. Long Enough Context Windows
A complex task might involve a dozen tool calls, each producing results. All that intermediate information needs to stay in context so the model can make correct next-step decisions.
Early models had 4K-8K context windows, severely limiting what agents could do. Today's 100K-1M windows let agents handle far more complex tasks.
Real-World Agents
Agents are not a concept — they're real products:
Coding Agents
- Claude Code: Autonomously reads code, writes code, runs tests, and fixes bugs in the terminal
- Cursor / Windsurf: Understands entire projects in the IDE, autonomously makes code changes
- GitHub Copilot Agent: Automatically handles Issues and submits Pull Requests
General-Purpose Agents
- Computer Use: Directly operates computer screens — clicking, typing, taking screenshots, using software like a human
- Deep Research: Given a research topic, autonomously searches, reads, synthesizes information, and produces a full report
Specialized Agents
- Customer Service Agents: Understand user issues, query knowledge bases, escalate to humans when needed
- Data Analysis Agents: Given an analysis request, autonomously write SQL, generate charts, and draw conclusions
What these products share: you provide a goal, and the agent figures out how to achieve it — no step-by-step guidance needed.
Agent Limitations
Agents are powerful but far from perfect:
Unpredictability: The same task may lead to completely different execution paths. This is problematic for scenarios requiring consistency.
Error Accumulation: Each step can go wrong, and more steps mean higher overall failure probability. A 10-step task with 95% accuracy per step yields only ~60% overall success rate.
Cost: Agents make multiple model calls and tool invocations, consuming far more tokens than a single conversation turn. A complex task might cost several dollars or more.
Safety: Letting AI execute actions autonomously means it might do things you didn't expect — deleting files, sending requests, modifying data. Controlling the agent's permission boundaries is a critical engineering challenge.
Key Takeaways
- Agent = LLM + Tools + Autonomous Decision Loop. It doesn't just answer questions — it plans and executes tasks on its own.
- The fundamental difference between Agents and Chatbots/Workflows is who makes decisions. Agents use AI models for dynamic decision-making, not humans or predefined code.
- Three key technologies make agents viable: tool use, strong reasoning, and long context windows.
- Agents are already deployed in coding, research, customer service, and more, but still face challenges in predictability, error accumulation, cost, and safety.
- This is a directional shift in AI applications — from "AI as a tool" to "AI as an executor." Understanding agents is key to understanding the next generation of AI applications.