Building Your First Agent
From Theory to Practice
In previous chapters, we covered the core concepts — reasoning loops, tool use, architecture patterns, memory management. Now let's put it all together and build a working agent from scratch.
We'll build a file analysis agent: give it a directory path, and it will autonomously browse files, analyze code structure, and answer your questions about the project.
Minimal Agent Loop Implementation
The core of an agent is a while loop. Here's the minimal implementation with Python + Anthropic SDK:
import anthropic
import json
import os
client = anthropic.Anthropic()
# Define tools
tools = [
{
"name": "list_directory",
"description": "List files and subdirectories in a directory",
"input_schema": {
"type": "object",
"properties": {
"path": {"type": "string", "description": "Directory path"}
},
"required": ["path"]
}
},
{
"name": "read_file",
"description": "Read file contents. Works with text files like code, config files, docs, etc.",
"input_schema": {
"type": "object",
"properties": {
"path": {"type": "string", "description": "File path"},
"limit": {"type": "integer", "description": "Max lines to read, default 100"}
},
"required": ["path"]
}
},
{
"name": "search_files",
"description": "Search for files containing specified text in a directory",
"input_schema": {
"type": "object",
"properties": {
"directory": {"type": "string", "description": "Directory to search"},
"pattern": {"type": "string", "description": "Text pattern to search for"}
},
"required": ["directory", "pattern"]
}
}
]
# Tool execution
def execute_tool(name, params):
if name == "list_directory":
path = params["path"]
try:
entries = os.listdir(path)
return json.dumps(entries[:50]) # Limit count
except Exception as e:
return json.dumps({"error": str(e)})
elif name == "read_file":
try:
with open(params["path"], "r") as f:
lines = f.readlines()[:params.get("limit", 100)]
return "".join(lines)
except Exception as e:
return json.dumps({"error": str(e)})
elif name == "search_files":
results = []
for root, dirs, files in os.walk(params["directory"]):
for file in files:
filepath = os.path.join(root, file)
try:
with open(filepath, "r") as f:
for i, line in enumerate(f, 1):
if params["pattern"] in line:
results.append(f"{filepath}:{i}: {line.strip()}")
except:
continue
return json.dumps(results[:20]) # Limit results
# Agent main loop
def run_agent(user_message):
messages = [{"role": "user", "content": user_message}]
system = "You are a file analysis assistant. Use the provided tools to browse and analyze project files, answering the user's questions."
max_iterations = 20
iteration = 0
while iteration < max_iterations:
iteration += 1
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
system=system,
tools=tools,
messages=messages
)
# Add model's response to message history
messages.append({"role": "assistant", "content": response.content})
# If model stops calling tools, task is done
if response.stop_reason == "end_turn":
for block in response.content:
if hasattr(block, "text"):
return block.text
return "Task complete."
# Handle tool calls
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = execute_tool(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result
})
messages.append({"role": "user", "content": tool_results})
return "Maximum iterations reached. Task incomplete."
# Usage
answer = run_agent("Analyze the project structure of ./my-project and tell me what kind of project this is")
print(answer)
This ~100 lines of code is a complete agent. The core is the while loop — continuously letting the model think and act until it decides to give a final answer or hits the iteration limit.
Key Design Decisions
Tool Granularity
The example above has three tools: list directory, read file, search. This granularity is intentional:
- Too few (a single "analyze_project" tool): the model loses flexibility, can't explore on demand
- Too many (20 fine-grained tools): the model picks wrong tools, context gets filled with tool definitions
- Just right (3-8): the model can clearly distinguish each tool's purpose
A practical rule: start with 3-5 core tools, then adjust based on actual model performance.
Tool Descriptions
Compare these two descriptions:
❌ "Read a file"
✅ "Read file contents. Works with text files like code, config files, docs, etc."
Good descriptions tell the model three things: what it does, when to use it, when not to use it. Agent-facing tool descriptions need to be more specific than human-facing ones.
Error Handling
Notice that every tool execution in the code above is wrapped in try-catch, returning clear error messages. This is because:
- Agents will encounter errors (path doesn't exist, insufficient permissions, file too large)
- Clear error messages enable self-correction
- If tools throw unhandled exceptions that crash the program, the agent loop breaks
Safety Considerations
Letting AI execute actions autonomously makes safety the top concern.
Principle of Least Privilege
Only give the agent capabilities it needs — nothing more. The example above can only read files and search — it can't write files, execute commands, or access the network.
# Dangerous: giving the agent arbitrary command execution
def execute_command(cmd):
return os.popen(cmd).read() # Never do this
# Safe: read-only operations within a specific directory
def read_file(path):
# Verify path is within allowed directory
allowed_dir = "/home/user/projects"
real_path = os.path.realpath(path)
if not real_path.startswith(allowed_dir):
return {"error": "Path is outside allowed directory"}
# ... read file
Path Validation
Agents might construct unexpected paths — like ../../etc/passwd. Always validate that paths are within allowed boundaries.
Operation Confirmation
For operations with side effects (writing files, sending requests, deleting data), add a human confirmation layer:
def execute_tool_with_confirmation(name, params):
if name in ["write_file", "delete_file", "send_request"]:
print(f"Agent wants to execute: {name}({params})")
confirm = input("Allow? (y/n): ")
if confirm != "y":
return {"error": "User denied this operation"}
return execute_tool(name, params)
Resource Limits
Prevent agents from running out of control:
- Maximum iterations:
max_iterations = 20in the code above - Token budget: Limit maximum token consumption per task
- Timeouts: Tasks running too long should be terminated
- File size limits: Avoid reading huge files that blow up the context
Debugging and Evaluation
Agents are harder to debug than regular LLM calls — behavior is non-deterministic, and the same input might take completely different paths.
Logging Is Your Most Important Tool
Record every agent step: what it thought, what tools it called, what results it got.
# Add logging to the agent loop
for block in response.content:
if hasattr(block, "text"):
print(f"[Thought] {block.text}")
if block.type == "tool_use":
print(f"[Tool] {block.name}({json.dumps(block.input)})")
Evaluation Criteria
How do you know if the agent is working correctly?
- Task completion rate: Given a set of test tasks, what percentage are completed successfully?
- Step efficiency: How many steps did it take? Were there redundant steps?
- Error recovery: Did it successfully recover when encountering errors?
- Cost: How many tokens were consumed per task?
Don't aim for 100% success rate. The value of an agent is handling most situations autonomously, with humans as the fallback for the rest.
Popular Agent Frameworks
We built an agent from scratch to understand the principles. In real projects, you'll likely use existing frameworks:
Anthropic Agent SDK: Official from Anthropic, lightweight, provides the agent loop, tool registration, multi-agent orchestration (handoffs), and other foundational capabilities. Best suited for scenarios using Claude models directly.
LangGraph: Part of the LangChain ecosystem, uses graphs to define agent workflows. Suited for scenarios requiring complex flow control. Flexible but has a steep learning curve.
CrewAI: Focuses on multi-agent collaboration, organizing agent teams with "roles" and "tasks." Good for simulating team collaboration scenarios.
Mastra: A TypeScript-ecosystem agent framework with built-in workflow engine, RAG, evaluation, and more. Good fit for Node.js stack projects.
The principle for choosing a framework is the same as choosing a model: understand your needs first, then pick the tool. If your scenario can be implemented in 100 lines of code, you don't need a framework.
Key Takeaways
- An agent's core is a while loop — call the model, execute tools, feed results back, repeat until done. ~100 lines of code can implement a working agent.
- Tool design determines the agent's ceiling: 3-8 core tools, clear descriptions, meaningful error messages.
- Safety is the top priority: least privilege, path validation, operation confirmation, resource limits. Autonomous AI execution ≠ unrestricted AI execution.
- Logging and evaluation are essential: record every step, measure agent quality by task completion rate and step efficiency.
- Frameworks are optional — understand the principles first, then choose a framework. Simple scenarios don't need one.