Agent Building

Build autonomous AI agents and pipelines

AI agents are autonomous systems that perceive their environment, reason about goals, and take actions using tools to accomplish tasks. Modern agent architectures combine large language models with function calling, memory systems, and orchestration frameworks to enable multi-step problem solving. Understanding agent design patterns, from simple tool-using loops to sophisticated multi-agent systems, is essential for building reliable, production-grade AI applications.

Key Formulas

\text{ReAct Pattern: Thought → Action → Observation → Thought (repeat until task complete)}

\text{Agent Loop: while (!task\_complete) \{ reason(); select\_tool(); execute(); observe(); update\_state(); \}}

\text{Tool Selection Score = similarity(prompt, tool\_description) × tool\_success\_history}

\text{Memory Retrieval: relevant\_memories = vector\_search(query\_embedding, memory\_store, top\_k)}

\text{Planning Decomposition: task → subtasks → actions → verify → aggregate results}

\text{Error Recovery: retry\_with\_backoff() OR alternative\_tool() OR escalate\_to\_human()}

\text{Multi-Agent Communication: agent\_i.send(message, recipient=agent\_j) → agent\_j.receive() → process()}

Key Concepts

Agentic Loops and the ReAct Pattern

The core of modern AI agents is the agentic loop-a cycle of reasoning, acting, and observing. The ReAct (Reasoning + Acting) pattern interleaves thoughts (internal reasoning) with actions (tool calls) and observations (tool outputs). This pattern enables agents to break down complex tasks, make decisions dynamically, and recover from errors. The loop continues until the agent determines the task is complete or reaches a maximum iteration limit. Key considerations include preventing infinite loops, managing token budgets, and ensuring the agent has sufficient context to make good decisions at each step.

Function Calling and Tool Use

Function calling allows LLMs to invoke external tools through structured outputs. Modern APIs (OpenAI, Anthropic, Google) support native function calling where the model outputs a function name and arguments in a structured format. Tools are defined with JSON schemas specifying their name, description, and parameter types. The agent runtime parses the function call, executes it, and returns the result as an observation. Best practices include writing clear tool descriptions, validating inputs before execution, handling errors gracefully, and limiting the number of available tools to prevent choice paralysis. Tools can be anything from simple calculators to complex API integrations, database queries, or file system operations.

Multi-Agent Orchestration

Multi-agent systems coordinate multiple specialized agents to solve complex problems. Common patterns include: (1) Hierarchical-supervisor agent delegates tasks to worker agents; (2) Sequential-agents process in a pipeline where each agent's output feeds the next; (3) Parallel-multiple agents work independently on subtasks, results are aggregated; (4) Debate-agents with different perspectives argue to reach consensus. Frameworks like LangChain's LangGraph, CrewAI, AutoGen, and Anthropic's Claude Agent SDK provide abstractions for building these systems. Key challenges include managing inter-agent communication, handling conflicting outputs, and ensuring the overall system remains debuggable. Each agent should have a clear role and expertise domain.

Memory Systems: Short-term and Long-term

Agents require memory to maintain context across conversations and tasks. Short-term memory (working memory) holds recent conversation history and is typically managed through the message array passed to the LLM. Long-term memory persists information across sessions using vector databases (Pinecone, Chroma, pgvector) for semantic search. Memory architectures include: (1) Episodic-records of past experiences and actions; (2) Semantic-facts and knowledge extracted from interactions; (3) Procedural-learned skills and successful action sequences. Effective memory systems balance retrieval relevance with token efficiency, using techniques like summarization, chunking, and relevance scoring to surface only the most useful context.

Planning and Task Decomposition

Planning enables agents to reason about future actions before executing them. Approaches include: (1) Plan-and-Execute-first generate a complete plan, then execute step by step; (2) ReAct-style interleaved-plan one step at a time based on observations; (3) Tree-of-Thought-explore multiple reasoning paths and select the best; (4) Self-Reflection-after execution, evaluate results and potentially revise the plan. Planning is especially important for complex, multi-step tasks where naive execution might lead to dead ends. The agent should be able to replan when unexpected observations occur. Planning can also involve decomposing high-level goals into concrete, actionable subtasks.

RAG Agents and Knowledge Integration

RAG (Retrieval-Augmented Generation) agents combine information retrieval with agentic capabilities. Unlike basic RAG which does a single retrieval, RAG agents can iteratively search, evaluate results, and retrieve more information as needed. Key components: (1) Query rewriting-transform user queries for better retrieval; (2) Multi-hop retrieval-follow references to find related documents; (3) Source verification-cross-check information across multiple sources; (4) Citation tracking-maintain provenance for generated claims. Vector databases store document embeddings, and the agent retrieves relevant chunks based on semantic similarity. Advanced RAG agents use tools like web search, database queries, and API calls to supplement their knowledge base.

Error Recovery and Robustness

Production agents must handle failures gracefully. Error sources include: tool execution failures, API rate limits, model hallucinations, and unexpected user inputs. Recovery strategies: (1) Retry with exponential backoff-transient failures often resolve; (2) Alternative tools-have backup tools for critical functions; (3) Graceful degradation-complete partial work when full completion isn't possible; (4) Human escalation-detect when the agent is stuck and request human intervention; (5) Self-correction-include reflection steps where the agent critiques its own outputs. Agents should log all actions and errors for debugging. Maximum iteration limits prevent infinite loops. Input validation and output parsing robustness prevent malformed data from crashing the system.

Agent Frameworks and SDKs (2025–2026)

The 2025–2026 ecosystem offers multiple frameworks for building agents. LangChain + LangGraph provides a flexible graph-based orchestration system with streaming and persistence. CrewAI offers a high-level abstraction for role-based multi-agent teams. Anthropic's Claude Agent SDK provides native integration with Claude's tool use capabilities. OpenAI's Assistants API handles conversation state and file attachments. Google's Vertex AI Agent Builder integrates with Google's ecosystem. Key considerations when choosing: (1) Model flexibility-can you use multiple LLM providers? (2) Tool integration-how easy is it to add custom tools? (3) State management-is conversation state handled automatically? (4) Streaming-can you stream intermediate steps? (5) Production readiness-does it handle scaling, monitoring, and error recovery? Frameworks significantly reduce development time but may constrain flexibility.

Solved Examples

Problem 1:

Design a customer support agent that can look up order information, process refunds, and escalate to humans. Define the necessary tools and the agentic loop.

Solution:

Step 1: Define the tools the agent needs.
- lookup_order(order_id: str) → Returns order details, status, items, and history.
- process_refund(order_id: str, amount: float, reason: str) → Initiates refund, returns confirmation or error.
- create_ticket(user_id: str, issue_summary: str, priority: str) → Creates escalation ticket, returns ticket ID.
- check_inventory(product_id: str) → Checks if replacement item is available.

Step 2: Define the agent loop (ReAct pattern).
1. Receive user query with authentication context (user_id).
2. THOUGHT: Analyze the query to determine intent and required information.
3. ACTION: Call appropriate tool(s) based on intent.
4. OBSERVATION: Receive tool output.
5. THOUGHT: Evaluate if the result satisfies the user's need or if more actions are needed.
6. If task complete: Respond to user with summary.
7. If stuck after 3 iterations: Escalate to human using create_ticket.

Step 3: Add guardrails.
- Require order_id verification before processing refunds.
- Limit refund amount without supervisor approval (e.g., $500 threshold).
- Log all actions for audit trail.
- Maximum 10 tool calls per conversation to prevent runaway loops.

Problem 2:

A research agent needs to answer questions by searching the web and synthesizing findings. Implement a multi-hop retrieval pattern where the agent can follow citations.

Solution:

Step 1: Define the tools.
- web_search(query: str) → Returns top search results with URLs and snippets.
- fetch_page(url: str) → Returns full text content of a webpage.
- extract_citations(text: str) → Returns URLs referenced in the text.

Step 2: Implement the multi-hop retrieval loop.
1. Initial query: User asks 'What are the latest advances in quantum error correction?'
2. ACTION: web_search('latest advances quantum error correction 2025')
3. OBSERVATION: Results include academic papers, news articles, and blog posts.
4. THOUGHT: The top result from Nature looks most authoritative. Let me read it.
5. ACTION: fetch_page(nature_article_url)
6. OBSERVATION: Article discusses new surface code implementations and cites relevant papers.
7. THOUGHT: This paper cites the original surface code paper-I should read that for context.
8. ACTION: extract_citations(current_text) → Returns URLs including the foundational paper.
9. ACTION: fetch_page(foundational_paper_url)
10. OBSERVATION: Explains the theoretical basis for surface codes.
11. THOUGHT: Now I have enough context to synthesize a comprehensive answer.
12. OUTPUT: Summarize findings with citations to both sources.

Step 3: Add stopping conditions.
- Maximum 5 hops to prevent infinite citation chasing.
- Track visited URLs to avoid re-fetching.
- Use embedding similarity to assess when gathered information is sufficient.

Problem 3:

Design a multi-agent code review system where one agent writes code, another reviews it, and a third runs tests. Implement the orchestration pattern.

Solution:

Step 1: Define agent roles.
- Coder Agent: Receives specifications, generates code, can iterate based on feedback.
- Reviewer Agent: Analyzes code for bugs, security issues, and style violations.
- Tester Agent: Runs test suites, reports failures, suggests fixes.
- Orchestrator Agent: Coordinates the workflow, tracks progress, makes final decisions.

Step 2: Define the orchestration workflow.
1. Orchestrator receives task: 'Implement a user authentication module.'
2. Orchestrator creates a plan: code → review → test → fix → review → test.
3. Orchestrator sends specification to Coder.
4. Coder generates code and returns to Orchestrator.
5. Orchestrator sends code to Reviewer.
6. Reviewer analyzes and returns: 'Found potential SQL injection in line 45, missing input validation.'
7. Orchestrator sends issues to Coder with instructions to fix.
8. Coder fixes issues and returns updated code.
9. Orchestrator sends code to Tester.
10. Tester runs tests and returns: 'All tests pass except edge case for empty passwords.'
11. Orchestrator sends test failure to Coder.
12. Coder fixes edge case.
13. Orchestrator re-runs review and test cycle.
14. When all checks pass, Orchestrator outputs final code.

Step 3: Communication protocol.
- Use structured messages with fields: sender, recipient, content, status.
- Maintain shared state: current code version, pending issues, iteration count.
- Set maximum iterations (e.g., 3 review cycles) to prevent endless loops.

Problem 4:

Implement a memory system for an agent that helps users plan meals. The system should remember dietary preferences, past meals, and shopping lists across sessions.

Solution:

Step 1: Design the memory architecture.
- Short-term memory: Current conversation history (last 20 messages).
- Long-term memory: Vector database storing user preferences and history.

Step 2: Define memory schemas.
- DietaryPreference: { user_id, preference_type, value, timestamp }
Types: 'allergies', 'dislikes', 'favorites', 'dietary_restrictions'
- MealHistory: { user_id, meal_name, date, rating, notes }
- ShoppingList: { user_id, items: [{ item, quantity, purchased }], created_date }

Step 3: Implement the memory tool.
- store_memory(user_id: str, memory_type: str, content: dict) → Confirms storage.
- retrieve_memories(user_id: str, query: str, memory_types: list) → Returns relevant memories.
- update_preference(user_id: str, preference_type: str, old_value: str, new_value: str) → Updates stored preference.

Step 4: Integration into agent loop.
1. User says: 'I'm allergic to shellfish. Plan dinner for tonight.'
2. BEFORE reasoning: Retrieve memories for user.
- retrieve_memories(user_id, 'dinner planning', ['DietaryPreference', 'MealHistory'])
3. OBSERVATION: Found preferences: allergic to shellfish, likes Italian cuisine, last 3 meals were pasta dishes.
4. THOUGHT: User is allergic to shellfish and has had pasta recently. I should suggest something different but still Italian-adjacent.
5. ACTION: Suggest risotto with chicken and vegetables.
6. AFTER response: Store this interaction.
- store_memory(user_id, 'MealHistory', { meal_name: 'risotto', date: today })

Step 5: Embedding and retrieval.
- Use a text embedding model (e.g., text-embedding-3-small) to embed memories.
- Store embeddings in vector database with metadata.
- Retrieve using semantic similarity to current query.
- Include memory age in relevance scoring (recent memories weighted higher).

Tips & Tricks

Always set a maximum iteration limit for agent loops-without it, an agent can get stuck in an infinite reasoning cycle, consuming tokens indefinitely. A good starting point is 10–20 iterations depending on task complexity.
Write tool descriptions as if explaining to a human colleague-the LLM relies entirely on these descriptions to decide when and how to use each tool. Include preconditions, parameter constraints, and examples of good inputs.
Implement structured output parsing for tool calls-don't rely on regex to extract function names from free text. Use native function calling APIs or JSON-mode to guarantee parseable outputs.
Log every agent action, thought, and observation with timestamps. This audit trail is invaluable for debugging why an agent made a particular decision and for improving prompts.
When using RAG with agents, let the agent control retrieval rather than doing a single upfront fetch. The agent can decide when it has enough information or when it needs to search differently.
For multi-agent systems, keep each agent focused on a narrow domain of expertise. A 'master of all trades' agent often performs worse than several specialized agents that collaborate.

Ready to practice?

Test your understanding with questions and get instant feedback.

Start Exercise →