AI Agents: From Chatbots to Autonomous Colleagues
A Chatbot talks. An Agent acts. How to build ReAct loops (Reason + Act) that can browse the web, query databases, and book flights autonomously.
The Evolution of the Interface
For the last 40 years, the fundamental paradigm of Human-Computer Interaction (HCI) has been “Command and Control.” You click a button, the computer does exactly one thing. You type a command, the computer executes exactly one function. Even with the rise of GenAI and ChatGPT, the paradigm remained largely conversational. You ask a question, the machine retrieves an answer. It is a fancy search engine.
AI Agents represent a shift from “Chat” to “Action.” An Agent doesn’t just know things. It does things. It is the difference between a Wikipedia that can answer “How do I book a flight?” and a Concierge that says “I have booked your flight.”
Why Maison Code Discusses This
We believe the “Website” as we know it is dying. In 5 years, users won’t click through 10 pages to find a product. They will tell their Personal Agent: “Buy me a blue shirt.” We build Agent-Ready APIs. We ensure our clients’ data is structured so that AI Agents (Google Gemini, OpenAI, Siri) can read and transact with it. If your site is not Agent-Ready, you are invisible to the biggest consumer of the next decade: The AI.
1. The ReAct Pattern: Anatomy of an Agent
The breakthrough paper “ReAct: Synergizing Reasoning and Acting in Language Models” (Yao et al., 2022) changed everything. Before ReAct, LLMs were just text predictors. ReAct gave them a “Inner Monologue” and a set of “Hands” (Tools).
The loop looks like this:
- Thought: The Agent analyzes the user’s request and plans a step. (“The user wants the weather in Paris. I need to find the current temperature.”)
- Action: The Agent selects a tool from its toolbox. (“I will call
weather_api.get('Paris').”) - Observation: The Agent reads the output of the tool. (“The API returned 20°C.”)
- Reasoning: The Agent updates its understanding. (“Okay, I have the temperature. Now I can answer the user.”)
- Final Answer: The Agent responds. (“It is 20°C in Paris.”)
This loop allows the agent to solve multi-step problems that it was never explicitly trained on.
2. Code Implementation (LangChain / TypeScript)
Here is how you implement a basic ReAct agent using LangChain and OpenAI functions.
import { ChatOpenAI } from "@langchain/openai";
import { AgentExecutor, createOpenAIFunctionsAgent } from "langchain/agents";
import { Pull } from "langchain/hub";
import { z } from "zod";
import { tool } from "@langchain/core/tools";
// 1. Define the Tools
// The "Hands" of the agent. Be very descriptive in the 'description' field,
// as the LLM uses this to decide when to call the tool.
const searchTool = tool(
async ({ query }) => {
console.log(`Searching for: ${query}`);
// Simulate a Google Search
return "Paris weather is 20 degrees Celsius with light rain.";
},
{
name: "search_web",
description: "Search the internet for current events and data.",
schema: z.object({
query: z.string().describe("The search query"),
}),
}
);
const emailTool = tool(
async ({ to, body }) => {
console.log(`Sending email to ${to}`);
// Simulate SMTP
return "Email sent successfully.";
},
{
name: "send_email",
description: "Send an email to a user.",
schema: z.object({
to: z.string().email(),
body: z.string(),
}),
}
);
const tools = [searchTool, emailTool];
// 2. Initialize the Brain (LLM)
const model = new ChatOpenAI({
modelName: "gpt-4-turbo",
temperature: 0, // Keep it deterministic for actions
});
// 3. Create the Agent
// We pull a standard ReAct prompt from the LangChain hub
const prompt = await Pull.pull("hwchase17/openai-functions-agent");
const agent = await createOpenAIFunctionsAgent({
llm: model,
tools,
prompt,
});
const agentExecutor = new AgentExecutor({
agent,
tools,
});
// 4. Execution
// User Request: "Find out the weather in Paris and email it to boss@company.com"
const result = await agentExecutor.invoke({
input: "Find out the weather in Paris and email it to boss@company.com",
});
console.log(result.output);
Walkthrough of Execution:
- Input: “Find out the weather…”
- LLM Decision: The LLM sees it doesn’t know the weather. It looks at
tools. It seessearch_web. It decides to call it. - Tool Output: “Paris weather is 20 degrees…”
- LLM Decision: It now has the info. But the request also said “email it”. It looks at
tools. It seessend_email. - Action: It constructs a JSON payload
{"to": "boss@company.com", "body": "The weather is 20C"}. - Tool Output: “Email sent.”
- Final Answer: “I have sent the email.”
This automates the entire workflow. The developer wrote 0 lines of “Weather to Email” logic. The AI figured it out.
3. The Danger Zone: Infinite Loops and Hallucinations
Giving an AI “Hands” is dangerous. What if it gets stuck in a loop?
- Thought: I need to buy a ticket.
- Action: Buy Ticket.
- Observation: Error: Credit Card Declined.
- Thought: I should try again.
- Action: Buy Ticket.
- Observation: Error…
- (Repeat 1000 times).
Guardrails are mandatory.
- Max Iterations: Hard limit the ReAct loop to 5 or 10 steps. If it isn’t solved by then, abort.
- Human in the Loop: For sensitive actions (Buying, Deleting, Emailing), force the Agent to ask for confirmation.
- Agent: “I am about to send this email. Proceed? [Y/N]”
- Zod Schema Validation: Force the tool inputs to match rigorous types. If the LLM generates a string where a number is required, throw a validation error before the tool runs.
4. The Memory Problem (Vector State)
Standard LLMs have a Short Term Memory (Context Window). If you talk to an agent for 1 hour, it forgets the start of the conversation. To build a truly useful “Colleague”, it needs Long Term Memory. We use Vector Databases (Pinecone, Milvus) to store “Memories”.
- Action: Agent stores summary of meeting in DB.
- Retrieval: Next week, when you ask “What did we discuss about Project X?”, the Agent queries the Vector DB, retrieves the relevant chunk, and injects it into the context. This is RAG (Retrieval Augmented Generation) applied to Agent State.
5. Case Study: The Autonomous Customer Support Agent
At Maison Code, we deployed an L2 Support Agent for a high-volume Shopify merchant. The Problem: 40% of tickets were “Where is my order?” (WISMO). Human agents spent 5 minutes per ticket:
- Read email.
- Copy Order ID.
- Open Shopify.
- Check status.
- Open Courier Dashboard (FedEx).
- Check tracking.
- Write email.
The Agent Solution: We built an Agent with three tools:
shopify_lookup_order(id)fedex_track_package(tracking_number)gmail_reply(text)
The Result:
- Zero Touch Resolution: The Agent autonomously resolved 85% of WISMO tickets in < 30 seconds.
- 24/7 Availability: Customers got answers at 3 AM.
- Cost: $0.05 per ticket vs $2.50 for a human agent.
However, we encountered edge cases. One customer asked “Where is my order?” but hadn’t placed one yet. The Agent hallucinated an Order ID. We fixed this by adding a “Verify User Identity” step before any lookups.
6. Multi-Agent Systems (LangGraph / CrewAI)
Single agents are powerful. Teams of agents are revolutionary. LangGraph allows you to orchestrate multiple agents with different “Personas”.
- Researcher Agent: Has
google_search. Scours the web for data. - Writer Agent: Has
markdown_formatter. Takes the research and writes a blog post. - Editor Agent: Has
critique_tool. Reviews the post and rejects it if it’s too short.
You create a graph:
Researcher -> Writer -> Editor -> (Pass) -> Publish.
Researcher -> Writer -> Editor -> (Reject) -> Writer.
This mimics a real human team workflow. The “Editor” keeps the “Writer” in check.
7. The Cost of Autonomy (Token Economics)
Agents are expensive. A single simple request might trigger 10 internal LLM calls (Thought -> Action -> Thought -> Action). If you use GPT-4, this costs $0.30 per run. Optimization Strategy:
- Router Model: Use a cheap model (GPT-3.5) for the routing logic (“Which tool do I use?”).
- Solver Model: Use an expensive model (GPT-4) for the complex generation (“Write the email”).
- Caching: Cache the results of expensive tool calls (e.g., SQL queries). Economics will dictate adoption.
8. The Future: 2026 and Beyond
We are moving away from “Prompt Engineering” (talking to the bot) to “Flow Engineering” (designing the graph of agents). In 2026, software will not be a set of static buttons. It will be a “Goal Oriented Operating System.” You will tell your computer: “Plan a trip to Japan under $5,000,” and it will act. It will browse Expedia. It will check your Calendar. It will negotiate via email. We are building the Universal Interface.
9. Conclusion
Building Agents is the most exciting engineering challenge of our decade. It requires a mix of Hard Engineering (API reliability, caching, types) and Soft Psychology (Prompting, reasoning loops). Start small. Give your agent one tool. Watch it work. Then give it another. Soon, you won’t be writing software. You’ll be managing it.
Need an autonomous workforce?
We build secure, deterministic AI Agent fleets to automate Operations, Support, and Sales.