Creating AI agents used to take months of complex coding and optimization. AgentKit has simplified this process into a matter of hours. Developers often struggle to manage models, APIs, workflows, safety filters, and UI setups. The process becomes so complex that building a simple bot feels like creating a small operating system.
AgentKit follows a straightforward principle: AI agents should be simple to build. The platform offers a detailed toolkit that helps you create, test, and deploy AI agents on a single platform. The real-world results prove its effectiveness. Klarna's support agent now handles two-thirds of all tickets. Their product, legal, and engineering teams line up through the visual canvas that reduced iteration cycles by 70%.
This piece covers everything about AgentKit. We will guide you through its core components and help you build your first agent. You will learn to utilize this efficient end-to-end development pipeline that makes AI agent creation as simple as it should be.
What is AgentKit and Why It Matters
OpenAI showed AgentKit at DevDay 2025. This solution tackles one of AI development's biggest headaches: developers need too many scattered tools to build working agents. Developers had spent months piecing together different frameworks. They duct-taped LangChain, ReAct, LlamaIndex, and many GitHub repositories just to create basic AI agents.
A quick overview of AgentKit
AgentKit serves as a detailed toolkit to develop agentic systems. It offers a complete set of tools to create, test, and deploy AI agents on one unified platform. The toolkit has four main parts:
- Agent Builder: A visual editor that lets developers create and version multi-agent workflows with a drag-and-drop interface
- ChatKit: A customizable chat interface that embeds directly into applications
- Guardrails: Built-in safety and moderation tools to protect against collateral damage
- Evals: A sophisticated system to test and improve agent performance
This unified approach removes the need to juggle multiple tools, handle missing config files, or tackle complex setups. Developers can design, integrate, and fine-tune their agents in one ecosystem.
Why developers are excited about it
AgentKit's ability to speed up development has created a buzz. Ramp, a fintech company, says AgentKit "transformed what once took months of complex coordination, custom code, and manual optimizations into just a couple of hours". The visual canvas helps product, legal, and engineering teams work together and cuts iteration cycles by 70%.
LY Corporation, a leading Japanese tech company, built a work assistant agent in under two hours with Agent Builder. This marks a huge improvement from older methods that took months.
"Agent Builder allowed us to coordinate agents in a whole new way," LY Corporation said, "with engineers and subject matter experts collaborating all in one interface".
Canva also saved two weeks of development time. They built a support agent for their developer community using ChatKit and integrated it in less than an hour. Companies can now launch AI solutions much faster than before.
How it simplifies agent development
Building reliable agents used to mean wrestling with scattered tools. Teams spent time on complex coordination without proper versioning. They created custom connectors, set up manual evaluation pipelines, tuned prompts, and worked weeks on frontend before launch.
AgentKit optimizes these tasks through a better workflow:
- Design: Agent Builder's visual canvas helps create workflows quickly, like Canva does for design
- Test: The built-in Evals system measures performance right away
- Deploy: ChatKit helps add the agent to applications quickly
The platform's standard approach means developers don't start from scratch on each project. Agent Builder's drag-and-drop feature makes it available to people with basic coding skills, which helps spread AI agent development.
OpenAI CEO Sam Altman highlighted this simplicity during DevDay: "This is all the stuff that we wished we had when we were trying to build our first agents". An OpenAI engineer proved how easy it was by building an AI workflow with two agents live on stage in eight minutes.
AgentKit revolutionizes AI agent development by making it available, efficient, and reliable. It brings scattered tools together into one solid platform and removes many roadblocks that made agent development hard and time-consuming.
Understanding the Core Components of AgentKit
Let's analyze how AgentKit works through its four core components that handle everything from visual design to performance testing.
Agent Builder: Visual workflow creation
Agent Builder is the heart of the AgentKit experience—a visual, drag-and-drop canvas where developers can map out an agent's logic easily. OpenAI's CEO describes it as "Canva for building agents". Developers can design multi-step workflows in this visual environment by connecting different components, adding various tools, and setting up custom rules that control agent behavior.
The canvas works with both simple linear tasks and complex systems where agents work together. It comes with versioning and inline reviews. Developers can:
- Configure nodes like Agent, Note, File Search, and User Approval
- Preview runs to test workflows before deployment
- Publish completed agents for immediate use
- Start from blank canvases or use prebuilt templates for common use cases
This visual approach has made a significant difference. Ramp noted that Agent Builder "transformed what once took months of complex orchestration, custom code, and manual optimizations into just a couple of hours". LY Corporation built their work assistant agent in under two hours, which sped up their development significantly.
ChatKit: Embeddable chat interface
ChatKit provides the user-facing component after you set up your agent's logic. This accessible, embeddable chat interface connects users with your agent. The toolkit handles complex frontend work that usually takes weeks to develop, including streaming responses, thread management, and brand-consistent UI elements.
ChatKit serves as your AI's "face" while Agent Builder creates the "brain". Users can upload files and images, see the agent's chain-of-thought reasoning, and watch live tool usage. This gives users clear insight into how the agent processes their requests.
Setting up requires creating a ChatKit session, configuring a backend endpoint, and adding the chat interface to your application. The toolkit offers two integration options: OpenAI's recommended hosted backend or an advanced integration using the ChatKit Python SDK on your infrastructure.
Guardrails: Built-in safety and moderation
AgentKit has Guardrails—an open-source, modular safety layer that protects against misuse or collateral damage. These safety mechanisms merge directly into your agent workflows through the Agent Builder interface.
Each Guardrail has three parts: an input channel (the data being confirmed), a logic unit (the rule set reviewing the input), and output channels with "Pass" or "Fail" branches that determine next steps. Agents can handle sensitive situations properly without developer intervention.
The system protects against personally identifiable information (PII), jailbreak attempts, harmful content, and hallucinations. Developers can add these protection nodes before sensitive components like connectors or agents with external access, which creates effective checkpoints throughout the workflow.
Evals: Performance testing and grading
AgentKit has a complete evaluation framework called Evals that helps build reliable, production-ready agents through testing. Evals offers four key capabilities:
Developers can create datasets to test agent responses systematically. The framework supports trace grading to review end-to-end workflows and find problems. It optimizes prompts automatically based on human annotations and grader outputs. Developers can also test models from other providers within the OpenAI platform.
These evaluation tools make a real difference—one company found that "the evaluation platform cut development time on their multi-agent due diligence framework by over 50%, and increased agent accuracy 30%".
These four components create a complete system that manages the entire agent development lifecycle—from visual creation and UI implementation to safety controls and performance optimization.
Step-by-Step: Building Your First Agent with Agent Builder
AgentKit's capabilities are clear, so let's create an actual agent using the visual canvas. Tasks that previously took months of complex orchestration now take just a few hours. Companies like LY Corporation have built functional work assistant agents in less than two hours.
Start with a blank canvas or template
Your first step is to open Agent Builder from the OpenAI platform dashboard. You'll need to make your first choice: start fresh or use a template.
Templates give newcomers a solid starting point. The platform comes with pre-built workflows for simple use cases like Customer Service, Q&A Agents, or Data Enrichment. These examples work right out of the box and you can adapt them to your needs.
You might want to pick "Blank workflow" if you have specific requirements or want total control. Next, give your workflow a name that shows what it does—like "Research Summarizer" or "GitHub Issue Assistant".
The workspace appears with a Start node on the canvas. This node is where users begin interacting with your agent and lets you set up input variables for user text.
Add nodes for logic, tools, and memory
The canvas is ready for you to add functionality through different node types. Agent Builder workflows use four main categories of nodes:
- Core Nodes: These are the foundations of every workflow:
- Start Node: Takes user input and converts it to usable text
- Agent Node: Defines the model, instructions, and tools
- Note Node: Adds comments or explanations (doesn't affect functionality)
- Tool Nodes: Connect your agent to external tools and data:
- File Search: Makes shared semantic search across vector stores
- Guardrails: Adds safety checks to block harmful content or PII
- MCP (Model Context Protocol): Connects to third-party services
- Logic Nodes: Control the flow and decision-making:
- If/Else: Adds conditional branching based on specific criteria
- While: Creates loops for repetitive tasks
- User Approval: Allows human review before proceeding
- Data Nodes: Handle information storage and transformation:
- Transform: Changes data structure or type
- Set State: Stores values as global variables for later use
Adding a node is straightforward - drag it from the sidebar to your canvas. A double-click on any node opens its parameters for customization. The Agent node needs a name, behavior instructions, and enabled tools.
Use drag-and-drop to connect your workflow
The next step is connecting your nodes into a working flow. Picture it as a flowchart where information moves from left to right between nodes. Each connection shows how data flows through your agent's decision process.
A simple research agent might flow like this: Start → Guardrail → Agent → Tool → Output. The Guardrail checks if user inputs are safe, the Agent processes queries, the Tool gets information, and the Output shows results.
Connections are made by clicking and dragging from one node's output to another node's input. Nodes like Guardrails can have different paths for "Pass" and "Fail" conditions, so your workflow handles various scenarios.
The Preview button lets you test your workflow. A chat window opens where you can interact with your agent and see it work live. You'll see exactly how information moves through each node—which helps with debugging and making improvements.
This visual development approach has helped companies reduce their iteration cycles by 70%. Teams now deploy agents in weeks instead of months.
Embedding Your Agent into an App Using ChatKit
Your agent workflow is ready in Agent Builder. The next vital step brings it to your users. ChatKit makes this possible with an embeddable chat interface that connects users to your agent without weeks of frontend development.
How ChatKit works
ChatKit acts as the user-facing part of AgentKit. It handles complex frontend tasks that usually take extensive development time. The system manages streaming responses, thread history, message input, and visual elements that create a conversational experience.
ChatKit's core is an embeddable UI component. It creates a connection between users and your Agent Builder workflow. The system takes care of chat threads, streaming responses, message history, user input, and visual elements needed for smooth conversations. Agent Builder creates the "brain" while ChatKit provides the "face" of your AI assistant.
ChatKit's strength lies in its simplicity. You won't need to build a chat interface from scratch. There's no need to manage websockets for streaming, implement token-by-token rendering, or create message threading logic. This saves weeks of frontend development time while delivering a professional, responsive chat experience.
Setting up a session and client secret
The ChatKit setup follows a three-step process that keeps security intact while enabling uninterrupted communication:
- Create a server endpoint: Set up a backend route that talks to OpenAI's ChatKit Sessions API. This endpoint creates sessions and generates client tokens for your frontend.
from fastapi import FastAPI
from openai import OpenAI
import os
app = FastAPI()
openai = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
@app.post("/api/chatkit/session")
def create_chatkit_session():
session = openai.chatkit.sessions.create({
# Pass your workflow ID here
})
return {"client_secret": session.client_secret}
- Install ChatKit libraries: Install the ChatKit React components in your project directory:
npm install @openai/chatkit-react
- Add the ChatKit script: Add the ChatKit JavaScript to your application's HTML:
<script src="https://cdn.platform.openai.com/deployments/chatkit/chatkit.js" async></script>
This security model keeps your OpenAI API key safe on your server. It provides short-lived client tokens to the browser. ChatKit uses these tokens to establish secure connections to OpenAI's backend without exposing your credentials.
Customizing the chat UI for your brand
ChatKit provides many ways to match your brand identity. You can switch between light and dark modes to match your site's theme.
ChatKit lets you adjust:
- Primary accent colors for buttons and links
- Corner radius for UI elements
- Spacing density between components
- Header buttons for navigation or actions
- Starter prompts to guide user interaction
You have two options for integration:
- Recommended integration: OpenAI hosts and scales the backend from Agent Builder. You customize the frontend appearance. This needs minimal setup but requires a development server.
- Advanced integration: Run ChatKit on your infrastructure using the ChatKit Python SDK. This gives full control but needs more setup.
Canva found that there was a quick integration process when building their developer support agent. They merged ChatKit "in less than an hour" and saved "over two weeks of time".
The final step renders ChatKit in your application:
import { ChatKit, useChatKit } from '@openai/chatkit-react';
export function MyChat() {
const { control } = useChatKit({
api: {
async getClientSecret(existing) {
// Fetch client secret from your endpoint
const res = await fetch('/api/chatkit/session', {
method: 'POST',
});
const { client_secret } = await res.json();
return client_secret;
},
},
});
return <ChatKit control={control} className="h-[600px] w-[320px]" />;
}
This approach helps you focus on your agent's intelligence. ChatKit handles the chat infrastructure beautifully.
Testing and Improving Agent Performance with Evals
Building and integrating your agent is just the first step. The real challenge lies in making sure it works reliably. AgentKit's Evals component offers great tools that help you assess and refine your agent's performance.
Creating datasets and test cases
Simple previews won't cut it for agents ready for production. AgentKit's Evals lets you build complete test datasets that act as your agent's training ground. Here's how to create good test cases:
Start by heading to the Evals section to create a new dataset. Add different test scenarios and their expected results. Customer service agents need test queries about return policies that include timeframes and conditions. You should also add edge cases like unreasonable refund requests where the agent should know to escalate the issue.
These datasets become your quality control system and help spot problems before users run into them. Think of it as a controlled space where you can put your agent through hundreds of different situations.
Using graders to evaluate responses
After setting up test cases, AgentKit lets you create custom graders to check how well your agent performs. These graders look at:
- The agent's use of correct internal tools
- Its logical reasoning process
- How well it makes use of available user information
The system runs these graders on thousands of interactions and gives you evidence-based insights into how your agent behaves. This ongoing cycle of testing and analysis helps improve quality with real measurements instead of gut feelings.
Automating prompt optimization
AgentKit's most impressive feature is its automated prompt optimization based on test results. The system can suggest better prompts based on human feedback and grader results, instead of you spending countless hours tweaking them manually.
The results speak for themselves. One company cut their multi-agent framework development time in half while their agent accuracy jumped by 30%. The system also lets you test third-party models on the same platform, so you can compare different providers without building separate test setups.
Advanced Features for Scaling and Customization
AgentKit provides powerful capabilities that go beyond simple toolkit components. These features enable enterprise-level deployment and customization to expand your agent's functionality.
Using Connector Registry and MCPs
The Connector Registry serves as a central control hub for enterprises that manage data across multiple workspaces. This unified admin panel combines all connections, which include pre-built integrations like Dropbox, Google Drive, SharePoint, and Microsoft Teams. The system also supports Model Context Protocol (MCP) servers that let agents connect with external data sources and tools through a standardized interface. Your organization can simplify governance while keeping security controls over sensitive information through this centralized approach.
Exporting workflows to code with AgentKit SDK
AgentKit lets you export your perfected visual workflow as Python or TypeScript code through the AgentKit SDK. This code-first library gives you complete control to define agents, tools, handoffs, and guardrails. You can access the generated code by going to Code → Agent's SDK → Python/TypeScript
. Developers can use this feature to customize their agents beyond the visual interface's capabilities.
Custom tool calls and third-party model support
AgentKit's Reinforcement Fine-Tuning (RFT) feature helps models use the right tools at the right time. RFT is currently available for o4-mini and exists in private beta for GPT-5. You can customize model behavior through feedback loops. The evaluation framework supports testing non-OpenAI models and helps you compare performance across different providers effectively.
Conclusion
AgentKit marks a major step forward in AI agent development. This piece shows how this complete toolkit reshapes the scene of building AI agents into a natural, available process. Developers can now create working agents in hours instead of spending months putting together various tools and writing custom code.
The four main components blend naturally to solve common challenges. Agent Builder gives you a visual canvas where workflows come together through user-friendly drag-and-drop design. ChatKit takes care of user-facing elements without requiring weeks of frontend development. Guardrails protect against misuse and collateral damage. Evals will give your agents reliable performance through systematic testing.
Companies have seen amazing results already. Klarna created a support agent that handles two-thirds of all tickets. Other organizations cut their iteration cycles by 70%. These wins show how AgentKit helps technical teams of all sizes create AI agents.
AgentKit gives you the flexibility and power you need for your project, whether you're building your first agent or scaling up to enterprise level. You don't need to piece together different tools and frameworks anymore. Building a small operating system just to run one bot is now as simple as it should be.
Start with a template or blank canvas today. Your production-ready AI agent is just a few hours away, not months of complex work.