June 10, 2026
Should Your AI Agent Get Its Own AI Agent? Decagon Thinks Yes
Your AI agent now has a colleague, and it is also an AI agent. Decagon’s new tool, Duet Autopilot, takes over the job a human team usually does on a customer-service agent, finding the faults, drafting the fixes, testing them, and passing them to a person to approve.
Decagon makes conversational AI agents for customer service and calls Autopilot “the first agent that improves customer experience agents on its own” and then proves the improvement worked.
An Agent that Works on Other Agents
Plenty of vendors already sell agents that learn from customer conversations. Zendesk and Ada, among others, market versions that get better over time. What sets this release apart is who Autopilot serves. Its user is not a customer, but another agent.
Autopilot finds faults in a live customer-service agent, drafts the edit, runs the test, and sends the result to a person for sign-off. Because Autopilot is also one of Decagon’s agents, it also runs on itself, so every correction a reviewer makes feeds back into how it works the next time around. Gartner has predicted that agentic AI will autonomously resolve 80 percent of routine customer service by 2029, and noted that the parties service teams deal with will not always be human. An agent built to serve agents fits that direction of travel.
Diagnose, Test, Approve
Autopilot reads the signals coming out of production conversations and turns them into proposed updates, from the highest-priority fixes down to small tweaks. It then checks its own work. Every proposed change runs against the original conversation that raised the problem, a set of regression tests, and a curated group of examples built around real customer types and intents. If a change fails, Autopilot keeps reworking it until it passes.
Teams set the rules up front around brand voice, writing standards and off-limits topics. Every change then arrives as a versioned update showing the issue found, the test results and the exact edits, and a person approves it before it goes live. As Alan Yiu, VP of Product at Decagon, put it, “Teams set the direction and review the work.”
This human-in-the-loop design matches a pattern across the market, where the strongest no-code agent platforms test changes against thousands of scenarios before anything reaches a customer.
A Benchmark Built for the Job
To show that Autopilot improves agents rather than just producing edits that look right, Decagon built DuetBench, “the first benchmark to measure agent self-improvement from end to end”. Existing benchmarks check whether an agent can resolve a fixed set of issues. They do not check whether an agent can improve another agent. On DuetBench, Autopilot passed 93 percent of diagnostic tasks, beating the average human score.
Decagon says it is running Autopilot with enterprise customers and design partners across financial services, retail and consumer technology, who are tracking its effect on resolution rates, escalation rates and coverage. Matt McCollum, senior manager of customer experience at Opendoor, said reviewing conversations by hand “isn’t an option” at his company’s scale, and that the tool lets his team “focus on decisions rather than digging through logs.”
So should your AI agent get its own AI agent? Finding faults and writing fixes by hand can eat a week, and handing that to a machine frees people to decide what good looks like. The answer will rest on the numbers early users report, and on how often reviewers reject what it proposes.
