← back to blog
developer productivity

The Developers Who Trust AI Agents Most Also Interrupt Them Most

6 min read

The developers most comfortable with AI agents are simultaneously granting them more autonomy and interrupting them more often. Anthropic published research on agent autonomy patterns across millions of real Claude Code sessions, and both numbers climb in the same direction as experience accumulates — which is exactly backwards from what most people expect.

The data: new users (around 10 sessions in) interrupt roughly 5% of turns. Experienced users (750+ sessions) interrupt around 9% of turns. Over that same range, full auto-approve usage grows from about 20% to over 40%. Trust went up. Interruptions went up. If trust meant "hands off," this wouldn't make sense.

Two Skills, Not One

The intuitive model for trust in AI systems is that it scales inversely with supervision. The more you trust the agent, the less you watch it. We reason this way about trust in people — once a colleague earns your confidence, you stop reviewing all their work. The model doesn't transfer cleanly to agents.

Auto-approve rate and interrupt rate are measuring different skills, and they develop somewhat independently.

Auto-approve is pattern recognition for "this is fine" — the ability to identify a category of actions that historically complete correctly without your involvement, and let them run. Over hundreds of sessions, you accumulate evidence about which actions are safe to batch. Bash commands on files you're not worried about. Dependency installs. Test runs on isolated code. Experienced users have built a map of the safe zone.

Interrupt rate is pattern recognition for "this is going wrong" — the ability to notice, midway through a long agent run, that the trajectory has drifted from what you wanted. Not at the end, when you review the output. During. Experienced users have developed a feel for the early signals: the agent asking about a file you didn't expect it to touch, a clarification question that reveals it misunderstood the scope, an intermediate commit message that's solving the wrong thing.

A beginner with 20% auto-approve and 5% interruptions is not trusting and monitoring accordingly. They're mostly at default settings and missing drift. The low auto-approve isn't caution — it's not yet knowing what to trust. The low interrupt rate isn't good oversight — it's not yet knowing what a failing trajectory looks like before it bottoms out.

An experienced user with 40% auto-approve and 9% interruptions has learned both skills. They know which routine actions can run without hand-holding. They also know what going-wrong looks like early enough to redirect rather than recover.

The gap between 5% and 9% interrupt rate sounds small. On a turn-intensive agentic session, it means actively redirecting the agent once every eleven actions versus once every twenty. Over a day of agentic work, that's a materially different level of continuous steering.

The Distribution Nobody Tells You About

The Anthropic data shows something else worth sitting with: the median turn duration in Claude Code is roughly 45 seconds, and has barely moved in months. The 99.9th percentile nearly doubled — from under 25 minutes in late September 2025 to over 45 minutes in early January 2026.

Almost every agent interaction is still short. The vast majority are quick lookups, single-file edits, test runs, brief clarifications. The long tail — complex, multi-file, multi-hour tasks — is where the most ambitious uses of agentic coding live. And that tail doubled in under four months.

For a beginner, this distribution is mostly invisible. Most of what they experience is the 45-second median: quick, manageable, low-stakes. When long-tail tasks appear, they often run to completion or failure without meaningful midcourse intervention because the developer hasn't learned what midcourse failure looks like yet.

Experienced users treat long-running tasks differently. They don't walk away and come back. They monitor more actively — which is part of what drives the higher interrupt rate. The auto-approve trust applies to the short tail. The heightened vigilance applies to the long tail. These are two different behavioral modes applied to two different parts of the same distribution.

The skill of agentic development, in part, is learning to sort the distribution correctly.

The Agent Also Asks More Than You Think

One more finding from the research: on complex tasks, agent-initiated clarification questions occur more than twice as often as human interruptions.

The agent is asking for help more than twice as often as experienced developers are stopping it. On hard tasks, the agent is surfacing ambiguity — the developer isn't driving all the bidirectional communication.

For beginners, this asymmetry probably means agent clarifications are doing most of the oversight work. The agent asks, the developer answers, the task continues. The developer isn't reading the trajectory; they're responding to explicit prompts. That's a functional workflow, but it's not what expertise looks like.

Expertise looks like catching things the agent didn't flag — the trajectory that looks wrong even though the agent hasn't raised a question, the architectural choice that's locally reasonable but globally inconsistent, the scope that's expanding past what you actually wanted. That's what the 9% interrupt rate is capturing. Those are the interruptions that aren't the agent asking for help. They're the developer seeing something the agent missed.

What You'd Actually Track

The raw metrics most tools surface — coding time, commit count, lines generated — don't distinguish between someone developing agent fluency and someone running more hours without updating their operational model.

The signals that describe where you are on this curve are different:

Auto-approve rate over time. Are you identifying more categories of actions you trust, or has the number been flat for months? Flat for a year probably means you're not expanding the map. Rising fast without a rising interrupt rate suggests you might be granting trust you haven't earned through experience.

Interrupt rate on long-running tasks. How often are you catching problems before the agent completes? This requires intention to track — most coding analytics don't surface it at all. Even rough journaling works: sessions where you redirected mid-run versus sessions you reviewed only at the end.

Recovery rate after interruption. When you redirect, how often does the subsequent run succeed? This is the cleanest measure of whether your interruptions are genuine pattern recognition or anxiety. Experienced users interrupt purposefully. The next turn usually works.

These aren't metrics most productivity tools are built to collect. Acceptance rate and suggestion count measure the tool's output. These measure your operational skill with the tool — which is a different thing.

The Trust Is Built by You, Not by Anthropic

The Anthropic analysis notes that the growth in auto-approve rate is smooth across model releases. If autonomy were purely a function of model capability, you'd see step changes when new models ship. Instead the curve is gradual, tied to session count, not release dates.

The trust is being built by the user. Accumulated experience gradually resolves which actions are reliably handled and which warrant more attention. Each session where a long-tail task succeeds expands the map of what's safe to trust. Each session where something goes sideways and gets caught early sharpens the pattern sense for what drift looks like.

This is a learnable skill in a way most AI productivity discourse misses. The conversation is dominated by which tool to use, which model to run, whether benchmarks translate to real tasks. Those questions matter. But underneath them is a quieter capability — reading your agent's behavior well enough to trust the right things and catch the right failures — that compounds across hundreds of sessions and doesn't show up in any dashboard anyone is currently shipping.

The experienced user's 40% auto-approve / 9% interrupt profile is a behavioral fingerprint. Roughly 750 sessions to develop. Most developers aren't there yet. The question worth asking isn't "which agent tool should I use" but whether you're running sessions in a way that builds the pattern recognition, or just accumulating hours without developing the skill.

The developers who trust AI agents most are also the ones watching them most carefully. That's not a contradiction. It's what expertise with a new kind of tool actually looks like.

Written by Kevin — builder of xeve

Track your apps, coding, music, and health — all in one place.

try xeve free