← back to blog
developer productivity

AI Makes You Feel Faster. Controlled Studies Disagree.

7 min read

You almost certainly believe AI coding tools make you more productive. Most developers do — surveys consistently put self-reported productivity gains from tools like Copilot, Cursor, and Claude somewhere between 20% and 55%. The problem is that when researchers run actual controlled experiments, the numbers fall apart.

The METR Finding

Last year METR published a randomized controlled trial on how AI tools affect experienced developers working on real open-source codebases. The sample was not juniors hacking on toy projects: median 10 years of experience, contributors to repositories they had been committing to for years (averaging 1,500 commits each), on codebases averaging over a million lines of code.

The headline result: developers using AI tools took 19% longer to complete tasks than developers working without them.

Before the study, these same developers predicted AI would cut their completion time by 24%. After the study — having just used the tools and completed the tasks — they estimated AI had made them 20% faster. They were sitting inside a 19% slowdown and experiencing it as a speedup.

METR's researchers are careful not to overextend this. Early-2025 AI tools on complex, mature codebases is exactly the condition where AI underperforms. The tools are better now, and simpler tasks on greenfield codebases probably look different. But the perception gap is what matters here. Developers are not just wrong about AI's impact. They are wrong in a specific, consistent direction: they feel faster when they use AI, regardless of what the output data shows.

Why the Perception Gap Exists

The JetBrains HAX team published a paper at ICSE 2026 this April that starts to explain why. They analyzed 151 million logged IDE events from 800 developers and ran a separate survey asking professionals how AI had changed their workflows.

The core finding: AI redistributes and reshapes developer workflows in ways that often elude developers' own perceptions.

In practice this works like this. When an AI assistant autocompletes a function, you experience the immediate gratification of code appearing quickly. That moment is vivid and memorable. The slower parts — reading what the AI produced, catching the subtle off-by-one it introduced, rethinking the approach when the generated code does not quite fit the existing architecture — feel like normal development work, not like overhead that the AI created.

The 151 million events show that behavior is changing in measurable ways. But when you ask developers how AI has affected them, their answers describe a different reality — one shaped by the fast, visible wins rather than the slower, invisible costs.

This is the same mechanism that makes developers consistently underestimate their context switches. Tracked automatically, the average developer switches apps 300 to 500 times a day. When asked, they guess 30 to 50. The switch itself is fast and forgettable. The recovery cost — the 23 minutes the University of California Irvine research found it takes to fully regain focus — is invisible. You do not experience it as time lost to the switch. You experience it as ordinary work.

AI tools introduce the same kind of invisible cost. The generation is instant and satisfying. The verification, debugging, and rework are normal-feeling. Your intuition gives you credit for the first and does not charge you for the second.

The Downstream Problem

Even where AI does improve individual output, research from Faros.ai across 22,000 developers identified a different failure mode: the gains do not reach business outcomes.

High-AI-adoption teams completed 21% more tasks and merged 98% more pull requests than lower-adoption teams. But PR review time increased by 91%, and AI adoption was associated with a 9% increase in bugs per developer and a 154% increase in average PR size. Individual developers are generating more code, faster. The PRs are bigger, buggier, and slower to clear review. No significant correlation appeared between AI adoption and company-level delivery improvements.

The individual developer feels productive. The organization does not ship faster.

This is not an argument against AI tools. It is an argument about where the bottleneck has moved. Before AI, the bottleneck for many teams was writing code. After AI, for many of those same teams, the bottleneck has shifted to reviewing it — and that bottleneck is harder to automate away. If you measure productivity by how quickly you produce code, AI looks great. If you measure by how much value ships, the picture is more complicated and depends heavily on whether your team has adapted its review process to the new volume.

What to Actually Track

The reason any of this matters is that developers are making tooling decisions based on how productive they feel, not on what the data shows. When you feel faster, you recommend the tool, renew the subscription, and attribute your results to it. This is rational given the information you have. The problem is that your primary data source is your own intuition, and there is now solid evidence that intuition is systematically miscalibrated in a specific direction.

What would it actually look like to measure whether AI tools are helping you?

Track coding output, not coding activity. Time spent in your editor is not the same as value produced. Connect your coding hours to your commit and PR data — GitHub activity alongside time-in-editor gives you a ratio you can actually track over time. If your committed code per coding hour dropped after you started using Copilot, that is information worth having.

Watch your review load. If you are generating more PRs, are you also spending more time reviewing others'? Faros.ai's data suggests review is where the AI productivity gains go to die at the team level. Track both sides of that equation.

Measure focus block length. AI tools generate code quickly, but reviewing and integrating that code requires sustained attention. If your average uninterrupted coding session has gotten shorter since you adopted AI tools — more snippet generation, more context-checking, more small corrections — that matters for your actual output even if the activity level looks higher.

Run a real comparison. Take two weeks with your current AI setup and two weeks without — or find weeks in your history that approximate this. Look at commits, bug rate, PR merge time. Anecdote beats nothing, but your own historical data beats anecdote.

Most developers cannot answer any of these questions with data right now. They could track app usage automatically and see how their time actually breaks down. They could correlate GitHub commit data with coding hours. They could look at whether deep focus blocks have gotten longer or shorter since adopting new tools. The technology to do this exists. Most people just have not set it up.

The Pattern

Every significant shift in developer tooling — IDEs, version control, containerization — has produced a wave of genuine believers alongside a wave of people experiencing placebo gains. The tools that survived were the ones that genuinely improved output, not just the ones that felt like they did. AI is going through the same sorting process right now.

The METR finding is not a verdict. It is a specific result from a specific context — experienced developers, mature codebases, 2025-era tools. The JetBrains finding is not a reason to distrust AI. It is a reason to distrust your own perception of AI's impact without external data to check it against.

You cannot feel your way to an accurate productivity baseline. You need measurement. The developers who will get the most out of AI tooling long-term are the ones who track what is actually happening — output, review load, focus time, bug rate — and adjust based on data instead of vibes.

The feeling will settle. The data will not lie.

Written by Kevin — builder of xeve

Track your apps, coding, music, and health — all in one place.

try xeve free