← back to blog
developer productivity

AI Sped You Up 20%. It Actually Slowed You Down 19%.

6 min read

The hardest productivity problem is not distraction or meetings or context switching. It is that humans are genuinely bad at assessing their own cognitive performance — and the AI coding boom has made this gap wider than it has ever been.

In mid-2025, METR published a controlled study measuring the actual productivity impact of AI tools on 16 experienced open-source developers. These were not beginners experimenting with Copilot for the first time. The participants averaged five years on their respective repositories, roughly 1,500 commits each, and were working on mature codebases averaging over a million lines of code. Before the study, they predicted AI would speed them up by 24%. After completing tasks with and without AI assistance, they reported feeling 20% faster. The actual measured result: they were 19% slower.

The perception gap was roughly 39 percentage points. They thought they got faster. They got slower. And they had no idea.

Why Experts Get Hit Hardest

It is tempting to read the METR result as evidence that AI tools are net negative. That is probably not the right conclusion. The researchers themselves believe developers are more sped up today than during the early-2025 study, as models have improved significantly in the months since. The newer 2026 iteration expanded to 57 developers across 143 repos — early signs suggest a better picture than the original numbers.

But the finding reveals something important about who benefits from AI assistance and under what conditions.

Experts who have spent years on a single codebase carry dense, hard-won mental models. They know which abstractions are fragile. They know where the state mutations happen, which functions have hidden side effects, which "simple" changes cascade into failures in adjacent modules. When they write code, they are navigating a three-dimensional map built over thousands of hours.

AI suggestions interrupt that navigation. Less than 44% of generated completions in the METR study were accepted. The rest were reviewed, tested, sometimes partially integrated, then discarded. For a developer who already knows what they need to write, evaluating AI output costs more time than the output saves.

For a less experienced developer working in an unfamiliar codebase, the equation flips. The baseline mental model is thinner, so AI suggestions land on more fertile ground. There is less to interrupt and more to fill in.

This is not intuitive. We assume more skill means more ability to extract value from sophisticated tools. For AI coding assistance, the opposite can be true — at least on complex, familiar codebases with high stakes for correctness.

The Perception Problem Is Structural

JetBrains published related research in April 2026, combining IDE log data from AI users with self-reported interviews. Their finding echoed METR's: developers perceived their productivity as increasing with AI assistance even when behavioral data said otherwise. AI, they found, redistributes and reshapes workflows in ways that often elude developers' own perceptions.

This is the structural problem. AI assistance changes the texture of work — tasks feel smoother, blocks feel shorter, cognitive friction is reduced in the moment. These are real effects. But smoother is not the same as faster, and faster is not the same as more output.

The subjective experience of using AI well is confidence. The code comes quickly, suggestions arrive before you finish typing, and the cursor moves through the file at a pace that feels productive. That feeling is not nothing. Reduced friction has real value. But it does not tell you whether you shipped more, debugged better, or wrote code that required fewer revisions downstream.

You cannot feel a 19% slowdown. It does not announce itself.

Memory Is Worse Than You Think

Most developers measure AI's impact the way they measure everything else about their work: by feel. They notice that repetitive tasks feel easier. They remember the time a completion wrote an entire test suite. They forget the hours spent reviewing unhelpful suggestions on the genuinely hard parts.

Memory is selective in exactly the wrong direction. We remember wins vividly and discount the friction cost of everything we discarded.

The METR researchers made this point explicitly: developers' estimates of their own productivity improvement before the study were plus 24%. Their estimates after completing the tasks, having experienced the sessions directly, were still plus 20%. The perception did not update even in the presence of completed work they could reflect on. Subjective assessment is not just imprecise — it is systematically biased toward confirming expectations.

We expect AI to help, so we perceive it helping. The stopwatch disagrees.

Measuring It Properly

The only way around this is tracking actual output alongside actual time. Not surveying yourself at the end of the week. Not checking your commit count on Friday afternoon. Automatic, continuous measurement: how many hours did you spend in coding tools this period versus last, how many commits shipped, how many PRs went out, how much net code changed.

When you have that data, you can run the comparison METR ran. Look at a month before you started using Copilot or Cursor heavily versus a month after. Look at weeks when your AI tool was unavailable versus weeks when it was front and center. The ratio of coding hours to shipped output will tell you more than your gut ever will.

The number will be different for you than for the METR participants. Your codebase, your experience level, your specific tools are different. Maybe AI is helping you 30%. Maybe it is costing you 15%. Both are plausible depending on the work. But the only way to know is to look.

Track your time across every tool — not just your editor, but your terminal, your browser, AI chat interfaces, documentation. System-level tracking captures all of it automatically without requiring you to start and stop timers. Connect that to your GitHub activity and you have the full picture: time invested on one side, output on the other. Correlate them over weeks. Watch whether the ratio improves.

If it is improving, AI is earning its place in your workflow. If it is flat or degrading, something is slipping — review overhead, more iteration cycles, downstream bugs that require revisiting.

This Is Not Anti-AI

AI coding tools are getting better fast. The METR team's read on 2026 is more optimistic than their 2025 data warranted. Models that produce useful completions on complex, mature codebases are a harder problem than greenfield assistance, and that capability gap is closing.

But "improving fast" is not the same as "definitely helping you right now." These are separate questions requiring separate evidence.

The developers in the METR study were not using AI carelessly. They were experienced, thoughtful engineers using current tools on real work. And they were 19% slower while believing they were faster.

The goal is not to feel productive. It is to be productive. Measure your work — not because the data will necessarily be bad, but because you cannot close a gap you cannot see, and you cannot see it by feel alone.

Written by Kevin — builder of xeve

Track your apps, coding, music, and health — all in one place.

try xeve free