← back to blog
developer productivity

AI Won't Hollow Out Your Skills. Your Behavior Will.

5 min read

Anthropic published a study in February that got summarized as "AI hurts developer learning." That summary is technically accurate and functionally useless. The useful finding is buried one layer deeper: the average 17% comprehension gap is not the story. The 65% versus 40% split is.

What the Study Actually Measured

Researchers at Anthropic ran a randomized controlled trial with 52 junior engineers — all experienced Python developers, none familiar with Trio, an async concurrency library. Half learned Trio with AI assistance freely available. Half learned without.

The headline: developers with AI access scored 17% lower on a comprehension test taken after the learning period, with no statistically significant productivity benefit during the tasks themselves. Standard "AI bad for learning" result.

Except that is not quite what the data says when you look at what the AI-assisted group actually did.

The AI group split into two distinct behavioral patterns. One group used AI primarily for code generation: describe a task, get code, accept it, move on. These developers finished faster, felt confident, and scored below 40% on the comprehension test. The other group used AI differently — they generated code too, but then asked follow-up questions. "Why does this work?" "What happens if I change this argument?" "What's actually going on under the hood here?" This group encountered more errors and took longer. They scored above 65%.

Same tools. Same library. Same novice baseline. Radically different outcomes, determined entirely by behavior, not by the presence of AI.

Why Delegation Is the Default

The two patterns are not equally likely. One has immediate, visible feedback. The other requires friction.

When you generate code and accept it, you feel flow. The task completes. The file looks right. Your editor is green. This is a genuinely satisfying cognitive loop: generation, completion, next task. It rewards you in every way that short-term feedback can.

When you stop to interrogate the output — understand why the type annotation works that way, trace what the async context manager is actually doing, ask what would happen on unexpected input — you are introducing friction deliberately. Tasks take longer. You feel slower. Your metrics look worse.

Most teams do not measure comprehension. They measure velocity: PRs merged, tickets closed, sprints delivered. If these are the signals, delegation is the rational choice. You are not making a mistake. You are optimizing correctly for the incentives in front of you.

The problem is that comprehension is what makes you able to debug.

The Debugging Gap

The Anthropic study broke out results by question type, and the gaps were not evenly distributed. The biggest differences appeared in debugging questions, not in "write this function" questions.

This is not a coincidence. Debugging requires a mental model of why code does what it does — not just that it works, but what mechanism it uses, what could make it fail, what subtle wrong input would produce a silent error. When you generate code without understanding it, you build none of that model. You have working code and no theory about it.

For routine tasks in a stable environment, that is fine. But software does not stay in stable environments. Libraries update. Edge cases surface. A parameter that never got unusual values starts getting them. The moment debugging is required, the developer who interrogated the AI's output will move through it in an hour. The developer who delegated will spend days, or never find it.

There is a harder version of this. As more code is written by AI, the work that developers are hired to maintain and debug is increasingly code they did not write and do not fully understand. Debugging AI-generated code you did not interrogate when it was produced is not hypothetical — it is already the job description for a large and growing share of working developers. Delegation behavior specifically fails to build the skill the job requires.

The Cohort Problem

The study's finding has a slow-moving consequence that extends beyond individual developers.

The cohort learning to code in 2024 to 2026 — shaped by AI assistance as the default — becomes mid-level engineers from 2027 to 2029 and seniors from 2029 to 2032. If most of that cohort spent its formative years in delegation mode, what arrives at senior level has learned to orchestrate AI generation fluently but has not built the debugging and mental-model depth that senior work requires.

This is not panic material. It is a structural signal. Organizations have assumed that juniors who spend five years struggling through errors naturally develop into engineers who can reason about complex systems. That development is not automatic when the errors are being resolved for you before you encounter them.

The gap does not announce itself early. It shows up when the edge case arrives, when the production incident requires someone to reason from first principles about code nobody fully understands.

What to Watch In Your Own Work

There is no existing metric that tracks delegation versus interrogation directly. Your editor does not know if you accepted a completion thoughtlessly or spent ten minutes understanding why it works.

But the proxies are visible if you track time.

The ratio between generation and review matters. If you spend roughly equal time generating code and thinking about what it does, you are probably building some mental model. If your AI sessions are rapid generation followed by immediate acceptance, the behavior is probably delegation.

How long it takes you to debug code you did not write is a ground-truth signal. If your debugging sessions on AI-generated code are consistently much longer than on code you wrote, you have built a comprehension gap somewhere. Looking at past sessions — which tasks took far longer than expected, which debugging chains stalled without resolution — will usually show the pattern.

The distinction matters most in the first few years of working with an unfamiliar system. Once a strong mental model is in place, accepting more AI output without interrogating every line is reasonable — you have the theory to evaluate what the code is doing. But building that initial model through pure delegation means building it on a foundation that does not hold under debugging conditions.

The Neutral Tool That Is Not Neutral In Practice

Across all the debate about whether AI tools improve or hurt productivity, the Anthropic study points to something more fundamental: the tools themselves are neutral, but the behavior they make easy is not.

Delegation is the path of least resistance. Interrogation requires deliberately choosing the slower, more effortful path — asking a question you could skip, building a model you might not need this week, spending time on comprehension that no metric will credit.

One pattern builds the kind of developer who can debug systems under pressure. The other does not.

The tools are the same in both cases. The only variable is whether you stop to ask why.

Written by Kevin — builder of xeve

Track your apps, coding, music, and health — all in one place.

try xeve free