developer productivity

Coding Agents Speed Up Projects That Haven't Used AI Before

July 4, 20266 min read

If you've been using Copilot or Cursor for a year and you're adding Claude Code expecting a second round of speedups, the data says you're more likely to get the quality costs without the velocity gains.

That's the headline finding from a study presented at MSR 2026 in April — one of the first peer-reviewed causal analyses of coding agent adoption in real projects. The researchers tracked agent adoption across 400+ open-source repositories using staggered difference-in-differences methodology, defining "adoption" as the first agent-generated pull request, and measured monthly outcomes across both velocity and quality dimensions. The methodology is careful enough that you shouldn't explain the findings away easily.

The velocity result: large, front-loaded gains when the agent is the first observable AI tool in a project. For repositories that already had AI IDE usage, adding an agent produced minimal or short-lived throughput increases.

The quality result: persistent degradation across all settings. Static analysis warnings up 18%. Cognitive complexity up 39%. It didn't matter whether the team was agent-first or had stacked an agent onto existing AI tools. The complexity cost accrued regardless of where the velocity was.

Why Velocity Saturates

The mechanism behind this finding makes intuitive sense once you see the data.

The first AI coding tool in a workflow accelerates the categories of work that were previously pure friction: boilerplate, mechanical refactors, test scaffolding, documentation, the first draft of anything repetitive. Those gains are real and they're immediate. Before AI assistance, a developer writing a CRUD endpoint was writing a CRUD endpoint. After, they're reviewing and adjusting one. That's a compression with a visible ceiling.

When you add a second tool — an autonomous agent on top of an IDE assistant — you're not entering a fresh domain of untapped acceleration. The AI-compressible work was already being compressed. What remains is either tasks that require judgment and context that neither tool handles well, or tasks where the first tool already captured most of the available gain.

The MSR study operationalizes this as "diminishing returns to AI assistance." The phrase is accurate but undersells the asymmetry. The velocity returns diminish. The quality costs don't.

What Cognitive Complexity Actually Measures

Cognitive complexity, as defined in the academic software engineering literature and implemented in tools like SonarQube, measures how hard code is to understand — not how long it is, but how many mental jumps a reader has to make to follow the logic. Nesting depth, branching conditions, unusual control flow, interactions between distant parts of the same function all contribute.

A 39% increase means AI-generated and AI-revised code is structurally harder to reason about than the code it replaced, even when it passes tests and functions correctly. The code works. Reading it is harder work. Extending it is higher-risk.

This matters in a specific way for projects where AI tools are stacked. When both your IDE assistant and your coding agent have contributed to a codebase over months, the cognitive complexity has accumulated from both sources. The agent-generated commits add their portion. The AI-IDE-assisted commits add theirs. The study found 18% more static analysis warnings — the tool-detectable surface of this problem — but cognitive complexity is the harder cost to quantify and the easier one to live in without noticing.

The signal shows up when you try to add a feature to something you wrote six months ago with heavy AI assistance. The code doesn't feel like yours. The logic paths require reconstruction rather than recollection. You trace through it the way you'd trace through code a contractor left. It's not a documentation failure. It's a cognitive complexity problem: the structure of the code doesn't reflect how a person would have organized their thinking.

The Realistic Population Reading This

Almost no one using AI coding tools today is in the "agent as first AI tool" category. Copilot launched at scale in 2022. Cursor became mainstream in 2023. By the time Claude Code, Windsurf, or Antigravity 2.0 entered developer workflows, the base of "projects with no prior AI tool exposure" had shrunk significantly.

That means most of the developers who switched to or added an agent in 2025 or 2026 were not in the high-velocity-gain group the MSR study identified. They were in the minimal-throughput-increase group — which is also the persistent-quality-degradation group.

The benchmark demonstrations that drove adoption looked compelling in part because they were often demoed on fresh codebases or specific tasks, not on mature projects already carrying a year of AI-IDE-assisted history. The front-loading effect means the demos were accurate for the "first AI tool" case. That's not the case most developers were actually in.

What This Doesn't Mean

None of this says agents are the wrong tool. The velocity gain is real for the projects where it applies — agent-first codebases see measurable throughput increases, and for specific classes of work (environment setup, test coverage, boilerplate, single-purpose scripts), agents outperform IDE assistants by enough to justify the stacking cost.

The finding is more specific: the expectation that stacking AI tools produces compounding velocity is wrong. The first tool captures most of the available speed. The second tool does different work, and measuring it with the same "how fast am I generating output" frame misses what the second tool is actually doing — which is more about task delegation than task acceleration.

The 39% cognitive complexity increase is the thing worth worrying about, not because it makes the code nonfunctional, but because it has a compounding maintenance cost that doesn't show up in sprint velocity. Code that's harder to understand gets touched less often, extended more carefully, and sometimes avoided entirely. The team routes around it. Features accumulate adjacent to the complex area rather than inside it. The codebase develops load-bearing code that nobody fully understands — not because it's undocumented but because the structure makes it hard to hold in mind.

The Measurement That Would Tell You If It's Happening

There's a proxy signal for cognitive complexity accumulation in your own codebase: how long do you spend reading code before writing any for a given task?

If you track active time at the session level — which application was in focus, for how long, before a commit appeared in version control — the ratio of reading time to writing time on files you haven't modified recently is a rough proxy for the cognitive overhead those files carry. Rising ratios on AI-assisted areas relative to areas you've maintained manually is the ground-level signal the MSR study's aggregate numbers describe.

Static analysis warnings are easier to track directly. Most CI pipelines already run them. A month-over-month trend on warning counts, broken out by files touched predominantly by AI-assisted work versus files maintained primarily by hand, would tell you whether the 18% figure from the study is tracking your specific codebase. Some codebases will see less degradation, some more. The important thing is checking rather than assuming the benchmark numbers don't apply to your project.

The study's authors call for "quality safeguards, provenance tracking, and selective deployment of autonomous agents." That's the institutional recommendation. The individual-developer version is simpler: if you're adding an agent to a project that already runs on AI IDE assistance, don't measure the addition by how fast you're generating code. Measure it by whether the areas the agent touches become harder to work in over the following two months.

That's the question the velocity number doesn't answer. The MSR study suggests it's the right one to ask.

Written by Kevin — builder of xeve

++related posts

developer productivity

A Quarter of Codex Requests Are Now a Full Day of Work

6 min read

developer productivity

The People Most Replaced by AI Are the Least Worried

6 min read

Track your apps, coding, music, and health — all in one place.

try xeve free