developer productivity

Refactoring Fell 60% Under AI. Nobody Tracked It.

June 4, 20266 min read

GitClear tracked 211 million changed lines of code across thousands of repositories from 2020 to 2024. In 2020, around 24.1% of all changed lines involved moving or consolidating existing code — the signal of active refactoring. By 2024, that number had fallen to 9.5%. And in 2024, for the first time in the dataset's history, copy-pasted lines outnumbered refactored lines.

This shift happened as AI coding tools went from niche to mainstream. The two trends aren't unrelated.

What Moved Lines Actually Measure

In GitClear's taxonomy, a "moved" line is code relocated from one part of a codebase to another — an extracted function, a consolidated utility, a pattern pulled into a shared module. It's the signal that a developer read their existing code, recognized something worth centralizing, and did the work to improve the structure rather than just add to it.

Moved lines are how codebases stay coherent. They're the mechanism by which duplicated logic gets resolved, by which a function written in one place becomes accessible everywhere it's needed, by which a codebase's internal API improves alongside its features.

The 24% baseline from 2020 wasn't an anomaly. It reflected a workflow where writing new code and improving existing code were interleaved. Developers read their codebases constantly — to understand what already existed, to find the right place to put something, to recognize when they were about to duplicate something that already worked. That reading generated natural refactoring pressure. Not as a scheduled activity, just as a byproduct of paying attention.

What AI Does Instead

When you prompt an AI assistant to write a function, it generates the function. It doesn't scan your codebase for something similar. It doesn't know your module structure well enough to extract the right abstraction. It writes what you asked for, usually correctly, and usually without knowing what you've already written.

This is the structural problem. AI tools optimize for answering the prompt. They don't optimize for improving the system the code lives in. And because the output is fast and functional, developers rarely stop to do what they used to do naturally: read around the new code and ask whether the structure still makes sense.

The result was a slow drift that accelerated as adoption grew. Code duplication rose from 8.3% of changed lines in 2021 to 12.3% in 2024. Code churn — new code revised within two weeks of its initial commit — climbed steadily across the same period. Neither metric is catastrophic in isolation. Together they describe a codebase accumulating structural debt, quietly, underneath metrics that look fine.

The Short-Term / Long-Term Split

This is where the refactoring collapse becomes a forecasting problem rather than a code quality one.

Short-term defect rates appear to improve slightly under AI adoption. AI tools catch obvious errors at generation time, suggest type-safe patterns, surface common mistakes. The feedback loop from writing to initial testing is faster.

Six months later, the picture changes. Research on AI-generated code finds that long-term bug rates run about 12% higher than for human-written code reviewed to the same standard. The bugs aren't usually in what the AI wrote in isolation. They're in the places where AI-generated code collides with the rest of the system in ways nobody modeled at generation time — a shared assumption that turned out not to be shared, a constraint in a distant module that the generated code quietly violated.

Refactoring is the activity that surfaces those collisions proactively. When a developer consolidates two similar functions, they find the edge case each was handling differently. When they extract a pattern into a shared module, they discover it was being called with inconsistent assumptions in both places it lived. Moved lines are bug prevention. Their decline means the collisions accumulate until production finds them.

The Measurement Problem

Almost nobody tracks their refactoring ratio.

Standard productivity metrics count commits, pull requests, lines added, features shipped. None of these distinguish between "code that adds something new to the system" and "code that improves how the system is organized." A PR that consolidates three duplicated implementations into one shared function might touch more files and change more lines than a PR that ships a new feature — and it will produce zero new functionality, zero new user-facing behaviors, and might even reduce your net line count. By every metric most teams track, it looks like less was done.

This is exactly what the GitClear data captures. The collapse from 24% to 9.5% didn't show up in any productivity dashboard. Deployment frequency kept improving. PR counts went up. Cycle time fell. The moves-versus-copies ratio drifted unnoticed because nobody was watching it.

If you track your own coding time — whether through an IDE plugin, a tool like xeve, or just your commit history — look at how your commit patterns have changed since you started using AI tools heavily. The question isn't whether you're shipping more. It's what fraction of what you're shipping improves code you already wrote versus adds code you haven't written yet. That ratio tells you something your commit count doesn't.

Why This Compounds

The refactoring collapse isn't a one-time debt. It compounds.

Every week that duplication rises and refactoring falls, the codebase becomes marginally harder to understand. The harder it is to understand, the less naturally refactoring happens — even for developers who want to do it. The more unfamiliar the code feels, the more you lean on AI to generate new code rather than improving what's there. The pattern reinforces itself.

GitClear's dataset ends at 2024. If the trend continued through 2025, the numbers are worse now. Codebases that were 18 months deep into heavy AI use at the end of 2024 are now approaching three years in. The structural debt from a sustained 9.5% refactoring rate compounds across every quarter it goes unmeasured and unaddressed.

What to Do With This

The refactoring work isn't coming back on its own. AI tools are getting better at whole-codebase awareness — Cursor, Augment, and Sourcegraph all made moves toward multi-file reasoning in 2025 — but none of them yet generate the same refactoring pressure that a developer reading their own code does. That reading is still the irreplaceable input.

The developers who will have the most workable codebases in a year or two are the ones who noticed this pattern early and treated "improve existing code" as a first-class work category rather than something that happens between features. It used to happen naturally. It doesn't anymore — not because developers stopped caring, but because the workflow AI enables makes it feel optional in a way it never quite did before.

One useful habit: before opening a new file to add something, spend five minutes in the files you touched last week. Not to change anything specific. Just to read. The refactoring pressure that used to be automatic doesn't disappear if you remove the automatic part. It just needs to be deliberate instead.

Written by Kevin — builder of xeve

++related posts

developer productivity

Everyone's Running Agents. Almost Nobody Built the Loop.

6 min read

developer productivity

Copilot Has Five Models to Choose From. No Tool Tells You Which Is Right.

6 min read

Track your apps, coding, music, and health — all in one place.

try xeve free