developer productivity

The Bill from Your AI Sprint Arrives Three Weeks Later

June 21, 20265 min read

Faros's 2026 engineering report tracked two years of telemetry across 22,000 developers and found that code churn — the ratio of lines deleted to lines added for merged pull requests — increased 861% as teams moved to high AI adoption. That's not a quality warning or a failure rate. It's a second shift: the engineering time spent in the weeks after a sprint removing the code the sprint shipped.

The first shift is visible. You write code, open PRs, get reviews, merge. Every productivity dashboard captures this. The second shift happens three weeks later, in a PR nobody connects to the original, removing code that turned out to be wrong in a way the review didn't catch. Same engineering time. Different column in the spreadsheet — or more accurately, no column at all.

What Code Churn Actually Measures

Code churn, as Faros defines it, is distinct from what you delete before committing. It's not the exploratory typing that gets thrown away in a session. It's code that made it through your quality process — review, tests, merge — and then needed to come back out.

At 861% increase, teams with high AI adoption are removing nearly ten times as much recently-merged code as they did before AI tools became dominant. If your baseline churn ratio was 10% (ten lines deleted for every hundred added), you're now running somewhere near 97%. Every hundred lines you added last sprint, you'll spend this sprint removing nearly all of them.

That calculation is rough — the starting ratio matters — but the direction is clear. More code is making it through review and then coming back for removal at a rate that didn't exist two years ago.

Why This Is Different from the Review Bottleneck

The code review problem and the churn problem are related but distinct.

The review bottleneck is synchronous: PRs sit in queue, take longer to get picked up, take more reviewer time per line. That delay is real and expensive, but it happens before merge. The code either gets in or it doesn't.

Churn is what happens after merge. It's code that passed review. Code that a developer, possibly a senior one, looked at and approved. Code that your test suite ran against. Code in production, doing whatever it was supposed to do. And three weeks later, it's coming out.

The failure mode here isn't sloppy review. AI-generated code is often superficially convincing — idiomatic, well-named, consistent with surrounding style. The problems that cause churn are structural: a function that solves the stated requirement while violating an architectural constraint the model didn't know about, an integration that works under expected conditions and fails under a different one that only emerged in production, a refactor that was correct in isolation and broke something two layers up.

Review catches some of this. The 861% churn figure is what doesn't get caught.

The Metric That Doesn't Connect the Events

Here's the specific measurement failure: standard engineering analytics see two separate events with no connection between them.

PR #412 adds 200 lines. It merges. It ships. Your productivity dashboard logs 200 lines added, one PR closed, one task complete.

Three weeks later, PR #431 removes 180 of those 200 lines. Your dashboard logs 180 lines removed, one PR closed, another task complete.

Two PRs. Two productivity events. Neither dashboard connects them. The net contribution of those 380 lines of activity was 20 lines of durable code. But the metric saw 380 lines moved and two tasks closed.

This is the measurement gap. Code churn is invisible at the metric level precisely because the events are separated by time and context. Nobody is tracking "how much of what merged in sprint N was removed in sprint N+1." If they were, the 861% figure would be showing up in every team's weekly standup.

It isn't, because nobody built the query.

The Solo Builder Version

If you're building solo, the second shift is particularly hard to escape or attribute elsewhere.

On a team, code churn gets distributed. Developer A generates the code in week one. Developer B writes the removal PR in week three. Both PRs look like normal work. Neither developer may even know the code they're removing came from an AI session three weeks earlier.

When you're the only developer, you're both. You write the AI-assisted code in week one. You come back three weeks later and realize it was the wrong approach, the integration broke something you didn't account for, or the architecture can't support what you added without scaffolding that makes the whole thing wrong.

The second-shift PR is just another PR. It doesn't feel like a cost accounting event. It feels like maintenance. But the time you spent writing and reviewing the original code, plus the time spent removing it, adds up to time with zero net contribution to the codebase. You're back to where you started.

At 861% churn, this isn't the occasional wrong turn. It's a structurally larger fraction of your engineering week than it used to be.

What You'd See If You Tracked It

Churn is measurable at the session level if you're tracking the right things.

The signal is return visits that reduce file size. When you open a file you committed to heavily two or three weeks ago, and your session ends with fewer lines than when you opened it, that's churn. It doesn't show up in your editor's "lines written" count. It doesn't show up in commit history as connected to the original session. But it's visible as a timestamp, a file path, and a line delta that's negative.

Across a month of sessions, the ratio of additions to files you haven't recently touched versus reductions to files you worked heavily a few weeks ago is a rough proxy for your personal churn rate. If your AI-heavy sprints are followed by net-negative sessions on the same files two to three weeks later, you're seeing the second shift in your own data.

Most developers don't have this picture. Session-level tracking that connects file activity across time isn't what editor plugins report — they report activity per session. The connection between session N and session N+k on the same files requires tracking across the full history.

We built session logging in xeve partly because editor metrics miss this. Coding sessions at the OS level have timestamps, durations, and — through version control — net file-level impact. Sessions that look like removal work after an AI-heavy week are the second shift made visible.

The Half the Throughput Numbers Don't Show

Faros's same report found 66% more epics completed per developer and 34% higher task completion under high AI adoption. Those numbers are real. AI is accelerating certain categories of output.

But epics completed in sprint N doesn't tell you how much of what those epics shipped is still in the codebase by sprint N+3. Code churn at 861% means a significant fraction won't be. The throughput went up. The durability of that throughput went down. The net contribution — durable code that shipped and stayed — is lower than either number alone suggests.

This is the accounting problem. Sprint metrics track what was shipped. They don't track whether it lasted. Those are different questions, and they're getting more different as AI makes generation faster and churn higher.

The first shift is getting faster. The bill arrives in the second.

Written by Kevin — builder of xeve

++related posts

developer productivity

30 Days to Migrate, Zero Days in Any Productivity Study

6 min read

developer productivity

Kiro Makes Specs Mandatory. Mandatory Isn't Enough.

6 min read

Track your apps, coding, music, and health — all in one place.

try xeve free