← back to blog
developer productivity

You Don't Trust It. You Commit It Anyway.

5 min read

SonarSource surveyed 1,100 developers and published a specific number in January that has not gotten enough attention: 96% of developers do not fully trust that AI-generated code is functionally correct. Only 48% always verify it before committing.

Do the arithmetic. More than half of developers are regularly committing code they themselves do not trust, without checking it. AI now accounts for 42% of all committed code. By 2027, SonarSource estimates that figure rises to 65%.

The verification gap is not a theory about what might go wrong at scale. It is a description of what is already happening, at scale, on your team.

Why the Gap Exists

The obvious answer is speed. AI generates code fast, and the whole point is to move faster. Stopping to verify feels like friction on the very tool you adopted to reduce friction.

But there is a subtler reason buried in the same survey: 38% of developers say that reviewing AI-generated code requires more effort than reviewing code written by a human colleague.

That number explains the behavior. When verification takes longer than the generation phase — when checking is harder than writing — the rational response under time pressure is to skip it. You generated the function in 12 seconds. Verifying it properly takes four minutes. If you are in a sprint and the function looks plausible, the commit happens.

The speed of generation creates a psychological credit. The code appeared almost instantly, which makes it feel low-cost. The review cost is invisible until you decide to do it, and even then it doesn't feel proportionate to something that took 12 seconds to produce. But the generation speed is the model's speed. The verification cost is yours — it lands entirely in your working hours, your cognitive load, your accountability when it breaks in production.

Generation and verification are decoupled. That decoupling is doing something specific to developer behavior at the commit stage.

What You're Actually Shipping

The math on unverified AI code in production is worth making explicit.

If 42% of committed code is AI-generated, and 52% of developers don't always verify before committing, you can't calculate an exact number without knowing the distributions. But the lower bound is not small. Some fraction of the AI-generated code in any given codebase was committed by someone who did not fully check it. In a team of ten developers using AI tools daily, this is not an edge case.

The PR review process is supposed to catch what the original author missed. The problem is that the reviewer is in the same position. They are also reviewing code that was AI-generated, and they know it. CodeRabbit's analysis of 470 pull requests found that AI-coauthored PRs contained 1.7 times more issues than human-written ones — 75% more logic errors, three times more readability issues, 2.74 times more security vulnerabilities. Reviewers are processing harder PRs, in larger volumes, in less time. The verification burden did not disappear at the PR stage. It concentrated there while simultaneously growing harder to execute.

SonarQube users in the same survey were 44% less likely to experience outages caused by AI-generated code. That is the tooling gap, and it is real. But the behavioral gap — the decision not to verify at all — is not fixed by static analysis alone. It is a choice made before the code hits any automated check.

The Pressure Creating This

There is a specific production context worth naming.

Teams that have adopted AI coding tools are generating significantly more code per developer. Faros.ai's longitudinal study of 22,000 developers found that high-AI teams merge nearly twice as many pull requests as low-AI teams. That is not a failure — it is the point. The tools are working.

But review expectations often haven't updated. You are generating twice as much code as you were 18 months ago, and the unspoken assumption is that you are also reviewing it to the same standard. You are not. Nobody is. The math does not allow it.

The verification gap is partly a time allocation problem. Developers are generating more than they can verify. The path of least resistance is to generate, glance, commit, and let PR review catch whatever slipped through. Except the PR reviewer is also generating more than they can verify, reviewing your code on top of their own backlog.

The 52% who don't always verify are not careless. They are optimizing rationally under constraints that have not been redesigned for the new throughput.

What the 35% Shadow-IT Number Adds

One more piece from the SonarSource survey belongs alongside the verification gap: 35% of developers access AI coding tools through personal accounts rather than work-sanctioned channels.

This matters for verification in a specific way. Work-sanctioned tools often sit inside governance infrastructure — automated testing, code scanners, audit logs, the organizational plumbing that catches what individual verification misses. Personal-account access typically bypasses all of that. You get the generation speed. You lose the institutional safety net. And if you are in the 52% who skip the manual check, there is nothing left catching it.

The combination of "I'll skip the manual check" and "I'm not on a company account so there's no automated check either" is the verification gap at its most exposed.

What to Actually Track

There is no single metric that captures verification behavior directly, but there are proxy signals worth watching.

Post-merge revision rate is the downstream record of what slipped. Jellyfish's Q1 2026 data found that nominal AI code acceptance rates of 80-90% drop to 10-30% when measured two weeks post-merge. That gap is where unverified code lives. The code looked right at merge time and required revision shortly after.

Capturing that rate requires connecting your commit history to your bug and rework tracking — linking the original merge to the subsequent fix. Most teams have both data sources. Few have wired them together.

Session timing is a more accessible proxy. If you track your active coding time and can see what fraction of sessions ended in a commit without a meaningful review window — a period of read-only activity before the push — you have a rough behavioral signal. The verification gap shows up as sessions where generation time and commit time are close together, with no intermediate pause.

The developers in the SonarSource data who always verify are spending more time per commit, not less. That friction is not something to optimize away. It is the part of the workflow that makes the rest of it safe.

The Number That Should Be Uncomfortable

Forty-two percent of committed code today was generated by AI. Fifty-two percent of developers don't always verify before committing. The expectation that PR review will catch what individual verification misses is baked into a workflow designed for human-generated code at human generation rates.

At twice the generation rate, with PRs that contain 1.7 times more issues, and reviewers who are in the same throughput bind, the assumption that review scales is not holding.

The verification gap is not going to close as AI adoption accelerates. It will widen unless teams explicitly track what fraction of AI-generated code reaches production without meaningful verification — and decide whether that fraction is acceptable.

Ninety-six percent of developers don't fully trust the output. The question is what they do about it at the moment that actually matters: before they push.

Written by Kevin — builder of xeve

Track your apps, coding, music, and health — all in one place.

try xeve free