developer productivity

AI Coding Output Is More Unequal Than Global Income

June 9, 20265 min read

The average developer saves about 4 hours per week from AI coding tools. The P99 Cursor user produces 46 times more AI-assisted code than the median user. These two statements are not in conflict — the second explains why the first is almost useless as a planning number.

Cursor published their Spring 2026 developer habits report with a data point that reframes the entire AI productivity conversation: the Gini coefficient for AI-generated lines of code across their user base is 0.77. P90 developers produce 10x more AI-assisted lines per week than the median. P99 developers produce 46x more. The top 1% merge 15x more PRs per week than the median user.

The global income Gini coefficient is around 0.65. Developer AI output is more concentrated at the top than global wealth distribution.

This is not a story about AI making everyone more productive. It's a story about a small fraction of developers operating at a scale that wasn't previously possible, while the median captures something significantly more modest — and has been flat for roughly a year.

What the Median Actually Looks Like

Cursor's data puts the median developer at 712 lines per week in May 2026, up from somewhere around 400 a year prior. Real increase. The P90 developer produced 8,800 lines per week in the same period.

Those two developers are not on the same curve. A median Cursor user is probably accepting inline completions, using AI to scaffold functions, generating test cases. The P90 user is running agents for larger task blocks, accepting output with minimal review, committing at a pace that the older copilot workflow cannot produce.

The behavioral data confirms it. Auto-accepted agent changes — code that reaches a commit without the developer manually reviewing the diff — rose from 7% to 36.3% between January and May 2026. Five-fold increase in five months. The median user is not driving that number. You cannot get to 36.3% aggregate auto-accept without a significant population that has made this their default.

The Pre-Selection Problem

A longitudinal study from NAV IT, published on arxiv in January 2026, analyzed two years of commit data across 703 repositories: 25 Copilot users versus 14 non-users. The finding: no statistically significant change in commit-based activity after Copilot adoption.

The same study found something critical: Copilot users were consistently more active than non-users before Copilot's introduction. The more productive developers adopted the tool first. The treatment group pre-selected for higher engagement.

This applies to any AI tool comparison between adopters and non-adopters. The developers who adopt new tools early were already pulling ahead. Measuring the tool's effect by comparing them to non-adopters overstates the tool's impact. Measuring by comparing their before/after understates the self-selection component — they would have kept pulling ahead regardless.

The Cursor power-law data tells you what happened inside the user group: even among developers who all chose to use AI tooling, gains concentrated at the top. The P90 is pulling away from the median at the same time AI adopters pull away from non-adopters. The divergence compounds.

The 36% Who Aren't Reading the Diff

The auto-accept number is worth sitting with.

36.3% of agent changes are now reaching commits without manual review. The CHI 2026 study from Carnegie Mellon found that 60% of developers said they preferred copilots to coding agents specifically because agents felt "kind of hidden" — outputs that were hard to understand and hard to control. 55% reported better comprehension of copilot output versus agent output.

That is stated preference. What Cursor's behavioral data shows is that in actual usage, more than a third of agent changes are skipping the comprehension step entirely.

There is probably a correct regime for skipping review: scaffolding, boilerplate, test fixtures, migration files. Code with a clear completion state and limited ownership surface. For that category, spending time reading a diff before merging is overhead that adds less than it costs.

But the rate rose 5x in five months. You cannot produce 8,800 lines per week while reading every diff — the math doesn't work. The category of code accepted without review is expanding at the pace the top-decile output numbers require.

What you don't know from outside your own data is which code is moving into the skip-review category that shouldn't be there. You can answer that if you track which files and functions come back for edits within two weeks of merge, and whether the revisit rate correlates with how the code was generated. That's a different kind of productivity accounting than lines or hours — it's a cost that appears in week five, not week one.

What This Means for Your Own Numbers

The AI productivity conversation has been running on averages. DX's 4-hour weekly savings number is real — it represents something close to the median, and it has been flat for four quarters because the median has plateaued. Cursor's Gini of 0.77 shows what's above that plateau: a distribution where most of the headline growth is driven by the top decile.

Comparing yourself to the average tells you almost nothing. Two developers at the same company, using the same tools, can sit at 4 hours saved per week and 40 hours saved per week — with completely different implications for what to optimize. The median Cursor user uses AI tools daily. The P90 user also uses AI tools daily. They are doing categorically different things.

Some developers are at the median because they have hit the natural ceiling of the copilot-style workflow and haven't restructured their approach. Some are there because their work — complex refactoring, debugging distributed systems, architecture decisions — doesn't extract well from current agent workflows regardless of effort. Some are at the median and don't know they have more headroom.

You cannot tell which case applies to you from vibes. The P50 developer feels like they are using AI productively. The P90 developer probably also feels like they are using AI productively. The feelings are the same; the output numbers differ by a factor of 12.

What you need is your own longitudinal data: focus session length, commit frequency, PR cycle time, revisit rate on recently merged code. Not because any single metric is definitive, but because they're something other than the average. At a Gini of 0.77, the average describes almost nobody's actual experience — and certainly not yours specifically.

The distribution is wide enough that "I use AI tools regularly" doesn't place you anywhere useful in it. Your history does.

Written by Kevin — builder of xeve

++related posts

developer productivity

AI's 2x Productivity Mandate Only Works on New Code

6 min read

developer productivity

94% of Developers Feel More Productive. The Real Gain Is 12%.

6 min read

Track your apps, coding, music, and health — all in one place.

try xeve free