developer productivity

Gartner's AI Cost Prediction Is Right. The Fix Isn't.

June 27, 20266 min read

On June 24, Gartner published a prediction that AI coding costs will surpass the average developer's salary by 2028. The current numbers make it plausible. Organizations are spending $200-500 per developer per month on AI tooling today, with some power users hitting $2,500 or more. If token consumption continues growing faster than prices fall, Gartner's math lands somewhere around 2028. No one who's watched a Cursor or Claude Code session spin up an agentic run for thirty minutes will find this projection surprising.

The question is what you're supposed to do about it. Gartner's answer is governance and context engineering — track usage, train developers to write tighter prompts, eliminate unnecessary data from context windows. This is practical advice and most teams should follow it. It's also not the thing that will actually resolve the problem, because it addresses the wrong layer.

The Cost Isn't the Issue. The Mystery Is.

Let's say you have a developer spending $2,500 a month on AI tokens. That's a lot. It's also potentially fine. If they're shipping work that would have required three people without AI, $2,500 is a bargain. If they're running agents that fail 70% of the time and the tokens go toward reruns, corrections, and context bloat on sessions that don't complete, $2,500 is waste with a billing receipt.

You cannot tell which it is from a billing dashboard. The billing dashboard tells you what was consumed. It does not tell you what was produced.

This is the missing layer in every governance conversation about AI coding costs. Governance frameworks track spend. They don't correlate spend to shipped value. When a developer blows through their AI credit allotment, the natural organizational response is to cap them. But capping someone who's running $2,500 in productive agent sessions is a different decision than capping someone running $2,500 in failed ones. Without per-session outcome data, you're making that decision blind.

What Context Engineering Actually Addresses

Gartner's recommended intervention is context engineering — the discipline of giving AI agents exactly the context they need, nothing more, and structuring that context so the model can use it efficiently. The argument is that most token overruns come from bloated context windows, and that developers trained to write concise, targeted prompts will see costs fall without sacrificing output quality.

This is true as far as it goes. A well-specified agent session does cost less than a poorly specified one. And there's a real skill in writing context that a model can act on, which is different from writing context that sounds complete but contains the wrong level of detail for the task.

But context engineering as a cost reduction strategy treats the symptom. The reason agent sessions run long and consume excessive tokens isn't primarily that developers write bad prompts. It's that agent sessions fail, get retried, drift mid-run, and require human correction cycles — and every one of those failure modes consumes tokens at the same rate as a successful run. A developer who runs three failed attempts before a working one has spent three times the tokens for one unit of output. Improving their prompting might help. Knowing that this is their pattern would help more.

The distinction matters because context engineering is a skill you train into developers on the assumption that better prompting reduces failure rates. That's a reasonable assumption. But it's not validated at the team level, and the training is expensive. If some developers already have high first-pass success rates and others don't, the leverage is very different across the team. You wouldn't know that without measuring what "success" looks like at the session level.

The Data That Doesn't Exist

Standard engineering analytics gives you commits, PRs opened, lines of code, time in the editor. Billing dashboards for AI tools now give you tokens consumed and dollar amounts per seat. What almost no team has is the thing that connects them: per-developer, per-session data on whether the AI session produced a usable output.

A session that opens a file, runs an agent, gets a working implementation in one pass, and closes is very different from a session that runs the same agent four times, each time requiring manual correction before the next attempt. Both appear in billing as "X tokens consumed." Neither appears in standard productivity dashboards as anything at all, because the session didn't produce a commit — it produced context for the next commit.

The 2028 cost trajectory Gartner is describing is partially a consequence of this invisibility. Teams can see that AI token costs are rising. They can't see whether those costs are rising in the sessions that produce value or in the sessions that produce reruns. Without that distinction, the only available lever is aggregate throttling — which does reduce costs, and also reduces output proportionally, which is not an improvement.

What Would Actually Help

The useful intervention is not prompting discipline or usage caps. It's building the data layer that connects AI session activity to development outcomes.

At the session level, this means tracking: did the session produce a commit? How many agent runs per commit? How much of the produced code survived the next week without being reverted? These aren't billing questions. They're productivity questions. And they require correlating two data sources that currently live in completely different systems: the AI tool's telemetry and the version control history.

Some of this is starting to appear. WakaTime now tracks AI model activity alongside coding time. LinearB's 2026 benchmark report includes AI-assisted PR metrics. But the correlation piece — connecting a developer's agent usage pattern to whether that agent usage is accelerating or compounding their delivery cycle — is still largely missing.

At xeve, we track the edges of this. Session-level data shows where time goes across tools; version history shows what shipped. The gap between Cursor open and first commit on a feature captures roughly how much of the session was rework vs. first-pass generation. That correlation isn't a billing metric. It's the closest proxy available for whether agent usage is producing leverage or producing churn.

The Real 2028 Risk

Gartner's projection will probably turn out to be approximately correct in magnitude, and significantly wrong in direction. Not because AI costs won't reach the salary threshold, but because model prices have historically fallen faster than usage estimates assume. GPT-3 cost $60 per million tokens in 2020. Comparable capability today costs roughly $1. That deflationary trend is not over, and it will absorb some of the consumption growth before 2028.

The more important risk in the prediction isn't the cost level. It's the invisibility. If organizations reach 2028 spending heavily on AI development tools and still cannot answer the question "which of that spend produced shipped value," they'll have acquired an expensive new line item with the same ROI uncertainty they started with in 2024.

Governance frameworks reduce that cost. They don't reduce that uncertainty. Context engineering helps developers use fewer tokens per session. It doesn't help engineering leaders understand whether the sessions are working.

The fix for AI coding costs by 2028 is not spending less. It's knowing what the spend bought. That requires session-level outcome data that most teams don't track and most tools don't surface. Until that layer exists, the governance conversation is about rationing something nobody has measured.

Written by Kevin — builder of xeve

++related posts

developer productivity

When a Government Pulls Your AI Stack Overnight

6 min read

developer productivity

Anthropic Writes 20% of Its Own Code. That's the Story.

6 min read

Track your apps, coding, music, and health — all in one place.

try xeve free