GitHub Copilot is moving to usage-based billing on June 1, and most teams are not prepared for the question the change is actually asking: what is AI-assisted development worth, measured against what you're paying for it?
The mechanics are specific. Code completions and Next Edit suggestions stay included in existing plan costs. But agentic sessions, multi-file chat, and code review now consume a monthly AI credit allotment, with overages billed at API token rates. A Copilot Business seat that was a flat $19/user/month becomes $19 plus whatever your team's agents consumed. GitHub is launching a preview billing tool in early May so admins can see projected costs before the switch. That preview exists because, without it, most teams would have no idea what they're about to owe.
The developer community's reaction, as Visual Studio Magazine reported it, was blunt: "You will get less, but pay the same price." That framing is fair for teams running intensive agentic workflows. It is also a signal that usage-based billing has moved the ROI question from abstract to financially real.
The Problem With Flat-Rate Pricing
When Copilot cost a flat $10 or $19 per seat, the productivity question was easy to avoid. Is AI coding assistance worth ten dollars a month? For almost any developer, the answer is almost certainly yes, even if the impact is marginal. The cost is low enough to not require justification.
Usage-based billing changes the calculus at the agentic layer. An agent run that spends two hours debugging a database migration, produces an incorrect fix, and then requires a developer to spend another hour reviewing and correcting the output now has a cost attached to it. Not wasteful in some abstract sense. Measurably expensive.
That is a new kind of accountability. And most teams are entering it without the data to support any particular answer.
What the Data Actually Shows
There is a well-documented gap between how developers perceive AI's impact on their productivity and what controlled measurements find.
METR's study of experienced open-source developers found that participants using AI tools were 19% slower than without them, while the same developers estimated, after completing the tasks, that AI had made them 20% faster. The perception did not update in the presence of contrary evidence. JetBrains published related research in April 2026 reaching a similar conclusion: AI redistributes and reshapes workflows in ways that often elude developers' own perceptions.
None of this means AI coding assistance is net negative. Models have improved substantially since the METR study's early-2025 window, and the 2026 iteration across a broader sample suggests better results. But it does mean you cannot rely on feel to answer the question billing is now asking.
The usual productivity proxies do not help much either. "We shipped 40% more PRs this quarter" tells you throughput went up. It does not tell you whether the AI credits that enabled that throughput were worth what they cost. Nor does it tell you whether the throughput gain is real or whether you are looking at a measurement artifact: faster individual tasks, more tasks taken on, same actual output per developer-hour.
What Measurement Actually Requires
To answer whether AI coding assistance earns its cost, you need three things in one place: AI usage data, developer time data, and output data.
Copilot's new billing interface will give you the first. GitHub gives you the second implicitly, through commit history and PR cycle time. The gap is developer time data: how many hours went into the work the AI touched, broken down by actual activity rather than self-report.
Editor-based time tracking misses too much. The overhead of AI-assisted work, the review loops, the correction passes, the re-prompting when an agent goes in the wrong direction, shows up as browsing, reading, or idle time in editor metrics. It is real cognitive work that does not look like coding from inside VS Code's activity tracker. System-level tracking that records every app switch gives you the full picture: time from when you start on a task to when you commit the result, including everything that happens in between.
We built xeve to capture exactly this. App usage, active windows, coding session boundaries, with the full time breakdown across terminal, browser, editor, and AI chat interfaces. When you look at the ratio of total active work time to committed output, you get a measure that is harder to game than tasks closed or prompts submitted.
What to Do Before June 1
The billing change is three weeks out. That is enough time to establish a baseline worth comparing against.
Start tracking coding time at the system level now, not just editor activity. Look specifically at active time in AI chat interfaces and agentic sessions relative to what those sessions produced in committed code. The sessions costing the most tokens tend to be the longest agent runs on the most ambiguous prompts. Those are also frequently the sessions with the worst output-to-time ratio.
Pull 60 days of GitHub data: commits, PR cycle time, review iterations, files changed per PR. That is your output baseline before usage-based billing takes effect. After June 1, you will have cost data alongside it. Token spend on one side, shipping velocity on the other, is the actual ROI picture. If costs go up and output metrics improve proportionally, the new model is justified. If costs go up and output is flat or down, you have data for a specific conversation about which workflows are consuming credits without producing results.
Long agentic sessions on open-ended problems are the usual culprit. They are token-expensive, frequently unproductive, and feel valuable in the moment because they generate a lot of activity. GitHub's billing preview will show you the cost. Only your output data shows you whether the cost was justified.
The Useful Consequence
The developer backlash to usage-based billing is understandable. Flat-rate pricing is easier to budget, and "you will get less but pay the same" is a fair description of the worst-case scenario for agentic-heavy teams.
But flat-rate pricing also let teams deploy AI tools and declare victory without measuring anything. Nobody asked whether the $10 per seat was producing $10 of value because the cost was low enough to not bother. Usage-based billing creates a direct link between AI activity and cost that makes measurement unavoidable.
If your AI-assisted work is producing output proportional to its token cost, you will be able to show that now. If it is not, this is the moment you find out. Both answers are more useful than the uncertainty most teams have been operating with since they first turned Copilot on.
The ROI question was always there. It just became expensive not to answer it.