developer productivity

AI's Context Window Grew 3,906x. Yours Didn't.

June 30, 20266 min read

A paper submitted to arxiv in March 2026 quantified something most developers have felt but hadn't seen charted: AI context windows have grown by a factor of 3,906 since 2017, while the human capacity to hold and process complex information has gone in the opposite direction. The authors call this the Cognitive Divergence, and the mechanism they propose for what's making it worse is the delegation feedback loop — the idea that outsourcing context-holding to AI reduces the cognitive practice of context-holding, which makes further delegation more attractive, which reduces the practice further.

The paper is an arxiv preprint, not peer-reviewed, and its methodology for converting human attention research into token equivalents is novel enough to warrant skepticism about the specific numbers. But the directional finding is hard to dismiss: two curves, moving in opposite directions, with the gap accelerating.

The AI Side of the Divergence

The context window timeline is documented fact. GPT-1 launched in 2017 with a 512-token context. GPT-4 launched in early 2023 with a 32K context. Gemini 1.5 Pro launched in February 2024 with a 1M context. By 2026, frontier models from Anthropic and Google operate at 2M+ tokens — the equivalent of roughly 1,500,000 words, or about 15 full novels in a single context window.

The doubling time the paper calculates is approximately 14 months. At that rate, AI context windows have grown by about 3,906x in nine years. By any reasonable projection, they will continue growing. The practical implication is that an AI model can now hold, reason over, and cross-reference a codebase many times larger than any individual developer can keep in their head simultaneously.

That's the point. A sufficiently large context window means the model can see your entire project at once — every function, every dependency, every comment — in a way no human can. This is useful. It's also changing what human cognitive capacity is for in a software development context.

The Human Side

The paper's methodology for measuring human attention is where you need to be careful. The authors define an "Effective Context Span" (ECS) as the amount of information a person can sustain meaningful attention over in a single working session, expressed in token equivalents. They derive estimates from neuroimaging studies on working memory capacity, reading comprehension, and sustained attention, then convert those measurements to token equivalents using text-density assumptions.

Their estimate: human ECS was approximately 16,000 tokens in 2004 and has declined to approximately 1,800 tokens in 2026. That's an 89% decline in the cognitive unit the authors are measuring.

The specific percentages should be treated with caution — converting neuroimaging research to token equivalents involves enough assumptions that the numbers shouldn't be cited as precision. But the underlying neuroimaging data the paper draws on is not controversial. Studies from 2015 through 2023 consistently document declining performance on sustained attention tasks across populations that correlate with increased digital media consumption. The mechanism — shorter attention cycles driven by habitual rapid-response information environments — has been replicated across multiple labs.

What the paper adds is the framing that puts AI context window growth and human attention change on the same axis. When you plot them together, the direction of each curve is the story: AI gets vastly better at holding context while humans, on average, practice holding context less.

The Delegation Feedback Loop

The more interesting part of the paper is the mechanism it proposes for why the divergence accelerates rather than stabilizing.

The hypothesis: every time you delegate a context-holding task to AI, you skip the cognitive work of holding that context yourself. Cognitive capacity, like most forms of capacity, is maintained through practice and lost through disuse. A developer who used to hold a mental model of a 50K-line codebase — understanding how the payment module communicates with the auth layer, how the error handling propagates, where the performance bottlenecks are — now asks the AI to retrieve that information on demand. The retrieval is faster. The comprehension is shallower. The mental model never forms.

This matters most at the level of architectural judgment. The work that AI models handle poorly is not code generation — they're good at that. It's knowing when the generated code is solving the wrong problem, or when an architectural decision made three months ago has now created a constraint that makes this approach fragile, or why a passing test suite still signals an integration risk. That kind of judgment requires the mental model. The mental model requires sustained, deep engagement with the codebase. Sustained engagement is exactly what shortens when you practice delegating the context-holding.

The loop closes: delegation reduces practice, practice atrophy increases the cost of not delegating, which makes delegation more attractive, which further reduces practice.

What Breaks First

The developers most exposed to this feedback loop are the ones who moved fastest into agentic workflows without adjusting what they track about their own cognition.

It shows up in subtle ways before it shows up obviously. You have to re-read a function you wrote three weeks ago because you no longer remember why the logic is structured that way. A code review takes longer because you're not retaining context between the first file and the fourth. Your estimates get worse, because estimation depends on holding the system model clearly enough to reason about dependencies. You feel productive because you're generating more output, but the output requires more correction and the architectural decisions feel harder than they used to.

None of this is visible in commit counts, PR merge rates, or AI token consumption. These metrics measure generation volume, not comprehension depth. You would need to track the signal that's actually degrading: how long you can sustain focused engagement with complex material before your attention resets.

The Measurement That Matters

Focus block length is a reasonable proxy. Not "time spent at a computer" — that can include shallow work, review, context-switching. Focus block length means uninterrupted engagement with a single complex problem, measured from when you open the relevant files to when attention breaks. It's the cognitive unit that corresponds most directly to what the ECS is trying to capture.

At xeve, we track this across users and the pattern is consistent: developers who maintain longer average focus blocks produce code that holds — lower churn rate, fewer post-merge corrections, more successful first-run PRs. The correlation isn't surprising. Sustained attention is what lets you hold enough of the system model to catch the problems before they hit production. The developers who maintain that capacity make better decisions at the architectural level, and that's where most of the value lives now that implementation is increasingly automated.

The practical version of the feedback loop defense is not "use AI less." It's "use AI in a way that preserves the practice of holding context." That means reading the code the agent writes, not just approving it. It means maintaining your own mental model of the systems you own, even if AI can retrieve the details faster. It means treating an extended focus block on a complex problem as a skill to protect, not a chore to outsource.

What the Divergence Actually Means

The AI context window will keep growing. The 2M-token context of 2026 will look small in three years. This is useful: there are genuinely new things you can build when a model can see your entire codebase at once.

But the developers who will use that capability well are the ones who can specify what they want with precision, review what they get with depth, and identify when the generated solution is wrong at a level that requires system-level understanding. All three of those activities depend on the cognitive capacity that the delegation feedback loop erodes.

The divergence the paper documents is real regardless of whether the specific ECS numbers hold up to scrutiny. AI is getting dramatically better at context-holding. The question for individual developers is whether they're treating their own equivalent of that capacity as something to develop or something to replace. Those are very different trajectories, and right now almost nobody is measuring which one they're on.

Your focus block length is a number you can track. It's not the same as ECS, but it's the closest proxy you have. If it's been getting shorter since you started running agents, that's data worth having.

Written by Kevin — builder of xeve

++related posts

developer productivity

Gartner's AI Cost Prediction Is Right. The Fix Isn't.

6 min read

developer productivity

When a Government Pulls Your AI Stack Overnight

6 min read

Track your apps, coding, music, and health — all in one place.

try xeve free