developer productivity

Four Agents at Once Will Wipe You Out by 11 AM

May 3, 20266 min read

Simon Willison, co-creator of Django, has 25 years of experience as a software engineer. He told Lenny's Podcast on April 3rd that spinning up four AI coding agents in parallel and orchestrating them simultaneously has made him dramatically more productive. He also said that by 11 AM, he is wiped out for the day.

These two things are both true, and the gap between them is the most important thing happening in developer productivity right now.

What Parallel Agents Actually Enable

For anyone not running agentic workflows yet: the basic pattern is straightforward. You open four terminals — or four Claude Code sessions, or four Cursor windows — point each at a different task, and let them work while you review, redirect, and unblock each in turn. Instead of writing one feature at a time, you're orchestrating four simultaneous workstreams. Your AI agents are writing code; you're the technical director.

The output gain is real. Willison described shipping more before noon than he used to ship in a full day. OpenAI's own engineers running parallel Codex sessions in April reported similar patterns — higher throughput, faster cycle times on well-defined tasks. A survey this year found 59% of developers now run three or more AI coding tools simultaneously for exactly this reason.

The problem is that the cognitive model required for this work is fundamentally different from the cognitive model required for programming — and those two models do not mix well inside the same day.

Programmer vs. Orchestrator

When you write code, you're doing depth work. You load a mental model of a system: the data flows, the state transitions, the edge cases, the constraints imposed by the surrounding architecture. You add to it incrementally. Each hour of uninterrupted work builds on the last. The first 20-30 minutes are an investment in loading the model; everything after that is the payoff.

When you orchestrate agents, you're doing breadth work. You maintain awareness of four simultaneous contexts at once — what each agent has done, what it missed, what review it needs, what the next direction should be. You're not deep in any single codebase; you're shallow in all four. You're reading, evaluating, redirecting. The work is real and cognitively demanding, but it's a different kind of demanding.

These modes are not just different styles of working. They draw on different cognitive resources. Depth work taxes your ability to maintain and extend a complex mental model over time. Orchestration taxes your working memory and attentional switching capacity. The exhaustion they produce is different in kind.

Willison's 25 years of experience, it turns out, doesn't help with the second mode. When he said that running four agents well is "taking every inch" of his experience — and that the exhaustion sets in by 11 AM — he was describing what happens when you try to maintain four simultaneous shallow contexts at full fidelity. The cognitive load scales with the number of agents, not the complexity of any individual task. Adding a fifth agent doesn't give you 25% more throughput. It gives you 20% more output and a 50% increase in overhead.

Where the Fatigue Actually Comes From

There's a specific mechanism here that BCG's research on AI cognitive load starts to illuminate. Their study, published in March, found that workers whose AI-related work required high oversight — checking outputs, managing parallel agents, validating generated code — expended 14% more mental effort and reported 12% greater fatigue than baseline. Workers using four or more AI tools simultaneously saw self-reported productivity drop.

The reason is that oversight is not passive. When an agent produces a block of code, you do not simply read it and accept or reject it. You reverse-engineer the logic. You check it against the architectural constraints of the surrounding system — constraints the agent doesn't have access to, or partially misunderstood from context. You mentally simulate edge cases. You catch the subtle off-by-one, the security assumption that's wrong for your environment, the API call that will break under load.

This "review" phase is cognitively intensive even for a single agent. Multiplied by four simultaneous workstreams, you are doing four continuous partial reviews, none of which gets your full attention. The work looks like high throughput. The experience is like trying to read four different books at the same time.

What makes it specifically exhausting for experienced developers is that experienced developers read code critically. They spot things that don't fit. A junior developer approves an agent's output because it looks plausible. A senior developer reads the same output and sees the three ways it could fail in production — and then has to think about each of those failure modes before approving. The faster the generation, the higher the review debt accumulates.

What You're Not Tracking

Most developers running parallel agents are measuring what they've always measured: tasks completed, PRs opened, lines of code shipped. These numbers look great. The things that don't show up:

How long was your useful work window today? If you fire up four agents at 8 AM and hit the wall at 11 AM, you got three focused hours — the same as a developer who did one deep coding session. The difference is your three hours produced more output, but they also left you with nothing in reserve for the afternoon. A deep coding day often has a second session.

What's the defect rate on agent-generated code that makes it past your review? BCG found that workers reporting AI cognitive overload made 39% more major errors. The cognitive tax of managing too many simultaneous contexts doesn't just leave you tired — it degrades the quality of the review itself. You catch less when you're reviewing four streams than when you're reviewing one.

What is your actual sustainable output per week, not per morning? Daily peak throughput is not the same as weekly sustainable throughput. The developers who are going to get the most from agentic workflows over months are the ones who treat their cognitive capacity as a resource that depletes and recovers — not as something to max out every morning.

What Sustainable Orchestration Looks Like

Willison's honest answer is that he's still figuring it out. He does it, it works, and it costs him his afternoon. That's the current state of the art for parallel agentic workflows: people are shipping more, wearing out faster, and not quite sure how to resolve the gap.

A few patterns worth experimenting with, based on what we see in usage data from developers who track their time:

Limit concurrent sessions to three. OpenAI's own internal data suggests three to five parallel Codex sessions is the ceiling before switching cost eats the gains. Most developers hit the wall before five. Three lets you review each stream with meaningful depth rather than cursory glance.

Separate orchestration days from depth days. The cognitive modes genuinely compete. A morning of parallel agents and an afternoon of architecture review is not a whole day of productivity — the second session will be degraded by the first. If you're doing agent-heavy work, plan for it to be the work of that day, not a warm-up.

Track your afternoon output separately. If you're doing parallel agents in the morning, log what you actually produce in the second half of the day. Most developers assume they're productive for eight hours. The data usually shows the afternoon after a heavy agentic morning is mostly shallow work: meetings, reviews, small fixes. Knowing the real pattern tells you whether the morning gain is net positive across the whole day or just redistributing the same total output into a smaller window.

The Trade-off Is Real

Parallel agentic workflows are genuinely powerful. They will keep getting more powerful as the models improve, the tooling matures, and developers figure out how to work with them well. None of that is in doubt.

What's also real is the cognitive trade-off Willison described honestly: more output, less runway. You're trading depth of work for breadth of coverage. Some days that's exactly the right call. Some problems are better solved by one four-hour focused session than by four parallel agents.

The developers who will get the most out of this shift are not the ones who run the most agents simultaneously. They're the ones who track what the agentic workflow is actually costing them — in fatigue, in review quality, in afternoon capacity — and calibrate accordingly.

You can fire up four agents at once. Just know what you're spending to do it.

Written by Kevin — builder of xeve

++related posts

developer productivity

Solo Founders Don't Have a Speed Problem

5 min read

developer productivity

AI Saved You an Hour. Researchers Tracked Where It Went.

6 min read

Track your apps, coding, music, and health — all in one place.

try xeve free