← back to blog
developer productivity

1,000 Agents Is Impressive. 95% of Your Work Doesn't Qualify.

6 min read

Jarred Sumner ported Bun's entire runtime from Zig to Rust in eleven days. Roughly 750,000 lines of Rust, 99.8% of the existing test suite passing on merge. He did it using Claude Code's dynamic workflows feature, which Anthropic launched on May 28th. Multiple overlapping waves of parallel agents — one wave mapping Rust lifetimes for every struct field in the Zig codebase, the next writing behavior-identical .rs files with two reviewers per file, a fix loop driving the build and test suite until both ran clean, and then an overnight pass opening PRs for each unnecessary data copy identified post-merge.

It is the most technically impressive demonstration of agentic coding I've seen. It is also a batch computing job, and the conditions that made it possible are more restrictive than the coverage suggests.

What Made It Work

The Bun port worked because it had three properties that most software tasks don't.

The transformation was mechanical. Porting behavior-identical code from one language to another is not a design problem. You're not deciding what the system should do; that's already encoded in the Zig. You're translating it into a different syntax while preserving semantics. The agents were executing a well-understood mapping function, not making judgments about architecture or product direction. When the problem is "take this input and produce a behavior-identical version in another language," 1,000 agents can work the problem in parallel because each agent is solving a bounded subproblem with a deterministic correct answer.

The work decomposed cleanly at file boundaries. Zig and Rust both organize code in files. Porting file by file is parallelizable in a way that, say, refactoring a tangled dependency graph is not. Each agent got a Zig file and had to produce one Rust file. The units of work were clear, the dependencies across units were manageable, and progress could be tracked and validated per unit. You can't parallelize across 1,000 agents unless you can decompose the problem into 1,000 independent pieces.

The test suite was comprehensive enough to validate the output. This is the number that should get more attention: 99.8% of existing tests passing on merge. The Bun codebase came with a test suite that could determine, mechanically, whether a given Rust file behaved like its Zig counterpart. Without that, you have no signal. You submit an overnight run to 1,000 agents and come back in the morning with 750,000 lines of Rust and no reliable way to know if any of it is right. The test coverage isn't a nice-to-have in this workflow. It's the feedback mechanism that makes the whole thing viable.

What Anthropic's Own Data Says

Here's the tension worth sitting with. Anthropic launched dynamic workflows with a 1,000-subagent ceiling and a marketing pitch about tasks you can "start before bed and wake up to completed." Their own 2026 Agentic Coding Trends Report, published months before the feature, says multi-agent approaches don't make sense for 95% of agent-assisted development tasks.

The company simultaneously shipped the most ambitious agent coordination feature in developer tooling and told you it's the wrong tool for almost everything.

Both of these things are true. The feature is real. The scope of its applicability is narrow. The tension only looks like a contradiction if you assume the launch is a claim about typical usage rather than a demonstration of a ceiling.

What Dynamic Workflows Actually Are

The framing that causes the most confusion is treating dynamic workflows as an upgrade to the existing coding assistant model — a better Copilot, a more capable Claude Code. They're not. They're a different category of tool.

The existing coding assistant model is interactive. You write, the assistant suggests, you accept or reject, you iterate. Feedback is synchronous. You can course-correct in real time. The loop is tight enough that imprecision in what you asked for surfaces immediately, while you still have full context on the task.

Dynamic workflows are asynchronous batch processing. You specify the job. You submit it. Agents run, potentially for hours, without your involvement. You review the output afterward. The feedback loop is hours long, not seconds. You cannot course-correct mid-run. Everything that matters has to be in the specification before you submit, because the agents will execute to completion — and by the time you see the results, they've already made thousands of decisions you didn't supervise.

This is batch computing. It is a legitimate and powerful paradigm. Mainframes ran payroll and billing jobs overnight for decades. The pattern works when the inputs are well-defined, the transformation is mechanical, and you have validation criteria that can run without human interpretation. It doesn't work when the problem requires judgment, iteration, or understanding of context that isn't expressible in a specification.

The Bun port met all the batch computing preconditions. Most developer work doesn't.

What the 5% Actually Looks Like

Anthropic's own examples for dynamic workflows cluster into a specific category: large-scale investigations and migrations across millions of lines of code. Security audits where you want to know every instance of a vulnerable pattern across an enormous codebase. Schema migrations across thousands of files. Dependency upgrades that require consistent changes in dozens of places. Performance analyses that need to examine every hot path.

These tasks share the structure of the Bun port: the what is defined, the units are independent, and you have a way to verify the output mechanically. What's intractable is the scale — more than any person could review manually in a reasonable timeframe. Agents don't provide the judgment. They provide the parallelism.

The 95% that doesn't qualify: novel feature development, debugging production incidents, designing architecture, reviewing anything that requires understanding why the code is structured the way it is. Any task where the answer to "what should it do" is what you're trying to figure out. Agents can't resolve that ambiguity at scale. They amplify it.

The Missing Prerequisite

The 99.8% test pass rate in the Bun port is the number I keep coming back to. Not the 750,000 lines. Not the eleven days.

Before Jarred Sumner submitted that overnight run, the Bun codebase had a test suite capable of mechanically validating 750,000 lines of ported code against the original behavior. That test suite was the product of years of development. Without it, the dynamic workflow is submission into a void — you might get back output that looks right, compiles, and fails in production in ways nobody anticipated.

Most developer backlogs are not backed by 99.8% test coverage. Most applications don't have test suites capable of validating a mechanical transformation at scale. This is worth naming plainly, because the demos of dynamic workflows tend to emphasize the agent count and the turnaround time. The prerequisite that makes turnaround time mean anything is the validation infrastructure, and that prerequisite is often the hardest part.

If your codebase has comprehensive tests and you're facing a large-scale mechanical transformation, dynamic workflows can compress weeks into hours. If your codebase has patchy test coverage and you're facing a design problem, 1,000 agents will give you 1,000 versions of the wrong answer, fast.

Three Questions Before You Queue the Job

Before treating dynamic workflows as a productivity upgrade for your current backlog:

Is the task a mechanical transformation or a design problem? Mechanical transformations (ports, migrations, refactors with deterministic rules) fit the batch model. Design problems don't. If the output depends on judgments that aren't expressible in your specification, the agents will make those judgments without you — and you won't know until you review the results.

Can the work decompose into independent units? File-level, module-level, endpoint-level — some boundary that agents can work across in parallel without stepping on each other. If your task requires agents to coordinate tightly or share context across everything they're doing simultaneously, the parallelism doesn't help.

Do you have validation that can run without you? The Bun port had a test suite. What's yours? If the answer to "how will I know if this is right" involves reading the output and using your judgment, you're in the validation loop for every unit. That's not a batch job. That's 1,000 tasks waiting for your review.

The answers to those three questions determine whether you have a Bun port or a specification problem dressed up in agent infrastructure. The distinction matters more than the agent count.

Written by Kevin — builder of xeve

Track your apps, coding, music, and health — all in one place.

try xeve free