developer productivity

You Can't Delegate What You Can't Specify

May 14, 20266 min read

Anthropic's 2026 Agentic Coding Trends Report contains one number that explains most of the frustration in agentic workflows: developers use AI in roughly 60% of their work, but say they can fully delegate only 0-20% of tasks. The gap isn't a model capability problem. It's a specification problem — and it's where most of the productivity gains went.

The bottleneck didn't disappear when agents got good enough to run autonomously for hours. It moved upstream. Writing code was the constraint. Specifying what to write is now the constraint.

What Full Delegation Actually Requires

When you write code yourself, imprecision is cheap. You form a rough idea — batch these database writes, add validation to this endpoint — and resolve the details during implementation. You encounter an edge case, you decide how to handle it, you move on. Vagueness costs you thinking time, which is already yours to spend.

When you delegate to an agent, every vague part of your brief becomes a branch point where the agent guesses. Sometimes it guesses right. More often it makes a locally reasonable choice that's wrong for your specific context: it violates an architectural constraint you didn't mention, misses a requirement you considered obvious, or breaks a downstream integration that wasn't visible in the files it read. This isn't a model failure. It's a spec failure.

Full delegation requires a brief that specifies the objective in terms of observable outcomes, the constraints that must not be violated, the context the agent needs that isn't visible in the codebase, and the success criteria you'll use to evaluate the output. This is more rigorous than how most developers describe tasks to each other, because colleagues share implicit context built over months of working together. Agents have none of that. Everything that matters has to be in the prompt or in the files the agent can see.

Most prompts aren't that. They're how you'd describe a task to a teammate who already understands the architecture, knows why the patterns exist, and will ask a clarifying question if something is unclear. Agents don't ask. They infer.

The Cost of Imprecision Moved Upstream

Before agentic coding, the cost of a vague requirement was paid during implementation. You wrote code with a rough idea, hit an edge case, and learned in real time what the requirement actually needed to be. The feedback loop was fast because you were the one doing the work — every ambiguity surfaced while the context was still loaded.

In agentic coding, the cost of imprecision is paid after the agent finishes. The agent runs to completion, which for complex tasks the Anthropic report documents can take hours. Then you review the output and find that it went wrong at step three because your brief didn't specify the constraint the agent violated. You fix the brief and re-run, then discover a different problem caused by a different gap in the specification.

This is the reason Anthropic describes the delegation gap as the central problem of the orchestration era. The report documents the shift clearly: the dominant interaction pattern moved from inline autocomplete in 2024 to autonomous execution in 2026, where the developer describes a task and the agent handles it across multiple files. The agent can execute. What's hard is giving it something to execute that succeeds without constant supervision.

A Rakuten team described in the report had an agent implement a feature across a 12.5 million-line codebase in a single seven-hour autonomous run. That wasn't possible because they fired off a vague prompt and hoped. It was possible because someone spent significant time up front writing a specification precise enough that the agent could execute to completion without intervention. The seven-hour run was the visible part. The spec work was invisible.

Why This Doesn't Show Up in the Numbers

Most developers tracking their productivity with agentic tools are measuring editor time and token usage. Neither of these captures what actually determines how productive an agentic workflow is.

The meaningful signal is how much time elapses between starting to specify a task and merging the agent's output. That window includes writing the brief, running the agent, reviewing the result, correcting the spec, re-running, and doing the final review. For a well-specified task, the agent's first run is mergeable and the total window is short. For a poorly specified task, the cycle repeats three or four times.

The ratio of first-run successes to total runs is the metric that tells you whether your delegation quality is improving. It doesn't show up in commit counts or token consumption. It shows up in time, if you're tracking the right window.

Teams that measure cycle time on agentic tasks — from when a developer starts writing the spec to when the output is merged — see a consistent pattern. Early in agentic adoption, throughput improves and cycle times are noisy. As developers build spec-writing discipline, cycle times shorten. That shortening compounds: cleaner specs produce more first-run successes, which reduces review-correct-rerun overhead, which frees time to write cleaner specs.

The developers who don't measure this assume the bottleneck is the model. They are waiting for the agent to get smarter. What is actually happening is their briefs are imprecise, their cycle times are long, and the ceiling on their productivity is set by their own specification quality, not by the model.

The Skill That's Actually Scarce

Specification ability is a specific, learnable skill, and most developers haven't had to develop it. The traditional development workflow never required making intent fully explicit — you were the one executing, so you could fill gaps as you encountered them.

The developers who are closing this gap fastest have developed a discipline that looks like this: before delegating a task, they ask whether they could write a test for the desired outcome before seeing any implementation. If the answer is no, the requirement isn't specified well enough to delegate. The agent won't know what success looks like; neither will the reviewer. The brief needs more work before the run starts.

This is a high bar. It forces specificity about outcomes before thinking about implementation, which is actually closer to good software engineering practice than "write code and see what works" — but it's a different kind of rigor than most developers are used to exercising before they write a single line.

The specification muscle develops with practice. Short tasks with narrow scope are easier to spec fully than large cross-cutting changes. Starting with well-bounded tasks, writing explicit success criteria even when they feel obvious, and reviewing what the brief missed when an agent run requires correction — these build the skill incrementally. The developers who've been doing this for six months are noticeably faster than the ones who treat every agent run as an experiment.

The Gap Isn't Closing on Its Own

Anthropic's 0-20% full delegation number is not a temporary problem of insufficient model capability. Agents will keep improving, context windows will expand, and models will infer more from less. But the underlying dynamic doesn't go away. It reaches higher complexity levels.

A more capable agent doesn't forgive a vague brief. It executes the vague brief more thoroughly, which means the output of a bad spec is now a more complete, more convincing, harder-to-diagnose version of the wrong thing. Better agents with worse specs produce better-looking failures.

The developers who will get the most from agentic coding over the next few years aren't the ones with access to the biggest models. They're the ones who've treated specification as a first-class engineering skill, measured their delegation quality rigorously, and built the habit of making intent explicit before running the agent.

The brief is the code now. Most developers are still writing it like it's an afterthought.

Written by Kevin — builder of xeve

++related posts

developer productivity

Git Pushes Are Up 78%. Most of Them Aren't Human.

5 min read

developer productivity

The Metric AI Productivity Research Keeps Ignoring

5 min read

Track your apps, coding, music, and health — all in one place.

try xeve free