developer productivity

Kiro Makes Specs Mandatory. Mandatory Isn't Enough.

June 19, 20266 min read

AWS Summit New York landed two days ago and the announcement worth thinking carefully about is Kiro: Amazon's new agentic IDE that won't generate code until you've produced three formal documents. Requirements.md. Design.md. Tasks.md. It uses EARS notation, a requirements format developed at Rolls-Royce for specifying aerospace control system behavior, and it's the mandatory replacement for Q Developer — whose IDE plugins reach end-of-life April 30, 2027. New Q Developer signups have been blocked since May 15.

For AWS shops, this isn't a philosophical discussion about development methodology. It's a migration with a hard deadline.

The discipline behind Kiro is right. The mechanism is doing something more complicated than that.

What the Three Documents Actually Do

Kiro's workflow is sequential. When you start a feature, the agent doesn't begin generating code. It runs a three-phase process first.

Phase one produces requirements.md: user stories and acceptance criteria written in EARS notation. EARS stands for Easy Approach to Requirements Syntax, originally developed at Rolls-Royce. The templates are tightly structured — each requirement follows a conditional form that makes it machine-readable and, in theory, testable. "When the user submits the filter, the system shall return only sessions matching the selected project within 500ms." That kind of thing.

Phase two produces design.md: architecture decisions, component boundaries, API contracts, sequence diagrams derived from the approved requirements.

Phase three produces tasks.md: discrete, dependency-sequenced work items derived from the design.

Only after all three documents exist does the agent generate code.

The intent is clear and defensible. A formal spec is the anchor. When the agent has structured context about what it's building, it's less likely to drift during a long autonomous run — one of the real failure modes that shows up in multi-file agentic sessions. The spec documents also serve as persistent context that survives context window limits. Kiro even keeps them live, updating them as the implementation evolves.

The Aerospace Transplant

EARS notation comes from a specific context with specific constraints. Aircraft control systems operate under known physics, in bounded environments, with requirements that are stable between spec and deployment. The failure modes are enumerable. The acceptable states are enumerable. When Rolls-Royce engineers specify the behavior of a flight spoiler, they're describing a physical system where the gap between "requirement stated" and "requirement correct" is detectable before anything flies.

Software requirements for a SaaS feature are different in a specific way: they're often wrong at the time they're written.

Not wrong because the developer was careless. Wrong because the requirements of a software system are frequently discovered through implementation. The feature you thought needed a filtering component turns out to need stateful pagination that wasn't in scope during spec. The API contract you documented doesn't handle the edge case you hit while building the first integration. The user story that seemed complete becomes ambiguous the moment a real user's data violates an assumption you didn't know you were making.

Aerospace specs succeed because the system being specified is closed and the constraints are physical. Software specs fail for the same reason software requirements documents have always failed: the act of building reveals what the requirements needed to say. That's not a tooling problem. It's a property of complex systems where the solution space isn't knowable before you explore it.

EARS notation can precisely specify that a spoiler should deploy when airspeed exceeds a threshold under specific conditions. Whether it adds value to "a developer should be able to filter their session history by project" depends on whether you actually know what "project" means across the data sources a real user connects. Usually you find out you didn't during implementation.

Structure Is Not Substance

This is the distinction that matters for whether Kiro's approach delivers what it promises.

Kiro enforces spec structure. Structure is a necessary condition for a useful spec, not a sufficient one.

A developer who produces perfectly formatted EARS requirements for the wrong feature, with the wrong acceptance criteria, or missing a critical constraint hasn't written a better spec. They've written a more formally incorrect one. The EARS format is auditable. Spec quality is not.

The developers who've built effective agentic workflows already know this from experience. Spec-writing ability is a skill developed through repeated cycles: write a brief, watch the agent interpret it, notice where the gaps were. The feedback loop builds calibration about what "specified enough to succeed" actually means for a given type of task. That calibration doesn't come from a format requirement. It comes from observing what happens when the format is met but the substance is wrong.

Kiro's bet is that the scaffold is what's missing — that developers who would write good specs if a format existed and was required aren't doing so because there's no structural moment that demands it. That's a reasonable hypothesis. Anthropic's agentic coding research from earlier this year documented the bottleneck clearly: developers use AI in roughly 60% of their work but fully delegate only 0-20% of tasks. The gap is attributed to specification quality, and it doesn't close on its own.

If the spec habit is what's missing, Kiro provides it. If the spec skill is what's missing, EARS compliance is a different exercise entirely.

The Migration Pressure

What makes this concrete rather than theoretical is that Amazon is forcing the decision. Claude Opus 4.6 and later frontier models are exclusive to Kiro — teams staying on Q Developer are locked to older model tiers. The enterprise teams that have CI pipelines, workflow rules, or automations built around Q Developer are on a timeline: evaluate and run parallel setups by Q3 2026, full migration before April 2027.

For those teams, the question of whether EARS notation is the right requirements format is secondary to the question of what happens when spec-first becomes mandatory at the tooling level. It's infrastructure pressure, not developer conviction.

The honest position is that nobody has meaningful data on this yet. Kiro launched in preview in May and got its main AWS Summit spotlight June 17. The first real cycle-time data from teams running spec-first workflows at scale won't exist for months. The productivity case for the approach is theoretically grounded in real research about the specification bottleneck, but it hasn't been validated at the level of "teams using Kiro ship more code that doesn't come back for revision."

What Would Tell You If It's Working

If you migrate to Kiro and want to know whether the spec-first workflow is actually producing better outcomes, the metrics the IDE surfaces are the wrong ones to watch.

The meaningful signal is the ratio of first-run agent successes to total runs per feature. When a well-specified task produces agent output that's mergeable the first time, the spec did its job. When the run requires three correction cycles, the spec — regardless of EARS formatting — wasn't precise enough. That ratio tells you whether spec quality is improving as the format habit builds.

Kiro gives you the spec documents as artifacts. What it doesn't give you is measurement of whether those specs predicted agent success. The correlation you want is: time spent in spec phase relative to revision cycles per feature, tracked across weeks as the discipline develops. That's what would tell you if the format requirement is building the underlying skill, or just producing compliant documents that fail the same ways undocumented briefs failed.

System-level session tracking can approximate this without instrumenting the IDE directly. Time from opening a project to first commit captures the spec-writing phase. Revision rate in version history captures whether agent output held. Neither requires Kiro to report it. Both are visible from the development environment regardless of which IDE is running inside it. We track this boundary in xeve — the gap between first file open and first commit on a feature is one of the cleaner proxies for spec investment, and it connects to downstream revision rate in ways that individual session data doesn't.

The Format Needs to Earn Its Place

The aerospace discipline is worth taking seriously. The core argument for EARS notation is sound: requirements written in testable, conditional form are less ambiguous than requirements written as prose. Ambiguity is exactly what causes agentic runs to fail mid-execution. Anything that reduces ambiguity in the brief reduces the agent's need to guess at branch points.

But discipline borrowed from one domain doesn't transfer to another automatically. Aerospace specs succeed in part because the system being specified is bounded and the requirements are stable. Software development has neither of those properties built in. The notation doesn't confer them.

The teams that will get the most from Kiro are the ones that use the mandatory spec process as a real checkpoint — asking before they approve requirements.md whether someone could write a failing test for each acceptance criterion before seeing any implementation. If the answer is no, the requirement isn't specific enough. The document exists; the spec doesn't.

That's the bar the format is proxying for. For some teams, the mandatory pause will develop the skill. For others, it will produce three documents that look right and fail the same ways undocumented briefs failed.

The format is the easy part. Amazon solved for the easy part.

Written by Kevin — builder of xeve

++related posts

developer productivity

The Bill from Your AI Sprint Arrives Three Weeks Later

5 min read

developer productivity

30 Days to Migrate, Zero Days in Any Productivity Study

6 min read

Track your apps, coding, music, and health — all in one place.

try xeve free