← back to blog
building xeve

Claude Reviews Its Past Work. Most Developers Don't.

5 min read

On May 6, Anthropic launched Dreaming — a background process for Claude Managed Agents that reads across past sessions, extracts recurring patterns, and writes structured updates to the agent's memory. Each agent session previously started fresh. Dreaming changes that. It's the first time I've seen an AI company frame the learning gap explicitly: doing tasks and learning from tasks are two different things, and they don't collapse automatically into each other.

That framing is the part I found interesting, not the feature itself. Because it applies to developers too, and most haven't solved it either.

The Blank Session Problem

A Claude agent running task 47 has no inherent knowledge of what happened across tasks 1 through 46 unless that information was explicitly stored and retrieved. Sessions are technically discrete. Patterns don't surface on their own.

Anthropic's team showed this concretely with Harvey, a legal AI company using Managed Agents for complex document work. Their agents kept making the same filetype-related errors across sessions — each time starting fresh, each time repeating the same mistake. Once Dreaming was enabled, the recurring pattern was detected, a memory update was written, and the error rate dropped. Harvey's completion rates went up approximately 6x in internal tests.

That's the problem Dreaming solves for AI agents. And it's a precise mirror of something most developers experience but rarely address.

You worked 50 hours last week. You have some intuition about how it went. But that intuition is based on impression, not data. You don't know, without looking at explicit tracking, whether your Tuesday afternoon sessions are consistently less productive than your Tuesday mornings. You can't see whether the project you feel you're neglecting is getting less time or just feeling stale. You can't tell from memory alone whether your deep work blocks are actually protecting the hours you think they're protecting.

Your weeks are also effectively discrete, unless you've built something to connect them.

What the Data Actually Shows

When we built xeve, we expected it to surface the obvious stuff: "you code more in the morning." That's true for most people and usually not actionable. The more useful findings are the ones that contradict what you already believe.

There's a pattern that comes up consistently: most developers have a 90-minute window somewhere in their day that runs at substantially higher output than everything else. They usually have an intuition about when this is. What they don't know — until they look at the data — is how rarely they protect it. Meetings, Slack catch-ups, and shallow tasks eat that window on two or three days per week without the developer registering it happening. The pattern only becomes visible when you look at a month of session data and see that the overlap between "the time I think I do my best work" and "actual deep coding time" is lower than expected.

The second thing: project time distribution. Developers consistently overestimate how much time they spend on their highest-priority work and underestimate maintenance, support, and context-switching overhead. Seeing the actual split — not what you guessed, but what the app-switch and session-length data shows — usually produces a discrepancy large enough to change behavior. Not because the information is conceptually surprising, but because a specific number for this week is different from a general awareness that "support work takes time."

The third: session efficiency by length. The intuition that longer sessions are more productive isn't uniformly true. Sessions above three hours often show declining output in the final hour, and the overhead to re-establish focus after a long session is higher than after a medium one. Two focused 90-minute blocks frequently produce more committed code than one four-hour marathon. The pattern varies by person, which is exactly why you need your own data rather than research averages.

None of this is surprising in the abstract. But abstract knowledge does very little. Seeing your specific numbers, for your actual week, compared against your own recent history — that's a different class of information.

Why Experience Doesn't Produce Learning Automatically

The core insight behind Dreaming is that experience and learning are separate things. A Claude agent can run thousands of tasks without improving on a specific error class if nothing is reading across sessions and extracting the signal. The tasks accumulate. The pattern doesn't surface.

Humans have the same architecture problem. We process experience in real time but extracting retrospective patterns requires a second system that most people don't build. Journals work for some. Weekly reviews work for others. Most developers do neither consistently, and the patterns — what's working, what's drifting, what they're systematically avoiding — remain invisible.

This isn't a discipline problem. The people who don't do weekly reviews aren't less self-aware than the people who do. The difference is usually whether they have something that makes review cheap and specific. "Reflect on your week" is expensive and vague. "Here's your coding time by project this week versus the prior four weeks" takes 30 seconds to read and produces a specific observation you can act on.

Automatic tracking solves the collection side. The review becomes cheap because the data is already there.

What Anthropic Actually Built

Reading the Dreaming documentation carefully: it's not a general-purpose memory dump. It's specifically designed to detect recurring patterns — the same error type appearing across multiple sessions, converging successful approaches, context that keeps getting re-established from scratch. The system writes targeted memory updates based on those patterns rather than just accumulating logs.

That specificity matters. A system that records everything doesn't help. A system that extracts patterns and surfaces relevant signal does.

This is the difference between raw tracking and analytics. Xeve doesn't surface every app switch you made in a week. It surfaces whether your deep work blocks are protecting the time you think they're protecting, whether your project time matches your stated priorities, and whether there's a consistent pattern in when your best sessions happen.

Dreaming for Claude agents and personal analytics for developers are solving the same structural problem: the gap between doing a thing and learning from having done it.

The Part You Have to Build

One thing Dreaming doesn't automate is what to do once a pattern is visible. Harvey's agents still needed someone to understand that filetype workarounds were the thing worth retaining. Pattern detection is automated. The decision about what the pattern means and what to change is not.

Same with personal tracking. The data will show you that your Tuesday afternoons are consistently your least productive sessions. What you do with that — move a standing meeting, protect the slot for shallow work, decide the pattern is fine given other constraints — is still your call. The data removes the ambiguity about what's happening. The choice about what to do with it is still yours.

What you can't do is make a well-informed choice without the data. And the data doesn't organize itself from lived experience alone.

Anthropic built Dreaming because a Claude agent working without retrospective pattern extraction is leaving capability on the table. The framing applies directly. A developer working the same way — accumulating sessions, forming impressions, operating from intuition — is in the same position.

The patterns in your work exist whether or not you have a system to surface them. The question is whether you see them before they've been running for six months.

Written by Kevin — builder of xeve

Track your apps, coding, music, and health — all in one place.

try xeve free