← back to blog
developer productivity

Your AI Agents Have a Dashboard. You Don't.

6 min read

GitHub launched the Copilot app on June 2nd, and it is not a coding tool. It is an observation platform for AI agents. That distinction matters.

The new desktop app replaces scattered chat windows with a unified control center. Each agent session runs in its own isolated git worktree. A "My Work" view shows active sessions, open PRs, issues in flight, and background automations, all in one place. Canvases — bidirectional work surfaces shared between developer and agent — update in real time as each agent makes progress. If you're running three parallel agents on different features, you have a live dashboard showing the state of each one.

This is the most sophisticated developer-facing observability product GitHub has ever shipped. And none of it tracks what you are doing.

What the App Is Actually Admitting

The need for a dedicated desktop application to manage AI agent sessions is not a neutral product decision. It's an acknowledgment that orchestrating multiple agents has become cognitively demanding enough to require its own interface.

In early 2025, the main interaction with AI coding tools was inline autocomplete. The model suggested; you accepted or rejected. One stream of attention, one approval decision at a time.

By mid-2026, the default pattern for heavy AI users is parallel agent management: multiple agents running simultaneously across different task scopes, each producing output that needs review, each capable of going wrong in distinct ways, each generating its own stream of commits and PR events. Simon Willison described this workflow in April — spinning up four Claude Code sessions in parallel, orchestrating them simultaneously — and noted he was mentally exhausted by 11 AM.

GitHub is shipping a dedicated app because the orchestration overhead became too complex to manage from inside an editor. That's the implicit product thesis: the job of managing AI agents is now a first-class part of software development, distinct enough from coding that it warrants its own interface.

They're right. And the visibility they've built for the agents is genuinely useful.

What they haven't built is the equivalent for you.

The Observability Asymmetry

When an agent session runs in the Copilot app, GitHub logs exactly what it did: which files it touched, which commits it made, how long it ran, what the PR looks like, whether it was approved or abandoned. Each session has a record.

When you run three parallel agent sessions for four hours, what gets logged about you?

Your IDE records keystrokes and file edits during the periods you're actively editing. Your terminal records commands you ran. GitHub records the commits you pushed. If you use GitHub's billing dashboard, you can see how many credits each session consumed.

None of that tells you how you spent the four hours. How long were you actually directing agents — writing prompts, refining specs, correcting course? How long were you reviewing output critically versus scanning it quickly? What did you do during the gaps between sessions?

Research on developer time allocation during agentic work surfaces something interesting: when an agent is running, developers frequently switch to unrelated tasks rather than waiting. One longitudinal study of developer logs noted that participants found it difficult to report how long tasks took because they would often work an unrelated task while waiting for agent output. That behavior is rational — idle waiting time has a high opportunity cost — but it creates a specific measurement problem. The session might take four hours in wall-clock time with one hour of actual agent direction, one hour of output review, and two hours of miscellaneous work that happened in the gaps. From the outside, it's four hours of agentic development.

The Copilot app now makes the agents' portion of that four hours fully legible. The rest remains invisible.

Why This Gap Is Getting Wider

The GitHub Copilot app is a product category that didn't exist six months ago. It was built because the observability need for agents became obvious. Developers running parallel sessions needed to see the state of each one without hunting through terminal windows.

Developer self-observability is moving slower because the need is less immediately obvious. The work still looks like "coding" from the outside. Your IDE shows you active, your terminal is open, your commit history shows forward motion. The texture of the work has changed fundamentally — you're now spending significant time directing, reviewing, and correcting AI output rather than writing code — but none of the standard productivity proxies register this shift.

The two patterns that most frequently get lost:

The time spent getting agents back on track. An agent that misunderstands the spec doesn't fail cleanly. It produces output that is plausible, partly functional, and wrong in ways that take careful reading to detect. Developers who track their session-level time consistently report that correcting a misunderstood agent run takes longer than the agent's initial execution. That cost is real, cognitively expensive, and it doesn't show up anywhere in the agent's dashboard.

The quality of the gaps between sessions. When you're running three agents and reviewing their output in rotation, you are doing genuine cognitive work — loading the context of each session, evaluating the output, deciding whether to approve, redirect, or abandon. That work is invisible to everyone, including you. If the gaps are filled with shallow multitasking rather than focused review, agent output quality degrades downstream. The Copilot app shows you what each agent produced. It can't show you whether you read it carefully.

What an Orchestrator Actually Needs to Track

The productivity signal that matters for agent-based development is how you spend the time surrounding each agent session — not what the agent did.

Specifically, the split between four modes of work:

Specifying: writing the brief, the constraints, the acceptance criteria. This is the upstream work that determines how often agents succeed on the first run. Better specs produce more first-run successes, which compounds. Developers who've tracked this consistently find that specifying well for ten minutes prevents a correction loop that costs forty.

Reviewing: reading agent output with genuine critical attention, not approval. The review that catches the logic error or the security assumption that's wrong for your context. This is the highest-value work you do during an agentic session, and it's almost completely invisible in standard metrics.

Correcting: fixing the spec after a failed run, identifying what the agent misunderstood and why. This mode tells you whether your specifying quality is improving over time. If the same class of spec failure keeps appearing, the problem is in your brief, not the model.

Drifting: context-switching to unrelated work during long agent runs. Sometimes this is fine — short tasks in the gaps are efficient. More often it means you're not available to review output when the agent finishes, which introduces latency and increases the chance you approve something you didn't read carefully.

None of those four modes are tracked by GitHub's agent dashboard. All of them are tracked by system-level time measurement — if you're logging active app windows and session boundaries.

The Product That Got Built First

The GitHub Copilot app is good and necessary. Managing multiple parallel agent sessions without a control plane would be actively worse than having one. The isolated worktrees, the My Work view, the session state tracking — these are genuinely useful for the way the most productive AI-heavy developers work in 2026.

But the product that got built first reflects where the tooling investment went: into observing the agents, because the agents are the new expensive part.

The developer's own patterns — how often they're specifying well versus throwing vague prompts, how carefully they're reviewing, whether their orchestration mode is sustainable — didn't get a dashboard. That work doesn't show up in billing data, which is where most of the observability investment is anchored.

If you're doing serious agentic development, you now have better tools to see what your agents did today than to see what you did. That asymmetry will look increasingly strange as orchestration becomes the primary mode of developer work.

The agents are observable. The orchestrator, so far, is not.

Written by Kevin — builder of xeve

Track your apps, coding, music, and health — all in one place.

try xeve free