← Back to Blog

The Real Skill in the Age of AI: Knowing When to Stop

The Real Skill in the Age of AI: Knowing When to Stop

The missing skill isn’t prompting. It isn’t knowing which model to use, or how to structure a CLAUDE.md, or when to reach for an MCP tool.

The missing skill is knowing when to stop.

The Async Human

Conscious human tasks are single-threaded async processes.

We have background threads — habits, pattern recognition, the kind of low-level processing that happens without us noticing. But our conscious executive function, the part that reads specs and makes architectural decisions and evaluates whether an agent went off the rails, is strictly single-threaded. It cannot genuinely run two complex reasoning tasks in parallel.

What we call “multitasking” is actually context switching. And context switching has overhead — exactly like an OS scheduler running multiple processes on a single core. The CPU doesn’t run them in parallel; it creates the illusion of parallelism by rapidly switching between them, loading and unloading state each time.

We do the same thing. And we pay the same price: every switch costs something, and the more switches you do, the less actual execution you get per unit of time.

This is the foundation everything else rests on.

Two Stopping Problems

Given that model, there are two distinct ways things go wrong when you orchestrate AI agents — and they operate on different axes.

The Vertical Problem: Building on Unverified Ground

You’re in a session. The agent has just finished the authentication module. It looks solid. You skim the output, it seems right, and there’s momentum — so you tell it to start the dashboard.

But you haven’t actually verified the auth module at depth. You’ve seen it. You haven’t tested it.

Now the dashboard is being built on top of assumptions that may be wrong. And the longer you continue before stopping to verify, the more expensive any foundation flaw becomes. If auth has a subtle bug, it might not surface until the dashboard is half-built — at which point you’re not fixing one thing, you’re untangling two.

This is the stopping problem within a session. It’s not about switching between agents. It’s about the pull of momentum inside a single thread of work. The agent keeps going, you keep going with it, and the transition from “build mode” to “verify mode” never happens because it was never explicitly planned.

The discipline: before the session starts, redefine what “done” means. The agent stopping is not done. Done means you’ve confirmed it works and the assumptions it built on are sound. No new feature starts until you’ve reached that bar.

The Horizontal Problem: How Many Processes Can You Actually Schedule?

Now add parallel sessions. Agent A is building the API. Agent B is refactoring the data model. Agent C is writing tests.

Each of those is a process your single-threaded attention has to service. You check in on A, switch to B, switch to C, back to A — and with every switch, you load a context, do some work, and unload it. Except you don’t fully unload it. Research on attention residue shows that when you shift focus from one task to another, part of your attention stays on the previous one. That residue consumes working memory, degrading the quality of your engagement with whatever you’ve switched to.

Three overlapping agent sessions means you’re potentially carrying three partial contexts simultaneously, never fully present in any of them. It doesn’t feel like that — it feels like productivity. But the quality of your oversight degrades quietly, and regressions and bad architectural decisions slip through because you weren’t reading deeply enough when it mattered.

The practical limit, in my experience: two to three active sessions per focused block, and only when the tasks are genuinely isolatable. Beyond that, you’re not supervising — you’re skimming. And skimming is where things break.

What Stopping Well Actually Looks Like

Both problems have the same root solution: make stopping a first-class part of the plan, not an afterthought.

A few heuristics that have changed how I work:

Before launching any session, answer two questions: what does done look like, and what does verified look like? If you can’t answer both, the task isn’t scoped well enough to run yet.

Resist the next feature until the previous one is verified. Momentum is the enemy here. The agent finishes something, you feel good about it, and the natural instinct is to keep going. Pause. Check. Test. Confirm the foundation is solid before building on top of it.

Treat session boundaries as hard stops, not suggestions. Decide in advance: after these agents report back, I review, I merge what’s ready, and I stop. Context drift — where sessions keep spawning new sessions without a real break — is how six hours disappear and you end up with a codebase that’s hard to reason about and a brain that’s completely fried.

Classify tasks before parallelizing them. Not all agent work has the same cognitive cost to oversee. Documentation, boilerplate, isolated tests — these are cheap to supervise in parallel. Architectural decisions, cross-cutting refactors, anything with shared state — these deserve your full, sequential attention.

The Closing Condition

A single-threaded process that tries to schedule too many tasks doesn’t get faster — it just spends more time context-switching than executing. The most efficient system isn’t the one that launches the most processes. It’s the one that does the most real work between switches.

That applies to CPUs. It turns out it applies to us too.


What limits have you settled on in practice? I’d be curious to compare notes. Get in touch.