At some point last week I had three Claude sessions running simultaneously. Each one doing something useful. Each one expecting something from me. And I had no system for any of it. I was using the same setup I built back when I was the one writing all the code.
Something was clearly wrong.
The honest version is this: the job has changed. Not “we have a new tool now” changed. The actual nature of the work has shifted, and I believe most developers haven’t processed what that means for their environment. When you were writing all the code yourself, your setup was optimized for one operator at a keyboard. One file, one thought, one change at a time. Editors, terminals, build tools, the whole stack was built around that loop. It worked.
Agentic AI breaks that loop. Not dramatically. Quietly, at first. You stop writing most of the lines. You start reviewing, directing, redirecting. The typing is still there but it’s no longer the bottleneck. Your judgment is. And the environment you built to serve a typist doesn’t know it’s supposed to serve a decision-maker now.
There are four places where I feel that gap most.
You’re orchestrating, not authoring
When I kick off a session to implement a feature, I’m not writing code. I’m issuing instructions to something that will write code. That’s closer to architecture or code review than to editing a file. The mental gear is different. I kept treating sessions like they were autocomplete with extra steps, and I spent two days convinced I needed a new plugin before I realized I needed a new mental model.
The mismatch only becomes obvious when your environment fights the work. Keybindings built for file navigation, when you’re trying to manage running contexts. A terminal multiplexer configured for one active task, when you need three. Nothing in the setup was designed for this, because the setup predates the workflow.
The question worth asking is not “what plugin do I add?” It’s “what does my environment need to support now that it didn’t need to support before?” For me the answer starts with context visibility. Concretely speaking: I name every tmux session after the task it’s running, and I keep a short notes file per session that I update whenever I redirect the AI. Two habits, nothing clever. But before, I was holding all of that in working memory and losing pieces of it constantly. Now I can look at my workspace and see what’s waiting, what’s running, what came back needing a decision. The overhead dropped, and my ability to redirect accurately went up. Turns out the bottleneck was me, not the model.
Reviews are now the primary job
This is the part I didn’t fully process until recently. When AI writes the code, reviewing it becomes the primary work. Not a second pass on something I mostly wrote. Reading something I didn’t author at all, and being responsible for it anyway.
There’s a specific failure mode I keep running into: the session went well, the output looks right, the tests pass. Everything signals “this is fine.” So I skim. I approve. Then two days later I find an architectural decision buried in the middle of the diff that I don’t actually agree with, or a helper function with a subtle assumption that the tests don’t catch because the tests were generated by the same session that wrote the bug. When the session writes both the code and the tests, a green suite isn’t evidence of correctness. It’s evidence of internal consistency. Not the same thing. The output was locally correct. It just didn’t fit the larger context.
That’s approval bias. When everything looks good, you stop reading carefully. And I believe it’s more dangerous with AI-generated code than with a colleague’s PR. A colleague’s bad code has tells. You can see where they were going, even when they went wrong. AI output is clean, confident, and looks like it was written by someone who has read all the documentation and none of the business requirements. There is nothing to push back against, which makes it easy to wave through.
Concretely speaking: I now use delta instead of reading diffs in a chat interface. I keep a separate worktree for anything non-trivial so I can actually run the output before approving it. And I have a short mental checklist I run on anything longer than roughly a hundred lines. Does the approach match how we’d actually have written this, meaning the thing the AI couldn’t know: the half-deprecated module we haven’t fully migrated, the team convention that lives in someone’s head and nowhere in the docs. Are there hidden assumptions. What’s the worst thing this code could silently do. Not because AI is untrustworthy. Because reviewing deserves the same environment quality that writing always had, and that environment doesn’t come for free. I had to build it deliberately. I should have done it sooner.
Parallel work keeps the reviews flowing
The reason to run sessions in parallel isn’t speed for its own sake. It’s so that when you finish reviewing one thing, there’s already something else ready. Reviewing is the bottleneck now. You want to keep it fed.
The natural pattern: kick off session A on the feature, session B on the tests, session C on the backlog item. By the time I’ve reviewed A, B has something. By the time I’ve redirected B, C is done. I’m constantly in motion but none of it is wasted waiting.
That only works if the environment can hold multiple contexts without losing them. Named tmux sessions. Git worktrees, one per active task. A note layer so I’m not carrying everything in working memory. None of these are novel ideas, I was using most of them before. What changed is how much the setup matters. One session at a time, a sloppy environment is a minor annoyance. Three in parallel, it’s the main bottleneck.
Automating the overhead
When I was writing all the code myself, the boring overhead (branching, committing, opening PRs) was annoying but proportional. Once per feature. Now I do it once per session, and the session cadence is much higher.
Every manual step I do predictably and without thinking is a step that interrupts a reviewing workflow running at a higher frequency than it used to. For that reason, the threshold for “worth automating” has dropped a lot. Things I could live with before are now friction I notice every day.
The heuristic I use: if I can describe the exact steps without ambiguity, it’s a script. If it requires reading, judgment, or context, it’s a prompt. The second category is what AI is actually for. The first category is just overhead I’ve decided to stop tolerating.
I have five sessions running right now. Two came back in the last hour. I know exactly what each one is waiting for, because it’s in the notes file.
A few weeks ago, juggling two or three felt like a lot.
Game of Life doesn’t care what you intended to build. The cells follow the rules, not your plans. My rules changed the day I stopped being the one writing all the code. The environment just needs to catch up.