Token Throughput as the New Coding Bottleneck

The inflection

The shift is recent and sharp. Andrej Karpathy identifies December 2024 as his personal inflection point. Before it he wrote roughly 80% of his own code; after it he stopped typing code entirely. That matches the broader pattern in early 2026: multi-agent coding harnesses moved from demo-ware to daily driver in about six months.

The frame Karpathy uses: “Everything is skill issue. It’s not that the capability is not there.” In other words, when developers complain that agents fail at a task, the bottleneck is usually in how they ask, not in what the agent can do.

Token throughput, not cognitive bandwidth

The traditional developer productivity model assumed cognitive bandwidth was the constraint: how fast can you hold a codebase in your head, reason about a change, type it, test it, debug it? Every productivity tool for 40 years targeted the cognitive friction of that loop.

When an agent does most of the typing and debugging, a different bottleneck surfaces. The new productivity question is:

How many parallel agent sessions can you supervise?
How cleanly can you specify a task so the agent runs autonomously without coming back for clarification?
How quickly can you review agent output and redirect when it goes wrong?
How effectively can you stay out of the agent’s way instead of nudging it toward solutions it didn’t ask for?

This is dispatch-queue work, not craftsman work. The cognitive skills are different: problem decomposition, specification writing, output review, orchestration pacing. Some developers take to it naturally. Others find it alienating.

Peter Steinberg mode

Karpathy describes what he calls “Peter Steinberg mode” as an illustration: 10 Codex agents running simultaneously, each assigned a ~20-minute task, while the human cycles between them. One agent researches, one codes, one plans — all in parallel. The human never sits with a single agent long enough to slow it down.

The practical constraint is attention, not compute. A human supervising 10 simultaneous 20-minute tasks has about 2 minutes per task to review output and issue next instructions. Tight budget. The skill is pacing the queue so the agents don’t all hit decision points at once.

What this changes about hiring

Typing speed stops mattering. The old correlation between fast typing and fast coding breaks. A slow typist who writes clean specs runs a bigger agent fleet than a fast typist who writes messy ones.
Specification skill becomes dominant. The developers who can cleanly specify a task in 2 minutes outcompete the ones who need 20 minutes of back-and-forth with the agent.
Orchestration pattern literacy matters. Knowing when to parallelize vs. serialize, when to escalate to a more capable model, when to restart vs. fix — these are harness-level skills, not code-level skills.
Reviewer taste is the final bottleneck. An agent fleet produces more code than any human can read carefully. Quickly spotting which output is wrong without reading every line becomes the ceiling on output quality.

The honest limits

Some tasks resist parallelism. Sequential bug hunts where each step depends on the previous one’s output still benefit from cognitive bandwidth more than throughput. And the “skill issue” framing can be a cop-out when agents genuinely hit capability walls on certain task classes: ill-specified problems, domains with sparse training data, tasks requiring human judgment.

The throughput frame is a better default than the bandwidth frame for 2026, but a frame only, not a universal law.

Agent Harnesses
Claws - Persistent Looping Agents as App Replacement
Auto Research - Agents as Overnight Experimentation Engines
Karpathy - No Priors Code Agents Autoresearch (source)