Token Throughput as the New Coding Bottleneck
The inflection
The shift is recent and sharp. Andrej Karpathy identifies December 2024 as his personal inflection point. Before it he wrote roughly 80% of his own code; after it he hasn’t typed a single line. That matches the broader pattern in early 2026: multi-agent coding harnesses moved from demo-ware to daily driver in about six months.
The frame Karpathy uses: “Everything is skill issue. It’s not that the capability is not there.” In other words, when developers complain that agents can’t do X, the bottleneck is usually in how the agent is being asked, not in what the agent can do.
Token throughput, not cognitive bandwidth
The traditional developer productivity model assumed cognitive bandwidth was the constraint. How fast can you hold a codebase in your head, reason about a change, type it, test it, debug it? Every productivity tool for 40 years was designed to reduce the cognitive friction of that loop.
When an agent does most of the typing and debugging, a different bottleneck surfaces. The new productivity question is:
- How many parallel agent sessions can you supervise?
- How cleanly can you specify a task so the agent runs autonomously without coming back for clarification?
- How quickly can you review agent output and redirect when it goes wrong?
- How effectively can you stay out of the agent’s way instead of nudging it toward solutions it didn’t ask for?
This is dispatch-queue work, not craftsman work. The cognitive skills are different: problem decomposition, specification writing, output review, orchestration pacing. Some developers take to it naturally. Others find it alienating.
Peter Steinberg mode
Karpathy describes what he calls “Peter Steinberg mode” as an illustration: 10 Codex agents running simultaneously, each assigned a ~20-minute task, while the human cycles between them. One agent researches, one codes, one plans — all in parallel. The human never sits with a single agent long enough to slow it down.
The practical constraint is attention, not compute. A human supervising 10 simultaneous 20-minute tasks has about 2 minutes per task to review output and issue next instructions. Tight budget. The skill is pacing the queue so the agents don’t all hit decision points at once.
What this changes about hiring
- Typing speed stops mattering. The old correlation between fast typing and fast coding breaks. A slow typist who writes clean specs can run a much bigger agent fleet than a fast typist who can’t.
- Specification skill becomes dominant. The developers who can cleanly specify a task in 2 minutes outcompete the ones who need 20 minutes of back-and-forth with the agent.
- Orchestration pattern literacy matters. Knowing when to parallelize vs. serialize, when to escalate to a more capable model, when to restart vs. fix — these are harness-level skills, not code-level skills.
- Reviewer taste is the final bottleneck. An agent fleet produces more code than any human can read carefully. The skill of quickly spotting which output is wrong without reading every line becomes the ceiling on output quality.
The honest limits
Not every task parallelizes cleanly. Sequential bug hunts where each step depends on the previous one’s output still benefit from cognitive bandwidth more than throughput. And the “skill issue” framing can be a cop-out when agents genuinely hit capability walls on certain task classes: ill-specified problems, domains with sparse training data, tasks requiring human judgment.
The throughput frame is a better default than the bandwidth frame for 2026 — but it’s a frame, not a universal law.
Related Notes
- Agent Harnesses
- Claws - Persistent Looping Agents as App Replacement
- Auto Research - Agents as Overnight Experimentation Engines
- Karpathy - No Priors Code Agents Autoresearch