🌰 seedling
Harness Simplification as Models Improve

Harness Simplification as Models Improve


The core principle

From Anthropic’s Building Effective Agents: “find the simplest solution possible, and only increase complexity when needed.” This applies not just at design time but continuously — harness components that were essential for one model generation may become dead weight for the next.

Source: Harness design for long-running application development (Prithvi Rajasekaran, Anthropic, March 2026)

A concrete example: context resets to compaction

The original long-running harness used Claude Sonnet 4.5, which exhibited “context anxiety” — premature wrap-up behavior as the context window filled. Context resets (clearing the window entirely, handing off state via structured artifacts) were essential.

When Opus 4.6 shipped, it largely eliminated context anxiety on its own. The harness dropped context resets entirely and moved to automatic compaction (summarizing earlier conversation in place). What had been a load-bearing architectural decision became unnecessary overhead.

Another example: sprint decomposition

The sprint construct helped Opus 4.5 work coherently by decomposing tasks into manageable chunks. Opus 4.6’s improved planning and long-context capabilities made this decomposition unnecessary — the model could sustain coherent multi-hour builds without explicit task decomposition.

Result: a DAW (Digital Audio Workstation) built in a single continuous session, with the generator running coherently for over two hours without sprint boundaries.

The evaluator boundary shifts

The evaluator’s value is not fixed — it depends on where the current model’s capability boundary sits:

  • Within the boundary — tasks the model handles reliably solo. The evaluator adds cost without catching meaningful issues.
  • At the boundary — tasks at the edge of the model’s capabilities. The evaluator catches real gaps and drives improvement.

As models improve, the boundary moves outward. Tasks that needed the evaluator for one model generation may not need it for the next. The evaluator is not a permanent yes/no decision — it is worth the cost when the task sits beyond what the current model does reliably solo.

The methodology for simplification

Radical simplification (cutting many components at once) made it hard to tell which pieces were load-bearing. A more methodical approach worked better: remove one component at a time and review the impact on the final result.

The broader claim

The space of interesting harness combinations doesn’t shrink as models improve. Instead, it moves, and the interesting work for AI engineers is to keep finding the next novel combination.

This means harness engineering is not a temporary discipline that disappears as models get better. It is a permanently moving frontier — each model improvement creates new harness possibilities while making old ones obsolete.


Browser Use: the inversion pattern

From Browser Use’s Bitter Lesson post (Gregor Zunic, April 2026): Browser Use rewrote their agent from thousands of lines of abstractions down to a for-loop with raw CDP access. The design principle — start with maximal capability, then restrict — is simplification taken to its logical extreme. Rather than stripping harness components one at a time, they asked what the minimal harness could be and built up from there. The result: the model routes around failures on its own because the action space is complete.

This complements the incremental approach described above. Both converge on the same insight: every abstraction encodes an assumption about model weakness, and those assumptions expire.


  • Agent Harnesses — the parent concept; this note describes the lifecycle of harness components
  • GAN-Inspired Agent Architecture - Generator Evaluator Loops — the harness being simplified in this case study
  • Context Engineering — context resets and compaction are context engineering techniques whose necessity changed with model improvements
  • The Folder Is the Agent - Context Accumulation as Specialization — accumulated context in folders persists across model generations even as harness scaffolding changes
  • The Bitter Lesson for Agent Frameworks - Browser Use — Browser Use’s case study of throwing away abstractions and rebuilding from a minimal for-loop
Connected Notes