Runtime Shift — A New Form of Distributional Shift for Agents

The mechanism

An agent does not learn an abstract concept of “shell.” It learns the specific shell it trained on — how ls formats output, how the browser resolves redirects, which commands block versus return instantly, which snapshots exist. The model bakes all of this into the policy during training (or the optimizer bakes it into the harness and prompts).

Move to a different runtime and:

Tools that were idempotent become flaky
Commands that were instant now block
Snapshots that existed disappear
Error messages change format

This is not data shift — the input distribution the MLOps community has worried about for a decade. It is environment shift, and it produces silent quality regressions that standard evals never catch because the eval runs in the training environment.

Three honest mitigations

Co-locate train and prod on the same runtime. Pick one sandbox provider, run RL rollouts on it, run production sessions on it, accept the lock-in. The most disciplined teams do this.
Define a runtime contract and enforce it on both sides. A small, versioned interface — “this is the shell, these are the tools, these are the latencies, these are the failure modes” — implemented once on training infrastructure and once on production infrastructure. The contract must cover timing and failure semantics, not just API surface.
Train against production noise. Inject latency, errors, and tool failures during training so the policy is robust to runtime variance. Step-DeepResearch reports tangible gains from 5–10% tool errors during training. This is the runtime analog of dropout.

The wrong answer — the one most teams unconsciously pick — is to choose a sandbox for production as a pure software engineering decision, ignoring the ML requirements, then spend quarters chasing agent flakiness.

Source: Hidden Technical Debt of AI Systems: Agent Runtime — Lee Hanchung, April 2026

Agent Harnesses — the harness is what the runtime hosts; runtime shift means the harness trained in one environment may not transfer cleanly
Context Engineering — context management assumes stable tool behavior; runtime shift breaks that assumption
Hidden Technical Debt of AI Systems - Agent Runtime — source clipping

Runtime Shift — A New Form of Distributional Shift for Agents

The mechanism

Three honest mitigations

Related Notes