Inference Cost Collapse and Frontier Model Margin Expansion

The numbers (as of April 2026)

From Brad Gerstner on All-In E230 - Anthropic Mythos OpenClaw Where AI Value Accrues (source):

Anthropic ARR trajectory: $1B (end 2024) → $4B (mid 2025) → $9B (end 2025) → $30B (April 2026)
Revenue tripled in roughly four months
Added “a combined Databricks-plus-Palantir of annualized revenue in a single month”
Achieved with only 2,500 employees and ~1.5 gigawatts of compute
Gerstner’s forecast: $80-100B ARR exit 2026

Why cheaper inference grows margins

The intuition that falling prices compress margins assumes variable costs scale with price. In frontier model economics, they decouple:

Fixed: Compute buildout, power contracts, training runs, research headcount. These costs hold steady even as per-token inference prices fall.
Falling fast: Per-token inference cost (hardware utilization, model distillation, quantization, batching efficiency, specialized inference hardware).
Growing faster than prices fall: Demand volume, as cheaper inference unlocks use cases that were unviable at prior price points.

When the cost curve drops 90% YoY but revenue 3x in four months, gross margin per dollar of infrastructure expands. Gerstner: burn levels will surprise skeptics on the low side.

The honest counter

Chamath’s objection (same episode) deserves weight: the debate is still happening in gross-revenue terms rather than free-cash-flow terms. Classic hype-phase signal. Also, Anthropic reports gross revenue while OpenAI reports net revenue, making direct comparisons misleading.

The resolution: gross margin expansion is probably real, but whether that translates to FCF depends on how much of each incremental revenue dollar gets consumed by the next round of training compute, data center buildout, and talent wars. The 2026-2027 burn numbers will settle the argument.

What this means for positioning

If inference margins are genuinely expanding at frontier labs, then:

Model-layer short theses based on “race to zero margin” are probably wrong for this cycle
Consumer willingness-to-pay caps matter less than a year ago, because the same revenue now funds 10x more compute
Capital requirements become a moat, because the players who can afford the next 10x training run capture disproportionate share

The trap is extrapolating from Anthropic’s moment. Inference cost collapse is roughly a one-time phenomenon that settles when hardware utilization and algorithmic efficiency approach limits. The margin expansion window closes when prices stabilize.

AI Stack Value Accrual - Chip, Infra, Intelligence, App
TAM of Intelligence is Infinite
AI and Investing Thesis
EUV Lithography as the Binding Constraint on AI Scaling — cost collapse assumes unconstrained chip supply; EUV says supply is capped
AI Infrastructure Investment Thesis - Mid-Post Training Layer
Nvidia Deliberate Underpricing as AI Ecosystem Strategy — hardware-side explanation for why model providers capture the margin expansion
AI Economics - Sriram Krishnan — the demand-side counterweight: cheaper inference still has to clear the revenue-per-watt-year self-funding bar, and Meta’s Llama bet is structured to suppress closed-lab pricing before margins fully compound
Agentic vs Human Inference - Workload Profile Differences — duration counterweight: cost collapse assumes the human-inference workload profile that today’s hardware was built for. The agentic profile inverts the design assumptions and may extend, not close, the margin window