AI Memory Crowding — HBM Eats Consumer Device Budgets

Why HBM, not DRAM

The binding constraint on inference throughput is memory bandwidth, not compute FLOPs or capacity.

Metric	HBM	DDR DRAM
Bandwidth	~2.5 TB/s per stack	~64-128 GB/s
Wafer area per bit	3-4x more	1x (baseline)
Cost per bit	Much higher	Lower
Value per bit in AI	Orders of magnitude higher	N/A for AI accelerators

Switching to commodity DRAM would increase capacity per chip but leave compute cores idle waiting for data. Total tokens per dollar gets worse, not better.

The crowding mechanism

DRAM vendors lost money in 2023 → delayed fab investment
Prices recovered in 2024 when reasoning models + KV cache scaling made long-context mainstream
New fabs take 2 years → meaningful capacity arrives late 2027-2028
In the interim: AI demand claims an increasing share of fixed memory supply
Consumer devices get squeezed — prices rise, volumes fall

Projected impact: smartphone volumes from 1.4B to 500-600M units. Xiaomi and Oppo already cutting low-end volumes by half. Memory vendors prefer AI contracts (longer terms, higher margins, more value per bit).

Investment implications

Memory vendors (SK Hynix, Samsung, Micron) benefit from the shift to HBM — higher margins per bit, longer contract terms, more predictable demand. Consumer electronics companies face BOM inflation that compresses margins or forces price increases. The transition is structural, not cyclical — AI’s memory appetite grows faster than new supply comes online.

2026-05 Update — The Dual Supercycle and CXL 3.0

Sriram Krishnan (AI Economics Part 2) frames HBM demand as occurring in two distinct supercycles, each with a different root cause:

First supercycle — driven by training. Frontier models needed thousands of GPUs fed by HBM running uninterrupted weeks-long jobs. This was the supercycle that the existing fab buildout (and the consumer-crowding-out covered above) was responding to.
Second supercycle — driven by agentic inference. Long context windows and growing task/tool histories overflow HBM and force constant spillover into DRAM. Where human inference sessions fit in HBM and discard quickly, agentic sessions hold growing state for hours. The agentic workload profile is the structural pull on a second HBM demand wave on top of the first.

This makes the HBM supply problem more durable than a single-wave training-cycle story would suggest. There is no end-of-training demand peak to wait out; agents are a structurally higher steady-state consumer of HBM than humans were.

The physical bonding constraint. HBM is bonded directly to the chip during packaging. You can only bond so much memory to a GPU before you run out of physical space — capacity per accelerator is bounded by package geometry, not just fab capacity. This is why DRAM (off-chip, scalable) cannot substitute for HBM despite being much cheaper per bit.

CXL 3.0 as the near-term fix. The most promising architectural workaround is Compute Express Link 3.0, which lets the CPU and GPU share a unified memory pool directly, eliminating the PCIe highway as a bottleneck. This would relax the HBM constraint by giving agentic workloads coherent access to a larger pooled memory rather than forcing spillover through slow PCIe paths. Commercial deployment at scale is 2-3 years out — too far away to ease the current crunch.

EUV Lithography as the Binding Constraint on AI Scaling — the other hardware bottleneck, at the logic layer
CUDA Programmability Moat - Why Flexibility Beats Optimization — why bandwidth matters more than raw FLOPs
Inference Cost Collapse and Frontier Model Margin Expansion — inference economics depend on memory bandwidth
AI and Investing Thesis — parent hub
AI Stack Value Accrual - Chip, Infra, Intelligence, App — HBM crowding is the memory constraint on the chip layer of the value accrual stack
Dylan Patel / SemiAnalysis — source

AI Memory Crowding — HBM Eats Consumer Device Budgets

Why HBM, not DRAM

The crowding mechanism

Investment implications

2026-05 Update — The Dual Supercycle and CXL 3.0

Related Notes