AI Memory Crowding — HBM Eats Consumer Device Budgets
Why HBM, not DRAM
The binding constraint on inference throughput is memory bandwidth, not compute FLOPs or capacity.
| Metric | HBM | DDR DRAM |
|---|---|---|
| Bandwidth | ~2.5 TB/s per stack | ~64-128 GB/s |
| Wafer area per bit | 3-4x more | 1x (baseline) |
| Cost per bit | Much higher | Lower |
| Value per bit in AI | Orders of magnitude higher | N/A for AI accelerators |
Switching to commodity DRAM would increase capacity per chip but leave compute cores idle waiting for data. Total tokens per dollar gets worse, not better.
The crowding mechanism
- DRAM vendors lost money in 2023 → delayed fab investment
- Prices recovered in 2024 when reasoning models + KV cache scaling made long-context mainstream
- New fabs take 2 years → meaningful capacity arrives late 2027-2028
- In the interim: AI demand claims an increasing share of fixed memory supply
- Consumer devices get squeezed — prices rise, volumes fall
Projected impact: smartphone volumes from 1.4B to 500-600M units. Xiaomi and Oppo already cutting low-end volumes by half. Memory vendors prefer AI contracts (longer terms, higher margins, more value per bit).
Investment implications
Memory vendors (SK Hynix, Samsung, Micron) benefit from the shift to HBM — higher margins per bit, longer contract terms, more predictable demand. Consumer electronics companies face BOM inflation that compresses margins or forces price increases. The transition is structural, not cyclical — AI’s memory appetite grows faster than new supply comes online.
Related Notes
- EUV Lithography as the Binding Constraint on AI Scaling — the other hardware bottleneck, at the logic layer
- CUDA Programmability Moat - Why Flexibility Beats Optimization — why bandwidth matters more than raw FLOPs
- Inference Cost Collapse and Frontier Model Margin Expansion — inference economics depend on memory bandwidth
- AI and Investing Thesis — parent hub
- AI Stack Value Accrual - Chip, Infra, Intelligence, App — HBM crowding is the memory constraint on the chip layer of the value accrual stack
- Dylan Patel / SemiAnalysis — source