The Two-System Brain — Why LLMs Are Data-Hungry and Biological Learners Are Not

The two systems

Learning Subsystem (cortex). General-purpose, uniform, repeating architecture. Learning happens in synaptic plasticity, not pre-wired circuits. Closer to Yann LeCun’s energy-based models than next-token prediction — the brain does omnidirectional inference (any node predicts from any other subset), not fixed left-to-right prediction. Chain-of-thought in LLMs may roughly approximate the brain’s native probabilistic sampling.

Steering Subsystem (subcortical). Hypothalamus, amygdala, brainstem. Innate, genetically wired. Provides reward functions, reflexes, and attention signals. Far more diverse cell types than cortex — RNA sequencing confirms unique cell types per innate reflex. Evolution wired social status instincts that attach to whatever entities the cortex learns are relevant. The amygdala trains predictors that generalize from primitive reflexes (spider on skin) to abstract concepts (“there’s a spider on your back”).

Adam Marblestone: “Intelligence is not just omnidirectional inference, but a system that drives attention and learning of a general world model.”

The loss function gap

Current ML	Biology
One simple loss (next-token, cross-entropy)	Many distinct loss functions per brain region
One curriculum	Stage-specific activation (developmental curriculum)
Data-hungry (trillions of tokens)	Efficient (better priors from loss function diversity)
No value function during base training	Basal ganglia: model-free RL + cortex: world model with value head for free

This gap is where the sample efficiency difference lives. LLMs start from random weights and need trillions of tokens because they lack the steering subsystem’s priors. The brain’s 3GB genome doesn’t store the world model (the cortex learns that). It stores: the learning subsystem architecture (compact — cortex is a repeating structure), the steering subsystem wiring (elaborate — explains the diverse subcortical cell types), and the reward functions (compact — “a line of Python” / ~1,000 lines).

Connection to RL scaling

This directly informs RL Scaling Follows Pre-Training - The Generalization Inflection Ahead. Amodei’s observation that RL now shows log-linear scaling maps to the brain’s architecture: pre-training builds the cortex-equivalent (general world model), RL adds steering-equivalent signals (reward functions that shape behavior). The GPT-1→GPT-2 generalization moment Amodei expects for RL parallels what happens when the steering subsystem’s diverse reward signals activate across broader task distributions.

Amodei’s response to the sample efficiency objection (models start from random weights, brains start with evolutionary priors) is exactly Marblestone’s framework stated differently.

The solvable investment thesis

Marblestone argues the brain’s algorithmic questions are answerable with hundreds of millions to low single-digit billions in neuroscience investment. Convergent Research’s bet: optical connectomics (photon-based, not electron microscopy) that preserves molecular labels. Target: molecularly annotated connectomes showing receptor types at each synapse.

Current cost: mouse brain via EM costs several billion. Optical target: tens of millions. Human Genome Project parallel: $1/base pair → hundredths of a cent.

“We just happened to get GPUs before we got better brain scanners.”

Total to answer the brain’s algorithmic questions: a rounding error compared to the trillions being spent on GPU clusters. The insight that could make LLMs 1,000x more sample-efficient might cost 0.1% of what we’re spending on brute-force scaling.

RL Scaling Follows Pre-Training - The Generalization Inflection Ahead — RL scaling as adding steering-like signals to the cortex-like base model
Two Exponentials - AI Capability vs Economic Diffusion — the capability curve that brain-inspired improvements would steepen
Three Waves of AI Opportunity - Unhobbling, Physical Interface, Robotics — Wave 3 (robotics) requires the kind of efficient learning the steering subsystem provides
Adam Marblestone on Dwarkesh Patel — source clipping

The Two-System Brain — Why LLMs Are Data-Hungry and Biological Learners Are Not

The two systems

The loss function gap

Connection to RL scaling

The solvable investment thesis

Related Notes