LearnHow LLMs WorkAttention
How LLMs Work

Attention

Why your inference costs scale quadratically

Attention is the mechanism that lets each token look at every other token to understand context. It's why 'bank' means different things in 'river bank' vs 'bank account' — and why doubling your context length quadruples your compute cost.

Why this matters

Attention is the bottleneck. It's why long-context models are expensive and why context window management is an engineering discipline, not an afterthought.