Attention is the mechanism that lets each token look at every other token to understand context. It's why 'bank' means different things in 'river bank' vs 'bank account' — and why doubling your context length quadruples your compute cost.
Why this matters
Attention is the bottleneck. It's why long-context models are expensive and why context window management is an engineering discipline, not an afterthought.