Context Windows

The budget your model runs on

The context window is the total number of tokens your model can process in one call — input + output combined. It's not just a limit, it's a budget: every token you spend on instructions is a token you can't spend on reasoning.

Why this matters

Context engineering — what goes in, what stays out — is the new prompt engineering. Teams that manage this well get 3-5x better results from the same model.

Prerequisites

Tokenization

Why your LLM sees 'ChatGPT' as 3 tokens, not 1 word

Leads to

Context Engineering

The new prompt engineering

RAG

Retrieval-Augmented Generation: when and how