Temperature & Sampling

Why the same prompt gives different answers

Temperature controls randomness. At 0, the model always picks the most likely next token. At 1, it samples from the full distribution. Top-p and top-k further shape which tokens are even considered. It's the difference between a calculator and a creative writer.

Why this matters

Wrong temperature = wrong results. Code generation needs low temperature (precision). Creative tasks need higher temperature (variety). Most teams use the default and never tune it.

Prerequisites

Tokenization

Why your LLM sees 'ChatGPT' as 3 tokens, not 1 word