How LLMs Work
Temperature & Sampling
Why the same prompt gives different answers
Temperature controls randomness. At 0, the model always picks the most likely next token. At 1, it samples from the full distribution. Top-p and top-k further shape which tokens are even considered. It's the difference between a calculator and a creative writer.
Why this matters
Wrong temperature = wrong results. Code generation needs low temperature (precision). Creative tasks need higher temperature (variety). Most teams use the default and never tune it.