A sampling parameter that limits token selection to the most probable options.
Top-p (also called Nucleus Sampling) is a text generation parameter that limits token selection to the smallest set of tokens whose cumulative probability exceeds the threshold p. Unlike temperature, which adjusts all probabilities, top-p dynamically adjusts the candidate pool based on the probability distribution.
With top-p = 0.9, the model considers only the tokens comprising the top 90% of probability mass, ignoring the long tail of unlikely tokens. This can produce more coherent output than high temperature settings while still allowing diversity.
In practice, top-p and temperature work together: many applications set one and leave the other at default values. A common approach is temperature 0.7 with top-p 1.0, or temperature 1.0 with top-p 0.9. Engineers should experiment to find optimal settings, and some APIs recommend adjusting only one parameter at a time for predictable behavior.
A sampling parameter that limits token selection to the most probable options.
Join our network of elite AI-native engineers.