Definition
Top-k sampling is a decoding strategy that limits a language model's choice of next token to the k highest-probability candidates at each step. All other tokens have their probability set to zero, and the next token is sampled from the remaining renormalised set. By cutting off the long tail of unlikely tokens, top-k reduces the chance of the model wandering into incoherent text while keeping enough options to stay varied and natural.
How it behaves
The single parameter k sets how many candidates survive the cut. A small k (say 5 or 10) makes output more focused and conservative; a larger k (40 or more) allows more diversity. Because k is a fixed count, top-k always keeps exactly that many options regardless of how confident the model is — which is its main weakness: when the model is very sure, k may still admit poor candidates, and when it is unsure, k may cut off good ones.
Use in local inference
Top-k is one of the oldest and most widely supported sampling controls, exposed by essentially every local runtime. In practice it is often layered with temperature and top-p so their strengths complement each other — top-k caps the absolute number of candidates while top-p adapts to the distribution's shape. Tuning these together is part of getting good, repeatable behaviour from a model you run yourself.
See top-p (nucleus sampling) for the adaptive alternative and beam search for a deterministic, search-based approach.
In Simple Terms
Top-k sampling is a decoding strategy that limits a language model’s choice of next token to the k highest-probability candidates at each step. All other…
