Definition
Greedy decoding is the simplest text-generation strategy for a language model: at every step it selects the single token with the highest probability and never reconsiders. Because no randomness or search is involved, the output is fully deterministic for a given prompt and model, which makes it reproducible and cheap to compute. For a sovereign operator running a local model on a Hashcenter rig, greedy decoding is the predictable baseline against which every other sampling choice is measured.
How it works
After the model produces a probability distribution over the vocabulary (the softmax of the logits), greedy decoding takes the argmax and appends that token, then repeats. It commits to the locally best choice at each position without any backtracking. This is fast and avoids the strange tangents that high-temperature sampling can produce, which is why it is common for code completion, classification, and extraction tasks where one correct answer is expected.
The trade-off
The weakness is myopia. A token that looks best in isolation can lead down a path of low overall sequence probability, so greedy output tends to be bland and prone to repetition loops. Methods that keep multiple candidate paths alive, or that inject controlled randomness, exist precisely to escape this local-optimum trap. Greedy decoding is effectively a beam search with a beam width of one, and a temperature-based sampler with temperature set to zero.
For deeper context, compare greedy decoding with the broader-search alternative at Beam Search and with the randomized approach at Temperature Sampling.
In Simple Terms
Greedy decoding is the simplest text-generation strategy for a language model: at every step it selects the single token with the highest probability and never…
