Skip to content

Bitcoin accepted at checkout  |  Ships from Laval, QC, Canada  |  Expert support since 2016

Softmax

Sovereign AI

Definition

Softmax is the function that turns a vector of arbitrary real-valued scores (logits) into a probability distribution: every output is positive and the outputs sum to one. It does this by exponentiating each score and dividing by the sum of all exponentials, so the largest scores dominate while smaller ones still receive nonzero weight. Softmax appears in two critical places inside every language model, making it one of the most-executed operations in self-hosted inference.

Two roles in a transformer

First, attention uses softmax to convert query-key similarity scores into the weights that blend value vectors, deciding how much each token attends to every other. Second, the final output layer applies softmax over the vocabulary to produce the probability of each possible next token. The temperature parameter you adjust during local inference rescales the logits before softmax: low temperature sharpens the distribution toward the top choice, high temperature flattens it for more variety.

Practical considerations

Softmax is computed in a numerically stable form (subtracting the max logit first) to avoid overflow, and it pairs naturally with cross-entropy loss during training, which simplifies the gradient. For operators, softmax is where sampling controls like temperature, top-k, and top-p take effect, the knobs that shape how deterministic or creative your local model's output is.

Softmax drives both the routing in sparse models and the attention weights; see Mixture of Experts (MoE) and the position signal feeding attention in Rotary Position Embedding (RoPE).

In Simple Terms

Softmax is the function that turns a vector of arbitrary real-valued scores (logits) into a probability distribution: every output is positive and the outputs sum…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners