Temperature (Sampling)

Sovereign AI

Temperature is the sampling parameter that controls how random a language model's output is at inference time. After the model computes raw scores (logits) for every possible next token, those logits are divided by the temperature value before the softmax function turns them into probabilities. That one division reshapes the entire distribution the model samples from — a single number trading determinism against variety, applied fresh at every token.

How the math behaves

With temperature below 1, the distribution sharpens: already-likely tokens dominate even more, so output becomes focused and repeatable. Above 1, the distribution flattens: unlikely tokens gain probability mass, so output grows diverse — and, past a point, incoherent, because the long tail of a 100,000-token vocabulary contains far more nonsense than inspiration. A temperature of 0 is treated as greedy decoding: always take the single most probable token, making responses deterministic for a fixed prompt. Typical practical values run from 0 to about 2. It is worth internalizing that temperature does not add knowledge or creativity to the model; it only changes how boldly the model samples from what it already believes.

Choosing a value

For factual answers, structured output, and code generation, low temperature (0 to 0.3) reduces drift and keeps results consistent — you generally want the model's best guess, not its tenth-best. For brainstorming, naming, and creative writing, higher values (0.7 to 1.2) produce useful variety. Temperature also interacts with the other samplers that usually run alongside it: top-k truncates the candidate list to the k most likely tokens, and top-p (nucleus sampling) keeps only the smallest set of tokens whose probabilities sum to p. These prune the tail so that a higher temperature can add variety among plausible tokens without opening the door to garbage. Most runners expose all three; changing one while forgetting the others is a classic source of mystifying output.

What temperature cannot fix

Low temperature reduces randomness, not error. A model that is confidently wrong stays wrong at temperature 0 — deterministically so. That failure mode belongs to hallucination, and no sampler setting eliminates it. Conversely, a touch of temperature is sometimes the cure for degenerate repetition loops that greedy decoding can fall into. Treat temperature as a style knob, not a truth knob.

Why self-hosters get the better deal

When you run an open-weight model on your own hardware, you control the sampler completely: temperature, top-p, top-k, and the random seed. Fix the seed and the settings, and inference becomes reproducible — the same prompt yields the same answer tomorrow, which matters for testing, auditing, and any workflow you need to trust. Vendor APIs may not expose every knob, may not honor seeds, and can change defaults silently. Reproducible behavior from infrastructure you control is the same property a sovereign operator demands from a node or a miner — verify, don't trust — applied to the model's dice.

Temperature is the sampling parameter that controls how random a language model’s output is at inference time. After the model computes raw scores (logits) for…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners