Skip to content

Bitcoin accepted at checkout  |  Ships from Laval, QC, Canada  |  Expert support since 2016

Speculative Decoding

Sovereign AI

Definition

Speculative decoding is an inference optimisation that makes large language models generate text faster without changing what they would have produced. A small, fast draft model proposes several candidate tokens ahead; the large target model then checks all of those proposals in a single parallel forward pass and keeps the longest prefix it agrees with, generating the next token itself to stay on track. Because the target model still has the final say on every token, the output distribution is preserved exactly — it is a lossless speed-up, not an approximation.

Why it speeds things up

Autoregressive generation is normally bottlenecked by memory bandwidth: producing one token at a time leaves the GPU underused. Verifying many draft tokens at once makes far better use of the hardware and cuts inter-token latency, commonly yielding 2-3x speedups. The gain depends heavily on how often the draft model guesses correctly; when speculation accuracy is low, the wasted work on rejected tokens can erode the benefit.

Relevance to local inference

For someone running models on their own machine, speculative decoding is one of the few ways to get meaningfully faster responses without buying a bigger GPU. Several local inference engines support it, often pairing a tiny draft model with the main model you actually want answers from.

It pairs naturally with other serving optimisations such as continuous batching and flash attention. For background on the per-token generation it accelerates, see inference.

In Simple Terms

Speculative decoding is an inference optimisation that makes large language models generate text faster without changing what they would have produced. A small, fast draft…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners