Skip to content

Bitcoin accepted at checkout  |  Ships from Laval, QC, Canada  |  Expert support since 2016

KV Cache (Key-Value Cache)

Sovereign AI

Definition

The KV cache, short for key-value cache, is the working memory that makes autoregressive text generation practical. During inference, a transformer's self-attention layers compute key (K) and value (V) tensors for every token. Rather than recomputing those tensors for the entire sequence each time a new token is produced, the model stores them in the KV cache and reuses them, appending only the new token's key and value as it goes.

Why It Matters for Speed

Without a KV cache, generating each new token would require re-running attention over the whole sequence so far, an O(n-squared) cost that grows brutally as text gets longer. The cache reduces per-token work to roughly O(n), commonly delivering several times faster generation. This is the single most important optimization separating a usable local chatbot from one that crawls.

The Memory Cost

That speed comes from spending memory. The KV cache grows linearly with sequence length, the number of layers, and the model's hidden size, so a long conversation can consume gigabytes of RAM or VRAM on its own, separate from the model weights. This is why long context windows are expensive to serve and why running big models locally often hits a memory wall before a compute wall. Techniques like cache quantization and paged attention exist specifically to tame it.

Understanding the KV cache explains why your local model slows down and eats memory as conversations lengthen. It pairs directly with our entries on Tokens per Second and Local LLM.

In Simple Terms

The KV cache, short for key-value cache, is the working memory that makes autoregressive text generation practical. During inference, a transformer’s self-attention layers compute key…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners