Skip to content

Bitcoin accepted at checkout  |  Ships from Laval, QC, Canada  |  Expert support since 2016

Grouped-Query Attention (GQA)

Sovereign AI

Definition

Grouped-query attention (GQA) is a memory-saving form of self-attention introduced in a 2023 Google paper by Ainslie and colleagues. It sits between two extremes: standard multi-head attention, where every query head has its own key and value heads, and multi-query attention, where all query heads share a single key/value head. GQA splits the query heads into groups, and each group shares one key/value head, giving a tunable trade-off between quality and efficiency.

Why fewer key/value heads matters

During generation, a Transformer caches the key and value vectors of every past token so it does not recompute them, the so-called KV cache. This cache, not the model weights, often dominates memory when context grows long. By shrinking the number of key/value heads, GQA cuts KV-cache size by the group ratio, easing the memory bandwidth bottleneck that limits generation speed. The authors also showed existing multi-head checkpoints can be converted to GQA with a small fraction of original training compute.

Practical impact for sovereign operators

Most modern open-weight models, including the Llama and Mistral families, ship with GQA precisely because it lets long-context models run on consumer GPUs. When you compare whether a model fits your hardware, the number of key/value heads is as important as the parameter count.

See also positional encoding (RoPE) and Transformer.

In Simple Terms

Grouped-query attention (GQA) is a memory-saving form of self-attention introduced in a 2023 Google paper by Ainslie and colleagues. It sits between two extremes: standard…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners