Passer au contenu

Bitcoin accepté au paiement  |  Expédié depuis Laval, QC, Canada  |  Soutien expert depuis 2016

Linear Attention

Sovereign AI

Definition

Linear attention is a family of attention mechanisms that reduce the cost of self-attention from quadratic to linear in the sequence length. Standard softmax attention compares every token to every other token, costing time and memory proportional to the square of the sequence length. Linear attention sidesteps this by approximating the softmax similarity with a kernel: each query and key is passed through a feature map, and the resulting dot products replace the exponential comparison. Because the operation is now a plain product of feature maps, the associative property of matrix multiplication lets the model aggregate keys and values once and reuse that summary, achieving linear-time complexity.

The kernel trick at the core

The defining choice in any linear-attention method is the feature map applied to queries and keys. The original Linear Transformer uses a simple elementwise map that keeps similarities non-negative; later methods like Performer use randomized features to approximate softmax more faithfully. In recurrent form, linear attention maintains a fixed-size state matrix updated token by token, which is why it delivers constant-memory inference rather than a key-value cache that grows without bound.

Trade-offs to know

Linear attention buys efficiency by giving up the exact softmax structure, which can cost accuracy on tasks needing sharp, precise recall and can affect training stability. This is why many modern systems pair it with full attention or gating rather than using it alone.

Linear attention is the conceptual foundation under several architectures covered elsewhere in this glossary, including RWKV, gated linear attention, and the broader class of sub-quadratic attention methods that let capable models run on hardware you own.

In Simple Terms

Linear attention is a family of attention mechanisms that reduce the cost of self-attention from quadratic to linear in the sequence length. Standard softmax attention…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Glossaire du minage

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Comparer les mineurs