Passer au contenu

Bitcoin accepté au paiement  |  Expédié depuis Laval, QC, Canada  |  Soutien expert depuis 2016

Flash Attention

Sovereign AI

Definition

Flash Attention is a fast, memory-efficient algorithm for computing the attention operation at the heart of every transformer model. It produces the exact same result as standard attention — it is not an approximation — but it is engineered to be IO-aware, meaning it minimises the slow reads and writes between the GPU's large high-bandwidth memory (HBM) and its small but fast on-chip SRAM. By tiling the computation and using an online-softmax trick, it never materialises the full attention matrix in global memory, avoiding the quadratic memory blow-up that makes long sequences expensive.

Why it matters

Standard attention is slow and memory-hungry because both its time and memory cost grow with the square of the sequence length. Flash Attention runs substantially faster (up to ~7.6x on GPT-2 in the original paper) and uses memory that scales linearly with sequence length, making longer context windows practical on the same hardware. For a self-hoster, that translates directly into being able to feed more text to a local model without running out of VRAM.

Where you encounter it

Most modern inference engines and training frameworks enable Flash Attention (or its successors) automatically when supported hardware is present. You rarely configure it by hand, but its presence is a major reason a given GPU can handle a larger context window than older software allowed.

It is one of several low-level tricks that make local inference viable; see also continuous batching for throughput on the serving side.

In Simple Terms

Flash Attention is a fast, memory-efficient algorithm for computing the attention operation at the heart of every transformer model. It produces the exact same result…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Glossaire du minage

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Comparer les mineurs