High Bandwidth Memory (HBM)

Hardware

High Bandwidth Memory (HBM) is a memory architecture that stacks multiple DRAM dies vertically and places them directly beside the processor on a silicon interposer. By using an extremely wide interface, HBM delivers far more bandwidth than traditional GDDR memory, which is why it equips the data-center GPUs used to train and serve the largest AI models. For sovereign AI, HBM is the dividing line between consumer cards and the accelerators that can hold a frontier model in memory and feed it fast enough to be useful.

How the stack works

HBM stacks DRAM dies one atop another and connects them with Through-Silicon Vias (TSVs), vertical interconnects that pass straight through the silicon to link the layers. The stack sits next to the compute die on a silicon interposer in a 2.5D package, where the interposer routes a very wide bus — on the order of 1024 bits per stack — into the GPU's memory controller. Current HBM3e stacks exceed 1.2 TB/s of bandwidth each, with multiple stacks per GPU. Stacking is what makes that width physically achievable: fanning a thousand-plus wires out to a conventional circuit board is impractical, but keeping the memory millimetres from the compute die on an interposer is not, so the geometry of the package is itself part of the performance story.

Why bandwidth is the bottleneck

Large language model inference is often memory-bandwidth bound rather than compute bound: the GPU must stream billions of weights from memory for every token it generates, so the rate at which weights can be read sets the token rate. HBM's bandwidth is what lets high-end accelerators produce tokens quickly. Consumer GPUs rely on GDDR instead, which is cheaper but slower and capacity-limited, which shapes exactly which models a home builder can realistically run and how fast. This is also why quantization — shrinking each weight to fewer bits — speeds up inference: fewer bytes to stream per token means the same bus delivers more weights per second.

Capacity as well as speed

Bandwidth grabs the headlines, but capacity matters just as much. A model's weights, plus the growing key-value cache of an in-progress conversation, all have to fit in memory or spill to something slower. HBM stacks pack many gigabytes close to the die, letting a single accelerator hold a large model where a consumer card would have to shard it across several GPUs or offload to system RAM. For long-context work in particular, running out of memory is a harder wall than running out of speed.

What it means for home builders

HBM is largely why the most capable accelerators are expensive and scarce: the packaging is complex and the supply is constrained. A sovereign operator generally cannot buy HBM-class parts affordably, so the practical path is to fit smaller or quantized models into GDDR memory and lean on efficient kernels and caching to make the most of the bandwidth available. Knowing that bandwidth, not raw compute, is usually the limiter helps you spend a hardware budget where it actually counts.

It is worth being clear that HBM is not faster memory in the sense of lower latency for a single access; its advantage is enormous parallel throughput. The wide bus moves a huge volume of data per clock, which suits streaming millions of weights in order, but a random single-byte fetch is not dramatically quicker than on other memory. That distinction matters when reasoning about workloads: dense, sequential weight streaming is exactly what HBM excels at, which is why it pairs so naturally with the large matrix multiplies at the heart of AI. For a home builder priced out of HBM parts, the honest path is to accept that a consumer card's ceiling is its memory system, and to choose model sizes and quantization levels that live comfortably within the bandwidth and capacity you can actually buy.

HBM feeds the Tensor Cores that do the arithmetic, and across multiple GPUs its data moves over interconnects like NVLink; the memory it streams is written and read by every GPU kernel in the model.

High Bandwidth Memory (HBM) is a memory architecture that stacks multiple DRAM dies vertically and places them directly beside the processor on a silicon interposer.…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners