Definition
High Bandwidth Memory (HBM) is a memory architecture that stacks multiple DRAM dies vertically and places them directly beside the processor on a silicon interposer. By using an extremely wide interface, HBM delivers far more bandwidth than traditional GDDR memory, which is why it equips the data-center GPUs used to train and serve the largest AI models. For sovereign AI, HBM is the dividing line between consumer cards and the accelerators that can hold a frontier model in memory.
How the stack works
HBM stacks DRAM dies one atop another and connects them with Through-Silicon Vias (TSVs), vertical interconnects that pass straight through the silicon to link the layers. The stack sits next to the compute die on a silicon interposer in a 2.5D package, where the interposer routes a very wide bus, on the order of 1024 bits per stack, into the GPU's memory controller. Current HBM3e stacks exceed 1.2 TB/s of bandwidth each, with multiple stacks per GPU.
Why bandwidth is the bottleneck
Large language model inference is often memory-bandwidth bound rather than compute bound: the GPU must stream billions of weights from memory for every token it generates. HBM's bandwidth is what lets high-end accelerators produce tokens quickly. Consumer GPUs rely on GDDR instead, which is cheaper but slower and capacity-limited, shaping which models a home builder can realistically run.
HBM feeds the Tensor Cores that do the arithmetic, and across multiple GPUs its data moves over interconnects like NVLink.
In Simple Terms
High Bandwidth Memory (HBM) is a memory architecture that stacks multiple DRAM dies vertically and places them directly beside the processor on a silicon interposer.…
