Definition
Memory bandwidth is the rate at which a processor can transfer data between its compute cores and its memory, measured in gigabytes per second (GB/s). A useful mental model: if VRAM is a warehouse whose size is its capacity in gigabytes, memory bandwidth is the width of the road leading to it. A wide road keeps the compute cores fed; a narrow one leaves them idle.
The memory wall
Modern AI hardware is frequently limited not by raw arithmetic but by data movement. In training, typical model-FLOPs utilization sits around 30–50% because GPUs stall waiting on memory. In inference, the effect is even sharper: generating each token requires streaming the model's weights and cache out of memory, so throughput is governed by bandwidth rather than FLOPS. This is the "memory wall" — adding more compute does not help once you are bandwidth-bound.
Why it matters for self-hosting
When sizing hardware for a local LLM, memory bandwidth often predicts real-world token speed better than the headline TFLOPS number. It is the reason high-end accelerators pair their cores with HBM, and a key spec to weigh when building an on-premise AI rig you intend to actually use day to day.
In Simple Terms
Memory bandwidth is the rate at which a processor can transfer data between its compute cores and its memory, measured in gigabytes per second (GB/s).…
