Roofline Model

Sovereign AI

The roofline model is a simple, visual performance model that bounds the maximum throughput a workload can achieve on a given processor using just two hardware limits: peak compute (FLOPS) and peak memory bandwidth (GB/s). Introduced by Williams, Waterman, and Patterson in 2009, it compresses everything that matters about an accelerator's headline specs into one chart — and has become the standard way to reason about whether AI hardware is actually being used well or merely owned.

How to read the chart

The horizontal axis is operational intensity — operations performed per byte of memory traffic — plotted on a log scale. The vertical axis is attainable performance in FLOPS. The chart's ceiling has two segments forming the eponymous roofline: a sloped line on the left, where attainable performance equals intensity times memory bandwidth — every additional operation-per-byte buys proportionally more speed — and a flat line on the right, where performance saturates at the chip's peak compute rate and no amount of data reuse helps further. The corner where they meet is the ridge point: the minimum intensity a workload needs to extract full compute from the machine. A high ridge point describes a chip whose math units outrun its memory — an accurate description of essentially every modern accelerator.

What it tells you

Plot a workload's operational intensity and you immediately see its ceiling. Land left of the ridge and the workload is memory-bound: faster compute will not help, but better data reuse, quantization, or more memory bandwidth will. Land right of it and the workload is compute-bound: only faster math units or lower-precision arithmetic move the needle. The vertical gap between measured performance and the roof shows exactly how much headroom an optimization could recover — and which kind of optimization is worth attempting. That diagnostic power, from two specs and one measurement, is why the model has survived fifteen years of hardware churn. Refinements add extra ceilings below the roof — for cache levels, or for peak throughput without specialized units like tensor cores — but the two-line version answers most real questions.

The roofline and the local AI box

For a sovereign builder sizing a machine for local inference, the roofline is the right lens because it exposes the trap in GPU marketing. LLM token generation has very low arithmetic intensity — each token streams the full weight set through the chip for little math per byte — so single-user inference lives far left of the ridge point on any modern GPU. On the sloped part of the roof, the spec that predicts your tokens per second is bandwidth, not TFLOPS; two cards with identical compute but different memory systems will generate at visibly different speeds. The model also explains, in one picture, why batching and longer prompts use hardware better (they slide the workload rightward) and why the same GPU that crawls through single-stream decode can saturate during prefill. Draw the roof for a machine before buying it, place your workload on the x-axis honestly, and the purchasing decision usually makes itself. See compute-bound vs memory-bound for the underlying distinction.

The model's honesty is its charm: it refuses to promise what hardware cannot deliver, and it makes hand-waving expensive. Teams that keep a roofline chart for their target machines catch mismatched purchases and futile optimizations early — spending a week accelerating the math in a memory-bound kernel is exactly the mistake the chart exists to prevent. The discipline translates to any measured system a miner already runs: identify which resource is the binding constraint before spending on the other one. A dashboard full of specs is decoration; a single chart that says "this is your ceiling, and this is why" is engineering.

The roofline model is a simple, visual performance model that bounds the maximum throughput a workload can achieve on a given processor using just two…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners