Definition
Model FLOPs Utilization (MFU) measures how efficiently an AI workload uses its hardware. It is the ratio of the useful floating-point operations the model actually requires per second to the theoretical peak FLOPS of the hardware. Popularized by Google's PaLM team, MFU has become the standard apples-to-apples efficiency metric across different accelerators and software stacks.
How it is computed
You take the number of FLOPs needed for one forward (and, in training, backward) pass of the model, multiply by the throughput in tokens or samples per second, and divide by the cluster's aggregate peak FLOPS. Because the numerator counts only the model's intrinsic, implementation-independent arithmetic, MFU rewards genuine efficiency rather than wasted or redundant computation, and lets you compare a run on one chip directly against a run on another.
What good looks like
Achieving 100% MFU is impossible in practice: memory bandwidth limits, inter-GPU communication, pipeline bubbles, and software overhead all eat into it. Well-tuned large-model training typically lands in the 35–55% range — for reference, Llama 3.1 training reported roughly 38–43% MFU on H100 clusters. A low MFU signals that the hardware is stalling, often because the workload has slid into a memory-bound regime rather than staying compute-bound.
MFU turns the abstract roofline into a single number you can track and improve. See FLOPS and the roofline model for the theory behind why the gap from 100% exists.
In Simple Terms
Model FLOPs Utilization (MFU) measures how efficiently an AI workload uses its hardware. It is the ratio of the useful floating-point operations the model actually…
