Definition
FLOPS (floating-point operations per second) is the standard measure of a processor's raw arithmetic throughput — how many floating-point calculations it can perform each second. Because training and running neural networks is overwhelmingly floating-point matrix math, FLOPS is the headline number vendors quote when comparing AI accelerators. It scales with familiar prefixes: a GFLOP is 10⁹, a TFLOP 10¹², and a PFLOP 10¹⁵ operations per second.
Precision changes the number
A FLOPS figure is meaningless without its precision. The same chip delivers very different numbers at FP64 (double), FP32 (single), FP16/BF16 (half), and FP8 — each lower precision roughly doubling throughput. Modern AI accelerators advertise their biggest numbers at low precision (FP16/FP8) because that is what neural networks use, and those gains come from dedicated Tensor Cores. Always compare like precision for like.
Why FLOPS is only half the story for inference
Peak FLOPS describes the compute ceiling, but real workloads rarely hit it. Training typically achieves only 30–50% of peak (model-FLOPs utilization) because the cores stall waiting on data. For inference — especially generating one token at a time — the binding constraint is usually memory bandwidth, not FLOPS: the hardware spends its time streaming weights from memory, not multiplying. This is why a card's real-world tokens per second often tracks bandwidth better than its TFLOPS rating.
When sizing hardware for a local LLM, read FLOPS alongside memory bandwidth and VRAM — together they predict usable performance far better than any single number.
In Simple Terms
FLOPS (floating-point operations per second) is the standard measure of a processor’s raw arithmetic throughput — how many floating-point calculations it can perform each second.…
