BF16 (bfloat16)

Sovereign AI

BF16, or bfloat16 (brain floating-point), is a 16-bit numeric format developed by the Google Brain team for neural-network workloads. It allocates one sign bit, eight exponent bits, and seven mantissa bits. That eight-bit exponent is the entire point: it gives BF16 the same dynamic range as 32-bit FP32, just with far less precision per value. The design encodes a hard-won lesson from training large networks: neural nets are far more sensitive to the range of representable values (the exponent) than to their precision (the mantissa).

Range over precision

The older IEEE FP16 format spends its 16 bits differently — a 5-bit exponent and 10-bit mantissa. The extra mantissa sounds better, but the narrow exponent means small gradient values underflow to zero and large activations overflow, destabilizing training. Practitioners compensate with loss scaling, multiplying the loss so gradients shift into FP16's representable band and unscaling before the weight update — workable, but one more thing to tune and one more way for training to silently fail. BF16 sidesteps the whole problem: because it preserves FP32's exponent range, values that would vanish in FP16 survive untouched, and models can usually be dropped into BF16 with no loss scaling and no hyperparameter retuning. Converting between BF16 and FP32 is also trivial — truncate or pad the mantissa — which simplifies mixed-precision pipelines.

The precision trade

Seven mantissa bits is coarse: BF16 can only distinguish about two to three significant decimal digits. Deep networks tolerate this remarkably well because their statistics average over millions of parameters, but the tolerance has limits. The standard mixed-precision recipe therefore keeps a full-precision FP32 copy of the master weights for the optimizer update, accumulates matrix-multiply results in FP32, and uses BF16 for the bulk storage and arithmetic in between. You get most of the memory and speed win while the numerically delicate steps stay precise.

Why sovereign builders care

For anyone doing fine-tuning or training on owned hardware rather than rented cloud, BF16 roughly halves the memory and bandwidth cost of weights, activations, and gradients compared to FP32. Half the bytes means larger models fit in the same VRAM, bigger batches fit per step, and the same card simply goes further. Modern accelerators implement BF16 natively in their matrix-multiply units, so the speedup is real silicon throughput, not just storage savings. BF16 is also the format most open-weight model checkpoints ship in, making it the natural starting point before any further compression.

Where it sits in the precision ladder

BF16 occupies the safe middle of the reduced-precision toolkit: FP32 above it for numerically fragile steps, FP8 and integer formats below it for even more aggressive savings. For inference specifically, weights are often compressed further still — see quantization — but training and fine-tuning workflows keep coming back to BF16 because it is the format that just works: FP32's range, half the memory, and no scaling gymnastics.

BF16, or bfloat16 (brain floating-point), is a 16-bit numeric format developed by the Google Brain team for neural-network workloads. It allocates one sign bit, eight…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners