Definition
Master weights are a full-precision FP32 copy of a model's parameters maintained throughout low-precision training. In mixed-precision training, weights, activations, and gradients are computed in a 16-bit format for speed and memory savings, but the authoritative weights are kept in FP32 and only those are updated by the optimizer. This FP32 copy is one of the three pillars of stable mixed-precision training, alongside loss scaling and accumulating products into FP32.
Why a separate copy is needed
During each optimizer step, the update applied to a weight is the gradient times the learning rate, a product that is frequently far smaller than the weight itself. In 16-bit precision such a tiny update can be entirely lost to rounding when added to a comparatively large weight, so the parameter effectively stops learning. Keeping the running weights in FP32 preserves these small but meaningful updates across thousands of steps, letting low-precision training match the final accuracy of full FP32 training.
The training loop in practice
Each iteration casts the FP32 master weights down to a 16-bit copy for the forward and backward passes, computes gradients cheaply, then applies those gradients to the FP32 master copy. Only the FP32 version persists between steps. This pattern halves the storage and bandwidth of the compute-heavy passes while protecting numerical fidelity where it matters most.
Master weights are why a model can train in BF16 or FP8 without degrading. They are also a meaningful part of the optimizer state memory budget that self-hosted builders must plan around.
In Simple Terms
Master weights are a full-precision FP32 copy of a model’s parameters maintained throughout low-precision training. In mixed-precision training, weights, activations, and gradients are computed in…
