Skip to content

Bitcoin accepted at checkout  |  Ships from Laval, QC, Canada  |  Expert support since 2016

ZeRO (Zero Redundancy Optimizer)

Sovereign AI

Definition

The Zero Redundancy Optimizer (ZeRO) is a family of memory optimizations in Microsoft's DeepSpeed library. Ordinary data parallelism wastes memory by storing identical copies of the optimizer states, gradients, and parameters on every device. ZeRO removes that redundancy by partitioning these training states across the available devices, so each holds only a fraction, while preserving the simplicity and compute efficiency of data parallelism.

The three stages

ZeRO is applied incrementally. Stage 1 partitions the optimizer states (such as Adam's momentum and variance), the largest consumer in mixed-precision training, with almost no change to the communication pattern. Stage 2 additionally partitions the gradients, so each device keeps only the gradients matching its optimizer shard. Stage 3 goes furthest and partitions the model parameters themselves, automatically gathering and re-scattering them during forward and backward passes. Stage 3's memory reduction scales linearly with the number of devices, enabling models with hundreds of billions of parameters.

Picking a stage

Higher stages save more memory but move more data across the interconnect. A common practice is to start at the lowest stage that fits the model and only escalate when memory still overflows, balancing savings against communication overhead. ZeRO can also offload partitioned states to CPU or NVMe for the most extreme cases.

ZeRO is the conceptual ancestor of PyTorch's Fully Sharded Data Parallel (FSDP) and an extension of Data Parallelism. Combine it with Gradient Checkpointing to push memory limits further.

In Simple Terms

The Zero Redundancy Optimizer (ZeRO) is a family of memory optimizations in Microsoft’s DeepSpeed library. Ordinary data parallelism wastes memory by storing identical copies of…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners