Skip to content

Bitcoin accepted at checkout  |  Ships from Laval, QC, Canada  |  Expert support since 2016

Fully Sharded Data Parallel (FSDP)

Sovereign AI

Definition

Fully Sharded Data Parallel (FSDP) is PyTorch's memory-efficient evolution of data parallelism. Where standard DistributedDataParallel keeps a full copy of the model, gradients, and optimizer states on every device, FSDP shards all three across the data-parallel workers. Each device only permanently holds its slice, which dramatically lowers the per-device memory footprint and makes very large models trainable on hardware that could never hold a full replica.

Gather, compute, discard

FSDP works by gathering the full parameters for a layer only at the moment they are needed. Before a layer's forward or backward pass, an all-gather reconstructs its complete weights on each device; once the computation finishes, the non-local shards are freed again. Gradients are reduced and scattered so each device keeps only the portion matching its parameter shard. Layers are usually wrapped in a nested fashion so only one unit's full parameters live in memory at a time.

The trade-off

This sharding trades communication for memory. FSDP moves more data across the interconnect than plain replication, but in return it fits larger models and bigger batches. Sharding strategies are configurable: FULL_SHARD shards parameters, gradients, and optimizer states, while SHARD_GRAD_OP shards only gradients and optimizer states. Sharded parameters can also be offloaded to CPU for extreme cases.

FSDP is closely related to the ZeRO (DeepSpeed) approach it draws from. Both build on Data Parallelism, and FSDP pairs naturally with Gradient Checkpointing for further memory savings.

In Simple Terms

Fully Sharded Data Parallel (FSDP) is PyTorch’s memory-efficient evolution of data parallelism. Where standard DistributedDataParallel keeps a full copy of the model, gradients, and optimizer…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners