Skip to content

Bitcoin accepted at checkout  |  Ships from Laval, QC, Canada  |  Expert support since 2016

Gradient Checkpointing

Sovereign AI

Definition

Gradient checkpointing, also called activation checkpointing, is a technique that trades extra computation for reduced memory. During an ordinary forward pass, a network stores every intermediate activation so they are available for the backward pass. For large models these activations dominate memory use. Gradient checkpointing keeps only a strategic subset and frees the rest, then recomputes the discarded activations on the fly during backpropagation, starting from the nearest saved checkpoint.

The compute-for-memory bargain

Because the freed activations must be regenerated, the forward computation for checkpointed regions effectively runs twice. In practice this can cut activation memory by roughly 5x at the cost of a typical training-throughput hit in the 25-35% range. For a sovereign builder running on a single GPU or a small rig, that bargain often makes the difference between a model that trains and one that simply will not fit.

Selective recomputation

Modern frameworks offer selective activation checkpointing, which gives finer control over what gets recomputed. Expensive operations like large matrix multiplications can be preserved in memory while cheap pointwise operations are recomputed, recovering most of the memory savings with less of the speed penalty. This tuning is increasingly automated in PyTorch and similar stacks.

Gradient checkpointing pairs naturally with sharded training. Combine it with Fully Sharded Data Parallel (FSDP) or ZeRO (DeepSpeed) to stack memory savings, and with Gradient Accumulation to fit larger effective batches.

In Simple Terms

Gradient checkpointing, also called activation checkpointing, is a technique that trades extra computation for reduced memory. During an ordinary forward pass, a network stores every…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners