Skip to content

Bitcoin accepted at checkout  |  Ships from Laval, QC, Canada  |  Expert support since 2016

Grokking

Sovereign AI

Definition

Grokking is a training phenomenon in which a neural network first memorizes its training data — reaching near-perfect training accuracy while performing poorly on held-out examples — and then, after a long period of apparently stalled progress, abruptly begins to generalize, with test accuracy climbing sharply. First reported by Power and colleagues in 2022 on small models trained on arithmetic tasks, grokking is notable because generalization arrives long after overfitting, sometimes after orders of magnitude more training steps.

What it reveals about learning

The standard intuition is that a model that has fit the training data perfectly has finished learning. Grokking contradicts this: beneath an unchanging training loss, the network keeps reorganizing its internal representations, eventually discovering the underlying rule rather than a lookup table of memorized answers. Research links the transition to regularization pressure (such as weight decay) gradually pushing the model toward a simpler, more general solution once memorization no longer reduces loss.

Why it matters

Grokking is a clean laboratory example of the gap between memorization and true generalization — the same gap that separates a model that has genuinely learned a concept from one that has merely pattern-matched its training set. It cautions practitioners that early-stopping on a validation plateau can quit just before the model would have generalized, and it gives theorists a reproducible setting to study how and when deep networks learn structure. For self-hosters, it is mostly a conceptual lesson: model quality is not always visible from training metrics alone.

Grokking sits alongside emergent abilities as a case where capability appears suddenly, and both complicate naive readings of scaling laws.

In Simple Terms

Grokking is a training phenomenon in which a neural network first memorizes its training data — reaching near-perfect training accuracy while performing poorly on held-out…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners