Definition
Scaling laws are empirical relationships showing that a language model's prediction error (loss) decreases smoothly and predictably as you increase three quantities: the number of model parameters, the size of the training dataset, and the amount of compute spent training. Documented by Kaplan and colleagues at OpenAI in 2020, these relationships hold as power laws across many orders of magnitude — loss falls roughly as a fixed fractional power of each input, so doubling a quantity yields a consistent, forecastable improvement.
Why predictability changed the field
Before scaling laws, the payoff of building a bigger model was uncertain. The discovery that loss follows a clean curve meant labs could extrapolate: measure performance at small scale, fit the curve, and predict how a model 100x larger would perform before spending the money to train it. This turned model development from guesswork into capital planning and directly motivated the race to ever-larger models.
The limits and the refinement
Scaling laws describe loss on the training objective, not necessarily usefulness on real tasks — and the curve does not promise the improvement is free. Kaplan's original work emphasized growing model size; a 2022 follow-up (the Chinchilla study) corrected the recipe, showing that data must scale in step with parameters for a fixed compute budget. The laws are also empirical observations, not guarantees: they can bend as data runs out or as architectures change.
The compute-optimal recipe that refined these curves is covered under Chinchilla-optimal training, and the sometimes-abrupt capability jumps that scaling can produce are discussed as emergent abilities.
In Simple Terms
Scaling laws are empirical relationships showing that a language model’s prediction error (loss) decreases smoothly and predictably as you increase three quantities: the number of…
