Epoch (Training)

Sovereign AI

In model training, an epoch is one complete pass of the entire training dataset through the learning algorithm. During a single epoch, the model sees every example in the dataset once and adjusts its parameters accordingly. Training almost always involves multiple epochs, because a single pass is rarely enough for a model to learn the patterns in the data — though as models and datasets have grown, the trend has been toward fewer passes over ever-larger corpora.

Epochs, batches, and iterations

An epoch is usually broken into smaller chunks called batches. The model processes one batch at a time, computes the error, and updates its weights after each; a full sweep through all the batches makes up one epoch. So if a dataset of 10,000 examples is split into batches of 100, one epoch consists of 100 iterations (weight updates). Batch size, learning rate, and epoch count interact: bigger batches give smoother but fewer updates per epoch, and learning-rate schedules are typically defined in terms of total training steps or epochs. The number of epochs is itself a hyperparameter, set before training begins and tuned like any other.

Choosing how many

The number of epochs is a balancing act. Too few and the model underfits, never fully learning the structure in the data; too many and it begins to memorize its training set — noise, quirks, and all — instead of generalizing, the failure mode known as overfitting. The standard discipline is to hold out a validation set, evaluate after each epoch, and stop once validation performance plateaus or degrades: early stopping. Saving a model checkpoint at the end of promising epochs lets you recover the best version even if later epochs make things worse. Data augmentation, dropout, and weight decay all raise the number of useful epochs by making memorization harder.

Epochs in the LLM era

Large language model pretraining inverted the classic picture. Foundation models are typically trained for roughly one epoch — or less — over enormous text corpora, because repeating data at that scale yields diminishing returns and encourages memorization of the training set. Fine-tuning swings back the other way: adapting a pretrained model to a small, task-specific dataset commonly uses a handful of epochs, and going much beyond that is the fastest way to make a capable base model parrot your few hundred examples verbatim. If you are fine-tuning an open-weight model on your own documents, watching validation loss per epoch is the single most important dial you have.

Why it matters for the self-hosted builder

One terminology trap: much modern tooling reports progress in steps (weight updates) rather than epochs, since streaming datasets and mixed corpora make "one full pass" fuzzy to define. Converting between the two is worth doing consciously — steps per epoch equals dataset size divided by effective batch size, including any gradient-accumulation multiplier. When a fine-tuning recipe says "train for 1,000 steps," check what fraction of your dataset that represents: on a small corpus it may amount to many epochs and quiet overfitting, while on a large one it may not even complete a single pass.

For the sovereign-AI builder running a training job on local hardware, epochs are also the unit of planning. One epoch's wall-clock time tells you what the whole run will cost in hours and electricity — a familiar calculation for anyone who has sized a mining deployment. Checkpointing every epoch protects a multi-day run against power loss the way good firmware protects a miner against corrupt flash: you lose an epoch, not the job. And because each additional epoch is paid for in your own watts on your own GPU, the overfitting question becomes pleasantly concrete: is another pass through the data actually buying generalization, or just burning kilowatt-hours to memorize what you already had? For related concepts, see hyperparameter, overfitting, and model checkpoint.

In model training, an epoch is one complete pass of the entire training dataset through the learning algorithm. During a single epoch, the model sees…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners