Skip to content

Bitcoin accepted at checkout  |  Ships from Laval, QC, Canada  |  Expert support since 2016

Pretraining

Sovereign AI

Definition

Pretraining is the first and most computationally expensive stage of building a large language model. The model is trained on a vast corpus of text to predict the next token in a sequence, and through that single objective it internalizes grammar, facts, coding patterns, and reasoning structure. The result is a base model, sometimes called a foundation model.

Self-supervised by design

Pretraining is described as self-supervised because the labels come from the data itself: for any position in the text, the "correct answer" is simply the token that actually follows. No human annotation is required, which is what allows training on internet-scale corpora measured in trillions of tokens. To predict those tokens well, the model is forced to compress an enormous amount of linguistic and world knowledge into its weights.

Base models versus assistants

A freshly pretrained base model is a powerful text predictor but not yet a helpful, instruction-following assistant. It will happily continue text without regard for intent or safety. Turning it into a usable assistant requires later stages such as instruction fine-tuning and preference alignment. Understanding this split matters for operators: base models offer maximum flexibility and minimal imposed behavior, while aligned models trade some of that openness for usability and guardrails.

Pretraining produces the foundation that later fine-tuning and alignment methods like RLHF refine. It is also where a model's capacity for in-context learning emerges, and it is the stage that determines the knowledge available to a local LLM you run yourself.

In Simple Terms

Pretraining is the first and most computationally expensive stage of building a large language model. The model is trained on a vast corpus of text…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners