Definition
Ground truth is the set of known-correct answers a supervised machine-learning model learns from and is measured against. Each training example is paired with a label — the factual target the model should produce — and the model adjusts its parameters to bring its predictions closer to those labels. In supervised learning the ground truth is, in effect, the teacher: it supplies the correct answers, and learning is the process of minimizing the gap between predictions and that truth.
Where labels come from
Most ground truth is produced by human annotation: people tag images, classify text, or mark the correct output for each record. This is slow, expensive, and repetitive, which is why labeled data is often the scarcest resource in an AI project. The same value is called a "label" in classification work or an "annotation" in linguistic and medical contexts. In a labeled dataset, the ground truth is the dependent variable the features are used to predict.
Garbage labels, garbage model
Quality is decisive. A model can only ever be as good as the truth it was trained on — noisy, biased, or inconsistent labels propagate straight into predictions. Human annotators disagree (inter-annotator disagreement), and unexamined bias slips into labels unnoticed. Serious projects measure annotator agreement, adjudicate conflicts, and audit a sample for accuracy rather than trusting raw label dumps. Verifying your own ground truth is also a defense against inheriting hidden bias from someone else's dataset.
Labeled data is the foundation a model trains on, and how you partition it determines honest evaluation — see the train-test split, and note that curated labels often live alongside features in a feature store.
In Simple Terms
Ground truth is the set of known-correct answers a supervised machine-learning model learns from and is measured against. Each training example is paired with a…
