Skip to content

Bitcoin accepted at checkout  |  Ships from Laval, QC, Canada  |  Expert support since 2016

Train-Test Split

Sovereign AI

Definition

A train-test split is the practice of dividing a dataset into separate subsets so that a model is evaluated on data it never learned from. It is the primary defense against overfitting — the failure mode where a model memorizes its training examples and then performs poorly on anything new. By holding back a portion of data, you get an honest estimate of how the model will behave in the real world rather than a flattering score on examples it has already seen.

Three subsets, three jobs

Most workflows use three partitions. The training set, the largest share, is what the model learns its parameters from. The validation set guides development decisions — tuning hyperparameters like learning rate and choosing between model variants — without contaminating the final evaluation. The test set is a strict holdout used exactly once, at the end, for an unbiased read on generalization. Common ratios are 70/15/15 or 80/10/10. A model that scores high on training but poorly on validation and test is overfitting.

How splitting goes wrong

The cardinal sin is leakage: letting information from the test set bleed into training, which produces dishonestly high scores that collapse in production. Time-ordered data must be split chronologically, never randomly, or the model effectively peeks at the future. Imbalanced classes need stratified splitting so each subset reflects the real distribution. For anyone training or fine-tuning models on their own hardware, a clean split is what separates a measurable result from wishful thinking.

The data being split is the labeled ground truth / labeled data, which usually arrives through a data pipeline / ETL process before partitioning.

In Simple Terms

A train-test split is the practice of dividing a dataset into separate subsets so that a model is evaluated on data it never learned from.…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners