Skip to content

Bitcoin accepted at checkout  |  Ships from Laval, QC, Canada  |  Expert support since 2016

MMLU (Massive Multitask Language Understanding)

Sovereign AI

Definition

MMLU (Massive Multitask Language Understanding) is a benchmark introduced by Dan Hendrycks and colleagues in 2020 to measure how much world knowledge a large language model has absorbed and how well it can reason across many domains. It poses 15,908 four-option multiple-choice questions spanning 57 subjects, ranging from elementary mathematics, US history, and computer science to professional-level law, medicine, and moral philosophy. Questions are drawn at high-school to graduate and professional-exam difficulty, so a high score requires both broad recall and the ability to apply that knowledge.

How scoring works

A model is shown a question and four answer choices and must select the correct option. The headline number is simple accuracy, usually reported as a single percentage averaged across all subjects. Because random guessing yields 25%, scores near that floor indicate little real understanding, while top contemporary models exceed 85%. Results are commonly reported in a few-shot setting, where the model first sees a handful of worked examples before answering.

Why it matters for evaluation

MMLU became a default yardstick for general-purpose model capability because its breadth resists the narrow over-fitting that single-topic tests invite. Its weaknesses are well documented: some original questions contain errors or ambiguous answers, and as the benchmark aged, training-data contamination became a concern, prompting harder successors such as MMLU-Pro. For anyone reading a model card, an MMLU figure is a coarse but useful signal of broad knowledge, best interpreted alongside reasoning- and code-specific tests.

MMLU is one entry in a wider family of evaluations covered in our glossary, including the reasoning-focused GPQA benchmark and the math-focused GSM8K benchmark. Understanding these helps a sovereign operator judge which open model is fit for a self-hosted stack.

In Simple Terms

MMLU (Massive Multitask Language Understanding) is a benchmark introduced by Dan Hendrycks and colleagues in 2020 to measure how much world knowledge a large language…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners