MMLU (Massive Multitask Language Understanding)

Sovereign AI

MMLU (Massive Multitask Language Understanding) is a benchmark introduced by Dan Hendrycks and colleagues in 2020 to measure how much world knowledge a large language model has absorbed and how well it can reason across many domains. It poses 15,908 four-option multiple-choice questions spanning 57 subjects, ranging from elementary mathematics, US history, and computer science to professional-level law, medicine, and moral philosophy. Questions are drawn at high-school to graduate and professional-exam difficulty, so a high score requires both broad recall and the ability to apply that knowledge.

How scoring works

A model is shown a question and four answer choices and must select the correct option. The headline number is simple accuracy, usually reported as a single percentage averaged across all subjects. Because random guessing yields 25%, scores near that floor indicate little real understanding, while top contemporary models exceed 85%. Results are commonly reported in a few-shot setting, where the model first sees a handful of worked examples before answering.

Why it matters for evaluation

MMLU became a default yardstick for general-purpose model capability because its breadth resists the narrow over-fitting that single-topic tests invite. Its weaknesses are well documented: some original questions contain errors or ambiguous answers, and as the benchmark aged, training-data contamination became a concern, prompting harder successors such as MMLU-Pro. For anyone reading a model card, an MMLU figure is a coarse but useful signal of broad knowledge, best interpreted alongside reasoning- and code-specific tests.

MMLU is one entry in a wider family of evaluations covered in our glossary, including the reasoning-focused GPQA benchmark and the math-focused GSM8K benchmark. Understanding these helps a sovereign operator judge which open model is fit for a self-hosted stack.

MMLU (Massive Multitask Language Understanding) is a benchmark introduced by Dan Hendrycks and colleagues in 2020 to measure how much world knowledge a large language…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners

MMLU (Massive Multitask Language Understanding)

Definition

How scoring works

Why it matters for evaluation

In Simple Terms

Explore the Full Glossary

ASIC Miner Database