Skip to content

Bitcoin accepted at checkout  |  Ships from Laval, QC, Canada  |  Expert support since 2016

GSM8K (Grade School Math Benchmark)

Sovereign AI

Definition

GSM8K (Grade School Math 8K) is a benchmark released by OpenAI in 2021 to measure multi-step mathematical reasoning. It contains roughly 8,500 high-quality, linguistically diverse grade-school word problems written by human authors, split into about 7,500 training and 1,000 test problems. Each problem is solvable by a capable middle-school student but requires between two and eight sequential steps of basic arithmetic, so success depends on planning a chain of operations rather than recalling a single fact.

What it measures

A model reads a short story problem and must produce the correct final numerical answer, typically after working through intermediate steps. GSM8K is the benchmark that most clearly demonstrated the value of chain-of-thought prompting: encouraging a model to write out its reasoning before answering sharply improves accuracy. Scores are reported as the percentage of test problems answered correctly, and the dataset's careful construction keeps its estimated error rate below 2%.

Strengths and caveats

GSM8K isolates reasoning from advanced mathematical knowledge by keeping the arithmetic elementary, so a low score signals a reasoning failure rather than missing math facts. Its limitations include difficulty ceilings as frontier models approach saturation, and contamination risk now that the problems circulate widely; a contamination-controlled variant, GSM1k, was built partly to detect this. Harder successors such as MATH push into competition-level problems.

GSM8K's reasoning focus complements the broad-knowledge MMLU benchmark and the science-reasoning GPQA benchmark, helping a self-hoster gauge whether an open model can reason, not just recall.

In Simple Terms

GSM8K (Grade School Math 8K) is a benchmark released by OpenAI in 2021 to measure multi-step mathematical reasoning. It contains roughly 8,500 high-quality, linguistically diverse…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners