Skip to content

Bitcoin accepted at checkout  |  Ships from Laval, QC, Canada  |  Expert support since 2016

Reward Model

Sovereign AI

Definition

A reward model is a neural network that takes a prompt and a candidate response and outputs a single scalar score predicting how much a human would approve of that response. It is the component that turns subjective human preferences into a numeric signal an optimizer can chase, and it sits at the heart of Reinforcement Learning from Human Feedback (RLHF).

How it is trained and used

The reward model is usually trained on pairwise comparisons, where annotators see two responses to the same prompt and mark which is better. Using a Bradley-Terry style objective, the model learns to assign higher scores to preferred responses. Once trained, it stands in for human judgment during the reinforcement phase: a policy-optimization algorithm such as PPO updates the language model to maximize the reward model's scores, effectively teaching the model to produce outputs humans would rate highly without a person in the loop for every step.

Strengths and pitfalls

A reward model lets human preference scale to millions of training updates, which is what makes RLHF practical. But it is only a proxy for real human values, and pushing a policy hard against an imperfect proxy is exactly the setup that produces reward hacking and sycophancy. This limitation is part of why newer methods such as Direct Preference Optimization aim to align models without a standalone reward model at all. For self-hosting builders, understanding the reward model clarifies both how today's assistants got their behavior and where that behavior can quietly go wrong.

See reward hacking for how reward models get exploited and Direct Preference Optimization for an approach that removes them.

In Simple Terms

A reward model is a neural network that takes a prompt and a candidate response and outputs a single scalar score predicting how much a…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners