Definition
A verifier model is a model whose job is to judge the output of another model rather than to produce the answer itself. In a generator-verifier setup, a generator proposes one or more candidate answers, often by sampling several reasoning paths, and the verifier scores them so the best can be selected or weak ones rejected. Verifiers are central to making LLM reasoning more reliable, because generating an answer and checking an answer are different skills, and a good checker can rescue a generator that is right only some of the time.
Outcome verifiers versus process verifiers
Verifiers come in two main forms. An outcome verifier, or outcome reward model, judges only the final answer: is the end result correct. A process verifier, or process reward model, scores each intermediate reasoning step, which gives much finer-grained feedback but requires costly step-level supervision data to train. OpenAI's "Let's Verify Step by Step" work found that process supervision can significantly outperform outcome supervision on hard math problems, because it catches a flawed step before it poisons the conclusion.
Where verifiers fit
Verifiers are the scoring engine behind best-of-N sampling and search-based reasoning, and they supply the evaluation signal that self-correcting agents depend on. For a self-hosted stack, a dedicated verifier, even a smaller one, is often the most cost-effective way to lift accuracy: let a cheap generator produce several candidates and let the verifier pick the winner.
Verifier scoring is how candidate answers from test-time compute get selected, and it provides the reliable evaluator that Reflexion needs to generate useful reflections.
In Simple Terms
A verifier model is a model whose job is to judge the output of another model rather than to produce the answer itself. In a…
