Definition
Rejection Sampling Fine-Tuning is an alignment technique where the model generates several candidate responses for each prompt, a reward model scores them, and the model is then fine-tuned only on the top-scoring outputs. Lower-scoring candidates are "rejected" and discarded — hence the name. It uses the same training loss as ordinary supervised fine-tuning, just on a self-curated set of high-quality examples.
How it works
For a given prompt, the model samples K outputs. A trained reward model ranks them, and the single best (or the top few) becomes the new target. Fine-tuning on these reinforced examples nudges the model toward its own best behavior. Meta's Llama 2 used four rounds of rejection sampling before its RL-based RLHF stage, and the approach is formalized in RAFT (Reward rAnked Fine-Tuning), also called iterative best-of-N fine-tuning.
Why it is attractive
Because it reuses the standard supervised fine-tuning loss, rejection sampling fine-tuning is simpler and more stable than full reinforcement learning, while still distilling the gains of a reward model into the policy. It is, in effect, turning Best-of-N sampling from an inference-time trick into a permanent improvement baked into the weights.
For a sovereign AI workflow, this method lets you raise a model's quality on owned hardware using only a reward signal and ordinary fine-tuning — no reinforcement-learning machinery required. The selected best responses can also seed a preference dataset for later direct preference tuning.
In Simple Terms
Rejection Sampling Fine-Tuning is an alignment technique where the model generates several candidate responses for each prompt, a reward model scores them, and the model…
