Skip to content

Bitcoin accepted at checkout  |  Ships from Laval, QC, Canada  |  Expert support since 2016

Rejection Sampling Fine-Tuning

Sovereign AI

Definition

Rejection Sampling Fine-Tuning is an alignment technique where the model generates several candidate responses for each prompt, a reward model scores them, and the model is then fine-tuned only on the top-scoring outputs. Lower-scoring candidates are "rejected" and discarded — hence the name. It uses the same training loss as ordinary supervised fine-tuning, just on a self-curated set of high-quality examples.

How it works

For a given prompt, the model samples K outputs. A trained reward model ranks them, and the single best (or the top few) becomes the new target. Fine-tuning on these reinforced examples nudges the model toward its own best behavior. Meta's Llama 2 used four rounds of rejection sampling before its RL-based RLHF stage, and the approach is formalized in RAFT (Reward rAnked Fine-Tuning), also called iterative best-of-N fine-tuning.

Why it is attractive

Because it reuses the standard supervised fine-tuning loss, rejection sampling fine-tuning is simpler and more stable than full reinforcement learning, while still distilling the gains of a reward model into the policy. It is, in effect, turning Best-of-N sampling from an inference-time trick into a permanent improvement baked into the weights.

For a sovereign AI workflow, this method lets you raise a model's quality on owned hardware using only a reward signal and ordinary fine-tuning — no reinforcement-learning machinery required. The selected best responses can also seed a preference dataset for later direct preference tuning.

In Simple Terms

Rejection Sampling Fine-Tuning is an alignment technique where the model generates several candidate responses for each prompt, a reward model scores them, and the model…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners