Passer au contenu

Bitcoin accepté au paiement  |  Expédié depuis Laval, QC, Canada  |  Soutien expert depuis 2016

Best-of-N Sampling

Sovereign AI

Definition

Best-of-N (BoN) sampling is an inference-time alignment method: instead of returning the first response a model produces, you sample N candidates and use a reward model to pick the highest-scoring one. It is a simple, training-free way to raise output quality by spending more compute at generation time.

The quality-versus-divergence tradeoff

Raising N generally improves the selected response, but it also pushes the resulting distribution away from the base model. A commonly cited expression places the KL divergence between the best-of-N policy and the reference policy at roughly log(n) − (n−1)/n — an upper bound that grows with N. Push N too high and the method can exploit flaws in the reward model, a failure called reward over-optimization or reward hacking, where outputs score well on the proxy but are actually worse.

Inference-time, not training-time

The key trait of best-of-N is that it changes nothing about the weights — it is pure inference-time selection. That makes it a useful baseline and a knob you can turn on any deployed model, trading more sampling for better answers. When the chosen best responses are instead fed back into training, the method becomes rejection sampling fine-tuning.

For self-hosters, best-of-N is the cheapest alignment lever available: it needs only a reward model and extra GPU cycles, no retraining. It depends on the same reward signal used to build a preference dataset.

In Simple Terms

Best-of-N (BoN) sampling is an inference-time alignment method: instead of returning the first response a model produces, you sample N candidates and use a reward…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Glossaire du minage

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Comparer les mineurs