Skip to content

Bitcoin accepted at checkout  |  Ships from Laval, QC, Canada  |  Expert support since 2016

Rollout (Reinforcement Learning)

Sovereign AI

Definition

In reinforcement learning, a rollout is a complete trajectory the model generates by acting out its current policy from a starting point until it terminates. For a language model, a rollout is a full sampled response to a prompt, every token of reasoning and final answer, together with whatever reward that response earns. Rollouts are the raw experience an RL optimizer consumes; the quality and diversity of rollouts largely determines how well training works.

Why rollouts dominate the training loop

Each step of reinforcement-learning fine-tuning generates fresh rollouts from the current policy, scores them, and uses them to update the model. Because generating long responses with a large model is expensive, the rollout phase is often the biggest consumer of compute in the whole training run, far more than the gradient updates themselves. Algorithms like GRPO deliberately sample a group of rollouts per prompt so their rewards can be compared against one another.

Rollout strategy as a design lever

How rollouts are generated, filtered, and reused is an active research area. Strategies range from simple sampling to tree-based search that branches at uncertain tokens to surface higher-quality trajectories, plus filtering that discards low-information rollouts before they reach the optimizer. For practitioners training a model on constrained hardware, smart rollout management, sampling fewer but more informative trajectories, is one of the highest-leverage ways to make local reinforcement learning affordable.

Each rollout feeds the policy gradient and is converted into an advantage estimate before updating the model.

In Simple Terms

In reinforcement learning, a rollout is a complete trajectory the model generates by acting out its current policy from a starting point until it terminates.…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners