Repetition Penalty

Sovereign AI

Repetition penalty is a decoding control that discourages a language model from repeating tokens it has already produced or seen in the prompt. It directly attacks one of the most common failure modes of local inference: the degenerate loop, where a model gets stuck emitting the same phrase, sentence, or word stem over and over until it exhausts the output budget. For anyone self-hosting a model, it is one of the first knobs worth checking when output quality collapses into echoes — and one of the easiest to over-tighten into a different failure entirely.

How it works

Before each new token is sampled, the sampler looks back at which tokens have already appeared and scales down their logits — the raw scores the model assigns to every vocabulary entry before they become probabilities. The classic formulation (introduced with the CTRL model in 2019) divides a positive logit by a penalty factor, so values above 1.0 suppress repeats and values below 1.0 would actually encourage them. This multiplicative action on raw scores makes it a stronger, blunter deterrent than the related additive controls: a frequency penalty subtracts an amount proportional to how many times a token has occurred, and a presence penalty subtracts a flat amount once a token has appeared at all. Most local runtimes expose some or all three, plus variants like no-repeat n-gram blocking, and they stack — which is why cargo-culted sampler settings copied between models so often misbehave.

Why models loop in the first place

Repetition is a probability feedback loop: once a phrase has appeared, its tokens look contextually likely, which makes the model emit them again, which makes them look likelier still. The problem is worst at low temperature or with greedy decoding, where nothing injects the randomness that would break the cycle, and it is notoriously more visible in smaller and heavily quantized models — precisely the models sovereign operators run on constrained hardware. That makes the penalty less an exotic option than a standard part of the local-inference toolkit.

Tuning in practice

A common working range is roughly 1.05 to 1.3. Too low and the model still loops; too high and it starts dodging tokens it legitimately needs, producing broken grammar, drifting word choice, or bizarre paraphrases of ordinary terms. Code and structured output are especially sensitive, since braces, keywords, and field names must recur — many operators disable or minimize the penalty for code generation and rely on other controls instead. Two practical notes: the penalty typically applies over a sliding window of recent context rather than the whole conversation, so window size matters as much as strength; and it is a symptom treatment, not a cure — persistent looping often signals a deeper issue such as a wrong prompt template or an over-aggressive quantization.

The control keeps evolving because the classic penalty is crude: it punishes a token for having appeared at all, regardless of whether repeating it would be natural. Newer samplers in the local ecosystem target the actual failure — verbatim sequences recurring — by escalating pressure only when the model starts retracing a long recent n-gram, leaving ordinary reuse of vocabulary untouched. Whatever the mechanism, the tuning workflow is the same: change one sampler parameter at a time, test against prompts that previously looped, and keep a written record of what each model actually needs, because sampler settings are model-specific folklore that does not transfer cleanly between families.

Repetition penalty belongs to the same family of sampling controls as temperature sampling and top-p (nucleus) sampling, and it is frequently paired with a logit bias when finer per-token control is needed.

Repetition penalty is a decoding control that discourages a language model from repeating tokens it has already produced or seen in the prompt. It directly…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners