LoRA Alpha

Sovereign AI

LoRA alpha (often written lora_alpha) is the scaling hyperparameter that controls how strongly a LoRA adapter's low-rank update is applied to the frozen base weights. In the standard formulation the update is multiplied by the ratio of alpha to rank, written as α/r, before being added back into the model. Alpha therefore governs the strength of the fine-tuning signal, while the rank governs its capacity. Of all the knobs in a parameter-efficient fine-tune, this pairing is the one most often misconfigured, precisely because each half looks harmless in isolation.

The alpha-over-rank ratio

Scaling by α/r is meant to keep the magnitude of the update roughly consistent as you change the rank, so that increasing rank does not also inadvertently amplify the adapter's effect. Setting alpha equal to the rank applies the update at unscaled (1x) strength; a larger alpha relative to rank pushes the adapter harder, while a smaller one softens it. This is why alpha cannot be reasoned about in isolation from the LoRA rank it is divided by: quoting "alpha 32" tells you nothing until you know whether the rank is 8, 64, or 256. When comparing recipes, always reduce them to the effective ratio first.

Common conventions

Two patterns dominate in practice. One fixes alpha at a constant such as 16 or 32 regardless of rank. The other ties alpha to rank, commonly α = 2r, which the original Microsoft implementation favoured (for example r=8 with alpha=16), giving a constant 2x effective scale. Some quantized-training recipes deliberately use a smaller alpha relative to rank to keep updates gentle on top of already-compressed weights. There is also a subtler wrinkle: because the α/r rule divides by the full rank, very high ranks can end up under-scaled, and a rank-stabilized variant (rsLoRA) that scales by α/√r instead was proposed to keep learning strength consistent as rank grows. You do not need that refinement at modest ranks, but it explains why naive rank sweeps with fixed alpha can look misleadingly flat.

How alpha interacts with learning rate

Alpha and the learning rate multiply into the same effective step, so they are partially interchangeable: doubling alpha and halving the learning rate lands in a similar place. The practical discipline is to move one at a time. If your fine-tune is not biting, raise alpha (or the ratio) before reaching for exotic schedulers; if the model starts parroting your dataset or degrading on general tasks, the adapter is being applied too hard, so soften the ratio or stop earlier. A practical tip worth repeating: if you raise the rank but leave alpha fixed, you are also weakening the per-parameter update, so adjust the two together.

Why it matters for local fine-tuning

For the self-hoster fine-tuning on a single GPU, alpha is pure leverage: it costs no memory and no compute, yet it decides whether hours of training produce a confident specialist or a mushy compromise. A sane starting point for most jobs is a ratio between 1x and 2x (alpha equal to rank, or double it), adjusted from evaluation results rather than folklore. Keep notes per run, the way you would log tuning profiles on a hashboard, because the right ratio is dataset-dependent and remarkably stable once found. Alpha is part of the same hyperparameter set inherited by LoRA variants like DoRA, and together with rank it defines the two-axis budget, strength and capacity, of every adapter you train.

One last habit separates tidy fine-tuners from frustrated ones: record alpha, rank, and learning rate together in every experiment log, and change only one per run. Because the three multiply into the same effective update, an undocumented change to any of them silently invalidates comparisons between runs, and folklore numbers copied from other people's configs carry their datasets' assumptions with them. The parameters are free to set and free to log; the GPU-hours spent rediscovering an interaction you already stumbled through last month are not.

LoRA alpha (often written lora_alpha) is the scaling hyperparameter that controls how strongly a LoRA adapter’s low-rank update is applied to the frozen base weights.…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners