Hyperparameter

Sovereign AI

Hyperparameter is the term for a configuration value you set before training begins, which governs how a model learns rather than what it learns. Unlike the model's weights — the millions or billions of numbers adjusted automatically during training — hyperparameters are chosen by the person running the training and stay fixed throughout a given run. They are the knobs on the machine: turn them well and training converges to a capable model; turn them badly and the same data and hardware produce garbage.

The common knobs

The most consequential hyperparameter is usually the learning rate, which controls how large a step the optimizer takes each update. Too high and training diverges or oscillates; too low and it crawls or stalls in a poor solution. Batch size determines how many examples are processed per update and interacts directly with how much VRAM a run consumes — often the binding constraint on self-hosted hardware. The number of training epochs sets how many passes are made over the dataset, and regularization strength (weight decay, dropout rate) controls how hard the model is discouraged from memorizing. Architectural choices such as layer count, hidden dimension, and attention-head count are hyperparameters too, though for fine-tuning an existing open-weight model those are already fixed and your knobs shrink to the training-loop values plus adapter settings such as LoRA rank and alpha.

Tuning and trade-offs

Because good hyperparameter values are rarely obvious in advance, practitioners search for them — sometimes by hand, sometimes with automated methods such as grid search, random search, or Bayesian optimization. The choices interact: a learning rate that works at one batch size may diverge at another, which is why recipes are shared as complete sets rather than individual values. Poorly chosen hyperparameters are a frequent cause of both underfitting and overfitting, so disciplined tuning, evaluated against a held-out validation set the model never trains on, is one of the highest-leverage activities in self-hosted model work. A practical habit: change one knob at a time, log everything, and keep the run cheap and short until the configuration proves itself.

Why this matters for sovereign AI

When you fine-tune a model on your own hardware instead of renting a managed service, nobody picks these values for you — and nobody hides them from you either. Understanding hyperparameters is the difference between cargo-culting a config file from a forum and being able to diagnose why your loss curve exploded at step 400. For home-lab rigs with a single consumer GPU, the practical loop is: pick a known-good recipe for your model family, shrink batch size until it fits in memory, use gradient accumulation to compensate, and only then start experimenting. The same discipline a miner applies to tuning frequency and voltage on a hashboard applies here: measure, change one thing, measure again.

Training knobs versus inference knobs

One distinction saves a lot of confusion: the settings you adjust when running a model — temperature, top-p, repetition penalty, maximum output length — are sampling parameters, not hyperparameters in the strict training sense, even though the word gets borrowed for them constantly. They shape how the model picks its next token from an already-fixed set of weights, they cost nothing to change, and you can flip them per request. True hyperparameters are decided before a training run, are expensive to revisit (each candidate value means another run), and permanently shape the resulting weights. The practical consequence for a self-hoster: if a local model behaves badly, exhaust the free knobs first. A temperature or prompt change takes seconds; a fine-tuning run with revised hyperparameters takes hours of GPU time and a validation plan. Reach for the expensive knobs only when the cheap ones are demonstrably out of reach.

For related concepts, see our entries on the training epoch, regularization, and quantization, which trades precision for memory after training is done.

See how settings affect cost in the inference cost calculator.

Hyperparameter is the term for a configuration value you set before training begins, which governs how a model learns rather than what it learns. Unlike…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners