Skip to content

Bitcoin accepted at checkout  |  Ships from Laval, QC, Canada  |  Expert support since 2016

Quantization (LLM)

Sovereign AI

Definition

Quantization is the process of reducing the numerical precision used to store a large language model's weights, shrinking the model so it fits in less memory and runs faster. A model trained at 16-bit precision can be quantized to 8-bit, 4-bit, or lower, cutting its memory footprint by half or more. This is the single most important technique for running capable local models on consumer hardware instead of renting cloud GPUs.

The precision-versus-quality trade-off

Lower precision means each weight is stored with fewer bits, which reduces VRAM use and increases speed, but it can slightly degrade output quality. Modern quantization formats — such as GGUF quant levels (for example Q4_K_M or Q5_K_M) and methods like GPTQ and AWQ — are designed to minimize that loss. In practice, 4-bit and 5-bit quants of a larger model often outperform an unquantized smaller model that uses the same memory, so quantization frequently improves real-world results on a fixed hardware budget.

Why it matters for sovereign AI

Quantization is what makes running an LLM locally practical: a quantized model can run entirely on a self-owned GPU with no data leaving your premises. For Bitcoiners extending self-custody to their AI tooling, it is the difference between depending on a cloud API and owning the full stack.

Check which models fit your card on D-Central's GPU and LLM compatibility guide.

In Simple Terms

Quantization is the process of reducing the numerical precision used to store a large language model’s weights, shrinking the model so it fits in less…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners