NF4 (4-bit NormalFloat)

Sovereign AI

NF4, or 4-bit NormalFloat, is a quantization data type introduced alongside the QLoRA fine-tuning method. It is built on the observation that neural network weights, once normalized, tend to follow a roughly normal (bell-curve) distribution. NF4 places its 16 representable 4-bit values so that each value is equally likely to be used given that distribution, making it information-theoretically optimal for normally distributed data. In plain terms: with only sixteen values to spend, NF4 spends them where the weights actually cluster — densely packed near zero, sparser in the rare tails — instead of wasting precision on extremes that almost never occur, as a uniform integer grid does.

How NF4 is applied

NF4 works with blockwise quantization: weights are split into small blocks, each block is normalized by its own scaling constant, and the normalized values are mapped onto the NF4 grid. Because every block carries its own scale, the format adapts to local variation across the weight tensor, and a single outlier can only distort its own small neighborhood rather than an entire layer. QLoRA pairs this with double quantization — quantizing the per-block scaling constants themselves — to claw back part of the metadata overhead, a detail that matters when you are counting megabytes on a consumer GPU. The format is most commonly reached through the bitsandbytes library, which exposes NF4 alongside a standard 4-bit float (FP4) alternative; NF4 is generally the better-performing default for LLM weights.

Why it matters for local AI

NF4 is the data type that made 4-bit fine-tuning practical. In QLoRA, the base model is frozen and stored in NF4 while small trainable adapter matrices — see LoRA — learn on top in higher precision; at each forward pass the NF4 weights are de-quantized on the fly to a compute type such as BF16, so training-quality math flows through storage-efficient weights. The arithmetic transforms what fine-tuning demands: a model whose full-precision training footprint requires datacenter hardware becomes tunable on a single consumer GPU, because the frozen base sits in 4-bit and only the slim adapters need optimizer state. That is the difference between customization being a cloud service you rent and a capability you own.

Where it fits among 4-bit formats

The full QLoRA recipe

NF4 is one of three tricks that together define QLoRA's memory story. The base model's weights sit frozen in NF4; double quantization compresses the blockwise scaling constants themselves, shaving further megabytes; and paged optimizers spill optimizer state to system memory when GPU memory spikes, preventing the out-of-memory crashes that otherwise kill long fine-tuning runs. The combination is what let the original authors fine-tune a 65-billion-parameter model on a single 48 GB GPU — hardware a determined individual can own — while reporting quality on par with 16-bit fine-tuning of the same models. One expectation worth setting: NF4 is a storage and training format, not a speed format. Dequantize-on-the-fly costs cycles, so NF4 inference is typically slower than purpose-built INT4 serving formats; the common workflow trains adapters over NF4, then merges and re-quantizes into a deployment format for daily use. Storage during learning, deployment formats for serving — two different jobs, two different tools.

NF4 solves a different problem than the deployment-oriented formats. GPTQ and AWQ calibrate a finished model for efficient GPU serving, and GGUF's block schemes target llama.cpp-style local inference — all destinations for a model that is done changing. NF4 is the format a model wears while it is still learning: load in NF4, train adapters, then either keep serving through bitsandbytes or merge and re-quantize into a deployment format. For the surrounding compression landscape, see LLM quantization.

See 4-bit memory fit in the GPU–LLM fit dataset.

NF4, or 4-bit NormalFloat, is a quantization data type introduced alongside the QLoRA fine-tuning method. It is built on the observation that neural network weights,…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners