Definition
NF4, or 4-bit NormalFloat, is a quantization data type introduced alongside the QLoRA fine-tuning method. It is designed around the observation that neural network weights, once normalized, tend to follow a roughly normal (bell-curve) distribution. NF4 places its 16 representable 4-bit values so that each value is equally likely to be used given that distribution, making it information-theoretically optimal for normally distributed data. In plain terms, it spends its limited bits where the weights actually cluster rather than wasting precision on rarely used extreme values.
How NF4 is applied
NF4 is used with blockwise quantization: weights are split into small blocks, each block is normalized by its own scaling constant, and the normalized values are mapped onto the NF4 grid. Because every block carries its own scale, NF4 adapts to local variations across the weight tensor. The format is most commonly accessed through the bitsandbytes library, which exposes both NF4 and a standard 4-bit float (FP4) option.
Why it matters for local AI
NF4 is the data type that made 4-bit fine-tuning practical. By storing a frozen base model in NF4 and training only small adapter weights on top, QLoRA lets enthusiasts fine-tune large models on a single consumer GPU. At inference time the NF4 weights are de-quantized on the fly to a higher-precision compute type such as BF16. This keeps powerful, customizable models within reach of self-hosted, off-cloud setups.
NF4 is delivered through bitsandbytes; for the surrounding fine-tuning context see LLM quantization.
See 4-bit memory fit in the GPU–LLM fit dataset.
In Simple Terms
NF4, or 4-bit NormalFloat, is a quantization data type introduced alongside the QLoRA fine-tuning method. It is designed around the observation that neural network weights,…
