Definition
An importance matrix, often shortened to imatrix, is a calibration artifact used by llama.cpp to improve the quality of quantized models. Before quantizing, the model is run over a calibration dataset and the imatrix tool records statistics about which weights have the largest influence on the model's activations. During the subsequent quantization step, that information steers the bit allocation so high-importance weights are represented more faithfully, rather than treating every weight as equally significant.
How it is generated and used
Generating an imatrix is a two-stage process. First, calibration text is fed through the model in segments (512 tokens by default), and the tool hooks into the computation graph to collect per-weight activation statistics. The resulting matrix is then passed to the quantizer, where high-importance weights receive larger bit allocations within mixed-precision blocks. In practice, using an imatrix can reduce perplexity by roughly 10 to 30 percent compared to naive quantization at the same bit-width, and it is strongly recommended for aggressive quantization levels below Q5_K_M.
Choosing calibration data
The calibration text matters. Community guidance favors meaningful, varied text over pseudo-random data, and ideally text that resembles the domain the model will be used in. A representative calibration set helps the imatrix protect the weights that the model actually relies on during real use, without overfitting to a narrow sample.
The imatrix is most often paired with K-quants inside the GGUF ecosystem to squeeze the most quality out of small files.
In Simple Terms
An importance matrix, often shortened to imatrix, is a calibration artifact used by llama.cpp to improve the quality of quantized models. Before quantizing, the model…
