GGUF Quantization Quality — Bits-per-Weight & Which Quant to Use

Which GGUF quant should you run? Every GGUF LLM quantization type by exact bits-per-weight, quant family and quality tier, so you can pick one that both fits your VRAM and stays accurate. Free CSV/JSON + REST under CC BY 4.0.

Quick answer

GGUF quantization shrinks a local LLM by storing each weight in fewer bits — the trade-off is file size and VRAM against output quality. This reference lists 19 GGUF quant types by their exact bits-per-weight, the family (full-precision floats, K-quants, importance-matrix I-quants, and legacy round-to-nearest types) and a quality tier, so you can pick one that BOTH fits your VRAM and stays accurate. Rule of thumb: Q4_K_M (~4.8 bpw) is the recommended default; use Q5_K_M or Q6_K when you have VRAM to spare; Q8_0 is effectively lossless; and below ~3 bpw prefer the importance-matrix I-quants, only on large models.

Q4_K_M is the sweet spot for most local models. Below ~3 bpw, the importance-matrix I-quants (IQ3 / IQ2) keep more quality per bit and should be reserved for large models; Q6_K and Q8_0 are near-lossless when VRAM allows. Pair this with D-Central's GPU/model VRAM data to confirm a quant both FITS and stays accurate.

Download CSV Download JSON REST API →

Quant type	Family	Bits/weight	Quality	Notes
F32	Floating point	32	Reference	Full single precision; the unquantized training baseline. Rarely used for local inference (huge files).
F16	Floating point	16	Reference	Half precision; the near-lossless inference baseline that quants are measured against.
BF16	Floating point	16	Reference	Brain-float 16; same size as F16 with a wider exponent range; a common training/inference format.
Q8_0	Legacy	8.5	Near-lossless	8-bit round-to-nearest; virtually indistinguishable from F16. Large files; a safe maximum-quality quant.
Q6_K	K-quant	6.5625	Excellent	6-bit K-quant; near-indistinguishable from F16 in most evaluations. The top choice when VRAM allows.
Q5_K	K-quant	5.5	Very good	Base 5-bit K-quant. Q5_K_M (a Q5_K/Q6_K tensor mix) is a high-quality pick for modest extra size.
Q4_K	K-quant	4.5	Good	Base 4-bit K-quant. Q4_K_M (a Q4_K/Q6_K mix, ~4.8 bpw effective) is the recommended default for most local models.
Q4_0	Legacy	4.5	Medium	Legacy round-to-nearest 4-bit; superseded by Q4_K_M, which is higher quality at similar size.
IQ4_XS	I-quant	4.25	Good	Importance-matrix 4-bit; excellent quality-per-bit, often matching Q4_K_S at a smaller size. Slower on some CPUs.
IQ4_NL	I-quant	4.25	Good	Importance-matrix 4-bit non-linear; similar size to IQ4_XS, tuned for non-linear weight distributions.
IQ3_S	I-quant	3.44	Medium	Importance-matrix 3-bit; better quality-per-bit than Q3_K at a comparable size.
Q3_K	K-quant	3.4375	Medium-low	Base 3-bit K-quant (Q3_K_S/M/L are tensor mixes around this). Visible quality loss on smaller models.
IQ3_XXS	I-quant	3.06	Low-medium	Importance-matrix 3-bit, very small; best reserved for larger models.
Q2_K	K-quant	2.625	Low	Smallest K-quant; noticeable quality loss. Use only for tight VRAM on large (30B+) models.
IQ2_S	I-quant	2.5	Low	Importance-matrix 2-bit; preserves more quality per bit than Q2_K. Large models only.
IQ2_XS	I-quant	2.31	Very low	Importance-matrix 2-bit, extra small; viable only on very large models.
IQ2_XXS	I-quant	2.06	Very low	Importance-matrix 2-bit, smallest practical 2-bit; large models only, with real quality loss.
IQ1_M	I-quant	1.75	Lowest	Importance-matrix 1-bit; extreme compression, only usable on the largest (70B+) models.
IQ1_S	I-quant	1.56	Lowest	Importance-matrix 1-bit, smallest; experimental, heavy quality loss, 70B+ only.

This page covers the quant types inside a GGUF file. For the layer above it — GGUF versus MLX, EXL3, GPTQ, AWQ, FP8 and bitsandbytes, and which one your hardware actually wants — see the quantization format comparison. Source: the GGUF quantization-type descriptions in the HuggingFace Hub docs and llama.cpp’s quant-descriptions. Related glossary: quantization, GGUF, perplexity. Tools: GPU/model VRAM fit, VRAM calculator, local-LLM model database, inference-cost calculator.

Related products, repair, and setup paths

Last reviewed June 19, 2026.