Definition
Product quantization (PQ) is a vector-compression technique for approximate nearest-neighbour search, introduced by Jegou, Douze, and Schmid in 2011. Instead of storing a full high-dimensional vector, PQ splits it into several lower-dimensional subvectors and quantizes each one separately against its own small codebook learned by k-means. The vector is then stored as a short list of centroid IDs, one per subvector, often just one byte each.
How the compression works
Suppose you have a 1,024-dimensional vector. Split it into 8 subvectors of 128 dimensions, learn a 256-entry codebook for each subspace, and replace every subvector with the index of its nearest codebook centroid. The original 4,096-byte vector now fits in 8 bytes. The full space of representable vectors is the Cartesian product of the per-subspace codebooks, hence the name: 256^8 distinct codes from just 8 small tables.
Asymmetric distance computation
At query time, PQ estimates the distance between the raw float query and a compressed code without decompressing it. The system precomputes a lookup table of distances from each query subvector to every centroid, then sums table entries, a handful of additions per candidate. This asymmetric distance keeps the query at full precision while documents stay compressed, preserving more accuracy than quantizing both sides.
PQ is frequently paired with a coarse partitioning stage to avoid scanning every code. See our entry on the inverted file index (IVF) for that combination, and vector quantization for the underlying signal-processing idea PQ generalizes. Both are practical building blocks for a retrieval index you host yourself.
In Simple Terms
Product quantization (PQ) is a vector-compression technique for approximate nearest-neighbour search, introduced by Jegou, Douze, and Schmid in 2011. Instead of storing a full high-dimensional…
