FP16 / INT8 Precision

Sovereign AI

FP16 and INT8 are two of the lower-precision number formats used to store and compute the weights and activations inside a neural network. Standard training uses 32-bit floating point (FP32); shrinking to FP16 (16-bit half-precision float) or INT8 (8-bit integer) cuts the memory footprint in half or to a quarter and accelerates the matrix math that dominates inference. For anyone running models on their own hardware, precision is the first lever to understand: it decides whether a model fits in VRAM at all, and how fast it runs once it does.

What the formats actually are

FP16 keeps the floating-point structure — a sign bit, 5 exponent bits, 10 mantissa bits — so it retains a wide dynamic range and is usually a near-drop-in swap for FP32 at inference, halving memory with little accuracy loss. Its cousin BF16 (bfloat16) re-slices the same 16 bits to keep FP32's full exponent range at the cost of mantissa precision, which is why training favors it. INT8 abandons floating point entirely: values are mapped onto just 256 discrete integer levels via a scale factor, making tensors four times smaller than FP32 and dramatically faster on hardware with dedicated integer pipelines — but the mapping must be calibrated, because a layer's real value distribution has to be squeezed through those 256 levels without clipping the range or wasting resolution on empty space.

The accuracy trade-off

Dropping bits is never free, and the cost is unevenly distributed. Weights tolerate quantization well; activations and outlier-heavy layers tolerate it worse, which is why practical INT8 pipelines use per-channel scales, calibration datasets, and sometimes quantization-aware training to recover lost accuracy. For large language models specifically, the ecosystem has largely moved past plain INT8 toward grouped low-bit weight formats — the 4-bit-class quantizations popularized by GGUF files and runtimes like llama.cpp — where weights are stored at very low precision but computation happens at higher precision. The general rule holds across all of it: a larger model at lower precision usually beats a smaller model at higher precision, which is the single most useful heuristic a self-hoster can carry; see quantization for the full treatment.

Reading hardware spec sheets

Precision is also why accelerator marketing numbers need decoding. A chip's rated throughput — its TOPS or TFLOPS figure — is quoted at a specific precision, and the same silicon typically doubles its rate each precision step down: an accelerator quotes one number at FP16 and roughly twice that at INT8, and newer GPUs extend the ladder further with FP8. Comparing one vendor's INT8 TOPS against another's FP16 TFLOPS is comparing apples to half-apples. See TOPS for how to read those ratings honestly.

The sovereign payoff

Lower precision is the great equalizer of local AI. The difference between FP32 and a well-chosen quantized format is the difference between "needs a datacenter GPU" and "runs on the consumer card you already own" — smaller weights, lower power draw, higher tokens per second, same practical capability for most workloads. It is the enabling technology behind the entire local LLM movement: precision engineering, applied so that capable models run on hardware you control rather than behind someone else's API. The arithmetic makes the point concretely: a 7-billion-parameter model needs roughly 28 GB of memory in FP32, about 14 GB in FP16, and around 7 GB in INT8 — before activations and context are counted — which is why the precision column, not the parameter count, is the first thing to read on any model card. Learn the ladder once, and hardware shopping becomes arithmetic instead of guesswork.

FP16 and INT8 are two of the lower-precision number formats used to store and compute the weights and activations inside a neural network. Standard training…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners