Skip to content

Bitcoin accepted at checkout  |  Ships from Laval, QC, Canada  |  Expert support since 2016

GGUF

Sovereign AI

Definition

GGUF (GGML Universal Format) is a binary file format for storing large language models so they can be run efficiently on consumer hardware. A single GGUF file bundles the model's weights, tokenizer, architecture metadata, and quantization details into one self-describing package, eliminating the need for separate config files. It is the de facto standard for the llama.cpp runtime and the broader ecosystem of local-inference tools.

Quantization Built In

GGUF is designed around quantization: model weights are compressed from 16- or 32-bit floats down to roughly 2-8 bits, slashing memory use and speeding up inference. Common quantization levels appear in filenames as tags like Q4_K_M, Q5_K_M, Q6_K, and Q8_0, each trading a little accuracy for a smaller, faster model. This is what lets a multi-billion-parameter model fit on a laptop, mini-PC, or even a phone.

Why Sovereign Users Care

Because a GGUF file is portable and runs without a cloud API, it is the practical backbone of self-hosted and air-gapped AI. You download one file, point a local runtime at it, and own the entire inference stack — no telemetry, no rate limits, no external dependency. GGUF replaced the older GGML format, adding extensibility and richer metadata.

GGUF files power self-hosting and air-gapped AI workflows. The compression behind them is explained in the quantization entry.

See model fit in the GPU–LLM fit dataset.

In Simple Terms

GGUF (GGML Universal Format) is a binary file format for storing large language models so they can be run efficiently on consumer hardware. A single…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners