VRAM

Sovereign AI

VRAM (Video Random Access Memory) is the high-speed memory built onto a graphics card. For gaming it holds textures and frame buffers, but for running AI models locally it holds the model's weights and working state. VRAM capacity is the single most important spec determining which local LLMs you can run: if a model's weights do not fit in VRAM, they spill into much slower system RAM, and generation speed collapses from conversational to painful.

How much VRAM a model needs

The rough arithmetic is parameter count times bytes per weight, plus overhead. A model stored at 16-bit precision needs about two bytes per parameter — so a 7-billion-parameter model wants roughly 14 GB before you even start generating. This is why quantization is the key that unlocks consumer hardware: the same 7B model at 4-bit precision fits in roughly 4–5 GB, and quality loss at that level is modest for most uses. The GGUF format used by llama.cpp and Ollama ships models pre-quantized at several sizes precisely so you can match the file to your card.

The part people forget: the KV cache

Weights are not the whole story. During generation the model keeps a key-value cache of everything in the current context, and that cache grows with every token in the context window. A long document, a big system prompt, or an agent carrying lots of history can add gigabytes on top of the weights. If you size your setup so the weights barely fit, the first long conversation will push you over. Budget headroom for context, or cap the context length in your runtime's settings.

Bandwidth matters too

Capacity decides whether a model runs; memory bandwidth largely decides how fast. Token generation is memory-bound — each new token requires streaming the active weights through the GPU — so a card with faster memory generates more tokens per second at the same model size. This is why workstation and datacenter cards with high-bandwidth memory outrun consumer cards with the same nominal capacity, and why Apple Silicon machines, whose unified memory serves as both system RAM and VRAM, are a legitimate path for large models despite modest GPU compute.

Planning a sovereign AI box

Because VRAM is the binding constraint, hardware planning for self-hosted inference starts with a VRAM budget and works backward to model size and quantization level. In practical tiers: an 8 GB card runs small quantized models comfortably; 12–16 GB opens the mid-size class that handles serious daily work; 24 GB runs the larger open models at usable quality; beyond that you are into multiple GPUs or unified-memory machines. Runtimes can also split a model between VRAM and system RAM, offloading as many layers as fit to the GPU — slower than all-VRAM, but it lets a modest card punch above its weight.

The sovereignty angle is straightforward: owning sufficient VRAM is what keeps inference — and your data — entirely on premises rather than on a rented cloud GPU with someone else's logging policy. Miners already understand this trade; it is the same reason you run your own node. Match models to your card with D-Central's GPU and LLM compatibility guide.

Two buying notes save the most regret. First, VRAM is not upgradeable — it is soldered to the card — so unlike system RAM you cannot add more later; buy for the models you want next year, not the ones you are testing today. Second, total VRAM across multiple cards does not behave like one big pool: splitting a model across GPUs works in most runtimes but adds overhead and complexity, so one card with more memory generally beats two cards that sum to the same number. The used workstation-card market exists largely because of this arithmetic, and for many home labs an older high-VRAM card is the best value per gigabyte in the building.

VRAM (Video Random Access Memory) is the high-speed memory built onto a graphics card. For gaming it holds textures and frame buffers, but for running…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners