Definition
VRAM (Video Random Access Memory) is the high-speed memory built into a graphics card. For gaming it holds textures and frame buffers, but for running AI models locally it holds the model's weights and working data. VRAM capacity is the single most important spec determining which local LLMs you can run: if a model's weights do not fit in VRAM, it must spill into much slower system RAM, and performance collapses.
How much VRAM you need
A rough rule of thumb is that a model needs VRAM roughly equal to its parameter count multiplied by the bytes per weight, plus overhead for context. A 7-billion-parameter model at 4-bit quantization needs on the order of 4-5 GB; larger models and longer context windows need proportionally more. This is why an 8 GB card can run small quantized models comfortably, a 24 GB card handles mid-sized models, and the largest open models require either a high-VRAM workstation card or multiple GPUs.
Why it matters for sovereign AI
Because VRAM is the binding constraint, choosing hardware for self-hosted AI starts with a VRAM budget, then works backward to model size and quantization. Owning sufficient VRAM is what lets you keep inference — and your data — entirely on premises rather than on a rented cloud GPU.
Match models to your card's VRAM with D-Central's GPU and LLM compatibility guide.
In Simple Terms
VRAM (Video Random Access Memory) is the high-speed memory built into a graphics card. For gaming it holds textures and frame buffers, but for running…
