Skip to content

Bitcoin accepted at checkout  |  Ships from Laval, QC, Canada  |  Expert support since 2016

GPU vs Local LLM Compatibility: Which Models Can Your GPU Run?

Cross-reference 30 GPUs against 33 open local LLMs to see exactly which models fit in VRAM at Q4, Q8 and FP16.

Quick answer

This dataset cross-references 30 GPUs against 33 open local LLMs (990 fit records), computing whether each model fits in VRAM at Q4, Q8 and FP16 — with headroom and a recommended quantization. Use it to answer "which local LLM can my GPU run?" before you buy hardware or pull a model.

Fit is weights-first: "tight" means the weights fit but leave little room for context/KV-cache — size up for long context. Free to download (CSV/JSON) and query via API under CC BY 4.0.

Download CSV Download JSON REST API →

GPUVRAMModels fit (Q4)Q8FP16Largest runnableRecommended home model (Q8)
Apple M4 Max (40-core GPU, up to 128 GB)128 GB32 / 333126Command R+ 104BCommand R+ 104B
Apple M3 Max (40-core GPU, up to 128 GB)128 GB32 / 333126Command R+ 104BCommand R+ 104B
NVIDIA H100 SXM 80 GB80 GB32 / 333025Command R+ 104BMixtral 8x7B Instruct v0.1
NVIDIA A100 SXM4 80 GB80 GB32 / 333025Command R+ 104BMixtral 8x7B Instruct v0.1
Apple M4 Pro (20-core GPU, up to 64 GB)64 GB32 / 332621Command R+ 104BMixtral 8x7B Instruct v0.1
NVIDIA L40S 48 GB48 GB31 / 332618Qwen2.5 72B InstructCommand R 35B (08-2024)
NVIDIA RTX A6000 (Ampere) 48 GB48 GB31 / 332618Qwen2.5 72B InstructCommand R 35B (08-2024)
NVIDIA GeForce RTX 509032 GB27 / 332018Mixtral 8x7B Instruct v0.1Gemma 3 27B Instruct
NVIDIA L4 24 GB24 GB26 / 331811Command R 35B (08-2024)Phi-4 14B Instruct
NVIDIA GeForce RTX 409024 GB26 / 331811Command R 35B (08-2024)Phi-4 14B Instruct
NVIDIA GeForce RTX 309024 GB26 / 331811Command R 35B (08-2024)Phi-4 14B Instruct
AMD Radeon RX 7900 XTX24 GB26 / 331811Command R 35B (08-2024)Phi-4 14B Instruct
AMD Radeon RX 7900 XT20 GB26 / 331811Command R 35B (08-2024)Phi-4 14B Instruct
NVIDIA GeForce RTX 508016 GB19 / 33189Phi-4 14B InstructOLMo 2 13B Instruct
NVIDIA GeForce RTX 4080 Super16 GB19 / 33189Phi-4 14B InstructOLMo 2 13B Instruct
NVIDIA GeForce RTX 408016 GB19 / 33189Phi-4 14B InstructOLMo 2 13B Instruct
NVIDIA GeForce RTX 4070 Ti Super16 GB19 / 33189Phi-4 14B InstructOLMo 2 13B Instruct
AMD Radeon RX 7800 XT16 GB19 / 33189Phi-4 14B InstructOLMo 2 13B Instruct
Intel Arc A770 16 GB16 GB19 / 33189Phi-4 14B InstructOLMo 2 13B Instruct
NVIDIA GeForce RTX 4060 Ti 16 GB16 GB19 / 33189Phi-4 14B InstructOLMo 2 13B Instruct
AMD Radeon RX 6900 XT16 GB19 / 33189Phi-4 14B InstructOLMo 2 13B Instruct
NVIDIA GeForce RTX 4070 Super12 GB19 / 33115Phi-4 14B InstructLlama 3.1 8B Instruct
AMD Radeon RX 7700 XT12 GB19 / 33115Phi-4 14B InstructLlama 3.1 8B Instruct
NVIDIA GeForce RTX 407012 GB19 / 33115Phi-4 14B InstructLlama 3.1 8B Instruct
NVIDIA GeForce RTX 3080 12 GB12 GB19 / 33115Phi-4 14B InstructLlama 3.1 8B Instruct
Intel Arc B580 12 GB12 GB19 / 33115Phi-4 14B InstructLlama 3.1 8B Instruct
NVIDIA GeForce RTX 4060 Ti 8 GB8 GB13 / 3394Gemma 3 12B InstructGemma 3 4B Instruct
NVIDIA GeForce RTX 40608 GB13 / 3394Gemma 3 12B InstructGemma 3 4B Instruct
AMD Radeon RX 76008 GB13 / 3394Gemma 3 12B InstructGemma 3 4B Instruct
NVIDIA GeForce RTX 30708 GB13 / 3394Gemma 3 12B InstructGemma 3 4B Instruct

Method: A model "fits" when the GPU's VRAM is at least the model's weight footprint at that quantization. Quality grades headroom: tight (<1.15× weights — weights-only, little room for context), good (<1.6×), ample (≥1.6×). Real KV-cache and long context consume additional VRAM beyond the weights; treat tight fits as a floor. Source data: the AI-GPU database & local-LLM model database.