GPU vs Local LLM Compatibility: Which Models Can Your GPU Run?
Cross-reference 30 GPUs against 33 open local LLMs to see exactly which models fit in VRAM at Q4, Q8 and FP16.
Quick answer
This dataset cross-references 30 GPUs against 33 open local LLMs (990 fit records), computing whether each model fits in VRAM at Q4, Q8 and FP16 — with headroom and a recommended quantization. Use it to answer "which local LLM can my GPU run?" before you buy hardware or pull a model.
Fit is weights-first: "tight" means the weights fit but leave little room for context/KV-cache — size up for long context. Free to download (CSV/JSON) and query via API under CC BY 4.0.
Download CSV Download JSON REST API →
| GPU | VRAM | Models fit (Q4) | Q8 | FP16 | Largest runnable | Recommended home model (Q8) |
|---|---|---|---|---|---|---|
| Apple M4 Max (40-core GPU, up to 128 GB) | 128 GB | 32 / 33 | 31 | 26 | Command R+ 104B | Command R+ 104B |
| Apple M3 Max (40-core GPU, up to 128 GB) | 128 GB | 32 / 33 | 31 | 26 | Command R+ 104B | Command R+ 104B |
| NVIDIA H100 SXM 80 GB | 80 GB | 32 / 33 | 30 | 25 | Command R+ 104B | Mixtral 8x7B Instruct v0.1 |
| NVIDIA A100 SXM4 80 GB | 80 GB | 32 / 33 | 30 | 25 | Command R+ 104B | Mixtral 8x7B Instruct v0.1 |
| Apple M4 Pro (20-core GPU, up to 64 GB) | 64 GB | 32 / 33 | 26 | 21 | Command R+ 104B | Mixtral 8x7B Instruct v0.1 |
| NVIDIA L40S 48 GB | 48 GB | 31 / 33 | 26 | 18 | Qwen2.5 72B Instruct | Command R 35B (08-2024) |
| NVIDIA RTX A6000 (Ampere) 48 GB | 48 GB | 31 / 33 | 26 | 18 | Qwen2.5 72B Instruct | Command R 35B (08-2024) |
| NVIDIA GeForce RTX 5090 | 32 GB | 27 / 33 | 20 | 18 | Mixtral 8x7B Instruct v0.1 | Gemma 3 27B Instruct |
| NVIDIA L4 24 GB | 24 GB | 26 / 33 | 18 | 11 | Command R 35B (08-2024) | Phi-4 14B Instruct |
| NVIDIA GeForce RTX 4090 | 24 GB | 26 / 33 | 18 | 11 | Command R 35B (08-2024) | Phi-4 14B Instruct |
| NVIDIA GeForce RTX 3090 | 24 GB | 26 / 33 | 18 | 11 | Command R 35B (08-2024) | Phi-4 14B Instruct |
| AMD Radeon RX 7900 XTX | 24 GB | 26 / 33 | 18 | 11 | Command R 35B (08-2024) | Phi-4 14B Instruct |
| AMD Radeon RX 7900 XT | 20 GB | 26 / 33 | 18 | 11 | Command R 35B (08-2024) | Phi-4 14B Instruct |
| NVIDIA GeForce RTX 5080 | 16 GB | 19 / 33 | 18 | 9 | Phi-4 14B Instruct | OLMo 2 13B Instruct |
| NVIDIA GeForce RTX 4080 Super | 16 GB | 19 / 33 | 18 | 9 | Phi-4 14B Instruct | OLMo 2 13B Instruct |
| NVIDIA GeForce RTX 4080 | 16 GB | 19 / 33 | 18 | 9 | Phi-4 14B Instruct | OLMo 2 13B Instruct |
| NVIDIA GeForce RTX 4070 Ti Super | 16 GB | 19 / 33 | 18 | 9 | Phi-4 14B Instruct | OLMo 2 13B Instruct |
| AMD Radeon RX 7800 XT | 16 GB | 19 / 33 | 18 | 9 | Phi-4 14B Instruct | OLMo 2 13B Instruct |
| Intel Arc A770 16 GB | 16 GB | 19 / 33 | 18 | 9 | Phi-4 14B Instruct | OLMo 2 13B Instruct |
| NVIDIA GeForce RTX 4060 Ti 16 GB | 16 GB | 19 / 33 | 18 | 9 | Phi-4 14B Instruct | OLMo 2 13B Instruct |
| AMD Radeon RX 6900 XT | 16 GB | 19 / 33 | 18 | 9 | Phi-4 14B Instruct | OLMo 2 13B Instruct |
| NVIDIA GeForce RTX 4070 Super | 12 GB | 19 / 33 | 11 | 5 | Phi-4 14B Instruct | Llama 3.1 8B Instruct |
| AMD Radeon RX 7700 XT | 12 GB | 19 / 33 | 11 | 5 | Phi-4 14B Instruct | Llama 3.1 8B Instruct |
| NVIDIA GeForce RTX 4070 | 12 GB | 19 / 33 | 11 | 5 | Phi-4 14B Instruct | Llama 3.1 8B Instruct |
| NVIDIA GeForce RTX 3080 12 GB | 12 GB | 19 / 33 | 11 | 5 | Phi-4 14B Instruct | Llama 3.1 8B Instruct |
| Intel Arc B580 12 GB | 12 GB | 19 / 33 | 11 | 5 | Phi-4 14B Instruct | Llama 3.1 8B Instruct |
| NVIDIA GeForce RTX 4060 Ti 8 GB | 8 GB | 13 / 33 | 9 | 4 | Gemma 3 12B Instruct | Gemma 3 4B Instruct |
| NVIDIA GeForce RTX 4060 | 8 GB | 13 / 33 | 9 | 4 | Gemma 3 12B Instruct | Gemma 3 4B Instruct |
| AMD Radeon RX 7600 | 8 GB | 13 / 33 | 9 | 4 | Gemma 3 12B Instruct | Gemma 3 4B Instruct |
| NVIDIA GeForce RTX 3070 | 8 GB | 13 / 33 | 9 | 4 | Gemma 3 12B Instruct | Gemma 3 4B Instruct |
Method: A model "fits" when the GPU's VRAM is at least the model's weight footprint at that quantization. Quality grades headroom: tight (<1.15× weights — weights-only, little room for context), good (<1.6×), ample (≥1.6×). Real KV-cache and long context consume additional VRAM beyond the weights; treat tight fits as a floor. Source data: the AI-GPU database & local-LLM model database.
Related products, repair, and setup paths
- self-hosted AI for Bitcoiners hub
- plebs guide to self-hosted AI
- install Ollama in 10 minutes
- LM Studio vs Ollama vs llama.cpp
- connect local AI to Home Assistant and Obsidian
- self-hosted AI troubleshooting
- repurpose mining hardware into an AI hashcenter
- local AI model leaderboards
Last reviewed June 18, 2026.
