AI GPU Database: VRAM, TFLOPS, TDP for Local LLM Inference

D-Central’s AI GPU Database is a free, citable reference of 30 GPU and accelerator records — the numbers that actually matter for running a large language model locally: VRAM GB, memory bandwidth, FP16 TFLOPS, INT8 TOPS, TDP, and a plain-English inference tier. Data is sourced directly from NVIDIA, AMD, Apple, and Intel official product pages and datasheets, cross-referenced with TechPowerUp GPU Database. Published under CC BY 4.0; verify at source before making purchasing decisions.

Key insight for local AI: For single-stream LLM token generation, memory bandwidth is the dominant bottleneck — not compute TFLOPS. A card with more GB/s will generate tokens faster even if its FP16 number looks lower. VRAM size determines which models fit without CPU offload. FP16/INT8 TFLOPS matter primarily for batch inference and prompt-processing speed.

30 records · v1.0 · June 2026 · CC BY 4.0

Download CSV
Download JSON
REST API
CC BY 4.0

Read first: what these numbers mean

FP16 TFLOPS methodology differs by card class. For NVIDIA GeForce consumer cards (Ada/Ampere/Blackwell), the FP16 figure is the manufacturer’s published shader FP16 (2× FP32 shader throughput). For NVIDIA professional/data-center cards (L40S, L4, A100, H100, RTX A6000), the figure is tensor-core FP16 dense from official datasheets. For AMD RDNA 3, it is the AI Accelerator FP16 Matrix figure from AMD’s official product page. For Apple Silicon, it is GPU shader throughput (FP32 ≈ FP16 rate; the Neural Engine INT8 figure is in the separate INT8 column). For Intel Arc, it is the published FP16 = 2× FP32. Where a figure is estimated rather than directly confirmed, the row is flagged in the VERIFICATION doc.
INT8 figures are often unpublished for consumer cards. NVIDIA does not publish INT8 TOPS for GeForce consumer cards. Where shown, they are either confirmed from official datasheets (data-center/pro cards) or derived from architecture ratios (flagged as estimated). AMD RDNA 3 AI Accelerators deliver the same throughput for FP16 and INT8 matrix operations per AMD’s official specifications.
Apple Silicon unified memory is not directly comparable to VRAM. The VRAM figure for Apple entries is the maximum unified-memory configuration. This memory is shared between the CPU, GPU, and Neural Engine — there is no transfer overhead. Bandwidth figures are from Apple’s official tech specs.
This dataset aggregates public specifications from manufacturers. It is not a D-Central measurement. Verify at source before purchasing decisions. Specifications change at product refreshes; check last_verified dates.

Full methodology: Mining Data Trust & Methodology. Credit: specifications from NVIDIA Corporation, AMD (Advanced Micro Devices), Apple Inc., Intel Corporation official pages and datasheets; cross-referenced with TechPowerUp GPU Database and WareDB.

Inference tier guide

Tier 1 Data Center — enterprise / cloud only; not home-deployable
Tier 2 Prosumer — serious home AI server; runs 70B+ models quantised
Tier 3 Capable — 13B–34B models comfortably; strong home use
Tier 4 Mid-Range — 7B–13B models; reasonable home use
Tier 5 Entry — 7B limit; VRAM or bandwidth-constrained

Manufacturer:

Segment:

Min VRAM (GB):

Max tier:

GPU and accelerator specifications

GPU / Accelerator	Manufacturer	VRAM GB	VRAM Type	Bandwidth GB/s	FP16 TFLOPS	INT8 TOPS	TDP W	Tier
NVIDIA H100 SXM 80 GBHopper (GH100) · HBM3	NVIDIA	80	HBM3	3,350	1,979	3,958	700	1
Frontier inference GPU; FP16 tensor-core dense, official NVIDIA datasheet. Cloud/enterprise only. Credit: NVIDIA Corporation — nvidia.com/en-us/data-center/h100/
NVIDIA A100 SXM4 80 GBAmpere DC (GA100) · HBM2e	NVIDIA	80	HBM2e	2,000	312	624	400	1
FP16 tensor-core dense, NVIDIA A100 datasheet. Available used through secondary market. Credit: NVIDIA Corporation — nvidia.com/en-us/data-center/a100/
NVIDIA L40S 48 GBAda Lovelace (AD102) · GDDR6	NVIDIA	48	GDDR6	864	~183 ⓘ	~366 ⓘ	350	1
48 GB enables 70B FP16. FP16/INT8 = dense tensor estimates; NVIDIA datasheets list the sparse figure (~2× dense). Credit: NVIDIA Corporation — nvidia.com/en-us/data-center/l40s/
NVIDIA L4 24 GBAda Lovelace (AD104) · GDDR6	NVIDIA	24	GDDR6	300	~121 ⓘ	242	72	2
Efficiency standout: 72 W TDP, low-profile PCIe, 24 GB. INT8 = 242 TOPS confirmed from NVIDIA. FP16 derived as INT8/2. Credit: NVIDIA Corporation — nvidia.com/en-us/data-center/l4/
NVIDIA RTX A6000 (Ampere) 48 GBAmpere (GA102) · GDDR6	NVIDIA	48	GDDR6	768	155	~310 ⓘ	300	2
FP16 = 154.83 TFLOPS tensor-core dense, NVIDIA official datasheet. 48 GB + NVLink (96 GB dual-card). Available used. Credit: NVIDIA Corporation — nvidia.com/…/rtx-a6000/
NVIDIA GeForce RTX 5090Blackwell (GB202) · GDDR7	NVIDIA	32	GDDR7	1,792	~210 ⓘ	—	575	2
Highest-bandwidth consumer GPU (1,792 GB/s, GDDR7). Launched Jan 2025. FP16 = shader estimate. NVIDIA does not publish GeForce tensor TFLOPS. Credit: NVIDIA Corporation — nvidia.com/…/rtx-5090/
NVIDIA GeForce RTX 4090Ada Lovelace (AD102) · GDDR6X	NVIDIA	24	GDDR6X	1,008	165.2	330.3	450	2
FP16 165.2 and INT8 330.3 TOPS = published by NVIDIA (shader FP16 = 2×82.6 FP32). Gold-standard home inference. Credit: NVIDIA Corporation — nvidia.com/…/rtx-4090/
Apple M4 Max (40-core GPU, up to 128 GB)Apple Silicon M4 · Unified	Apple	128 ‡	Unified	546	~18.4 ⓘ	38 ‡	— ‡	2
‡ VRAM = unified memory (no GDDR bus). INT8 TOPS = 16-core Neural Engine (separate from GPU). TDP not published per chip. FP16 = GPU shader third-party estimate. Runs 70B FP16 — uniquely, without quantisation. Credit: Apple Inc. — apple.com newsroom
NVIDIA GeForce RTX 3090Ampere (GA102) · GDDR6X	NVIDIA	24	GDDR6X	936	~71 ⓘ	~142 ⓘ	350	3
Best-value used card for 24 GB. FP16 confidence: moderate (search results diverged between 71 and 142; we use 71.16 = 2×35.58 FP32, the lower/conservative value). Credit: NVIDIA Corporation.
NVIDIA GeForce RTX 5080Blackwell (GB203) · GDDR7	NVIDIA	16	GDDR7	960	112.6	—	360	3
FP16 112.6 TFLOPS = consistently cited from multiple sources; Blackwell launched Jan 2025. Credit: NVIDIA Corporation — nvidia.com/…/rtx-5080/
NVIDIA GeForce RTX 4080 SuperAda Lovelace (AD103) · GDDR6X	NVIDIA	16	GDDR6X	736	~104 ⓘ	—	320	3
Credit: NVIDIA Corporation — nvidia.com/…/rtx-4080-family/
NVIDIA GeForce RTX 4080Ada Lovelace (AD103) · GDDR6X	NVIDIA	16	GDDR6X	717	~97.5 ⓘ	—	320	3
FP32 49 TFLOPS confirmed by NVIDIA (“49 Shader-TFLOPs”). FP16 = 2×. Credit: NVIDIA Corporation.
AMD Radeon RX 7900 XTXRDNA 3 (Navi 31) · GDDR6	AMD	24	GDDR6	960	123	123	355	3
FP16 Matrix (AI Accelerator) = 123 TFLOPS & INT8 Matrix = 123 TOPS per AMD official product page (RDNA 3 delivers same throughput for FP16 and INT8 matrix). Inference via ROCm (Linux) or Vulkan/DirectML. Credit: AMD — amd.com RX 7900 XTX
Apple M3 Max (40-core GPU, up to 128 GB)Apple Silicon M3 · Unified	Apple	128 ‡	Unified	400	~16.4 ⓘ	18 ‡	— ‡	3
‡ Unified memory / Neural Engine — same caveats as M4 Max. 128 GB enables 70B Q4 locally. Credit: Apple Inc. — apple.com M3 newsroom
NVIDIA GeForce RTX 4070 Ti SuperAda Lovelace (AD103) · GDDR6X	NVIDIA	16	GDDR6X	672	~88 ⓘ	—	285	3
Credit: NVIDIA Corporation — nvidia.com/…/rtx-4070-ti-super/
AMD Radeon RX 7900 XTRDNA 3 (Navi 31) · GDDR6	AMD	20	GDDR6	800	103	103	315	3
FP16 Matrix = 103 TFLOPS, INT8 Matrix = 103 TOPS per WareDB (sourced from AMD official). 20 GB sweet spot. Credit: AMD — amd.com RX 7900 XT
Apple M4 Pro (20-core GPU, up to 64 GB)Apple Silicon M4 · Unified	Apple	64 ‡	Unified	273	~9.2 ⓘ	38 ‡	— ‡	3
‡ Same Apple caveats. FP16 = LOW confidence estimate (half of M4 Max). 64 GB unified memory. Credit: Apple Inc. — Apple tech specs MBP M4 Pro
NVIDIA GeForce RTX 4070 SuperAda Lovelace (AD104) · GDDR6X	NVIDIA	12	GDDR6X	504	~71 ⓘ	—	220	4
Credit: NVIDIA Corporation — nvidia.com/…/rtx-4070-super/
AMD Radeon RX 7800 XTRDNA 3 (Navi 32) · GDDR6	AMD	16	GDDR6	576	74.6	74.6	263	4
FP16 Matrix = 74.6 & INT8 Matrix = 74.6 TOPS per AMD official product page. 16 GB mid-range. Credit: AMD — amd.com RX 7800 XT
Intel Arc A770 16 GBXe-HPG Alchemist (ACM-G10) · GDDR6	Intel	16	GDDR6	560	39.4	—	225	4
FP16 39.4 TFLOPS = published by Intel official product page. 16 GB GDDR6, XMX AI acceleration. INT8 not published for consumer Arc. Credit: Intel Corporation — intel.com Arc A770 specs
AMD Radeon RX 7700 XTRDNA 3 (Navi 32) · GDDR6	AMD	12	GDDR6	432	70.3	70.3	245	4
FP16 Matrix = 70.3 TFLOPS per AMD official spec page (108 AI Accelerators). Credit: AMD — amd.com RX 7700 XT
NVIDIA GeForce RTX 4070Ada Lovelace (AD104) · GDDR6X	NVIDIA	12	GDDR6X	504	~58.5 ⓘ	—	200	4
Credit: NVIDIA Corporation — nvidia.com/…/rtx-4070/
NVIDIA GeForce RTX 4060 Ti 16 GBAda Lovelace (AD106) · GDDR6	NVIDIA	16	GDDR6	288	~44 ⓘ	—	165	4
16 GB main advantage; bandwidth 288 GB/s is a bottleneck (~3× slower token gen vs. RTX 4090 despite same VRAM). Credit: NVIDIA Corporation.
NVIDIA GeForce RTX 3080 12 GBAmpere (GA102) · GDDR6X	NVIDIA	12	GDDR6X	912	~61 ⓘ	—	350	4
912 GB/s bandwidth makes this a fast token generator despite “only” 12 GB. Good used value. Credit: NVIDIA Corporation.
AMD Radeon RX 6900 XTRDNA 2 (Navi 21) · GDDR6	AMD	16	GDDR6	512	~46 ⓘ	—	300	4
RDNA 2 — no dedicated AI Accelerators. FP16 = shader estimate. 16 GB at good used prices. ROCm support for older gen. Credit: AMD — amd.com RX 6900 XT
Intel Arc B580 12 GBXe2-HPG Battlemage (BMG-G21) · GDDR6	Intel	12	GDDR6	456	27.3	—	190	4
FP16 27.34 TFLOPS = Intel official product page. $249 launch (Dec 2024). 160 XMX engines for matrix acceleration. INT8 not published. Credit: Intel Corporation — intel.com Arc B580 specs
NVIDIA GeForce RTX 4060 Ti 8 GBAda Lovelace (AD106) · GDDR6	NVIDIA	8	GDDR6	288	~44 ⓘ	—	160	5
Same compute as 16 GB variant. 8 GB VRAM is the binding AI constraint. Prefer 16 GB for AI work. Credit: NVIDIA Corporation.
NVIDIA GeForce RTX 4060Ada Lovelace (AD107) · GDDR6	NVIDIA	8	GDDR6	272	~30 ⓘ	—	115	5
115 W TDP standout. 8 GB limits to 7B Q4. Good entry point. Credit: NVIDIA Corporation — nvidia.com/…/rtx-4060/
AMD Radeon RX 7600RDNA 3 (Navi 33) · GDDR6	AMD	8	GDDR6	288	~42.8 ⓘ	~42.8 ⓘ	165	5
8 GB RDNA 3 entry. 64 AI Accelerators. FP16/INT8 sourced from RDNA 3 architecture data via gpupoet (not directly from amd.com product page — moderate confidence). Credit: AMD — amd.com RX 7600
NVIDIA GeForce RTX 3070Ampere (GA104) · GDDR6	NVIDIA	8	GDDR6	448	~40.6 ⓘ	—	220	5
Widely available used at low prices. 8 GB limits to 7B Q4. Better bandwidth than RTX 4060 8 GB despite same VRAM class. Credit: NVIDIA Corporation.

ⓘ = estimated value (not directly confirmed from official spec page). ‡ = Apple Silicon caveat applies (unified memory / Neural Engine / TDP not published per chip). See per-row notes and the Methodology page. For machine-readable access, use the REST API.

Frequently asked questions

Which GPU is best for running local LLMs at home?: For most home users, the NVIDIA GeForce RTX 4090 (24 GB, 1,008 GB/s) is the best single-card choice — it has the VRAM to run 34B models at FP16 and the bandwidth to generate tokens quickly. If budget is the constraint, the RTX 3090 offers nearly the same VRAM and bandwidth at a much lower used price. Apple Silicon (M4 Max with 128 GB unified memory) is the only platform that can run 70B models at full FP16 precision without a dedicated GPU costing more than a car.
Why does memory bandwidth matter more than compute TFLOPS for local AI?: Modern LLM inference in single-stream mode is memory-bandwidth-bound, not compute-bound. The GPU must stream the entire model weight from VRAM for every generated token. A card with more GB/s will generate tokens faster even if its FP16 number looks lower. Compute TFLOPS matter more for long-context prompt processing (prefill) and batch inference, where the arithmetic intensity rises.
Can I use an AMD GPU for local AI?: Yes. AMD RDNA 3 GPUs work with ROCm on Linux for frameworks like PyTorch and llama.cpp (ROCm/HIP backend). On Windows, Vulkan and DirectML backends are available via llama.cpp and Ollama. The RX 7900 XTX (24 GB) and RX 7900 XT (20 GB) are competitive with NVIDIA options for inference throughput, with the same AI Accelerator FP16/INT8 matrix performance. ROCm tooling is improving but is not yet as seamless as NVIDIA’s CUDA.
What about Intel Arc GPUs?: Intel Arc GPUs (A770 16 GB, B580 12 GB) are a viable budget option, especially for the 16 GB A770. They use Intel’s XMX matrix engines for FP16/INT8 acceleration and are supported via IPEX-LLM, OpenVINO, and llama.cpp’s SYCL backend. CUDA is not available. Ecosystem maturity is lower than NVIDIA or AMD ROCm.
What does “INT8 TOPS” mean and why is it missing for many cards?: INT8 TOPS (tera-operations per second at 8-bit integer precision) measures how fast a GPU can run quantised model inference. NVIDIA does not publish INT8 TOPS for GeForce consumer cards (it appears only on pro/data-center datasheets). Where shown for consumer NVIDIA cards, the figure is estimated. AMD does publish AI Accelerator INT8 TOPS on its product pages for RDNA 3 — notably, RDNA 3 delivers the same throughput for FP16 and INT8 matrix operations.
What is unified memory (Apple) and how does it compare to GPU VRAM?: Apple Silicon uses a unified memory architecture where the same physical memory pool is shared between CPU, GPU, and Neural Engine with no bus-transfer overhead. “VRAM GB” for Apple entries in this table reflects the maximum unified-memory configuration, not a discrete VRAM pool. This means an M4 Max with 128 GB can dedicate all 128 GB to model weights if needed — something no consumer discrete GPU offers. The trade-off is that memory bandwidth (546 GB/s for M4 Max vs. 1,008 GB/s for RTX 4090) is lower, so token generation on the same model is somewhat slower per watt.
What is the smallest GPU that can run a 13B model?: A 13B parameter model in FP16 requires approximately 26 GB of VRAM (13B × 2 bytes). At Q4 quantisation (~0.5 bytes/parameter), 13B fits in about 7 GB. So: a card with 8 GB VRAM (RTX 4060, RX 7600) can run a 13B model at Q4 quantisation but not at FP16. A 16 GB card (RTX 4080, RX 7900 XTX) handles 13B at FP16 comfortably. A 24 GB card handles 13B at FP16 with room to spare.
How often is this dataset updated?: The dataset was last verified in June 2026 against official manufacturer spec pages. GPU specifications are stable once a card is released, but new models are released regularly. We aim to add new entries and correct any discrepancies as they are identified. Check last_verified in the CSV/JSON and verify key specifications at the manufacturer’s source before purchasing.

Related datasets and tools

Bench Measurements — D-Central’s first-hand ASIC performance measurements from our Montreal bench
Open Data Catalog — all D-Central open datasets with API endpoints
Running Local LLMs in Canada — guide to setting up private AI inference on your own hardware
AI & Bitcoin Mining Convergence — D-Central’s AI vertical
Distributed Compute — sovereign AI infrastructure for Canadian businesses
Bitcoin Mining Field Manual — technical guides for ASIC deployment and hashcenter operations

Cite this dataset

This dataset is published under the Creative Commons Attribution 4.0 International (CC BY 4.0) licence. You are free to share, adapt, and use this data for any purpose, including commercially, as long as you give appropriate credit.

APA
D-Central Technologies. (2026). AI & Local-Inference GPU Database (v1.0) [Dataset]. https://d-central.tech/data/ai-gpu-database/. CC BY 4.0.

Chicago
D-Central Technologies. “AI & Local-Inference GPU Database.” Version 1.0. Dataset. 2026. https://d-central.tech/data/ai-gpu-database/. CC BY 4.0.

BibTeX
@misc{dcentral2026gpudb, author = {{D-Central Technologies}}, title = {AI & Local-Inference {GPU} Database}, year = {2026}, version = {1.0}, howpublished = {url{https://d-central.tech/data/ai-gpu-database/}}, note = {CC BY 4.0} }

Machine-readable downloads: CSV · JSON · REST API

Specifications sourced from NVIDIA Corporation, AMD (Advanced Micro Devices), Apple Inc., and Intel Corporation official product pages and datasheets; cross-referenced with TechPowerUp GPU Database and WareDB. As of June 2026 — verify at source before purchasing decisions. Not financial or purchasing advice.

Related products, repair, and setup paths

Last reviewed July 24, 2026.