Current

RTX 5090

NVIDIA · dual-slot · Released January 2025

Blackwell flagship: 32 GB GDDR7, 1792 GB/s bandwidth — the first consumer card that comfortably runs 70B models at Q8.

Hardware spec sheet

Vendor	NVIDIA
Category	GPU
VRAM / memory	32 GB
Memory bandwidth	1792 GB/s
FP16 TFLOPS	120
INT8 TOPS	900
TDP	575 W
Architecture	Blackwell
Form factor	dual-slot
Release date	January 2025
Street price (USD)	1999 MSRP
120V note	575W is aggressive for a single 120V/15A branch; 120V/20A or 240V strongly recommended.

The RTX 5090 arrived January 2025 on NVIDIA’s Blackwell architecture, the successor to Ada Lovelace. The VRAM jump from 24 GB to 32 GB is the headline for inference plebs: 70B-class models finally fit at Q8 on a single card with usable context. GDDR7 on a 512-bit bus delivers ~1.8 TB/s — nearly 2× the 4090 — so tok/s scales proportionally on memory-bound workloads.

Who it’s for: professionals and well-funded enthusiasts who need single-card 70B performance without moving to H100/H200 workstation territory.

Models it runs comfortably: Llama 3 70B at Q8 with 8K context, Qwen 2.5 72B at Q4 with 32K context, Mixtral 8x22B at Q3. Also the first consumer card where FP8 training of ~7B LoRAs is genuinely practical.

Hashcenter notes: dual-slot (!) Founders Edition despite 575 W TDP — NVIDIA shifted to a vapor-chamber flow-through design. 575 W is aggressive for a single 120V/15A branch; 120V/20A or 240V strongly recommended. The 12V-2×6 connector evolved from 12VHPWR with better detection — still use a quality cable. Blackwell credits go all the way back to the Tesla architecture that started NVIDIA’s compute journey in 2006.

Further reading: This card is a core component of a pleb-grade AI Hashcenter. Pair it with the sovereignty argument in the Sovereign AI for Bitcoiners Manifesto, or look at how the same 120V envelope powers a Bitcoin space heater in our mining catalog. Running both workloads on one rig? See Heating Your Home With Inference.

Models that run on this hardware

Mixtral 8x7B Mistral runs at Q4 on this hardware Gemma 3 Gemma runs at Q5_K_M with headroom Gemma 2 Gemma runs at Q5_K_M with headroom Mistral Small 3 Mistral runs at Q8 / FP16 comfortably Phi-4 Phi runs at Q8 / FP16 comfortably FLUX.1 schnell FLUX runs at Q8 / FP16 comfortably FLUX.1 dev FLUX runs at Q8 / FP16 comfortably Stable Diffusion 3.5 Stable Diffusion runs at Q8 / FP16 comfortably

Get it running

01 Install Ollama →
Ten-minute local LLM runtime. One binary, zero cloud.
02 Give it a UI →
Open-WebUI turns Ollama into a self-hosted ChatGPT.
03 Which runner? →
LM Studio vs Ollama vs llama.cpp — pick the right runtime for your rig.

Further reading: Heating your home with inference for turning this card into a winter-heat source, and the Sovereign AI for Bitcoiners Manifesto for the bigger picture on owner-operated AI.

RTX 5090

Hardware spec sheet

Models that run on this hardware

Get it running

Related products, repair, and setup paths