RTX 5090
NVIDIA · dual-slot · Released January 2025
Blackwell flagship: 32 GB GDDR7, 1792 GB/s bandwidth — the first consumer card that comfortably runs 70B models at Q8.
Hardware spec sheet
| Vendor | NVIDIA |
|---|---|
| Category | GPU |
| VRAM / memory | 32 GB |
| Memory bandwidth | 1792 GB/s |
| FP16 TFLOPS | 120 |
| INT8 TOPS | 900 |
| TDP | 575 W |
| Architecture | Blackwell |
| Form factor | dual-slot |
| Release date | January 2025 |
| Street price (USD) | 1999 MSRP |
| 120V note | 575W is aggressive for a single 120V/15A branch; 120V/20A or 240V strongly recommended. |
The RTX 5090 arrived January 2025 on NVIDIA’s Blackwell architecture, the successor to Ada Lovelace. The VRAM jump from 24 GB to 32 GB is the headline for inference plebs: 70B-class models finally fit at Q8 on a single card with usable context. GDDR7 on a 512-bit bus delivers ~1.8 TB/s — nearly 2× the 4090 — so tok/s scales proportionally on memory-bound workloads.
Who it’s for: professionals and well-funded enthusiasts who need single-card 70B performance without moving to H100/H200 workstation territory.
Models it runs comfortably: Llama 3 70B at Q8 with 8K context, Qwen 2.5 72B at Q4 with 32K context, Mixtral 8x22B at Q3. Also the first consumer card where FP8 training of ~7B LoRAs is genuinely practical.
Hashcenter notes: dual-slot (!) Founders Edition despite 575 W TDP — NVIDIA shifted to a vapor-chamber flow-through design. 575 W is aggressive for a single 120V/15A branch; 120V/20A or 240V strongly recommended. The 12V-2×6 connector evolved from 12VHPWR with better detection — still use a quality cable. Blackwell credits go all the way back to the Tesla architecture that started NVIDIA’s compute journey in 2006.
Further reading: This card is a core component of a pleb-grade AI Hashcenter. Pair it with the sovereignty argument in the Sovereign AI for Bitcoiners Manifesto, or look at how the same 120V envelope powers a Bitcoin space heater in our mining catalog. Running both workloads on one rig? See Heating Your Home With Inference.
Models that run on this hardware
Get it running
-
01
Install Ollama →
Ten-minute local LLM runtime. One binary, zero cloud.
-
02
Give it a UI →
Open-WebUI turns Ollama into a self-hosted ChatGPT.
-
03
Which runner? →
LM Studio vs Ollama vs llama.cpp — pick the right runtime for your rig.
Further reading: Heating your home with inference for turning this card into a winter-heat source, and the Sovereign AI for Bitcoiners Manifesto for the bigger picture on owner-operated AI.
