Skip to content

We're upgrading our operations to serve you better. Orders ship as usual from Laval, QC. Questions? Contact us

Bitcoin accepted at checkout  |  Ships from Laval, QC, Canada  |  Expert support since 2016

Current

RTX 5090

NVIDIA · dual-slot · Released January 2025

Blackwell flagship: 32 GB GDDR7, 1792 GB/s bandwidth — the first consumer card that comfortably runs 70B models at Q8.

Hardware spec sheet

VendorNVIDIA
CategoryGPU
VRAM / memory32 GB
Memory bandwidth1792 GB/s
FP16 TFLOPS120
INT8 TOPS900
TDP575 W
ArchitectureBlackwell
Form factordual-slot
Release dateJanuary 2025
Street price (USD)1999 MSRP
120V note575W is aggressive for a single 120V/15A branch; 120V/20A or 240V strongly recommended.

The RTX 5090 arrived January 2025 on NVIDIA’s Blackwell architecture, the successor to Ada Lovelace. The VRAM jump from 24 GB to 32 GB is the headline for inference plebs: 70B-class models finally fit at Q8 on a single card with usable context. GDDR7 on a 512-bit bus delivers ~1.8 TB/s — nearly 2× the 4090 — so tok/s scales proportionally on memory-bound workloads.

Who it’s for: professionals and well-funded enthusiasts who need single-card 70B performance without moving to H100/H200 workstation territory.

Models it runs comfortably: Llama 3 70B at Q8 with 8K context, Qwen 2.5 72B at Q4 with 32K context, Mixtral 8x22B at Q3. Also the first consumer card where FP8 training of ~7B LoRAs is genuinely practical.

Hashcenter notes: dual-slot (!) Founders Edition despite 575 W TDP — NVIDIA shifted to a vapor-chamber flow-through design. 575 W is aggressive for a single 120V/15A branch; 120V/20A or 240V strongly recommended. The 12V-2×6 connector evolved from 12VHPWR with better detection — still use a quality cable. Blackwell credits go all the way back to the Tesla architecture that started NVIDIA’s compute journey in 2006.

Further reading: This card is a core component of a pleb-grade AI Hashcenter. Pair it with the sovereignty argument in the Sovereign AI for Bitcoiners Manifesto, or look at how the same 120V envelope powers a Bitcoin space heater in our mining catalog. Running both workloads on one rig? See Heating Your Home With Inference.

Get it running

  1. 01 Install Ollama →

    Ten-minute local LLM runtime. One binary, zero cloud.

  2. 02 Give it a UI →

    Open-WebUI turns Ollama into a self-hosted ChatGPT.

  3. 03 Which runner? →

    LM Studio vs Ollama vs llama.cpp — pick the right runtime for your rig.

Further reading: Heating your home with inference for turning this card into a winter-heat source, and the Sovereign AI for Bitcoiners Manifesto for the bigger picture on owner-operated AI.