Current

RTX 4090

NVIDIA · triple-slot · Released October 2022

Ada Lovelace's consumer flagship: 24 GB, 1 TB/s bandwidth, 82.6 FP16 TFLOPS. Fastest single-card pleb option for inference.

Hardware spec sheet

Vendor	NVIDIA
Category	GPU
VRAM / memory	24 GB
Memory bandwidth	1008 GB/s
FP16 TFLOPS	82.6
INT8 TOPS	660
TDP	450 W
Architecture	Ada Lovelace
Form factor	triple-slot
Release date	October 2022
Street price (USD)	1600-1900 (new/used)
120V note	450W on 120V/15A is the practical ceiling for a single card with a 1000W PSU; two 4090s need 240V.

The RTX 4090 launched October 2022 on NVIDIA’s Ada Lovelace architecture — the direct successor to Ampere (RTX 3090) with a lithography jump to TSMC 4N. Same 24 GB VRAM as the 3090, but meaningfully faster: 1008 GB/s bandwidth, 82.6 FP16 TFLOPS, and roughly 2× tensor throughput for INT8/FP8. Ada Lovelace introduced 4th-gen tensor cores and FP8 support, both of which matter for quantized inference workloads.

Who it’s for: prosumers who want the fastest single-card inference without jumping to workstation cards. Also the card of choice for serious Stable Diffusion / ComfyUI users.

Models it runs comfortably: same parameter envelope as the 3090 (up to ~40B at Q4), but roughly 1.7–2× faster tok/s. Llama 3 70B at Q4 fits with 4K context.

Hashcenter notes: triple-slot, 450 W TDP, 16-pin 12VHPWR connector (check cable quality — early cables had connector issues NVIDIA and partners have since addressed). 450 W on 120V/15A is the practical ceiling for a single card with an 1000 W PSU; a second 4090 really needs 240V. Credit Ada Lovelace — the 2022 flagship that made local 70B-class models feel responsive on a home rig.

Further reading: This card is a core component of a pleb-grade AI Hashcenter. Pair it with the sovereignty argument in the Sovereign AI for Bitcoiners Manifesto, or look at how the same 120V envelope powers a Bitcoin space heater in our mining catalog. Running both workloads on one rig? See Heating Your Home With Inference.

Models that run on this hardware

Gemma 3 Gemma runs at Q5_K_M with headroom Gemma 2 Gemma runs at Q5_K_M with headroom Mistral Small 3 Mistral runs at Q5_K_M with headroom Phi-4 Phi runs at Q8 / FP16 comfortably FLUX.1 schnell FLUX runs at Q8 / FP16 comfortably FLUX.1 dev FLUX runs at Q8 / FP16 comfortably Stable Diffusion 3.5 Stable Diffusion runs at Q8 / FP16 comfortably Mistral 7B Mistral runs at Q8 / FP16 comfortably

Get it running

01 Install Ollama →
Ten-minute local LLM runtime. One binary, zero cloud.
02 Give it a UI →
Open-WebUI turns Ollama into a self-hosted ChatGPT.
03 Which runner? →
LM Studio vs Ollama vs llama.cpp — pick the right runtime for your rig.

Further reading: Heating your home with inference for turning this card into a winter-heat source, and the Sovereign AI for Bitcoiners Manifesto for the bigger picture on owner-operated AI.

RTX 4090

Hardware spec sheet

Models that run on this hardware

Get it running

Related products, repair, and setup paths