RTX 4090
NVIDIA · triple-slot · Released October 2022
Ada Lovelace's consumer flagship: 24 GB, 1 TB/s bandwidth, 82.6 FP16 TFLOPS. Fastest single-card pleb option for inference.
Hardware spec sheet
| Vendor | NVIDIA |
|---|---|
| Category | GPU |
| VRAM / memory | 24 GB |
| Memory bandwidth | 1008 GB/s |
| FP16 TFLOPS | 82.6 |
| INT8 TOPS | 660 |
| TDP | 450 W |
| Architecture | Ada Lovelace |
| Form factor | triple-slot |
| Release date | October 2022 |
| Street price (USD) | 1600-1900 (new/used) |
| 120V note | 450W on 120V/15A is the practical ceiling for a single card with a 1000W PSU; two 4090s need 240V. |
The RTX 4090 launched October 2022 on NVIDIA’s Ada Lovelace architecture — the direct successor to Ampere (RTX 3090) with a lithography jump to TSMC 4N. Same 24 GB VRAM as the 3090, but meaningfully faster: 1008 GB/s bandwidth, 82.6 FP16 TFLOPS, and roughly 2× tensor throughput for INT8/FP8. Ada Lovelace introduced 4th-gen tensor cores and FP8 support, both of which matter for quantized inference workloads.
Who it’s for: prosumers who want the fastest single-card inference without jumping to workstation cards. Also the card of choice for serious Stable Diffusion / ComfyUI users.
Models it runs comfortably: same parameter envelope as the 3090 (up to ~40B at Q4), but roughly 1.7–2× faster tok/s. Llama 3 70B at Q4 fits with 4K context.
Hashcenter notes: triple-slot, 450 W TDP, 16-pin 12VHPWR connector (check cable quality — early cables had connector issues NVIDIA and partners have since addressed). 450 W on 120V/15A is the practical ceiling for a single card with an 1000 W PSU; a second 4090 really needs 240V. Credit Ada Lovelace — the 2022 flagship that made local 70B-class models feel responsive on a home rig.
Further reading: This card is a core component of a pleb-grade AI Hashcenter. Pair it with the sovereignty argument in the Sovereign AI for Bitcoiners Manifesto, or look at how the same 120V envelope powers a Bitcoin space heater in our mining catalog. Running both workloads on one rig? See Heating Your Home With Inference.
Models that run on this hardware
Get it running
-
01
Install Ollama →
Ten-minute local LLM runtime. One binary, zero cloud.
-
02
Give it a UI →
Open-WebUI turns Ollama into a self-hosted ChatGPT.
-
03
Which runner? →
LM Studio vs Ollama vs llama.cpp — pick the right runtime for your rig.
Further reading: Heating your home with inference for turning this card into a winter-heat source, and the Sovereign AI for Bitcoiners Manifesto for the bigger picture on owner-operated AI.
