RTX A4000
NVIDIA · single-slot blower · Released April 2021
Single-slot Ampere workstation card with 16 GB and a blower. The quiet-rack pleb's favourite for dense multi-GPU builds.
Hardware spec sheet
| Vendor | NVIDIA |
|---|---|
| Category | GPU |
| VRAM / memory | 16 GB |
| Memory bandwidth | 448 GB/s |
| FP16 TFLOPS | 19.2 |
| INT8 TOPS | 155 |
| TDP | 140 W |
| Architecture | Ampere |
| Form factor | single-slot blower |
| Release date | April 2021 |
| Street price (USD) | 600-900 (used) |
| 120V note | 140W each; four A4000s on one 120V/15A circuit is comfortable with an 850W PSU. |
The RTX A4000 is NVIDIA’s Ampere-generation workstation card optimised for density: single-slot blower cooler, 140 W TDP, 16 GB of ECC GDDR6. Launched April 2021, it shares Ampere’s compute lineage with the 3090 — same tensor-core generation, same FP16/INT8 throughput per CUDA core — but in a form factor that lets you cram 4–7 cards into a single workstation chassis.
Who it’s for: Hashcenter builders who need multi-GPU density without datacenter cards. Four A4000s in a Threadripper workstation = 64 GB VRAM in a quiet, thermally sane package. Also popular in rack deployments where blower airflow matters.
Models it runs comfortably: single card handles Llama 3 8B at FP16, 14B at Q8, 32B at Q4 (tight). Two cards split a 70B at Q4.
Hashcenter notes: ECC memory is a genuine advantage for long-running inference workloads where bit-flips are a silent failure mode. Blower cooler is noticeably quieter than consumer 3090 blowers because the 140 W TDP keeps fan RPM lower. Used prices \$600–900 as of 2026. 140 W each means four cards on a single 120V/15A circuit is very comfortable. Credit to NVIDIA’s workstation team for a genuinely pleb-friendly dense-compute card.
Further reading: This card is a core component of a pleb-grade AI Hashcenter. Pair it with the sovereignty argument in the Sovereign AI for Bitcoiners Manifesto, or look at how the same 120V envelope powers a Bitcoin space heater in our mining catalog. Running both workloads on one rig? See Heating Your Home With Inference.
Models that run on this hardware
Get it running
-
01
Install Ollama →
Ten-minute local LLM runtime. One binary, zero cloud.
-
02
Give it a UI →
Open-WebUI turns Ollama into a self-hosted ChatGPT.
-
03
Which runner? →
LM Studio vs Ollama vs llama.cpp — pick the right runtime for your rig.
Further reading: Heating your home with inference for turning this card into a winter-heat source, and the Sovereign AI for Bitcoiners Manifesto for the bigger picture on owner-operated AI.
