Current

RTX A4000

NVIDIA · single-slot blower · Released April 2021

Single-slot Ampere workstation card with 16 GB and a blower. The quiet-rack pleb's favourite for dense multi-GPU builds.

Hardware spec sheet

Vendor	NVIDIA
Category	GPU
VRAM / memory	16 GB
Memory bandwidth	448 GB/s
FP16 TFLOPS	19.2
INT8 TOPS	155
TDP	140 W
Architecture	Ampere
Form factor	single-slot blower
Release date	April 2021
Street price (USD)	600-900 (used)
120V note	140W each; four A4000s on one 120V/15A circuit is comfortable with an 850W PSU.

The RTX A4000 is NVIDIA’s Ampere-generation workstation card optimised for density: single-slot blower cooler, 140 W TDP, 16 GB of ECC GDDR6. Launched April 2021, it shares Ampere’s compute lineage with the 3090 — same tensor-core generation, same FP16/INT8 throughput per CUDA core — but in a form factor that lets you cram 4–7 cards into a single workstation chassis.

Who it’s for: Hashcenter builders who need multi-GPU density without datacenter cards. Four A4000s in a Threadripper workstation = 64 GB VRAM in a quiet, thermally sane package. Also popular in rack deployments where blower airflow matters.

Models it runs comfortably: single card handles Llama 3 8B at FP16, 14B at Q8, 32B at Q4 (tight). Two cards split a 70B at Q4.

Hashcenter notes: ECC memory is a genuine advantage for long-running inference workloads where bit-flips are a silent failure mode. Blower cooler is noticeably quieter than consumer 3090 blowers because the 140 W TDP keeps fan RPM lower. Used prices \$600–900 as of 2026. 140 W each means four cards on a single 120V/15A circuit is very comfortable. Credit to NVIDIA’s workstation team for a genuinely pleb-friendly dense-compute card.

Further reading: This card is a core component of a pleb-grade AI Hashcenter. Pair it with the sovereignty argument in the Sovereign AI for Bitcoiners Manifesto, or look at how the same 120V envelope powers a Bitcoin space heater in our mining catalog. Running both workloads on one rig? See Heating Your Home With Inference.

Models that run on this hardware

Mistral Small 3 Mistral runs at Q4 on this hardware Phi-4 Phi runs at Q5_K_M with headroom FLUX.1 schnell FLUX runs at Q8 / FP16 comfortably FLUX.1 dev FLUX runs at Q8 / FP16 comfortably Stable Diffusion 3.5 Stable Diffusion runs at Q8 / FP16 comfortably Mistral 7B Mistral runs at Q8 / FP16 comfortably Stable Diffusion XL Stable Diffusion runs at Q8 / FP16 comfortably Whisper Large v3 Whisper runs at Q8 / FP16 comfortably

Get it running

01 Install Ollama →
Ten-minute local LLM runtime. One binary, zero cloud.
02 Give it a UI →
Open-WebUI turns Ollama into a self-hosted ChatGPT.
03 Which runner? →
LM Studio vs Ollama vs llama.cpp — pick the right runtime for your rig.

Further reading: Heating your home with inference for turning this card into a winter-heat source, and the Sovereign AI for Bitcoiners Manifesto for the bigger picture on owner-operated AI.

RTX A4000

Hardware spec sheet

Models that run on this hardware

Get it running

Related products, repair, and setup paths