Superseded

Tesla P40

NVIDIA · blower (passive) · Released September 2016

The budget pleb pick: 24 GB of Pascal-era VRAM for $150–250 used. Slow by 2026 standards but unbeatable $/GB.

Hardware spec sheet

Vendor	NVIDIA
Category	GPU
VRAM / memory	24 GB
Memory bandwidth	347 GB/s
FP16 TFLOPS	12
INT8 TOPS	47
TDP	250 W
Architecture	Pascal
Form factor	blower (passive)
Release date	September 2016
Street price (USD)	150-250 (used)
120V note	250W is easy on 120V/15A; four P40s in one rig still fits on a single 120V/20A circuit with an 80+ PSU.

The Tesla P40 is the 2016 Pascal-architecture datacenter card that became a used-market legend for budget LLM plebs. 24 GB of GDDR5 on a 384-bit bus yields 347 GB/s — roughly a third of a 3090 — but at \$150–250 on the used market it is the best \$/GB of VRAM you can buy. Pascal descends directly from Maxwell (GTX 980 era) and predates Volta’s first-gen tensor cores, so inference is pure FP32/FP16 compute with no tensor acceleration.

Who it’s for: budget-first plebs building their first LLM rig, or anyone stacking VRAM (two P40s = 48 GB for \$400). Acceptable for background/batch workloads where tok/s is not critical.

Models it runs comfortably: Llama 3 8B at Q8, Llama 3 70B distillations at Q4 (slow but functional), Mixtral 8x7B at Q4. Expect 5–10 tok/s on 70B-class — fine for chat, painful for agents.

Hashcenter notes: passive blower — designed for server airflow, needs a fan shroud in a tower case. 250 W TDP on an 8-pin EPS (not PCIe) connector — requires an adapter. No display output, headless only. Single-slot blower form factor makes it excellent for 4-GPU rack builds. Credit NVIDIA for building the card and the secondary market for making it affordable for plebs.

Further reading: This card is a core component of a pleb-grade AI Hashcenter. Pair it with the sovereignty argument in the Sovereign AI for Bitcoiners Manifesto, or look at how the same 120V envelope powers a Bitcoin space heater in our mining catalog. Running both workloads on one rig? See Heating Your Home With Inference.

Models that run on this hardware

Gemma 3 Gemma runs at Q5_K_M with headroom Gemma 2 Gemma runs at Q5_K_M with headroom Mistral Small 3 Mistral runs at Q5_K_M with headroom Phi-4 Phi runs at Q8 / FP16 comfortably FLUX.1 schnell FLUX runs at Q8 / FP16 comfortably FLUX.1 dev FLUX runs at Q8 / FP16 comfortably Stable Diffusion 3.5 Stable Diffusion runs at Q8 / FP16 comfortably Mistral 7B Mistral runs at Q8 / FP16 comfortably

Get it running

01 Install Ollama →
Ten-minute local LLM runtime. One binary, zero cloud.
02 Give it a UI →
Open-WebUI turns Ollama into a self-hosted ChatGPT.
03 Which runner? →
LM Studio vs Ollama vs llama.cpp — pick the right runtime for your rig.

Further reading: Heating your home with inference for turning this card into a winter-heat source, and the Sovereign AI for Bitcoiners Manifesto for the bigger picture on owner-operated AI.

Tesla P40

Hardware spec sheet

Models that run on this hardware

Get it running

Related products, repair, and setup paths