Tesla P40
NVIDIA · blower (passive) · Released September 2016
The budget pleb pick: 24 GB of Pascal-era VRAM for $150–250 used. Slow by 2026 standards but unbeatable $/GB.
Hardware spec sheet
| Vendor | NVIDIA |
|---|---|
| Category | GPU |
| VRAM / memory | 24 GB |
| Memory bandwidth | 347 GB/s |
| FP16 TFLOPS | 12 |
| INT8 TOPS | 47 |
| TDP | 250 W |
| Architecture | Pascal |
| Form factor | blower (passive) |
| Release date | September 2016 |
| Street price (USD) | 150-250 (used) |
| 120V note | 250W is easy on 120V/15A; four P40s in one rig still fits on a single 120V/20A circuit with an 80+ PSU. |
The Tesla P40 is the 2016 Pascal-architecture datacenter card that became a used-market legend for budget LLM plebs. 24 GB of GDDR5 on a 384-bit bus yields 347 GB/s — roughly a third of a 3090 — but at \$150–250 on the used market it is the best \$/GB of VRAM you can buy. Pascal descends directly from Maxwell (GTX 980 era) and predates Volta’s first-gen tensor cores, so inference is pure FP32/FP16 compute with no tensor acceleration.
Who it’s for: budget-first plebs building their first LLM rig, or anyone stacking VRAM (two P40s = 48 GB for \$400). Acceptable for background/batch workloads where tok/s is not critical.
Models it runs comfortably: Llama 3 8B at Q8, Llama 3 70B distillations at Q4 (slow but functional), Mixtral 8x7B at Q4. Expect 5–10 tok/s on 70B-class — fine for chat, painful for agents.
Hashcenter notes: passive blower — designed for server airflow, needs a fan shroud in a tower case. 250 W TDP on an 8-pin EPS (not PCIe) connector — requires an adapter. No display output, headless only. Single-slot blower form factor makes it excellent for 4-GPU rack builds. Credit NVIDIA for building the card and the secondary market for making it affordable for plebs.
Further reading: This card is a core component of a pleb-grade AI Hashcenter. Pair it with the sovereignty argument in the Sovereign AI for Bitcoiners Manifesto, or look at how the same 120V envelope powers a Bitcoin space heater in our mining catalog. Running both workloads on one rig? See Heating Your Home With Inference.
Models that run on this hardware
Get it running
-
01
Install Ollama →
Ten-minute local LLM runtime. One binary, zero cloud.
-
02
Give it a UI →
Open-WebUI turns Ollama into a self-hosted ChatGPT.
-
03
Which runner? →
LM Studio vs Ollama vs llama.cpp — pick the right runtime for your rig.
Further reading: Heating your home with inference for turning this card into a winter-heat source, and the Sovereign AI for Bitcoiners Manifesto for the bigger picture on owner-operated AI.
