Skip to content

We're upgrading our operations to serve you better. Orders ship as usual from Laval, QC. Questions? Contact us

Bitcoin accepted at checkout  |  Ships from Laval, QC, Canada  |  Expert support since 2016

Superseded

Tesla P40

NVIDIA · blower (passive) · Released September 2016

The budget pleb pick: 24 GB of Pascal-era VRAM for $150–250 used. Slow by 2026 standards but unbeatable $/GB.

Hardware spec sheet

VendorNVIDIA
CategoryGPU
VRAM / memory24 GB
Memory bandwidth347 GB/s
FP16 TFLOPS12
INT8 TOPS47
TDP250 W
ArchitecturePascal
Form factorblower (passive)
Release dateSeptember 2016
Street price (USD)150-250 (used)
120V note250W is easy on 120V/15A; four P40s in one rig still fits on a single 120V/20A circuit with an 80+ PSU.

The Tesla P40 is the 2016 Pascal-architecture datacenter card that became a used-market legend for budget LLM plebs. 24 GB of GDDR5 on a 384-bit bus yields 347 GB/s — roughly a third of a 3090 — but at \$150–250 on the used market it is the best \$/GB of VRAM you can buy. Pascal descends directly from Maxwell (GTX 980 era) and predates Volta’s first-gen tensor cores, so inference is pure FP32/FP16 compute with no tensor acceleration.

Who it’s for: budget-first plebs building their first LLM rig, or anyone stacking VRAM (two P40s = 48 GB for \$400). Acceptable for background/batch workloads where tok/s is not critical.

Models it runs comfortably: Llama 3 8B at Q8, Llama 3 70B distillations at Q4 (slow but functional), Mixtral 8x7B at Q4. Expect 5–10 tok/s on 70B-class — fine for chat, painful for agents.

Hashcenter notes: passive blower — designed for server airflow, needs a fan shroud in a tower case. 250 W TDP on an 8-pin EPS (not PCIe) connector — requires an adapter. No display output, headless only. Single-slot blower form factor makes it excellent for 4-GPU rack builds. Credit NVIDIA for building the card and the secondary market for making it affordable for plebs.

Further reading: This card is a core component of a pleb-grade AI Hashcenter. Pair it with the sovereignty argument in the Sovereign AI for Bitcoiners Manifesto, or look at how the same 120V envelope powers a Bitcoin space heater in our mining catalog. Running both workloads on one rig? See Heating Your Home With Inference.

Get it running

  1. 01 Install Ollama →

    Ten-minute local LLM runtime. One binary, zero cloud.

  2. 02 Give it a UI →

    Open-WebUI turns Ollama into a self-hosted ChatGPT.

  3. 03 Which runner? →

    LM Studio vs Ollama vs llama.cpp — pick the right runtime for your rig.

Further reading: Heating your home with inference for turning this card into a winter-heat source, and the Sovereign AI for Bitcoiners Manifesto for the bigger picture on owner-operated AI.