Self-Hosted AI Inference Cost Calculator: Tokens per kWh & $ per Million Tokens
What does it really cost to run a local LLM? Estimate throughput, electricity per million tokens and tokens per kWh for any GPU and model.
Quick answer
Running Llama 3.1 8B Instruct at 4-bit on an NVIDIA GeForce RTX 4090 draws about 450 W and is estimated at ~201.6 tokens/sec (memory-bandwidth bound), so the electricity to generate one million tokens costs roughly $0.0445 CAD at Quebec's residential rate ($0.0718/kWh) — about 1,612,800 tokens per kWh. Self-hosted inference is dominated by hardware cost, not electricity.
Electricity is a tiny fraction of local-inference cost — the real question is hardware amortization. Use the calculator below with your own GPU, model and power price.
Method: Throughput is the memory-bandwidth-bound estimate tokens/sec ≈ GPU memory bandwidth (GB/s) ÷ model weight size (GB) — a useful upper-bound approximation for single-stream decoding, not a benchmark; real speed varies with framework, batching and context. Cost is electricity only (GPU at full TDP), excluding hardware amortization, cooling and host power. Sources: D-Central AI-GPU database, local-LLM model database & Canadian electricity rates. See also the GPU–LLM fit dataset.
Related products, repair, and setup paths
- immersion cooling hub
- home immersion cooling guide
- ASIC miners for immersion planning
- ASIC cooling parts
- airflow shroud before immersion
- compare miner specs in the database
- ASIC repair support
Last reviewed June 18, 2026.
