Self-Hosted AI Inference Cost Calculator: Tokens per kWh & $ per Million Tokens

What does it really cost to run a local LLM? Estimate throughput, electricity per million tokens and tokens per kWh for any GPU and model.

Quick answer

Running Llama 3.1 8B Instruct at 4-bit on an NVIDIA GeForce RTX 4090 draws about 450 W and is estimated at ~201.6 tokens/sec (memory-bandwidth bound), so the electricity to generate one million tokens costs roughly $0.0438 CAD at Quebec's residential rate ($0.0707/kWh) — about 1,612,800 tokens per kWh. Self-hosted inference is dominated by hardware cost, not electricity.

Electricity is a tiny fraction of local-inference cost — the real question is hardware amortization. Use the calculator below with your own GPU, model and power price.

GPU Model Quant $/kWh (CAD)

Method: Throughput is the memory-bandwidth-bound estimate tokens/sec ≈ GPU memory bandwidth (GB/s) ÷ model weight size (GB) — a useful upper-bound approximation for single-stream decoding, not a benchmark; real speed varies with framework, batching and context. Cost is electricity only (GPU at full TDP), excluding hardware amortization, cooling and host power. Sources: D-Central AI-GPU database, local-LLM model database & Canadian electricity rates. See also the GPU–LLM fit dataset.

Related products, repair, and setup paths

Last reviewed June 18, 2026.