Cloud vs Self-Hosted AI: 3-Year TCO Calculator (Canada)
Cloud AI APIs offer instant access with no capital outlay. Self-hosted open-weight models offer sovereignty, privacy, and dramatically lower per-token costs at scale. The right answer depends on your workload volume, your hardware budget, your province’s electricity rate, and how much of the clock your inference stack is actually running.
This calculator computes a complete 3-year total cost of ownership for both paths — cloud API billing (token-based or GPU-hour) versus self-hosted hardware — and shows you the exact crossover month. All formulas are shown in the methodology section below. Electricity rates are pre-filled from D-Central’s Canadian electricity rates dataset (June 2026; verify at utility tariff schedule before financial decisions).
We stand on the shoulders of open-source inference projects — llama.cpp, Ollama, vLLM, Unsloth — that made local inference economically viable. None of D-Central’s hardware recommendations require our proprietary software; these frameworks run on any compatible GPU.
3-year AI TCO calculator
Step 1 — How are you billed for cloud AI?
☁ Cloud inputs
e.g. 50M tokens/mo ≈ heavy team usage
Output is typically 10–30 % of input volume
Default: Claude Sonnet 4.6 — $3.00/MTok (June 2026; verify at anthropic.com/pricing)
Default: Claude Sonnet 4.6 — $15.00/MTok output
USD → CAD conversion
CAD (update to current rate)
Cloud APIs are USD-denominated. All outputs below shown in CAD at this rate.
🏠 Self-host inputs
One-time hardware purchase. Contact us for a quote →
Total system draw (GPU + CPU + memory + cooling). Check your hardware spec sheet.
% of clock time the inference stack is under meaningful load. Low utilization widens payback period.
Covers replacement parts, labour, and software updates. Industry rule-of-thumb: 10–15 %/yr for server hardware.
3-year cost summary
3-yr cloud cost
—
— /mo
3-yr self-host cost
—
— /mo opex
3-yr net savings
—
—
Crossover point
—
Cloud vs self-host — 3-year cost breakdown
—
—
Show self-host cost breakdown
| Hardware capex (one-time) | — |
| 3-yr electricity cost | — |
| 3-yr maintenance | — |
| Total self-host (3 yr) | — |
⚠ Estimate only — verify with actual quotes, tariff schedules, and your cloud provider invoices before financial decisions. Does not include colocation rack fees, internet transit, staff time, or insurance. Cloud costs assume flat monthly volume; does not account for burst pricing, reserved-instance discounts, or committed-use deals that may reduce cloud cost by 20–50 %.
Cloud AI pricing reference (as of June 2026)
The prices below are from public pricing pages as of June 2026. API prices change frequently — always verify at the provider’s official pricing page before financial modelling. All prices are in USD per million tokens.
| Model | Input ($/MTok) | Output ($/MTok) | Source (verify) |
|---|---|---|---|
| Claude Opus 4.8 (Anthropic) | $5.00 | $25.00 | anthropic.com/pricing |
| Claude Sonnet 4.6 (Anthropic) | $3.00 | $15.00 | anthropic.com/pricing |
| Claude Haiku 4.5 (Anthropic) | $1.00 | $5.00 | anthropic.com/pricing |
| GPT-4o (OpenAI) | $2.50 | $10.00 | openai.com/api/pricing |
| GPT-4o mini (OpenAI) | $0.15 | $0.60 | openai.com/api/pricing |
| Gemini 3.1 Pro (Google) | $2.00 | $12.00 | ai.google.dev/pricing |
| Gemini 3.5 Flash (Google) | $1.50 | $9.00 | ai.google.dev/pricing |
| Gemini 2.5 Flash-Lite (Google) | $0.10 | varies | ai.google.dev/pricing |
Sources: Anthropic — anthropic.com/pricing (June 2026). OpenAI — openai.com/api/pricing (June 2026). Google — ai.google.dev/pricing (June 2026). Prices change; verify before committing to a budget.
Cloud GPU-hour reference (as of May–June 2026)
| Provider & GPU | Hourly rate (USD) | Notes |
|---|---|---|
| RunPod — H100 PCIe | ~$1.99/hr | On-demand, spot may be lower; verify at runpod.io |
| RunPod — H100 SXM | ~$2.69/hr | Higher bandwidth NVLink interconnect |
| Lambda Labs — H100 PCIe | ~$3.29/hr | Verify at lambdalabs.com |
| AWS P5 — H100 (8× per instance) | ~$12.29/hr (per GPU equiv.) | ~5× premium over specialized providers; includes AWS support, compliance, SLA |
Source: Synthesized from BuildMVPFast GPU comparison and SynpixCloud GPU pricing 2026, as of May–June 2026. H100 rental rates have fallen significantly since 2024; verify current rates before committing.
Methodology and formulas
The calculator uses the following formulas, modelled over a 36-month (3-year) horizon. All calculations are performed client-side in your browser; no data is sent to our servers.
Cloud cost
Token-based billing:
monthly_cloud (USD) = (input_MTok × input_price_$/MTok)
+ (output_MTok × output_price_$/MTok)
monthly_cloud (CAD) = monthly_cloud_USD × USD_CAD_rate
3yr_cloud (CAD) = monthly_cloud_CAD × 36
GPU-hour billing:
monthly_cloud (USD) = gpu_hours_per_month × $/hour 3yr_cloud (CAD) = monthly_cloud_CAD × 36
Self-host cost
monthly_electricity (CAD) = watts × (utilization% ÷ 100)
× 24 h/day × 30.44 days/mo
÷ 1000 {convert Wh → kWh}
× $/kWh
monthly_maintenance (CAD) = capex × (annual_rate% ÷ 100) ÷ 12
monthly_opex (CAD) = monthly_electricity + monthly_maintenance
3yr_self_host (CAD) = capex + (monthly_opex × 36)
Crossover / payback period
monthly_net_saving = monthly_cloud_CAD − monthly_opex_CAD IF monthly_net_saving ≤ 0 → cloud is cheaper; no crossover within model ELSE: crossover_months = capex ÷ monthly_net_saving
Assumptions and limitations
- Electricity rate: Rates shown are approximate residential/commercial tariff-schedule values from primary utility sources as of June 2026. Industrial large-power tariffs may differ significantly (lower for >1 MW in Quebec; higher for cryptographic workloads under proposed Hydro-Québec tariffs). See the full electricity rates dataset.
- Currency: Cloud API prices are USD-denominated. Self-host costs (hardware, electricity) are typically CAD. The FX input defaults to 1.37 — update to your transaction-date rate. CAD/USD has ranged from 1.30 to 1.45 in 2025–2026.
- Not modelled: Cloud reserved-instance or committed-use discounts (can reduce cloud cost 20–50 %); colocation rack rent and cooling for self-hosted; enterprise support contracts; network egress charges; insurance; staff hours for hardware administration.
- Maintenance rate: The 10 % default follows the industry rule-of-thumb for server hardware. Consumer GPU workstations may run lower (5–8 %); dense GPU cluster deployments may run higher (12–18 %).
- Hardware lifespan: This model assumes hardware depreciates over 3 years. GPU hardware commonly runs 5+ years; extending the horizon would improve the self-host case further.
- Utilization: The single biggest lever in the model. A GPU running at 20 % utilization is very expensive per useful token. Self-hosting typically makes economic sense above 40–50 % sustained utilization.
Hardware tiers for self-hosted AI in Canada
The local AI hardware guide maps today’s major open-weight models (Gemma 4, Qwen3, Llama 4 Scout, DeepSeek V4 Pro) to verified hardware tiers with VRAM requirements and inference framework recommendations. For enterprise-scale deployments or multi-GPU cluster design, D-Central’s AI sovereignty consulting service covers architecture, procurement, and Hashcenter buildout in Quebec and across Canada.
Key self-hosting hardware brackets to model in the calculator:
- Personal / developer workstation (RTX 4090, 24 GB VRAM): typical system draw ~350–400 W; runs Qwen3-27B at Q4 and smaller models; best for individual developers or small teams with moderate volume.
- Workstation 48 / Apple 128 GB unified: total system draw ~400–600 W; handles Llama 4 Scout INT4 and Qwen3-35B; suitable for teams of 5–20 at sustained workloads.
- Single H100 SXM node: ~700 W GPU draw, ~1.2–1.5 kW total system; production inference server; viable for teams with >100 concurrent users or high-throughput batch pipelines.
- Multi-GPU Hashcenter node (4–8× H100): 4–10 kW total system; enterprise scale; requires colocation or purpose-built facility. Contact D-Central for design.
For capex inputs in the calculator, request a quote through the consulting page — we do not list hardware prices publicly as configurations are built to order.
Frequently asked questions
At what monthly cloud spend does self-hosting typically break even?
For a single 24 GB VRAM workstation (≈ CA$12,000–18,000 capex, ~400 W draw, 60 % utilization, Quebec electricity), the self-host opex runs roughly CA$80–150/month. If your cloud API bill exceeds approximately CA$800–1,200/month, self-hosting breaks even within 12–24 months and saves 40–65 % over three years. Higher cloud spend compresses the payback period further. Use the calculator above with your actual numbers — the threshold varies significantly with hardware choice, province, and utilization rate.
Does self-hosting actually save money on output-heavy workloads like agentic AI?
Yes — and this is where the savings are most dramatic. Agentic workloads with large context windows and multi-step reasoning generate high output token counts. At CA$15–25/MTok for output on frontier models, a team running 10 million output tokens per month spends CA$150,000–250,000/year on API alone. A single H100 node (≈ CA$40,000–60,000 capex) running vLLM with an open-weight model at 60 % utilization costs roughly CA$2,500–4,000/year in electricity in Quebec. The break-even on output-intensive workloads is often under 6 months.
Can I trust the quality of open-weight models for production workloads?
That depends on the workload. For code generation, summarization, RAG pipelines, and structured extraction, open-weight models like Llama 4 Scout, Qwen3-30B, and Mistral-class models perform comparably to frontier APIs on many benchmarks. For highly complex multi-step reasoning, frontier APIs (Claude Opus 4.8, GPT-5 class) still hold an edge. A practical approach: route routine high-volume tasks to self-hosted open-weight models and reserve API calls for complex edge cases. This hybrid architecture often achieves 70–90 % cost reduction while maintaining quality on the tasks that matter most.
What about Quebec’s proposed data-centre electricity tariffs — does that change the math?
Potentially, yes. Hydro-Québec has filed proposals for a dedicated data-centre rate of approximately 13 ¢/kWh for facilities drawing more than 5 MW — roughly doubling the current large-power industrial rate for that use case. The proposals are pending regulatory approval as of June 2026 and would not affect facilities below 5 MW or general industrial users. For most self-hosted AI deployments (single-room or small cluster below 5 MW), the current residential/SMB rate applies. For enterprise Hashcenter-scale deployments above 5 MW, model both the current rate and the proposed rate to stress-test your TCO. See the full electricity rates dataset for details and the Hydro-Québec source citation.
Should I include staff time in the self-host cost?
Yes — and this calculator does not model it. Labour is the most common omission in DIY TCO analysis. A self-hosted inference stack requires initial setup (typically 1–3 person-weeks for a new deployment) and ongoing administration (updates, monitoring, troubleshooting — often 2–5 hours/week for a single-node setup, more for clusters). At a fully-loaded Canadian software engineering cost of CA$120–180/hour, even 3 hours/week adds CA$18,000–28,000/year to the true self-host TCO. That said, at scale this cost is amortized across much higher output volume, and many teams already have DevOps capacity. Account for your actual situation. D-Central’s AI sovereignty consulting service includes deployment and ongoing support options.
What open-source inference frameworks should I use?
Ollama is the easiest starting point — it handles model download, quantization selection, and a local OpenAI-compatible API endpoint with minimal configuration. llama.cpp underpins Ollama and is the gold standard for CPU and mixed CPU/GPU inference. vLLM is the production choice for high-throughput multi-user serving on NVIDIA GPUs — PagedAttention makes it significantly more efficient than naive inference at concurrent load. Unsloth is the best option for fine-tuning on consumer hardware. D-Central does not make proprietary inference software a requirement — our hardware ships ready to run any of these frameworks.
Related D-Central resources
- Local AI hardware guide — VRAM requirements and hardware tiers for every major open-weight model, from Gemma to DeepSeek V4 Pro
- AI sovereignty consulting — architecture, procurement, and Hashcenter buildout for Canadian organizations
- Canadian electricity rates by province — primary utility tariff data with operational notes for mining and AI compute workloads
- Running local LLMs in Canada — practical guide to local AI inference for Canadian teams
- Energy for compute — Bitcoin mining and AI compute energy economics
- Firmware cost-of-ownership calculator — equivalent TCO analysis for Bitcoin mining firmware
Related products, repair, and setup paths
- self-hosted AI for Bitcoiners hub
- plebs guide to self-hosted AI
- install Ollama in 10 minutes
- LM Studio vs Ollama vs llama.cpp
- connect local AI to Home Assistant and Obsidian
- self-hosted AI troubleshooting
- repurpose mining hardware into an AI hashcenter
- local AI model leaderboards
Last reviewed June 15, 2026.
