Open-Weight AI Models for Canada: License × VRAM × Jurisdiction Comparison (2026)

For Canadians, the only AI model that is fully CLOUD Act-free is one you self-host — regardless of what the API provider’s terms of service say about Canadian data residency. Open-weight models (also called open-source weights) can be downloaded and run entirely on your own hardware, keeping inference data under Canadian jurisdiction. This comparison covers six families — Meta Llama 4, DeepSeek R1/V3, Google Gemma 4, Alibaba Qwen3, Mistral, and OpenAI gpt-oss — across the three dimensions that matter most to Canadian operators: license terms, minimum VRAM at Q4 quantization, and which legal jurisdiction governs the model developer. Specs are as of mid-2026 and will shift as quantization tooling evolves; treat VRAM figures as planning ranges, not guarantees.

Why jurisdiction matters — the CLOUD Act problem for Canadian AI

The US Clarifying Lawful Overseas Use of Data (CLOUD) Act (2018) compels US-headquartered companies to produce data stored anywhere in the world in response to lawful US government orders. This means that when a Canadian business sends prompts to OpenAI, Anthropic, Google Vertex, or AWS Bedrock, those messages and responses transit infrastructure controlled by a US entity subject to CLOUD Act jurisdiction — even if the API endpoint resolves to a Canadian region.

Data residency (where the hard drive lives) and data sovereignty (who has legal authority over the operator) are not the same thing. A server in Toronto operated by a US parent company satisfies the residency test but not the sovereignty test. Canadian law firm guidance, Quebec Law 25 compliance frameworks, and federal PIPEDA obligations all treat the corporate structure of the data processor — not just the physical location — as a jurisdictional factor. For a fuller treatment of this distinction, see our guide to the CLOUD Act and Canadian AI operations.

Self-hosting an open-weight model resolves this. When your inference runs on hardware you control in Canada, no API call leaves your perimeter. The model weights are a static file; they do not phone home. This is the core sovereignty argument for running local LLMs — not primarily cost or performance, but legal control over where your data goes.

Master comparison table: license × VRAM × jurisdiction (mid-2026)

VRAM figures are at Q4_K_M quantization in Ollama or llama.cpp unless noted. “Min VRAM” reflects the lightest production-grade variant in that family. MoE = Mixture-of-Experts architecture (only a fraction of parameters activate per token). All figures are approximate and date-sensitive; verify against the project’s Hugging Face model card before provisioning hardware.

Model family	Developer	License	Min VRAM (Q4)	Key sizes	Developer HQ	API: CLOUD Act risk?	Self-hosted: data stays local?
Llama 4 Scout	Meta	Llama 4 Community License¹	~20 GB	17B active / 109B total (MoE)	United States	Yes (Meta AI API)	Yes
Llama 4 Maverick	Meta	Llama 4 Community License¹	~320 GB+	17B active / 400B total (MoE)	United States	Yes (Meta AI API)	Yes (enterprise multi-GPU only)
DeepSeek R1 Distill-7B	DeepSeek AI	MIT	~5.5 GB	7B (distilled from R1)	China	N/A — avoid hosted API²	Yes
DeepSeek R1 Distill-14B	DeepSeek AI	MIT	~8.5 GB	14B (distilled from R1)	China	N/A — avoid hosted API²	Yes
DeepSeek R1 Distill-32B	DeepSeek AI	MIT	~18 GB	32B (distilled from R1)	China	N/A — avoid hosted API²	Yes
Gemma 4 12B	Google DeepMind	Apache 2.0	~6.7 GB	12B (multimodal)	United States	Yes (Google Vertex / Gemini API)	Yes
Gemma 4 27B	Google DeepMind	Apache 2.0	~14.4 GB	26B MoE (multimodal)	United States	Yes (Google Vertex / Gemini API)	Yes
Qwen3-4B	Alibaba Cloud	Apache 2.0	~3.5 GB	4B dense	China	N/A — avoid hosted API²	Yes
Qwen3-14B	Alibaba Cloud	Apache 2.0	~8–10 GB	14B dense	China	N/A — avoid hosted API²	Yes
Qwen3-32B	Alibaba Cloud	Apache 2.0	~18–20 GB	32B dense	China	N/A — avoid hosted API²	Yes
Mistral 7B	Mistral AI	Apache 2.0	~4 GB	7B	France (EU)	GDPR applies; not CLOUD Act	Yes
Mistral NeMo 12B	Mistral AI	Apache 2.0	~7.1 GB	12B	France (EU)	GDPR applies; not CLOUD Act	Yes
Mistral Small 24B	Mistral AI	Apache 2.0	~13.4 GB	24B	France (EU)	GDPR applies; not CLOUD Act	Yes
gpt-oss-20b	OpenAI	Apache 2.0	~12–16 GB	20B MoE (released Aug 2025)	United States	N/A — weights only, no OpenAI API	Yes
gpt-oss-120b	OpenAI	Apache 2.0	~80–96 GB	120B MoE (released Aug 2025)	United States	N/A — weights only, no OpenAI API	Yes

¹ Llama 4 Community License notes: Commercial use is permitted for entities with fewer than 700 million monthly active users worldwide. The license explicitly excludes entities domiciled in the European Union — this restriction does not apply to Canadian businesses, but is worth monitoring for companies with EU operations. You may not use Llama 4 outputs to train a competing foundation model. Verify the current license text at llama.meta.com before production deployment; Meta has updated its license terms between model generations.

² Chinese-developer API caution: DeepSeek and Qwen are developed by Chinese-headquartered companies (Hangzhou DeepSeek Artificial Intelligence and Alibaba Group respectively). Accessing their hosted APIs routes inference through servers subject to Chinese data-access laws, which are structurally different from Canadian or EU frameworks. Self-hosting the open weights on your own hardware resolves the data-residency question entirely — your prompts and responses never leave your infrastructure. However, some Canadian organizations in regulated sectors (finance, health, government) maintain supply-chain policies that extend to the origin of model weights themselves, irrespective of self-hosting. Assess this against your own compliance posture; this is not legal advice.

Model family deep dives

Meta Llama 4 (Scout and Maverick)

Llama 4 is Meta’s fourth generation open-weight family, released in early 2026. Both Scout (109B total MoE) and Maverick (400B total MoE) use Mixture-of-Experts architecture, meaning only 17 billion parameters activate per inference pass — making Scout substantially lighter than its total parameter count suggests.

Scout for self-hosting: At Q4 quantization, Scout requires approximately 20–24 GB VRAM, fitting a single RTX 4090 (24 GB) or RTX 5090 for production workloads. At FP8/FP16 precision, a single H100 80 GB is the baseline. Scout handles long context and multimodal input (image + text).

Maverick for self-hosting: Maverick’s 400 billion total parameters make single-node self-hosting impractical for most organizations. Full-precision inference requires 4–8 H100 80 GB GPUs, translating to enterprise-scale capital expenditure. Maverick is relevant for larger hashcenters deploying GPU compute for multi-tenant inference services; for a single organization’s internal AI stack, Scout is the appropriate choice.

License flag: The Llama 4 Community License permits broad commercial use below the 700M MAU threshold. The EU exclusion does not affect Canadian companies operating only in Canada, but Canadian businesses with EU subsidiaries or EU-resident users should seek legal review before deployment. Verify the licence at llama.meta.com. The licence is stricter than MIT or Apache 2.0 — it is a custom document, not a standard OSI-recognised open-source licence.

DeepSeek R1 and V3 (MIT licence)

DeepSeek AI (Hangzhou, China) released both the R1 reasoning model and the V3 general model under the MIT licence — the most permissive standard open-source licence. MIT allows unrestricted commercial use, modification, redistribution, and fine-tuning with no royalty, attribution requirement, or competitive-use clause. This is a genuinely open licence by any standard OSI definition.

The full R1 and V3 models are 671 billion parameter MoE architectures that require data-centre-scale infrastructure to run at full precision. For practical self-hosting, DeepSeek released a series of smaller distilled models — R1-Distill-7B through R1-Distill-70B — that extract reasoning capability into dense models derived from the Qwen3 and Llama base architectures.

Distilled model VRAM planning (Q4_K_M, mid-2026 figures):

R1-Distill-7B: ~5.5 GB — runs on an RTX 3060 12 GB or MacBook with 8 GB unified memory
R1-Distill-14B: ~8.5 GB — fits a 10 GB GPU; comfortable on RTX 3080/4070
R1-Distill-32B: ~18 GB — single RTX 4090 (24 GB) with headroom for KV cache
R1-Distill-70B: dual 24 GB GPUs or Apple Silicon with 64 GB+ unified memory

The jurisdictional split: Self-hosted DeepSeek weights do not call home. Your inference data stays on your hardware. The concern for regulated Canadian workloads is not data residency — self-hosting solves that — but the origin of the weights themselves: the intellectual property, training data decisions, and potential backdoor risk of models produced under Chinese law. For low-risk internal workflows (code completion, document summarisation), this distinction is largely academic. For workloads involving privileged legal advice, health records, or national security-adjacent applications, consult your legal team and your organisation’s AI procurement policy.

The local LLM guide for Canada covers hardware selection and setup frameworks for self-hosted DeepSeek distillations.

Google Gemma 4 (Apache 2.0)

Gemma 4 is Google DeepMind’s open-weight family released on April 2, 2026, under the Apache 2.0 licence — a clean departure from Gemma 3’s custom Gemma Terms of Use, which included distillation restrictions and was not OSI-recognised as open source. The upgrade to Apache 2.0 in Gemma 4 removes those limitations: you can fine-tune Gemma 4 on private data, distil from it, and ship commercial products built on it without attribution obligations or competitive-use clauses.

Gemma 4 comes in four sizes as of mid-2026: E2B and E4B (edge-optimised for mobile/embedded), a 26B MoE (efficient), and a 31B dense model. The 12B multimodal variant handles both text and images. VRAM at Q4 (approximate):

Gemma 4 12B: ~6.7 GB — RTX 3060 12 GB or better
Gemma 4 26B MoE: ~14.4 GB — RTX 3090/4080/4090
Gemma 4 31B dense: ~17.5 GB — RTX 4090 24 GB or dual 16 GB GPUs

Developer jurisdiction is the United States (Google LLC). If you access Gemma via Google Vertex AI or the Gemini API, the CLOUD Act applies. Self-hosted Gemma 4 — weights downloaded from Hugging Face, inference running on your hardware — fully removes that vector.

Alibaba Qwen3 (Apache 2.0)

Qwen3 is Alibaba Cloud’s open-weight family released on April 28, 2025, under the Apache 2.0 licence. The family spans eight dense model sizes (0.6B, 1.7B, 4B, 8B, 14B, and 32B) and two MoE models (30B-A3B and 235B-A22B, where the second number is activated parameters per pass). Qwen3 introduced a dual-mode thinking/non-thinking architecture: models can toggle between extended chain-of-thought reasoning (higher quality, higher latency) and direct answer mode (faster, suitable for agentic loops).

As of mid-2026, Qwen3 and its successors (Qwen3.6 series) are among the most competitive open-weight models on reasoning benchmarks in the 14B–32B range. VRAM at Q4_K_M (approximate):

Qwen3-4B: ~3.5 GB — runs on entry-level GPUs, even CPU-only with a fast machine
Qwen3-8B: ~5–6 GB — GTX 1080 Ti / RTX 3060
Qwen3-14B: ~8–10 GB — RTX 3080/4070
Qwen3-32B: ~18–20 GB — single RTX 4090 or 5090
Qwen3-235B-A22B MoE: ~140 GB+ FP16; INT4 estimated ~60–80 GB — multi-GPU required

The same jurisdictional note applies as for DeepSeek: Alibaba is a Chinese-headquartered company. Self-hosted Qwen3 weights produce no API traffic to Alibaba’s infrastructure. For organisations that extend supply-chain scrutiny to weight origins, alternatives with EU or US developer pedigree (Mistral, Gemma 4, gpt-oss) may be preferred. For most Canadian SMB workloads, self-hosted Qwen3 at the 7B–32B range is a practical, cost-effective choice with an explicitly permissive licence.

Mistral AI (Apache 2.0 — French/EU jurisdiction)

Mistral AI is headquartered in Paris, France, making it the only major open-weight model developer in this comparison outside US and Chinese jurisdiction. EU-based companies are subject to GDPR rather than the CLOUD Act. This makes Mistral models particularly relevant for Canadian businesses that need to demonstrate CLOUD Act immunity on both the weights (developer origin) and the inference path (self-hosted).

Mistral releases several models under Apache 2.0: the original Mistral 7B, Mistral NeMo 12B (in partnership with NVIDIA), and Mistral Small 24B. VRAM at Q4_K_M (approximate):

Mistral 7B: ~4–4.3 GB — runs comfortably on 6–8 GB VRAM; accessible on consumer hardware
Mistral NeMo 12B: ~7.1 GB — RTX 3060 12 GB or better; strong multilingual capability
Mistral Small 24B: ~13.4 GB — RTX 3090/4080 tier; competitive general-purpose quality

Mistral also offers larger proprietary models (Mistral Large) that are not open-weight and are accessed via API only — those carry the same third-party data-routing concerns as any other hosted API. Only the Apache 2.0 models listed above are self-hostable. If you access api.mistral.ai, your data transits Mistral’s EU infrastructure; it is not CLOUD Act-exposed, but it does leave your perimeter.

For Canadian operations with strong French-language requirements, Mistral NeMo 12B has notably strong French/English bilingual performance — a practical advantage given Quebec language obligations.

OpenAI gpt-oss (Apache 2.0)

On August 5, 2025, OpenAI released its first open-weight models since GPT-2 in 2019: gpt-oss-20b and gpt-oss-120b, both under the Apache 2.0 licence. These are Mixture-of-Experts architectures — the 20B model activates a fraction of its total parameters per inference pass. They are not available through the OpenAI API or ChatGPT; they exist exclusively as downloadable weights.

gpt-oss-20b: Approximately 12–16 GB VRAM at MXFP4/Q4 quantization, making it accessible on a single RTX 4080/4090. This is the accessible tier for organisations building on-premises AI workflows without enterprise GPU budgets.

gpt-oss-120b: Requires 80–96 GB VRAM for a clean single-node run — an H100 80 GB at minimum, or 2× A100 40 GB. At benchmark time (August 2025), gpt-oss-120b matched or exceeded o4-mini on general problem-solving, coding, and health QA tasks. It is a genuine frontier-class self-hostable option, though the hardware cost places it in hashcenter or enterprise compute territory.

Developer jurisdiction is the United States (OpenAI, LLC). Because these models have no API, there is no API data-routing concern — the only data that ever touches OpenAI’s infrastructure is the weight download itself. Self-hosted inference is fully contained.

CLOUD Act immunity map

The following is a plain-language summary of which deployment patterns remove CLOUD Act exposure for Canadian operators. This is a general informational framework, not legal advice; consult qualified Canadian counsel for regulated workloads.

Deployment pattern	Data residency	CLOUD Act exposure	Recommended for sensitive CA data?
OpenAI / Anthropic / Google API (any region)	Varies by region setting	Yes — US parent company	No (verify with legal counsel)
Azure OpenAI (Canadian region)	Canada	Yes — Microsoft is a US company	No without specific contractual protections
AWS Bedrock (ca-central-1)	Canada	Yes — Amazon is a US company	No without specific contractual protections
DeepSeek/Qwen hosted API (chat.deepseek.com, Alibaba Cloud)	China	Chinese law applies	No
Mistral API (api.mistral.ai)	EU	GDPR applies; not CLOUD Act	Case-by-case; better than US APIs for CLOUD Act but still third-party
Any open-weight model, self-hosted on Canadian hardware	Canada	None — no US operator in data path	Yes (subject to your hardware vendor’s jurisdiction)

For a deeper look at how cloud AI provider agreements handle Canadian data, see our cloud AI provider comparison and the dedicated CLOUD Act guide for Canadian organizations.

How to choose the right model for your use case

Constraint-first: what VRAM do you have?

Before choosing a model on capability grounds, size for your actual hardware. Our VRAM calculator for local LLMs covers common GPU configurations. Quick reference:

6–8 GB VRAM (RTX 3060, GTX 1080 Ti, M2 MacBook Air): Mistral 7B, DeepSeek R1-Distill-7B, Qwen3-4B, Gemma 4 E4B
12–16 GB VRAM (RTX 3060 12 GB, RTX 4070/4080, M2 Pro MacBook): Mistral NeMo 12B, Gemma 4 12B, DeepSeek R1-Distill-14B, gpt-oss-20b, Qwen3-8B
24 GB VRAM (RTX 3090, RTX 4090, RTX 5090, A5000): Mistral Small 24B, Gemma 4 26B MoE, Llama 4 Scout, DeepSeek R1-Distill-32B, Qwen3-32B
40–80 GB VRAM (A100, H100): gpt-oss-120b, Qwen3-235B-A22B (INT4), Llama 4 Maverick (multi-GPU)

Licence priority: what can you legally do with the model?

MIT (most permissive): DeepSeek R1/V3 and distillations. No conditions whatsoever — use commercially, redistribute, fine-tune, distil.
Apache 2.0 (broadly permissive): Qwen3, Mistral (7B, NeMo, Small), Gemma 4, gpt-oss. Same freedom as MIT plus an explicit patent grant. No competitive-use clause; no attribution requirement in output.
Llama 4 Community License (custom): Permissive for qualifying users, but not OSI-recognised open source. Read the full licence before shipping; the 700M MAU cap, the EU exclusion, and the “no competing foundation model” clause all have operational implications.
Gemma 3 (not Gemma 4) — Gemma Terms of Use: Includes distillation restrictions. Gemma 4 (Apache 2.0) supersedes Gemma 3 for new deployments.

Jurisdiction priority: “cleanest” from Canadian sovereignty standpoint

Best: Self-hosted Mistral (Apache 2.0, French developer, EU GDPR framework) — no CLOUD Act, no Chinese-law weight provenance
Good: Self-hosted gpt-oss (Apache 2.0, US developer, but no API involved) — US weight provenance but no data ever goes to a US operator in self-hosted mode
Good: Self-hosted Gemma 4 (Apache 2.0, US developer) — same position as gpt-oss
Good: Self-hosted Llama 4 Scout (US developer, custom licence) — read the Llama licence; no API data routing
Workload-dependent: Self-hosted DeepSeek / Qwen (MIT / Apache 2.0, Chinese developer) — data stays local, but weight provenance from China is a supply-chain factor for some regulated sectors

For a complete rundown of self-hosting options in the Canadian context — hardware, inference frameworks, energy costs, and setup guides — see running local LLMs in Canada. For Canadian sovereign-stack architecture (combining local inference with decentralised compute and mesh networking), see the sovereign stack guide.

Honest verdict

No single model wins across all dimensions. The honest answer for most Canadian businesses:

For a balanced choice at 12–24 GB VRAM with clean jurisdiction: Mistral NeMo 12B or Mistral Small 24B (Apache 2.0, EU developer, strong French support).
For maximum reasoning capability per dollar of GPU at 24 GB VRAM: Qwen3-32B or DeepSeek R1-Distill-32B (Apache 2.0 / MIT) — accept the Chinese developer weight-provenance trade-off, consult your compliance posture.
For lowest hardware floor: Qwen3-4B or Mistral 7B run on 6–8 GB VRAM; both are Apache 2.0 / MIT; both are genuinely useful for summarisation, classification, and light code tasks.
For GPT-quality reasoning on-premises with permissive licence: gpt-oss-20b at 12–16 GB VRAM (Apache 2.0, US developer); gpt-oss-120b for hashcenter-scale deployments.
For multimodal (image + text) at 14 GB VRAM: Gemma 4 26B MoE (Apache 2.0, US developer) is a clean choice.

All specs in this comparison are date-stamped mid-2026. Quantization tooling evolves rapidly; by the time you read this, newer quants (IQ3, MXFP4, compressed architectures) may change the VRAM floor substantially. Always verify against the current model card on Hugging Face before provisioning hardware.

Frequently asked questions

What does “open weight” mean — is it the same as open source?

Open-weight means the model’s trained parameters (weights) are publicly available for download. Open source, strictly defined by the Open Source Initiative, additionally requires access to training data, training code, and the right to modify and redistribute all components. Most of the models in this comparison are “open weight” rather than fully open source: you get the weights and can run inference freely, but the training data and precise training methodology are typically not disclosed. MIT and Apache 2.0 licences cover the weights themselves; they do not compel disclosure of training data.

Does storing data in a Canadian AWS or Azure region make it CLOUD Act-safe?

No. The CLOUD Act grants US authorities the power to compel US-headquartered companies to produce data regardless of where that data is physically stored. Both Amazon and Microsoft are US-headquartered. A Canadian AWS region (ca-central-1) satisfies data residency requirements under some Canadian frameworks, but does not remove CLOUD Act exposure. Only removing the US company from the data path — specifically by self-hosting on hardware with no US operator involvement — resolves CLOUD Act jurisdiction. This is a general informational statement; consult qualified Canadian counsel for regulated workloads.

Is self-hosting DeepSeek safe if it’s a Chinese model?

Self-hosting means your inference data never leaves your hardware — it does not transit DeepSeek’s servers or any Chinese infrastructure. The data residency question is fully resolved by self-hosting. What remains is a supply-chain consideration: the model weights were developed by a Chinese company under Chinese law. For most commercial workloads (internal tooling, code completion, document processing), this distinction is of minimal operational significance. For workloads involving state-sensitive, legally-privileged, or national-security-adjacent data, your organisation’s AI procurement and security policies should govern the decision, and you should obtain qualified legal advice specific to your sector.

Which open-weight model has the most permissive licence for fine-tuning?

MIT-licenced models (DeepSeek R1/V3 and distillations) impose no conditions at all — you can fine-tune, distil, and redistribute without any requirement. Apache 2.0 models (Qwen3, Mistral, Gemma 4, gpt-oss) are functionally equivalent for almost all use cases and additionally include an explicit patent grant. Llama 4’s Community Licence is more restrictive: it prohibits using Llama 4 outputs to train a competing foundation model, and requires agreement to terms that are specific to Meta’s policies rather than a standard open-source framework.

What is the minimum hardware to run a useful local LLM in Canada?

A useful entry point is an 8 GB VRAM GPU (a used RTX 3060 or RTX 3070) running Mistral 7B or DeepSeek R1-Distill-7B at Q4_K_M. These fit in a standard desktop workstation and cost a fraction of enterprise GPU hardware. For Apple Silicon users, an M2/M3 MacBook with 16 GB unified memory can run 7B–13B models adequately. Our VRAM calculator helps size hardware for specific model/quantization combinations. Energy cost matters too: sustained GPU inference draws 150–350W per card; factor that into your operating cost projection.

Does Llama 4 work in Canada, or is the EU exclusion a concern?

The Llama 4 Community Licence’s EU exclusion applies to entities domiciled in the European Union — not Canadian entities. Canadian businesses can use Llama 4 Scout and Maverick commercially without triggering the EU clause, provided they meet the other licence conditions (primarily the sub-700M MAU threshold). If your Canadian company has an EU subsidiary or EU-domiciled employees who would be the model operator, that sub-entity’s eligibility should be reviewed against the current licence text at llama.meta.com. Licence terms can change between model generations.

How do Qwen3 and DeepSeek distillations compare for French-language tasks?

As of mid-2026, Mistral NeMo 12B and Mistral Small 24B are generally considered stronger for French/English bilingual tasks due to Mistral’s European training data emphasis. Qwen3 and DeepSeek distillations perform well on French but were trained on primarily English and Chinese corpora with multilingual top-up. For Quebec-specific content — Quebecois idiom, legal French, regulatory French — Mistral models are the safer baseline choice among the options in this comparison. Benchmark all candidates on your specific French-language task before committing to a production stack.

What is gpt-oss, and why did OpenAI release open-weight models?

OpenAI released gpt-oss-20b and gpt-oss-120b on August 5, 2025 — their first publicly available model weights since GPT-2 in 2019. Both models use Mixture-of-Experts architecture and are licensed under Apache 2.0. They are not available through the OpenAI API or ChatGPT; they are weights-only releases intended for self-hosting. OpenAI framed the release as part of its commitment to broad AI access while maintaining safety evaluation protocols. At benchmark time, gpt-oss-120b matched or exceeded o4-mini on coding and general problem-solving tasks. As US-headquartered company weights licensed under Apache 2.0, gpt-oss sits in the same sovereignty tier as Gemma 4 for Canadian self-hosters: no CLOUD Act exposure in self-hosted mode, US weight provenance noted.

Related products, repair, and setup paths

Last reviewed June 15, 2026.