NVIDIA DGX Spark in Canada: Specs, Sovereignty & Law 25 — D-Central Technologies
NVIDIA introduced the DGX Spark in March 2025 as part of a new category: the personal AI supercomputer. It is not a gaming GPU or a repurposed workstation. It is an engineered system built around the same Blackwell architecture that powers NVIDIA’s data-centre accelerators, compressed into a desktop enclosure that draws under 240 W and fits on a credenza. For Canadian buyers who need serious local AI capability without a full hashcenter build, it occupies a unique position in the market.
D-Central Technologies stands on the shoulders of NVIDIA’s engineering and the open-source ecosystem — llama.cpp, Ollama, vLLM — that made local inference practical. Our role is integration, Canadian procurement, and sovereignty advisory; the hardware excellence belongs to NVIDIA.
NVIDIA DGX Spark official specifications (GB10 Grace Blackwell)
All specifications below are sourced directly from NVIDIA’s official DGX Spark product page and NVIDIA hardware documentation. Verify independently before purchasing.
| Component | Specification | Notes |
|---|---|---|
| SoC | NVIDIA GB10 Grace Blackwell Superchip | CPU and GPU on one die, connected via C2C NVLink |
| GPU architecture | Blackwell (sm_121) — 6,144 CUDA cores, 48 SMs, 5th-gen Tensor Cores | NVFP4 precision support; RT Cores included |
| AI performance | Up to 1 PFLOP at FP4 (~1,000 AI TOPS) | FP4 = NVFP4; real-world throughput varies by workload and quantization |
| CPU | 20-core Arm: 10× Cortex-X925 + 10× Cortex-A725 | Grace CPU; same architecture as NVIDIA’s HPC Grace Hopper systems |
| Unified memory | 128 GB LPDDR5X, shared CPU+GPU pool | 273 GB/s bandwidth; no VRAM vs. RAM split — the full 128 GB is available to both compute units simultaneously |
| Interconnect | C2C NVLink (CPU ↔ GPU) | Eliminates PCIe bottleneck; all memory addressed uniformly |
| Storage | 4 TB NVMe M.2 SSD (PCIe 5.0) | Local model storage; no external NAS required for most workloads |
| Networking | 10 GbE (ConnectX‑7 NIC), WiFi 7, Bluetooth 5.3 | Two DGX Sparks can be NVLinked together for 256 GB unified pools |
| Power | 240 W rated TDP | Real-world peak measured closer to ~200 W under combined CPU+GPU load in independent testing (ServeTheHome, 2025; NVIDIA Dev Forum). Standard North American outlet (120 V / 20 A circuit) is sufficient. |
| Software stack | DGX OS, NVIDIA NIM microservices, NeMo framework, CUDA, cuDNN, TensorRT-LLM | Pre-installed; runs llama.cpp, Ollama, vLLM, and other open-source inference engines |
| Form factor | Desktop / mini-tower | Passive-cooled exterior; fan-cooled internals. No rack unit required. |
| US MSRP (reference only) | From approximately US$3,999 (standard) — prices vary by configuration and retailer | As listed on NVIDIA.com and Canadian retailers (Canada Computers, Memory Express, Amazon.ca) as of mid-2025; subject to change. Contact D-Central for Canadian procurement and integration pricing. |
What open-weight AI models run on 128 GB unified memory
The DGX Spark’s 128 GB unified pool is its defining characteristic. Unlike discrete GPUs where only the VRAM (often 24–80 GB) is available for model weights, every byte of the DGX Spark’s 128 GB is accessible to the GPU compute cores. Use our local LLM VRAM calculator to estimate specific model fits; the table below gives representative examples.
VRAM figures are approximate weight footprints at the stated quantization level. Add 10–30% for KV cache and runtime overhead at normal context lengths. Sources: NVIDIA product page, community benchmarks cited below.
| Model | Quant | ~Memory footprint |
Fits DGX Spark? | Notes |
|---|---|---|---|---|
| Qwen3‑35B | Q8 | ~37 GB | ✓ Comfortable | Ample headroom for long-context RAG |
| Llama 4 Scout (109B MoE) | INT4 | ~55 GB | ✓ Comfortable | Remaining ~73 GB covers KV cache at multi-user batch sizes |
| Mistral Large 2 / Qwen3‑72B | Q4 | ~40–45 GB | ✓ Comfortable | Strong legal / reasoning quality at modest compute |
| Llama 4 Maverick (400B MoE) | INT4 | ~100–120 GB | ⚠ Tight / depends on quant | MoE active-parameter counts are small but all expert weights load; verify exact GGUF size before committing |
| Fine-tune up to 70B | LoRA / QLoRA BF16 | Per NVIDIA | ✓ Supported | NVIDIA states fine-tuning up to 70B parameters is supported via NeMo and HuggingFace PEFT |
| DeepSeek V4 Pro / R2 | INT4 | >400 GB | ✗ Requires cluster | Not a single-node model. Requires 6–8×H100 or DGX Station multi-node. See AI sovereignty consulting for cluster design. |
All VRAM estimates are approximate and community-sourced unless specifically cited to NVIDIA. Actual consumption scales with context window length and batch size. Use the VRAM calculator for your specific model and quantization target.
Two DGX Sparks can be connected via NVLink-C2C to form a 256 GB unified memory pool — per NVIDIA documentation — enabling inference on larger models or higher-throughput multi-user serving without a full rack deployment. This is a meaningful architectural advantage over discrete GPU multi-GPU setups that communicate over PCIe.
Why DGX Spark matters for Canadian AI sovereignty
Canadian organizations face a structural problem with US-hosted AI services: any provider incorporated in the United States or operating US infrastructure is subject to the CLOUD Act (18 U.S.C. § 2713), which compels disclosure of customer data to US federal agencies regardless of where that data is physically stored. This is not a theoretical risk — it is the legal default for every major US AI cloud service.
The DGX Spark eliminates this exposure category entirely: your data never leaves your premises, never transits US infrastructure, and is never subject to a foreign government’s legal demand. See our full explainer at /cloud-act-canada-ai/.
- Data residency: All model weights, all inference input, all output remain on hardware you physically control.
- No usage telemetry sent to NVIDIA cloud by default when running open-weight models via llama.cpp, Ollama, or vLLM (verify your inference stack configuration independently).
- Air-gap capable: The DGX Spark runs fully offline after initial setup. Models are downloaded once and stored locally.
- BYOM (Bring Your Own Model): Open-weight models from Meta, Mistral, Alibaba Cloud, Google DeepMind, and others run locally without any subscription or call-home requirement.
For a full local AI hardware comparison including the DGX Spark alongside GPU workstations and Apple Silicon, see /local-ai-hardware-guide/.
Quebec Law 25 and the DGX Spark: the compliance layer
Legal note: The following is general informational commentary based on publicly available regulatory guidance. It is not legal advice. Consult a qualified Quebec privacy lawyer for advice specific to your organization’s circumstances. The Commission d’accès à l’information du Québec (CAI) is the authoritative source for Law 25 interpretation.
Quebec’s Act Respecting the Protection of Personal Information in the Private Sector (commonly called Law 25, in force in phases 2022–2023) imposes obligations that directly intersect with AI infrastructure choices:
- Section 17 — Cross-border transfers: Organizations transferring personal information outside Quebec must conduct a privacy impact assessment (PIA) and ensure the receiving jurisdiction offers comparable protection. Every inference call sent to a US AI API is a cross-border transfer of the query content. On-premise inference eliminates the transfer.
- Section 23 — Automated decision-making: Where technology is used to make automated decisions about individuals, organizations must disclose the use of such technology and provide a right to have a human review the decision. On-premise deployment gives you full control over what is logged, disclosed, and reviewed.
- Section 8 — Confidentiality obligations: Personal information must not be disclosed without consent. A US provider subject to the CLOUD Act cannot guarantee that disclosure to US authorities will not occur. On-premise hardware removes this variable.
According to Augure’s 2026 Law 25 AI compliance guide and Borden Ladner Gervais’ April 2026 CLOUD Act analysis, on-premise AI infrastructure substantially reduces (though does not automatically eliminate) cross-border transfer obligations under Law 25. The CAI issued C$2.3 million in fines in Q1 2026 alone under section 91’s penalty framework — active enforcement is underway.
For organizations in healthcare, legal services, financial services, or any sector processing sensitive Quebec resident data, the DGX Spark’s on-premise architecture materially simplifies the Law 25 compliance analysis. Full context: /quebec-law-25-ai-on-premise-llm/.
Honest fit analysis: is the DGX Spark right for you?
Not every buyer needs a DGX Spark. Here is an honest breakdown.
Strong fit
- Regulated-sector professionals (healthcare, legal, financial) who process sensitive Quebec or Canadian client data and cannot tolerate CLOUD Act exposure from US AI providers.
- AI researchers and developers who need to fine-tune 7B–70B models locally and want a single-device environment without managing multi-GPU clusters.
- Small teams (3–20 users) needing a shared, always-on inference server at 70B-class quality without per-seat cloud subscription costs that compound over 2–3 years.
- Organizations already evaluating on-premise LLM deployment who want NVIDIA’s full enterprise software stack (NIM, NeMo, TensorRT-LLM) without a full DGX Station or H100 rack.
- Sovereign AI use cases where air-gap operation, audit trails, and zero data-exfiltration are requirements, not preferences.
Weaker fit — consider alternatives
- Frontier MoE models at full precision (DeepSeek V4 Pro, Llama 4 Maverick at FP8): these require >256 GB or multi-node GPU cluster. See sovereign AI consulting for cluster design.
- GPU-accelerated video rendering or traditional scientific HPC workloads: the GB10 GPU is AI-optimized. It is not a replacement for an RTX 4090 for gaming or video production.
- Buyers who only need 7B–14B models and are comfortable with a less complex GPU workstation: smaller, lower-cost options exist. See local AI hardware guide for the full tier comparison.
- High-volume API serving (thousands of concurrent requests): multi-GPU or rack-scale is more appropriate at that scale.
How D-Central helps Canadian DGX Spark buyers
D-Central Technologies sources and integrates local AI hardware for Canadian organizations. For the DGX Spark specifically, we provide:
- Canadian procurement: We source DGX Spark units through authorized NVIDIA channels for Canadian delivery. No need to manage US import logistics, duties, or warranty grey-areas.
- Sovereignty stack setup: We configure your DGX Spark with a sovereign inference stack (Ollama, vLLM, Open WebUI, or custom NIM deployment) so your team is operational from day one — not day thirty.
- Law 25 compliance posture review: We can refer you to our network of Quebec privacy counsel and help you document your on-premise AI deployment for PIA purposes.
- Two-unit NVLink configuration: For teams that need 256 GB unified memory, we design and configure dual-Spark NVLink setups for seamless scale-up without rack infrastructure.
- Ongoing support: Canadian-time-zone support for inference stack issues, model updates, and hardware diagnostics.
Commerce is quote-only — we do not publish CAD pricing here. Contact us at D-Central AI Sovereignty Consulting or reach us directly at support@d-central.tech / +1 855‑753‑9997 to discuss your requirements.
D-Central Technologies
1325 Rue Bergar, Laval QC H7L 4Z7
support@d-central.tech — +1 855‑753‑9997
Frequently asked questions
What is the NVIDIA DGX Spark?
The DGX Spark is a desktop personal AI supercomputer built around NVIDIA’s GB10 Grace Blackwell Superchip. It integrates a 20-core Arm CPU and a Blackwell GPU on a single die with 128 GB of unified LPDDR5X memory at 273 GB/s bandwidth, delivering up to 1 PFLOP of AI performance at FP4 precision. It is designed for local AI development, fine-tuning, and inference — no cloud subscription required.
How much memory does the DGX Spark have?
The DGX Spark has 128 GB of unified memory shared between the CPU and GPU. This is not split into separate “system RAM” and “VRAM” pools — the full 128 GB is addressable by both compute units simultaneously via the C2C NVLink interconnect at 273 GB/s. This architecture allows running models that would require much more conventional discrete GPU VRAM.
What AI models can the DGX Spark run?
According to NVIDIA’s official product page, the DGX Spark can fine-tune models up to 70 billion parameters and run inference on models up to approximately 200 billion parameters. In practice, models like Llama 4 Scout (INT4, ~55 GB), Qwen3-72B (Q4, ~40 GB), and Mistral Large 2 fit comfortably. Full-parameter frontier MoE models like DeepSeek V4 Pro exceed the 128 GB capacity and require a multi-node cluster. Use our VRAM calculator for specific model planning.
Can I buy the NVIDIA DGX Spark in Canada?
Yes. The DGX Spark is available in Canada through several channels including Canada Computers, Memory Express, and Amazon.ca (as of mid-2025). D-Central Technologies also procures and integrates DGX Spark units for Canadian organizations on a quote basis, including sovereignty stack configuration and Canadian support. Contact us for a quote.
How does the DGX Spark help with Quebec Law 25 compliance?
Quebec Law 25 (section 17) requires a Privacy Impact Assessment and equivalent-protection safeguards for cross-border transfers of personal information. When inference runs on a US-hosted AI API, every query containing personal data is a cross-border transfer. Running inference locally on a DGX Spark eliminates this transfer — your data never leaves your premises and is never subject to CLOUD Act disclosure requests. This significantly simplifies (but does not automatically satisfy all of) the Law 25 compliance analysis. This commentary is general information only and not legal advice; consult a qualified Quebec privacy lawyer for your specific situation.
What is the CLOUD Act and why does it matter for Canadian AI users?
The CLOUD Act (18 U.S.C. § 2713) allows US federal law enforcement to compel American companies to produce customer data stored anywhere in the world — including in Canadian data centres. Every major US AI provider (OpenAI, Google, Anthropic, Microsoft Azure) is subject to this law. On-premise hardware like the DGX Spark is not operated by a US company and is not subject to this compulsion. See our full explainer at /cloud-act-canada-ai/.
Can two DGX Sparks be connected for more memory?
Yes. NVIDIA documents the ability to NVLink two DGX Spark units together to form a 256 GB unified memory pool. This enables inference on larger models or higher-throughput serving for small teams, without requiring rack infrastructure or a full DGX Station deployment. D-Central configures dual-Spark NVLink setups as part of our integration service.
What is the power requirement for a DGX Spark?
The DGX Spark’s rated TDP is 240 W. Independent testing (ServeTheHome, 2025; NVIDIA Developer Forum community reports) found real-world peak power draw closer to ~200 W under combined CPU+GPU AI workloads, with idle draw around 22–25 W after a firmware update. A standard 120 V / 20 A North American circuit is sufficient for a single unit.
Is the DGX Spark the same as the DGX Station?
No. The DGX Station is a larger, rack-adjacent tower that uses full-size discrete Blackwell GPUs with much larger discrete VRAM and is positioned for enterprise teams running frontier-scale models. The DGX Spark uses the GB10 SoC with unified memory and is positioned as a personal or small-team AI supercomputer. They are distinct products at different price and capability tiers. If your requirements exceed 128 GB or you need multi-100B parameter inference at commercial throughput, the DGX Station or a GPU cluster is the appropriate next step — contact D-Central AI consulting for sizing guidance.
Does NVIDIA credit open-source projects for the DGX Spark ecosystem?
The DGX Spark ships with NVIDIA’s own software stack (DGX OS, NIM, NeMo, TensorRT-LLM), but the broader ecosystem that makes local LLM inference practical depends heavily on open-source work: llama.cpp (Georgi Gerganov et al.), Ollama, vLLM (UC Berkeley Sky Computing Lab), and the HuggingFace ecosystem. D-Central stands on the shoulders of all of these projects. We credit them because the practical value of hardware like the DGX Spark is inseparable from the open-source inference stack that runs on it.
Related resources
- Local AI hardware guide — Full tier comparison from 8 GB GPU boxes to GPU hashcenter clusters, with model-to-hardware mapping
- Local LLM VRAM calculator — Estimate memory requirements for any open-weight model at any quantization level
- AI sovereignty consulting — D-Central’s sovereign AI infrastructure design and integration service
- CLOUD Act and Canadian AI — Full explainer on CLOUD Act exposure for Canadian organizations using US AI services
- Quebec Law 25 and on-premise LLMs — Regulatory context for Quebec organizations deploying AI locally
- Local LLMs in Canada — Overview of the Canadian on-premise AI landscape
- Distributed compute — Multi-node and hashcenter AI compute options beyond the desktop
Specification source: NVIDIA DGX Spark official product page and NVIDIA DGX Spark hardware documentation. Power data: ServeTheHome review, 2025; NVIDIA Developer Forum community reports. Law 25 regulatory commentary: Borden Ladner Gervais, April 2026; Augure Law 25 guide, 2026. VRAM figures are approximate community-sourced estimates; verify for your specific model and quantization. This page is published in June 2026; specifications and pricing are subject to change by NVIDIA. This page does not constitute legal advice.
Related products, repair, and setup paths
- self-hosted AI for Bitcoiners hub
- plebs guide to self-hosted AI
- install Ollama in 10 minutes
- LM Studio vs Ollama vs llama.cpp
- connect local AI to Home Assistant and Obsidian
- self-hosted AI troubleshooting
- repurpose mining hardware into an AI hashcenter
- local AI model leaderboards
Last reviewed June 18, 2026.
