Skip to content

Bitcoin accepted at checkout  |  Ships from Laval, QC, Canada  |  Expert support since 2016

D-Central Technologies — decentralized computing powering your digital sovereignty.

A local LLM is a large language model that runs on a computer you own — in your home or office, here in Canada — with no prompt and no document ever leaving the machine. For a growing number of Canadians and Canadian businesses, that is the whole point: the difference between renting an AI capability a foreign vendor can re-price, restrict, or switch off, and owning one that runs offline, runs next year, and answers only to you. This page is the practical starting point — what you can run, what hardware it takes, and how D-Central helps you stand it up.

Why Canadians are bringing AI in-house

Three forces are pushing Canadian individuals, SMBs, and regulated organizations toward local AI at once:

The bigger picture — owning your money, intelligence, connectivity, and power as layers no single actor can switch off — is our Sovereign AI in Canada pillar.

What you can actually run locally

More than most people expect. Open-weight models — families like Llama, Qwen, Gemma, Mistral, DeepSeek, and gpt-oss — run well on hardware you can buy in Canada and handle the bulk of real work: drafting and editing, summarizing long documents, classification and extraction, coding assistance, and retrieval-augmented generation (RAG) over your own files. Credit where it is due: that capability exists because the open-model community ships strong weights anyone can use, and tools like Ollama, llama.cpp, and LM Studio made running them genuinely easy.

We will be honest about the gap: for the very hardest reasoning, the longest contexts, and top-end multimodal work, the best hosted models still lead, by roughly a few months — and that gap has been closing. The useful question is not “is it as smart as the frontier?” but “is it smart enough for the job?” For most day-to-day business work, the answer is yes.

How much hardware do you need?

Think of it as a VRAM ladder — bigger models need more video memory (figures are for 4-bit quantization, a sensible default):

Model class Approx. VRAM Good fit for
7–8B ~8 GB A capable everyday assistant on a modest modern GPU
13–14B ~12 GB Stronger general work
30–32B ~24 GB A used RTX 3090 (24 GB) is the classic dollars-per-VRAM value pick
70B ~48 GB Two GPUs, or Apple Silicon with 128 GB unified memory
gpt-oss-120b ~80 GB Serious silicon for the largest open models

The takeaway: a genuinely useful private assistant is within reach of a single well-chosen GPU. Match the memory to your workload rather than buying the biggest card you can. The full build-out and model-matching walkthrough is in Replace Cloud AI With a Local LLM.

How to get started

Starting is easy; doing it well is the harder part. Install Ollama or LM Studio, pull a small 7–8B model, and you will be chatting in minutes — full credit to those projects for lowering the bar. The hard 80% is everything around the model: power draw, heat, quiet 24/7 operation, RAG over your own documents, and tuning for real workloads. That is where most home and office setups quietly struggle, and it is exactly the part we help with.

Let D-Central build it — Canada-wide, hand-built in Laval

If you would rather not assemble and tune it yourself, we do the build, the model selection, the power-and-heat planning, and the hardening, so what you receive is already running. Quote-only and build-to-order: priced to the job, not to a shelf.

We will also tell you honestly when the cloud is the better answer for your use case — that is the whole point of asking people who run both.

Frequently asked questions

Can a local LLM really replace ChatGPT or Claude in Canada?

For most real work — writing, editing, summarizing, document Q&A, and a capable coding assistant — yes. For the hardest reasoning, the longest contexts, and absolute top-end quality, a frontier hosted model still wins, and we will not pretend otherwise. A local model is not smarter; it is yours — it cannot be revoked, and for the bulk of day-to-day tasks that trade is worth it.

What hardware do I need to run a local LLM?

It scales with model size at 4-bit quantization: about 8 GB of VRAM for a 7–8B model, ~12 GB for 13–14B, ~24 GB for 30–32B (a used RTX 3090 is the value pick), and ~48 GB or Apple Silicon with 128 GB unified memory for a 70B. A capable private assistant is within reach of a single well-chosen GPU.

Does running an LLM locally help with Quebec’s Law 25?

A fully local deployment, where personal information never leaves your premises, is the cleanest posture available because it removes the cross-border transfer and third-party-access problems by design. It is a strong posture, not an automatic guarantee — compliance depends on your whole set of practices, so confirm specifics with qualified counsel. This is orientation, not legal advice.

Which open models are best to run locally?

The open-weight families people reach for most are Llama, Qwen, Gemma, Mistral, DeepSeek, and gpt-oss. The right pick depends on your job and your hardware budget — we match a model to your use case during a Sovereignty Briefing rather than chasing a leaderboard.

Can D-Central set this up for me, or sell me a box that is already running?

Both. We build, select and tune the model, plan the power and heat, and harden the setup so what you receive already works — or we guide you through doing it yourself. Everything is quote-only and build-to-order, hand-built in Laval, Quebec, and shipped Canada-wide.