Open Dataset · CC BY 4.0 · D-Central Open Data

Local LLM Model Database

A machine-readable index of 33 open-weight large language models you can run locally — without routing data through US cloud providers. Every record covers parameter count, context window, license, approximate VRAM at Q4 / Q8 / FP16, and the Ollama pull tag. Built for engineers, researchers, and Canadian sovereignty-minded operators who need to evaluate models against their hardware before downloading 40 GB.

Records: 33 models
Families: Llama · Qwen · Gemma · Mistral · Phi · DeepSeek · OLMo · Command · SmolLM
As of: June 2026 — verify at source before deployment
License: CC BY 4.0

REST API (live JSON)
Download CSV
Download JSON

Why run a large language model locally?

When you query a hosted AI API your prompts, documents, and completions travel to and are processed on servers owned by US technology companies. Those servers fall under the CLOUD Act, which lets US authorities compel disclosure of stored data regardless of where the server sits — including Canadian data stored on US-owned infrastructure. For organisations handling sensitive client data, intellectual property, or regulated information, that is a material risk.

Self-hosted open-weight models eliminate that exposure entirely: your prompts and responses never leave your own hardware. The models in this database are all open-weight and self-hostable. Every row carries cloud_act_free_if_selfhosted = true.

This dataset is a companion to D-Central’s Local LLM Canada guide and the AI Sovereignty Consulting service. It is one layer of a broader open data initiative — the same philosophy that drives open Bitcoin mining infrastructure drives open AI infrastructure: decentralisation is not a feature, it is the architecture.

Credit: These models exist because of the researchers and organisations who open their work. D-Central indexes them; we do not create them. Standing on the shoulders of Meta AI, Alibaba / Qwen Team, Google DeepMind, Mistral AI, Microsoft, DeepSeek AI, Allen AI, Cohere, and HuggingFace — all of whom released open weights that make self-hosted AI possible.

How to read this table

Params (B): Total parameter count in billions. For MoE models the active-params column shows how many are used per forward pass (determining inference speed).
Context (K): Native context window in thousands of tokens. Some models can extend this with RoPE scaling or YaRN — the native spec is listed; check the model card for extended-context options.
VRAM Q4 / Q8 / FP16: Approximate GPU memory in GB at each quantisation level. Q4 (4-bit) lets you run larger models on consumer GPUs; Q8 is near-lossless; FP16 is full precision. Figures are cross-checked against Ollama library reported sizes where available, then calculated using the standard formula (params × bits / 8 × ~1.15 overhead). Actual usage grows with context length — these are base-load estimates at idle context.
Commercial: Whether the license explicitly permits commercial use without a separate agreement. Entries marked ✗ (CC-BY-NC) require a separate commercial license from the developer.
Ollama tag: Run ollama pull <tag> to download. Ollama applies Q4_K_M quantisation by default unless you specify a tag suffix.
CLOUD Act free: Always ✓ for self-hosted deployments — your data stays on your hardware.

The open-weight LLM landscape changes quickly. New models, new versions, and license updates appear frequently. Treat this as a starting point and verify the canonical model card before production deployment.

Open-weight LLM model index

Filter by family:

Max VRAM (Q4, GB):

Commercial use only:

Model	Family	Params (B)	Active (B)	Context (K)	License	Commercial	VRAM Q4 (GB)	VRAM Q8 (GB)	VRAM FP16 (GB)	Ollama Tag	Modality	CLOUD Act Free

VRAM figures are approximate estimates — base model weights only, no active KV cache. Verified against Ollama library file sizes where noted. Sort by any column header.

VRAM quick reference — which GPU can run what

GPU / Hardware	VRAM	Practical Q4 ceiling	Example models that fit
RTX 3060 / 4060	8 GB – 12 GB	~12 GB Q4	Phi-3.5 mini (2.2 GB), SmolLM2 1.7B (1.8 GB), Mistral Nemo 12B (7.1 GB), Gemma 3 12B (7.5 GB)
RTX 3090 / 4090	24 GB	~22 GB Q4	All of the above + Qwen2.5 32B (19 GB), Qwen3 30B-A3B (19 GB), Gemma 4 31B (20 GB)
RTX 6000 Ada / A6000 Ada	48 GB	~45 GB Q4	All 32B models + Llama 3.1/3.3 70B (43 GB), DeepSeek-R1-Distill-Llama-70B (43 GB)
2× RTX 3090 / 4090 (tensor split)	48 GB combined	~45 GB Q4	Same as A6000 Ada; llama.cpp and Ollama support multi-GPU tensor split
H100 80 GB / A100 80 GB	80 GB	~75 GB Q4	All 70B models at Q8; Qwen2.5 72B (44 GB Q4, 75 GB Q8)
Apple M3 Ultra (192 GB unified)	192 GB	~180 GB Q4	Command R+ 104B (59 GB Q4), Llama 3.1 405B (245 GB Q4 — needs 256 GB+ Mac or multi-H100)

VRAM figures are approximate. Context length and batch size add KV-cache overhead beyond the base model weight size. Ollama automatically attempts multi-GPU tensor split when needed. For sovereign AI infrastructure planning, see AI Sovereignty Consulting.

License summary for open-weight models

Not all open-weight models are open for commercial use. This table maps the license types found in this dataset to practical deployment rights. This is informational, not legal advice — read the full license and consult a lawyer for regulated deployments.

License	Commercial use	Attribution required	MAU / threshold restriction	Models in this dataset
Apache 2.0	✓ Fully permitted	License notice only	None	Qwen2.5 (≤32B), Qwen3 (all), Gemma 4, Mistral 7B / Nemo, Mixtral 8x7B, OLMo 2, SmolLM2
MIT	✓ Fully permitted	License notice only	None	Phi-3.5 mini, Phi-4, DeepSeek-R1-Distill-Qwen-7B/14B/32B
Llama Community License (3.1 / 3.2 / 3.3)	✓ Permitted (conditions apply)	“Built with Llama” in product	>700M MAU requires Meta permission	Llama 3.1, Llama 3.2, Llama 3.3, DeepSeek-R1-Distill-Llama-70B
Tongyi Qianwen License	✓ Permitted (conditions apply)	“Built with Qwen” in product	>100M MAU requires Alibaba permission	Qwen2.5 72B
Google Gemma Terms	✓ Permitted (conditions apply)	Specified usage policy	Prohibited uses listed; no MAU cap	Gemma 3 (4B, 12B, 27B)
CC-BY-NC-4.0	✗ Non-commercial only	Attribution required	Separate commercial license from Cohere	Command R 35B, Command R+ 104B

Frequently asked questions

What is an open-weight large language model?

An open-weight LLM (large language model) is an AI model whose trained weights (the numerical parameters that encode its knowledge) are publicly released and downloadable. Unlike closed models such as GPT-4 or Claude — which are only accessible via APIs owned by their developers — open-weight models can be run on your own hardware without any API call, subscription, or data transmission to a third-party server. “Open-weight” is distinct from “open source”: the weights are free to download, but the training code and training data may or may not be published. OLMo 2 (Allen AI) is currently the most fully open family, releasing weights, training data, training code, and intermediate checkpoints under Apache 2.0.

Why does the CLOUD Act matter for AI in Canada?

The US CLOUD (Clarifying Lawful Overseas Use of Data) Act of 2018 allows US authorities to compel US-headquartered technology companies to hand over data stored on servers anywhere in the world, including Canada. When Canadian organisations use AI APIs hosted by US companies (OpenAI, Anthropic, Google, Microsoft Azure), their prompts and generated outputs may be subject to this obligation. Running an open-weight model on your own Canadian-controlled hardware means no US company is in custody of your data, eliminating this exposure. This is particularly relevant for law firms, healthcare organisations, financial institutions, and government bodies handling sensitive information. Note: this is informational, not legal advice — consult a privacy lawyer for your specific regulatory context.

What does Q4 / Q8 / FP16 quantisation mean?

Quantisation reduces the precision (bit-width) used to store each model parameter, trading a small amount of quality for dramatically lower memory requirements. FP16 (16-bit floating point) is full precision — the most accurate but requires the most VRAM (approximately 2 GB per billion parameters). Q8 (8-bit integer quantisation) halves the memory to roughly 1 GB per billion parameter with minimal quality loss. Q4_K_M (4-bit quantisation with mixed-precision handling) further halves it to roughly 0.5–0.6 GB per billion parameter, with a small but measurable quality reduction on demanding tasks. For most chat, writing, and coding tasks Q4_K_M is indistinguishable from FP16 in practice. Ollama applies Q4_K_M by default when you run ollama pull <model> without a quantisation tag.

What is Ollama and how do I use it?

Ollama is an open-source tool (MIT license) that makes running open-weight LLMs on macOS, Linux, and Windows as simple as a single command. It handles model downloading, quantisation selection, GPU/CPU offloading, and exposes a local REST API compatible with the OpenAI API format. To run a model: install Ollama from ollama.com, then run ollama run llama3.1:8b (for example) in a terminal. The Ollama tag column in this dataset gives you the exact string to pass to ollama pull or ollama run. Ollama is not the only way to run these models — llama.cpp, LM Studio, vLLM, and text-generation-webui are popular alternatives.

Which model should I start with for local deployment?

The answer depends on your hardware and use case. For a 6–8 GB GPU: Phi-3.5 mini 3.8B (2.2 GB Q4, MIT, 128K context) or Mistral Nemo 12B (7.1 GB Q4, Apache 2.0, 128K context) are strong starting points. For a 24 GB GPU: Qwen2.5 32B or Qwen3 30B-A3B (MoE, inference speed of a 3B model). For a 70B-class model on multi-GPU or a 48 GB card: Llama 3.3 70B or DeepSeek-R1-Distill-Llama-70B for reasoning-heavy tasks. For maximum commercial licence freedom: Apache 2.0 models (Qwen3, Gemma 4, OLMo 2, Mistral AI models) carry the fewest restrictions. For a fully self-hosted Bitcoin mining / AI sovereignty setup, D-Central’s AI Sovereignty Consulting can help design the right stack.

What is a Mixture of Experts (MoE) model and how does it affect VRAM?

A Mixture of Experts (MoE) model partitions its parameters into “experts” — sub-networks specialised for different inputs. For each token, a router selects only a small subset of experts to activate (e.g. 2 out of 8). This means the model can have many more total parameters than a standard dense model while using the same compute per token as a smaller one. The catch: all expert weights must be loaded into VRAM simultaneously, even though most are idle at any given moment. So Mixtral 8x7B (46.7B total params, 12.9B active) infers at the speed of a 13B model but requires ~26 GB VRAM at Q4. Similarly Qwen3 30B-A3B requires ~19 GB VRAM but infers as fast as a 3B model. MoE models are excellent for latency-sensitive deployments where you have sufficient VRAM.

Are models like Command R commercially usable?

No — Command R (35B) and Command R+ (104B) from Cohere are released under CC-BY-NC-4.0, which prohibits commercial use without a separate licence from Cohere. Both are marked with a ✗ in the “Commercial” column of this dataset. If you need a commercially licensed model in the 35B+ range, consider Qwen2.5 32B (Apache 2.0), Qwen3 32B (Apache 2.0), or Llama 3.1 70B (Llama Community License, commercial OK). Always read the full licence text before production deployment. This dataset is informational, not legal advice.

How often is this dataset updated?

The dataset is manually curated and verified against Hugging Face model cards and the Ollama library. The current version covers models as of June 2026. The open-weight LLM space moves rapidly — new model families, updated versions, and licence changes appear frequently. We update this database when major new releases occur. The REST API at /wp-json/dc/v1/local-llm-models always reflects the current version. If you spot an error or omission, the dataset is available under CC BY 4.0 — cite D-Central and contribute corrections. Last verified: June 2026.

Cite this dataset

This dataset is published under Creative Commons Attribution 4.0 International (CC BY 4.0). You are free to share, adapt, and use it commercially — with attribution.

Plain text:

D-Central Technologies. (2026). Local LLM Model Database [Dataset]. D-Central Open Data. https://d-central.tech/data/local-llm-model-database/ — CC BY 4.0. As of June 2026; verify at source.

BibTeX:

@dataset{dcentral2026llmdb,
  title     = {Local LLM Model Database},
  author    = {{D-Central Technologies}},
  year      = {2026},
  month     = {June},
  url       = {https://d-central.tech/data/local-llm-model-database/},
  license   = {CC BY 4.0},
  note      = {Open-weight LLM index: parameters, context window, VRAM,
               license, Ollama tag. As of June 2026; verify at source.}
}

Machine-readable:

REST API: /wp-json/dc/v1/local-llm-models
CSV: /data/local-llm-model-database.csv
JSON: /data/local-llm-model-database.json

Attribution note: The models indexed here are released by their respective developers (Meta, Alibaba/Qwen Team, Google DeepMind, Mistral AI, Microsoft, DeepSeek AI, Allen AI, Cohere, HuggingFace). D-Central provides this index under CC BY 4.0; the models themselves remain under their own licences listed in each record.

Related products, repair, and setup paths

Last reviewed June 15, 2026.