Definition
MLC-LLM (Machine Learning Compilation for LLMs) is an open-source universal deployment engine for large language models. Its goal is to let a model run natively across an unusually wide range of hardware, including NVIDIA, AMD, and Intel GPUs, Apple Silicon, iPhones, Android phones, and even web browsers. For a sovereign user who wants the same model on a workstation and a phone without depending on a vendor's cloud, MLC-LLM offers one toolchain that targets all of them.
Compilation with Apache TVM
MLC-LLM's distinguishing approach is machine learning compilation. Using Apache TVM as its backend, it compiles a model down to device-specific native code and GPU shaders rather than relying on a single hand-written runtime. It can generate Metal shaders for Apple devices, Vulkan for Linux and Windows, and WebGPU shading language for browsers, producing a portable binary library tuned to each platform's constraints.
On-device and in-browser inference
This compilation strategy is what enables MLC's in-browser sibling project, WebLLM, to run an LLM entirely client-side over WebGPU with no server, while retaining a large share of native performance. MLCEngine exposes an OpenAI-compatible API across REST, Python, JavaScript, iOS, and Android, all backed by the same compiler. The trade-off is an explicit compilation step for each target, in exchange for genuine cross-platform reach.
MLC-LLM emphasizes portability over any single vendor; contrast it with NVIDIA-only TensorRT-LLM and the lightweight cross-platform runtime llama.cpp.
In Simple Terms
MLC-LLM (Machine Learning Compilation for LLMs) is an open-source universal deployment engine for large language models. Its goal is to let a model run natively…
