Definition
ONNX (Open Neural Network Exchange) is a community-driven open standard for representing machine-learning models in a portable way. It defines a common computation graph and a set of built-in operators so that a model trained in one framework can be run in another, breaking the vendor and toolchain lock-in that otherwise ties a model to the software it was created in. For a sovereign stack, that portability means your model is not hostage to a single runtime's continued existence.
How the format works
An ONNX model is an acyclic graph of nodes, where each node is a call to a standardized operator with defined data types. Because the operators are specified once and shared, implementations remain portable across frameworks rather than each tool reimplementing them. This makes ONNX a kind of interchange format, the way a common file standard lets different programs open the same document.
ONNX Runtime and inference
The companion ONNX Runtime executes these models efficiently across CPUs, GPUs, and specialized accelerators, which is useful when you want to run a model on whatever hardware you already own. While much of the local LLM world standardizes on GGUF, ONNX is widely used for smaller models, embeddings, vision, and audio tasks, and for deploying models in constrained or embedded environments.
For LLM-specific local formats and runtimes, compare GGUF and llama.cpp; models in either form are commonly found on the Hugging Face Hub.
In Simple Terms
ONNX (Open Neural Network Exchange) is a community-driven open standard for representing machine-learning models in a portable way. It defines a common computation graph and…
