Definition
Model distillation, also called knowledge distillation, is a compression technique in which a small student model is trained to imitate the behavior of a larger, more capable teacher model. The goal is to retain most of the teacher's performance while shrinking the parameter count, memory footprint, and inference cost, making the model practical to run on modest hardware.
How it works
Instead of training the student only on hard, one-hot labels, distillation trains it to match the teacher's full output distribution, the soft probabilities the teacher assigns across possible answers. These soft targets carry richer information about how the teacher generalizes, so the student learns faster and more reliably than it would from labels alone. Researchers distinguish response-based distillation, which matches final outputs, from feature-based and relation-based variants that match internal representations.
Why it matters for sovereign AI
Distillation is a key reason capable models can run locally rather than only in a data center. A distilled model can fit on a single GPU or even a powerful workstation, which keeps inference, prompts, and data under the operator's own control. That self-hosting capability is central to digital sovereignty: smaller, distilled open-weight models let individuals own the full stack rather than renting it. One caveat to note neutrally is that distilling from a proprietary model may conflict with that provider's terms of use, so the provenance of teacher outputs matters.
Distillation complements other efficiency and alignment topics; see Direct Preference Optimization for how a distilled model can then be tuned to preferences.
In Simple Terms
Model distillation, also called knowledge distillation, is a compression technique in which a small student model is trained to imitate the behavior of a larger,…
