Definition
Parameter-Efficient Fine-Tuning (PEFT) is an umbrella term for techniques that specialise a large language model to a new task or domain by training only a tiny fraction of its parameters — often well under 1% — while keeping the original pretrained weights frozen. This sidesteps the enormous memory and compute cost of full fine-tuning, where every weight is updated, and makes adaptation feasible on modest hardware that a self-hoster might actually own.
The main PEFT approaches
Several families fall under PEFT. LoRA injects small low-rank matrices into the model's weight updates. Adapters insert lightweight bottleneck modules between transformer layers. Prefix- and prompt-tuning instead learn a small set of continuous vectors prepended to the input rather than touching the network's weights at all. Studies report that adapter and LoRA methods can match full fine-tuning accuracy while reducing trainable parameters by over 95%.
Why it matters for sovereignty
PEFT is what makes running and customising your own model practical without a data-centre. Because only the small added weights are trained, the resulting artefact is a few megabytes that can be shared, swapped, and version-controlled independently of the multi-gigabyte base model — a good fit for local, self-custodied AI.
PEFT methods combine naturally with model compression: QLoRA, for instance, applies LoRA on top of a quantized base model. The result is then a smaller alternative to traditional full fine-tuning.
In Simple Terms
Parameter-Efficient Fine-Tuning (PEFT) is an umbrella term for techniques that specialise a large language model to a new task or domain by training only a…
