Definition
Prefix tuning is a parameter-efficient fine-tuning method that steers a frozen language model by prepending a sequence of trainable continuous vectors, a learned prefix, to the model's hidden states. Introduced by Li and Liang in 2021 for natural-language generation tasks, it lets you adapt a large model to a new task without touching any of its original weights.
Prefixes in every layer
The defining feature of prefix tuning is that the trainable vectors are inserted into all of the transformer's layers, not just the input. These prefix vectors act on the attention key and value states at each layer, giving the method fine-grained influence over the model's internal computation. During training the prefix is not optimized directly; instead it is produced by a small feed-forward reparameterization network, which improves stability. That helper network is discarded after training, leaving only the learned prefix vectors.
Efficiency and use
Li and Liang reported that prefix tuning matched full fine-tuning on generation tasks while training roughly 1000x fewer parameters, and that it performed especially well in low-data settings. Because each task needs only its own small prefix, a single frozen base model can serve many tasks. This is attractive for self-hosted deployments where storing one full model copy per task would be wasteful.
Prefix tuning is one of several soft prompt methods. It contrasts with prompt tuning, which adds learnable vectors only at the input embedding layer rather than throughout the network.
In Simple Terms
Prefix tuning is a parameter-efficient fine-tuning method that steers a frozen language model by prepending a sequence of trainable continuous vectors, a learned prefix, to…
