Definition
An adapter layer is a small, trainable module inserted inside a frozen transformer so the model can be adapted to a new task by training a tiny fraction of its parameters. Introduced for NLP by Houlsby and colleagues in 2019, adapters were among the first parameter-efficient fine-tuning techniques and remain a reference point for newer methods.
The bottleneck architecture
Each adapter is a two-layer feed-forward network with a bottleneck. It first projects the layer's hidden representation down to a much smaller dimension, applies a non-linearity, then projects it back up to the original size, with a residual (skip) connection around the block. Because only this narrow down-and-up projection is trained, the parameter count is tiny relative to the full network. In the original work, adapters were placed twice per transformer block, once after the multi-head attention sub-layer and once after the feed-forward sub-layer.
Why it matters
Houlsby's adapters reached near full fine-tuning quality while training under about 4% of the model's parameters, and the frozen base weights can be shared across many tasks, each with its own small adapter. The main drawback compared with merge-friendly methods is that adapter modules stay in the forward pass, adding a little inference latency, which later approaches such as LoRA were designed to avoid.
Adapters are the conceptual ancestor of much of the modern fine-tuning toolkit. For related approaches that adapt a frozen model without inserting new modules, see prefix tuning and prompt tuning in our glossary.
In Simple Terms
An adapter layer is a small, trainable module inserted inside a frozen transformer so the model can be adapted to a new task by training…
