Skip to content

Bitcoin accepted at checkout  |  Ships from Laval, QC, Canada  |  Expert support since 2016

Activation Function

Sovereign AI

Definition

An activation function is the nonlinear step applied to a neuron's weighted sum before it passes to the next layer. Without it, stacking layers would collapse into a single linear transformation no matter how deep the network, so activation functions are what give neural networks their expressive power. In transformers, the activation lives in the feed-forward block that follows attention, and the choice of function measurably affects model quality.

From ReLU to GELU and SwiGLU

Early networks used ReLU, which simply zeroes negative inputs. Transformers moved to GELU (Gaussian Error Linear Unit), introduced in 2016, which weights each input by the probability mass of a Gaussian below it, producing a smoother curve that improved language and vision results. The current frontier favors gated variants like SwiGLU, which split the feed-forward input into a value path and a gate path and multiply them; reported results show SwiGLU lowering loss relative to GELU. Meta's Llama recipe of pre-norm plus RMSNorm plus SwiGLU plus RoPE became the de facto template for open models.

Why the detail matters

Gated activations like SwiGLU use three weight matrices in the feed-forward block instead of two, which changes the parameter budget and the shapes you will see when inspecting or quantizing a checkpoint. Knowing which activation a model uses helps when converting weights for local inference.

The feed-forward block that houses the activation is also the part replaced in sparse models; see Mixture of Experts (MoE) and the normalization that precedes it in Layer Normalization.

In Simple Terms

An activation function is the nonlinear step applied to a neuron’s weighted sum before it passes to the next layer. Without it, stacking layers would…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners