Tensor Parallelism

Sovereign AI

Tensor parallelism, also known as horizontal or intra-layer parallelism, divides the math inside a single layer across multiple accelerators. Instead of giving each device a copy of the whole model, the large weight matrices of a layer are sliced column-wise or row-wise, each device computes its partial result, and the pieces are combined. This lets a single layer that would never fit on one accelerator run across several.

How the work is split

In a transformer, the attention and feed-forward matrix multiplications are the natural targets. A weight matrix is partitioned so each device multiplies its slice against the activations, then an all-reduce or all-gather stitches the outputs back together before the next layer. Because this synchronization happens inside every layer, tensor parallelism carries the highest communication overhead of the common strategies and is most effective within a single high-bandwidth node where devices are tightly interconnected.

Where it fits

Tensor parallelism rarely stands alone at scale. It is typically combined with pipeline parallelism (splitting layers across devices) and data parallelism (replicating the whole stack) into what practitioners call 3D parallelism, the layout used to train the largest models. The rule of thumb is to keep tensor parallelism inside a node and use the lower-bandwidth strategies across nodes.

Tensor parallelism is one of the three core axes of distributed training. Compare it with Pipeline Parallelism, which splits the model by layer, and Data Parallelism, which replicates the model and splits the data.

Tensor parallelism, also known as horizontal or intra-layer parallelism, divides the math inside a single layer across multiple accelerators. Instead of giving each device a…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners

Tensor Parallelism

Definition

How the work is split

Where it fits

In Simple Terms

Explore the Full Glossary

ASIC Miner Database