Skip to content

Bitcoin accepted at checkout  |  Ships from Laval, QC, Canada  |  Expert support since 2016

Jamba

Sovereign AI

Definition

Jamba is a hybrid large language model that interleaves three building blocks: Mamba state-space layers, transformer attention layers, and mixture-of-experts (MoE) feed-forward layers. Developed by AI21 Labs and released with open weights, it was the first production-scale model to combine a state-space backbone with conventional attention, aiming to get the long-context efficiency of state-space models together with the reasoning sharpness of attention.

How the blocks combine

A Jamba model is built from repeating blocks. Most layers are Mamba layers, which carry information forward in a fixed-size recurrent state and so handle very long contexts cheaply; roughly one in every eight layers is a transformer attention layer, which restores the precise all-pairs recall that pure state-space models can lack. Mixture-of-experts is added to some feed-forward positions so the model can grow its total parameter count while only activating a small subset of experts per token, keeping the active compute manageable. The Jamba-1.5 release shipped a Mini variant with 12B active parameters and a Large variant with 94B active parameters, both reaching an effective context window of 256K tokens.

Why the hybrid matters

The motivation is practical: attention is expensive on long inputs because its cost grows with the square of the sequence, while state-space layers stay linear. By making attention the minority and state-space layers the majority, Jamba keeps memory use flat across long contexts. A custom quantization scheme lets the Large model serve 256K-token contexts on a single eight-GPU machine.

Jamba is a leading example of the hybrid approach. For the underlying ideas, see selective state space, hybrid attention, and the state space duality that ties these families together.

In Simple Terms

Jamba is a hybrid large language model that interleaves three building blocks: Mamba state-space layers, transformer attention layers, and mixture-of-experts (MoE) feed-forward layers. Developed by…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners