State Space Model (Mamba)

Sovereign AI

A State Space Model (SSM) is a sequence-modelling approach borrowed from control theory, in which a hidden state evolves over time according to linear dynamics and produces outputs at each step. In deep learning, SSMs have become a competitive alternative to the transformer for long sequences because they can process them in roughly linear time and memory, rather than the quadratic cost of attention. Mamba is the architecture that brought this approach to the forefront for large-scale language modelling, and it is worth understanding for anyone who cares about running capable models on hardware they own.

The core idea: a running summary instead of total recall

A transformer's attention mechanism lets every token look back at every previous token, which is powerful but expensive: cost grows with the square of sequence length, and inference must keep a key-value cache that grows with every token generated. An SSM instead maintains a fixed-size hidden state — a compressed running summary of everything seen so far — and updates it as each token arrives. Processing a sequence therefore costs the same per token whether it is the tenth token or the hundred-thousandth, and generation needs only the state, not a growing cache. The trade-off is that a fixed-size state must be selective about what it keeps, which is exactly the problem Mamba set out to solve.

Selective state spaces

Classic SSMs use fixed transition matrices, which limits their ability to reason about content: they filter sequences the same way regardless of what the tokens actually say. Mamba introduces a selective SSM in which the key matrices governing how the state is updated and read out become functions of the input itself. This input-dependent parameterisation lets the model decide, token by token, what to remember, suppress, or forget — recovering the content-based reasoning that made attention so powerful, but without attention's quadratic blow-up.

Hardware-aware design

Because the selective formulation cannot use the simple convolution shortcut of earlier SSMs, Mamba relies on a hardware-aware parallel associative scan tuned for modern GPUs, keeping training fast and inference efficient. The payoff is strong throughput and the ability to handle very long contexts on a fixed memory budget.

Why self-hosters should watch this space

For local inference, memory is usually the binding constraint: a transformer's KV cache can consume gigabytes of VRAM on long conversations, directly limiting the context window you can afford on consumer hardware. An architecture whose memory footprint stays flat as context grows changes that equation, which is why SSMs and transformer-SSM hybrid models are attractive for sovereign AI setups — the same reason quantization matters: capability per gigabyte on hardware you control. As with any architecture, the practical question is not benchmark elegance but what runs well, today, on your machine; transformers still dominate the open-model ecosystem, with SSM-based and hybrid models arriving as credible challengers rather than incumbents.

For the practically minded self-hoster, the checklist before adopting any SSM or hybrid model is the same as for any architecture: confirm your runtime supports it, since new architectures land in research code long before they are implemented and optimised in the local-inference stacks like llama.cpp and Ollama; check that quantized weights exist for it; and benchmark on your own workload rather than trusting paper numbers. Architectural efficiency only becomes real efficiency once the whole toolchain — kernels, quantization, serving — has caught up, and that lag has historically run months to years.

SSMs sit alongside transformers and recurrent networks in the modern architecture landscape. For the lineage they grew out of, see the Recurrent Neural Network (RNN), and for the broader class of large pretrained systems, the foundation model.

A State Space Model (SSM) is a sequence-modelling approach borrowed from control theory, in which a hidden state evolves over time according to linear dynamics…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners