Skip to content

Bitcoin accepted at checkout  |  Ships from Laval, QC, Canada  |  Expert support since 2016

Continuous Batching

Sovereign AI

Definition

Continuous batching is a scheduling technique for serving large language models that dramatically improves how many requests a GPU can handle at once. Traditional (static) batching waits for every request in a batch to finish before starting the next batch, which wastes GPU time because requests finish at different lengths. Continuous batching instead makes its decisions at each generation step — also called iteration-level scheduling — admitting new requests into the active batch as soon as a slot frees up and retiring completed ones immediately. The GPU rarely sits idle.

Why throughput jumps

Autoregressive generation leaves modern GPUs underutilised when handling one request at a time. By continuously packing in-flight requests together, continuous batching keeps GPU occupancy high during decoding. Serving engines built around it report large throughput gains — on the order of 10-20x over naive batching in published benchmarks — while also lowering tail latency under bursty load.

Relevance to self-hosting

If you run a local model that serves more than one user or application — a household, a small team, or several agents — continuous batching is what lets a single GPU keep up without buying more hardware. It is a core feature of popular open-source inference servers and usually works automatically once enabled. This is the practical, modern form of what is loosely called 'batch inference.'

It composes with other serving optimisations such as flash attention and speculative decoding; all of them serve faster, more efficient local inference.

In Simple Terms

Continuous batching is a scheduling technique for serving large language models that dramatically improves how many requests a GPU can handle at once. Traditional (static)…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners