Skip to content

Bitcoin accepted at checkout  |  Ships from Laval, QC, Canada  |  Expert support since 2016

Throughput vs Latency (LLM Inference)

Sovereign AI

Definition

Throughput versus latency is the central trade-off when serving large language models (LLMs). Latency measures how fast one request is served — time to first token (TTFT) for the first response, and inter-token latency for each token after. Throughput measures total work across the whole system, usually as tokens per second summed over every concurrent request. The two pull in opposite directions: choices that maximize aggregate throughput often make any individual request feel slower, and vice versa.

Why they conflict

Packing more requests into a batch keeps the GPU busy and raises throughput, but each request now shares compute and memory bandwidth, so its own latency rises. The decode phase is memory-bound, so a single request rarely saturates a big GPU on its own — batching is how you reclaim that idle capacity. Continuous batching narrows the gap by swapping finished requests out and new ones in, lifting throughput with a smaller latency penalty than naive static batching.

Choosing a target

Interactive chat is latency-sensitive: a low TTFT and snappy token stream matter more than raw volume. Offline or bulk jobs — summarizing an archive, classifying a corpus — are throughput-sensitive, where total tokens per hour is what counts and per-request delay is irrelevant. Knowing which side you are on tells you how to size hardware and configure batch size.

For sovereign Bitcoiners self-hosting inference, this trade-off decides whether you tune your rig for a responsive personal assistant or a high-volume batch pipeline. See time to first token (TTFT) and batch size (inference).

Balance the two in the inference cost calculator.

In Simple Terms

Throughput versus latency is the central trade-off when serving large language models (LLMs). Latency measures how fast one request is served — time to first…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners