Skip to content

Bitcoin accepted at checkout  |  Ships from Laval, QC, Canada  |  Expert support since 2016

Request Scheduling (LLM Serving)

Sovereign AI

Definition

Request scheduling is the logic inside a large language model inference server that decides, at each step, which queued requests enter the running batch, which are preempted, and in what order their prompts are processed. Because GPU memory and compute are finite, the scheduler is what turns a pile of incoming requests into a stream of tokens that respects each user's latency expectations while keeping the hardware busy.

The metrics it balances

Two latencies dominate. Time to first token (TTFT) measures how long a user waits after sending a request before any output appears, and is driven mostly by queueing delay and the prefill of the prompt. Time per output token (TPOT), sometimes called time between tokens, measures the pace of streaming once generation starts. A scheduler tries to honour service-level objectives on both while pushing system throughput as high as possible.

Common policies

Naive first-come-first-served scheduling suffers head-of-line blocking, where one long prompt stalls everyone behind it. Practical servers borrow ideas from operating systems: shortest-job-first ordering of prefills to cut average TTFT, least-slack-time-first to protect deadlines, and chunked prefill that breaks a huge prompt into pieces so it interleaves with ongoing decodes rather than monopolizing the GPU. Multi-priority schemes let latency-sensitive requests jump ahead of best-effort background jobs.

For a sovereign operator hosting models on their own hardware, the scheduler is the lever that converts raw GPU capacity into a predictable, fair service. It works hand in hand with in-flight batching and is constrained by the limits set in token budget and rate limiting.

In Simple Terms

Request scheduling is the logic inside a large language model inference server that decides, at each step, which queued requests enter the running batch, which…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners