Skip to content

Bitcoin accepted at checkout  |  Ships from Laval, QC, Canada  |  Expert support since 2016

Prefill Phase

Sovereign AI

Definition

The prefill phase is the first of two stages in large language model (LLM) inference. When you send a prompt, the model processes every input token at once, in parallel, computing the attention states for the whole sequence and storing them in the key-value (KV) cache. The prefill phase ends when the first output token is produced. Because all prompt tokens are handled simultaneously, prefill is compute-bound rather than memory-bound: it saturates the GPU's matrix-multiply units and scales roughly with the square of the prompt length.

Why prefill dominates first-response time

Prefill is the work that happens before any text appears, so it largely determines how long a user waits for the model to start replying. A long prompt, a system prompt, retrieved documents, or a large context window all lengthen prefill. On a self-hosted sovereign AI rig, prefill is where a big GPU earns its keep — the parallel arithmetic maps well to the hardware, much as a hashboard's parallel ASIC cores map well to SHA-256 work.

Prefill vs decode

Prefill is fundamentally different from the token-by-token generation that follows. Once the cache is built, each new token only needs the cached states plus one fresh computation, so the second stage behaves very differently. Optimizations like chunked prefill split a long prompt into pieces so the engine can interleave prefill work with ongoing generation, smoothing GPU utilization on a shared box.

Understanding prefill helps you size hardware and read benchmarks honestly when running models on your own terms. See the decode phase for the second stage, and time to first token (TTFT) for the metric prefill drives.

Model your throughput in the inference cost calculator.

In Simple Terms

The prefill phase is the first of two stages in large language model (LLM) inference. When you send a prompt, the model processes every input…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners