Skip to content

Bitcoin accepted at checkout  |  Ships from Laval, QC, Canada  |  Expert support since 2016

Prompt Caching

Sovereign AI

Definition

Prompt caching lets an inference engine store a frequently reused prompt prefix (system instructions, tool definitions, long documents, or examples) so that subsequent requests read from the cache instead of reprocessing the same tokens. When a new request shares a cached prefix up to a defined breakpoint, the model reuses the stored computation, reducing both latency and cost on the cached portion.

How the economics work

On hosted APIs that support it (for example Anthropic's Claude API with the cache_control ephemeral marker), writing to the cache costs slightly more than a normal input token, but every later request that hits the cache within the time-to-live pays only a small fraction (around a tenth) of the standard input price. The default ephemeral cache typically lasts about five minutes, with longer durations available at higher write cost. In practice this can cut input token spend by 70-90% for workloads with a large static context and a small changing tail.

Caching in self-hosted inference

The same idea exists locally. A local LLM server reuses the KV cache for a shared prefix so it does not recompute attention over tokens it has already seen this session, which is the on-device equivalent of prompt caching and a key reason long, stable context windows stay affordable.

For sovereign operators running their own inference, structuring prompts as a stable cached prefix plus a small dynamic suffix is the single highest-leverage cost optimisation, and it keeps all data on your own hardware. It pairs naturally with a RAG pipeline where the retrieved context changes but the instructions do not.

In Simple Terms

Prompt caching lets an inference engine store a frequently reused prompt prefix (system instructions, tool definitions, long documents, or examples) so that subsequent requests read…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners