Tokens

Sovereign AI

Tokens are the basic units of text a large language model actually processes. Models do not read characters or whole words; a tokenizer first chops text into tokens — usually sub-word fragments — and maps each to a number the model can compute on. As a rough rule of thumb, one token is about three-quarters of an English word, so 1,000 tokens is roughly 750 words, though the exact ratio depends on the language, the subject matter, and the specific tokenizer. Everything a model reads and everything it writes is, from the model's perspective, a sequence of these integers and nothing else.

Why everything is measured in tokens

Tokens are the currency of working with LLMs. The context window — the model's working memory — is sized in tokens and is shared between your prompt, any documents you paste in, and the model's own reply. Generation speed is reported in tokens per second, the figure that determines whether a local model feels conversational or glacial. Cloud APIs bill per token, in and out. And model quality claims are benchmarked against context lengths measured in tokens. For a self-hoster, tokens are the unit that connects all the hardware questions: how much text fits in memory, how fast responses stream, and how large a document the model can actually consider at once.

Why a model "sees" tokens, not letters

Because the model operates on tokens rather than characters, it can struggle with tasks that seem trivial — counting the letters in a word, reversing a string, reasoning about spelling — because the individual characters are hidden inside the token's single integer ID. The word "mining" may be one token; the model never "sees" its six letters. This is a direct consequence of tokenization, not a flaw in the model's reasoning, and it explains a whole family of otherwise-puzzling LLM quirks, including why unusual names get mangled and why arithmetic on long numbers goes astray: digits get grouped into tokens in inconsistent chunks. Common words earn their own single token; rare or technical words shatter into several fragments, which also means jargon-dense text consumes budget faster than plain prose.

Tokens and the self-hosting budget

Running models locally makes token awareness concrete. Every token in the context must be represented in memory during inference — the KV cache grows with context length, competing for the same VRAM the model weights occupy — so a machine that comfortably runs a model at 4K context may choke at 32K. Prompt length affects response latency too, since the model must process every input token before producing its first output token. Practical habits follow: trim boilerplate from prompts, summarize long histories instead of resending them, and when using RAG, retrieve the few passages that matter rather than stuffing the window. A sovereign AI stack is ultimately provisioned in tokens: the window you can afford, at the speed you can tolerate, on the metal you own.

The pipeline in one line

Text goes in, the tokenizer converts it to token IDs, the model predicts the next token repeatedly, and the tokenizer converts the result back to text. Master that single loop — and its costs — and most LLM behaviour, pricing, and hardware sizing stops being mysterious.

Tokens are the basic units of text a large language model actually processes. Models do not read characters or whole words; a tokenizer first chops…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners