Definition
A token is the basic unit of text a large language model actually processes. Models do not read characters or whole words; a tokenizer first chops text into tokens — usually sub-word fragments — and maps each to a number the model can compute on. As a rough rule of thumb, one token is about three-quarters of an English word, so 1,000 tokens is roughly 750 words, though the exact ratio depends on the language and the tokenizer.
Why everything is measured in tokens
Tokens are the currency of working with LLMs. The context window — the model's working memory — is sized in tokens, shared between your prompt and the model's reply. Generation speed is reported in tokens per second. And cloud APIs bill per token. Understanding tokens is therefore the first step to estimating how much hardware a self-hosted model needs and how much a workload will cost.
Why a model "sees" tokens, not letters
Because the model operates on tokens rather than characters, it can struggle with tasks like counting the letters in a word or reversing a string — the individual characters are hidden inside the token. This is a direct consequence of tokenization, not a flaw in the model's reasoning, and it explains many otherwise-puzzling LLM quirks.
Tokens are produced by the tokenizer and consumed during inference; the rate at which they are generated is your tokens per second.
In Simple Terms
A token is the basic unit of text a large language model actually processes. Models do not read characters or whole words; a tokenizer first…
