Skip to content

Bitcoin accepted at checkout  |  Ships from Laval, QC, Canada  |  Expert support since 2016

Special Tokens

Sovereign AI

Definition

Special tokens are reserved entries in a tokenizer's vocabulary that carry structural or control meaning instead of representing literal text. They tell a language model where a sequence begins and ends, where to stop generating, how to pad a batch, and — in chat models — who is speaking. Without them, a model has no reliable way to distinguish a system instruction from a user message, or to know when to halt.

The common set

The classic trio is BOS (beginning of sequence), EOS (end of sequence), and PAD (padding). BOS signals the start of generation; EOS marks completion and is the cue that stops decoding; PAD fills shorter sequences so every example in a batch has equal length, with an attention mask ensuring the model ignores the filler. Many tokenizers also reserve an UNK (unknown) token, though byte-level schemes rarely need it. Chat-tuned models add role and turn markers such as control tokens that delimit user and assistant turns.

Why correct handling matters

Special tokens are a frequent source of subtle bugs. Adding them twice — for example, applying a chat template that already inserts them and then re-tokenizing with automatic insertion enabled — produces duplicate markers that degrade output quality. Conversely, omitting an EOS during fine-tuning teaches a model never to stop. Anyone running models locally for sovereignty reasons should inspect exactly which special tokens their tokenizer injects.

Special tokens occupy reserved slots in the tokenizer vocabulary and are the building blocks that a chat template arranges into a structured conversation.

In Simple Terms

Special tokens are reserved entries in a tokenizer’s vocabulary that carry structural or control meaning instead of representing literal text. They tell a language model…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners