Definition
Chunking is the step in a retrieval pipeline where long documents are split into smaller passages before they are embedded and stored for search. It is one of the most consequential design choices in a retrieval-augmented generation system: chunks that are too large dilute relevance and waste context budget, while chunks that are too small lose the surrounding meaning needed to answer a question. Reports suggest the gap between a good and a poor chunking strategy can swing retrieval accuracy by tens of percent.
Common strategies
Fixed-size chunking splits text every N tokens — simple and fast, but blind to meaning. Recursive chunking splits along natural boundaries (paragraphs, then sentences) and is the usual baseline. Semantic chunking groups text by topic coherence so each chunk represents a single idea. A common practical starting point is roughly 400–512 tokens per chunk with 10–20% overlap, where overlapping a sliding window carries context across boundaries so a fact split between two chunks is not lost.
Why it matters
For a self-hosted knowledge tool — say, answering questions over a library of mining manuals — chunking determines whether the right passage is even retrievable. Good chunks become good embeddings, which produce precise hits at query time.
Each chunk is converted to a vector and stored in a vector database, and chunk size is bounded in practice by the model's context window.
In Simple Terms
Chunking is the step in a retrieval pipeline where long documents are split into smaller passages before they are embedded and stored for search. It…
