Skip to content

Bitcoin accepted at checkout  |  Ships from Laval, QC, Canada  |  Expert support since 2016

Chunking

Sovereign AI

Definition

Chunking is the step in a retrieval pipeline where long documents are split into smaller passages before they are embedded and stored for search. It is one of the most consequential design choices in a retrieval-augmented generation system: chunks that are too large dilute relevance and waste context budget, while chunks that are too small lose the surrounding meaning needed to answer a question. Reports suggest the gap between a good and a poor chunking strategy can swing retrieval accuracy by tens of percent.

Common strategies

Fixed-size chunking splits text every N tokens — simple and fast, but blind to meaning. Recursive chunking splits along natural boundaries (paragraphs, then sentences) and is the usual baseline. Semantic chunking groups text by topic coherence so each chunk represents a single idea. A common practical starting point is roughly 400–512 tokens per chunk with 10–20% overlap, where overlapping a sliding window carries context across boundaries so a fact split between two chunks is not lost.

Why it matters

For a self-hosted knowledge tool — say, answering questions over a library of mining manuals — chunking determines whether the right passage is even retrievable. Good chunks become good embeddings, which produce precise hits at query time.

Each chunk is converted to a vector and stored in a vector database, and chunk size is bounded in practice by the model's context window.

In Simple Terms

Chunking is the step in a retrieval pipeline where long documents are split into smaller passages before they are embedded and stored for search. It…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners