Definition
Okapi BM25 ("Best Matching 25") is a probabilistic ranking function that scores how relevant a document is to a keyword query. Developed by Stephen Robertson and Karen Sparck Jones around the Okapi information retrieval system, it remains the default sparse-retrieval baseline in modern search engines and the keyword half of most hybrid search stacks. Unlike dense vector methods, BM25 matches literal terms, making it strong on exact phrases, product codes, and rare technical jargon.
The three ingredients
BM25 combines three signals. Term frequency (TF) rewards documents that use a query term more often, but with diminishing returns: a saturation function means the tenth occurrence adds far less than the second. Inverse document frequency (IDF) rewards rare terms, so a distinctive word like a chip part number counts more than a common stopword. Document-length normalisation penalises long documents so they do not win simply by containing more words. Together these produce a relevance score that is more robust than classic TF-IDF.
Where it fits
BM25 needs no model training, no GPU, and no embeddings; it runs on a plain inverted index, which makes it cheap and fully transparent for a self-hosted search box. Its blind spot is paraphrase: it cannot match "car" to "automobile" because it sees only tokens, not meaning.
That weakness is exactly why operators pair it with vector semantic search and merge the lists with Reciprocal Rank Fusion, giving a private RAG pipeline both literal precision and conceptual recall.
In Simple Terms
Okapi BM25 (“Best Matching 25”) is a probabilistic ranking function that scores how relevant a document is to a keyword query. Developed by Stephen Robertson…
