Definition
Reranking is a precision step in retrieval pipelines that re-orders an initial set of candidate documents by how relevant each one actually is to the query. It sits between broad retrieval and answer generation, sharpening the results that a fast first-stage search returned. In retrieval-augmented generation, reranking is what stops a generator from being fed loosely related context.
Two-stage retrieval
Production retrieval typically runs in two stages. Stage one retrieves broadly — a vector or hybrid search pulls 50 to 100 candidates optimized for recall, so nothing relevant is missed. Stage two ranks precisely — a reranker scores each candidate against the specific query and reorders them. The reason both stages exist is that embedding models used for first-stage search are trained for scalable similarity, not fine-grained ranking, so a cheap broad sweep followed by an expensive precise pass is the efficient design.
Cross-encoders
The classic reranker is a cross-encoder: it feeds the query and a candidate document into the model together, so every query token can attend to every document token, and outputs a single relevance score. This joint processing captures relationships a static embedding misses, which is why cross-encoders typically lift ranking accuracy well above bi-encoder retrieval — at the cost of running the model once per candidate, which is why it is only applied to the shortlist.
D-Central documents reranking as the quality multiplier on top of semantic search. Running both stages locally keeps a private knowledge base self-hosted while still delivering precise retrieval.
Cost a rerank pass in the inference cost calculator.
In Simple Terms
Reranking is a precision step in retrieval pipelines that re-orders an initial set of candidate documents by how relevant each one actually is to the…
