Definition
Cosine similarity measures how alike two vectors are by computing the cosine of the angle between them. The result ranges from 1 (identical direction), through 0 (orthogonal, unrelated), to -1 (opposite direction). It is the most common scoring function used to rank embeddings in a vector database, because it compares the direction of meaning encoded in a vector while ignoring its magnitude.
Relationship to the dot product
Cosine similarity equals the dot product of two vectors divided by the product of their magnitudes. When the vectors are normalised to unit length, the denominator becomes 1, so cosine similarity reduces to a plain dot product. This is why many vector stores normalise embeddings on insert and then use the cheaper dot product internally: the two are mathematically equivalent for unit vectors, and the dot product is faster to compute at scale.
Why direction beats distance for text
For text, what matters is the overall content or topic, not how "long" the document's vector is. Two passages can express the same idea at different lengths; cosine focuses on the shared direction rather than absolute position, making it well suited to semantic search and document classification. Euclidean distance, by contrast, is sensitive to magnitude and is preferred for some clustering tasks.
A caveat worth knowing: cosine similarity on raw, untuned embeddings can produce misleading results, so models are typically trained with cosine as the objective. When you build a private retrieval pipeline feeding a local LLM, match your similarity metric to how the embedding model was trained.
In Simple Terms
Cosine similarity measures how alike two vectors are by computing the cosine of the angle between them. The result ranges from 1 (identical direction), through…
