Skip to content

Bitcoin accepted at checkout  |  Ships from Laval, QC, Canada  |  Expert support since 2016

Sentence Embedding

Sovereign AI

Definition

A sentence embedding is a single fixed-length vector that represents the meaning of an entire sentence or short passage, rather than one vector per word. It lets you compare two texts directly with a similarity metric such as cosine similarity, which is the foundation of semantic search, clustering, and deduplication. Sentence embeddings are the unit that gets stored and indexed in a vector database.

Why not just average word vectors

An early approach was to average the word vectors in a sentence, but this loses word order and context and performs poorly. Modern sentence embedding models, popularised by Sentence-BERT (SBERT), instead fine-tune a transformer on sentence pairs using a siamese (twin-tower) architecture and apply mean pooling over the final token representations. The original SBERT work found mean pooling outperformed both max pooling and using the [CLS] token, and the result consistently beats averaged word embeddings on semantic similarity benchmarks.

How they fit a retrieval pipeline

You split documents with a chunking strategy, run each chunk through the embedding model to get a sentence (or passage) embedding, and store the vectors. At query time you embed the question the same way and retrieve the nearest stored vectors. Keeping the model and the index local means the meaning of your private documents never leaves your machine.

Sentence embeddings are the connective tissue of a self-hosted RAG stack: they turn raw text into the searchable embeddings a local LLM retrieves from.

See what fits your VRAM in the GPU–LLM fit dataset.

In Simple Terms

A sentence embedding is a single fixed-length vector that represents the meaning of an entire sentence or short passage, rather than one vector per word.…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners