Definition
An image embedding is a fixed-length vector of numbers that encodes the visual content of an image. A neural network, usually a vision encoder, compresses the picture's pixels into a high-dimensional vector where the geometry carries meaning: images that look or mean something similar land close together, while unrelated images sit far apart. This turns the fuzzy problem of "are these two pictures alike?" into a concrete math operation on vectors.
Measuring similarity
Once images are embedded, you compare them with a distance metric, most commonly cosine similarity or Euclidean (L2) distance. This is the engine behind reverse image search, near-duplicate detection, content moderation, and visual recommendation. Embeddings produced by a model like CLIP live in the same space as text embeddings, so you can search a photo library with a typed phrase.
Self-hosted visual search
Image embeddings let you build a private visual search index entirely on your own hardware. You run an open-weight encoder locally, store the resulting vectors in a vector database, and query them without sending a single image to an outside service, an attractive property for anyone curating a sensitive or personal collection.
Image embeddings are the visual cousin of the text vectors covered in our embeddings entry, and they are typically produced by a vision encoder.
In Simple Terms
An image embedding is a fixed-length vector of numbers that encodes the visual content of an image. A neural network, usually a vision encoder, compresses…
