Skip to content

Bitcoin accepted at checkout  |  Ships from Laval, QC, Canada  |  Expert support since 2016

Image Patch Embedding

Sovereign AI

Definition

Image patch embedding is the operation that converts a raw image into a sequence of vectors a transformer can read. Instead of feeding pixels directly, the image is divided into a grid of fixed-size patches (commonly 14x14 or 16x16 pixels); each patch is flattened and passed through a linear projection to produce an embedding vector. This patchification is what made the Vision Transformer possible, because it reframes an image as a short sequence of tokens analogous to words in a sentence.

From patches to a sequence

After projection, a positional encoding is added to each patch embedding so the model knows where each patch sat in the original grid, since attention itself is order-agnostic. The resulting sequence then flows through standard transformer blocks. A 224x224 image at a 16-pixel patch size yields 196 patches, so patch size directly controls sequence length, compute cost, and the granularity of detail the model can resolve.

Role in multimodal systems

In a vision-language model, patch embeddings produced by the vision encoder are the raw material that a projector later maps into the language model's space. Smaller patches preserve fine detail but multiply the token count; larger patches are cheaper but blur small features. That trade-off is central to running multimodal models efficiently on owned hardware.

Patch embeddings feed the vision encoder whose outputs become a visual token sequence, later aligned with text through multimodal alignment.

In Simple Terms

Image patch embedding is the operation that converts a raw image into a sequence of vectors a transformer can read. Instead of feeding pixels directly,…

Explore the Full Glossary

Browse all Bitcoin mining terms from A to Z. Whether you are a beginner or expert, deepen your understanding of the mining ecosystem.

Mining Glossary

ASIC Miner Database

Compare 500+ miners with real-time profitability data, home mining scores, and detailed specs.

Compare Miners