Definition
A Convolutional Neural Network (CNN) is a class of deep neural network that uses convolutional layers to filter inputs for spatially local patterns. CNNs are the workhorse architecture of computer vision, powering image classification, object detection, and segmentation, and they also appear in audio and signal-processing tasks.
Convolution and feature maps
The core operation is convolution: a small matrix of learnable weights, called a kernel or filter, slides across the input image and computes a dot product at each position. The result is a feature map highlighting where a particular pattern appears. A single convolutional layer applies many filters in parallel, producing a stack of feature maps. Pooling layers then downsample these maps to reduce dimensionality and add a degree of translation invariance.
Hierarchical feature extraction
Stacking convolutional and pooling layers builds a hierarchy of features. Early layers learn simple primitives such as edges and corners; deeper layers combine those into textures, object parts, and eventually whole objects. Because filters are reused across the entire image (weight sharing), CNNs use far fewer parameters than a fully connected network of comparable reach, which makes them efficient to train and to run for inference — a meaningful advantage when self-hosting vision models on modest hardware.
While transformer-based vision models have grown popular, CNNs remain dominant for many efficient, on-device vision workloads. For sequence-oriented architectures, contrast the CNN with the Recurrent Neural Network (RNN), and see how convolutional features feed into a multimodal AI model.
In Simple Terms
A Convolutional Neural Network (CNN) is a class of deep neural network that uses convolutional layers to filter inputs for spatially local patterns. CNNs are…
