Definition
ControlNet is a neural network architecture, introduced by Lvmin Zhang and collaborators in 2023, that adds precise spatial control to large pre-trained text-to-image diffusion models. A plain text prompt tells a diffusion model what to draw but not where to place it. ControlNet supplies that missing spatial guidance by conditioning generation on a structured input image, such as a Canny edge map, a depth map, a human-pose skeleton, or a segmentation mask.
How it works
ControlNet clones the encoder blocks of a frozen diffusion model into a trainable copy and connects the two through "zero convolution" layers whose weights start at zero. Because those connections begin as zeros, training starts as a harmless no-op and gradually learns the new conditioning without corrupting what the base model already knows. This design lets builders attach control to a model like Stable Diffusion while preserving its original generation quality.
Why it is useful
ControlNet turns a generic image generator into a controllable tool: a sketch becomes a finished render, a pose reference dictates a character's stance, and a depth map keeps geometry consistent. Multiple ControlNets can be stacked for layered control. Because the trained control adapters are small relative to the base model, the community shares them freely.
ControlNet builds directly on the diffusion model it conditions. For the broader category these tools belong to, see our entry on the multimodal model.
In Simple Terms
ControlNet is a neural network architecture, introduced by Lvmin Zhang and collaborators in 2023, that adds precise spatial control to large pre-trained text-to-image diffusion models.…
