Back to glossary

Backpropagation

The algorithm that efficiently computes gradients of the loss function with respect to every weight in a neural network by propagating error signals backward from the output layer to the input layer.

Backpropagation is the engine that makes training deep neural networks computationally feasible. After a forward pass computes the model's prediction and the loss measures how wrong it is, backpropagation traces back through the network, computing how much each weight contributed to the error using the chain rule of calculus. These gradients then guide weight updates via gradient descent.

The algorithm works by applying the chain rule layer by layer, from output to input. Each layer computes how its inputs affected its outputs (the local gradient) and multiplies by the gradient flowing in from above. This recursive multiplication efficiently decomposes the end-to-end gradient computation into simple per-layer operations that can be parallelized on GPUs.

The practical challenges of backpropagation include vanishing gradients (gradients shrink to near-zero in deep networks, preventing early layers from learning) and exploding gradients (gradients grow uncontrollably, destabilizing training). Solutions like residual connections (skip connections), careful weight initialization, gradient clipping, and normalization layers (batch norm, layer norm) have made backpropagation reliable even in networks with hundreds of layers. These are now standard components of modern architectures.

Related Terms