Backpropagation

Backpropagation is the engine that makes training deep neural networks computationally feasible. After a forward pass computes the model's prediction and the loss measures how wrong it is, backpropagation traces back through the network, computing how much each weight contributed to the error using the chain rule of calculus. These gradients then guide weight updates via gradient descent.

The algorithm works by applying the chain rule layer by layer, from output to input. Each layer computes how its inputs affected its outputs (the local gradient) and multiplies by the gradient flowing in from above. This recursive multiplication efficiently decomposes the end-to-end gradient computation into simple per-layer operations that can be parallelized on GPUs.

The practical challenges of backpropagation include vanishing gradients (gradients shrink to near-zero in deep networks, preventing early layers from learning) and exploding gradients (gradients grow uncontrollably, destabilizing training). Solutions like residual connections (skip connections), careful weight initialization, gradient clipping, and normalization layers (batch norm, layer norm) have made backpropagation reliable even in networks with hundreds of layers. These are now standard components of modern architectures.

Related Terms

RAG (Retrieval-Augmented Generation)

Embeddings

Vector Database

LLM (Large Language Model)

Fine-Tuning

Prompt Engineering