Batch Normalization
A technique that normalizes layer inputs across the training batch to have zero mean and unit variance, stabilizing and accelerating neural network training by reducing internal covariate shift.
Batch normalization (BatchNorm) addresses a fundamental training challenge: as weights update during training, the distribution of inputs to each layer shifts, forcing subsequent layers to constantly adapt to a moving target. By normalizing each layer's inputs to a standard distribution, BatchNorm stabilizes training and allows higher learning rates without divergence.
The technique computes the mean and variance of each feature across the current mini-batch, normalizes the values, and then applies learned scale and shift parameters that allow the network to undo the normalization if it is not helpful. During inference, running statistics accumulated during training replace the batch statistics, making predictions deterministic.
In practice, batch normalization enables faster convergence (often 2-5x fewer training steps), allows higher learning rates, reduces sensitivity to weight initialization, and acts as a mild regularizer. However, it has limitations: it depends on batch size (small batches give noisy statistics), it behaves differently during training and inference, and it is not ideal for sequence models. Layer normalization, which normalizes across features rather than the batch dimension, has become the standard in transformers and other sequence models.
Related Terms
RAG (Retrieval-Augmented Generation)
A technique that grounds LLM responses in external data by retrieving relevant documents at query time and injecting them into the prompt context.
Embeddings
Dense vector representations of text, images, or other data that capture semantic meaning in a high-dimensional space, enabling similarity search and clustering.
Vector Database
A specialized database optimized for storing, indexing, and querying high-dimensional vector embeddings with sub-millisecond similarity search.
LLM (Large Language Model)
A neural network trained on massive text corpora that can generate, understand, and transform natural language for tasks like summarization, classification, and conversation.
Fine-Tuning
The process of further training a pre-trained LLM on a domain-specific dataset to specialize its behavior, style, or knowledge for a particular task.
Prompt Engineering
The practice of designing and iterating on LLM input instructions to reliably produce desired outputs for a specific task.