Dropout
A regularization technique that randomly deactivates a fraction of neurons during each training step, forcing the network to learn redundant representations and preventing over-reliance on any single neuron.
Dropout is one of the simplest and most effective regularization methods. During each training forward pass, each neuron has a probability p (commonly 0.1-0.5) of being temporarily set to zero. This forces the remaining neurons to compensate, learning distributed representations where no single neuron is critical. At inference time, all neurons are active but outputs are scaled by (1-p) to maintain consistent magnitudes.
The intuition is that dropout trains an implicit ensemble of exponentially many sub-networks. Each training step uses a different random subset of neurons, and the final model approximates the average of all these sub-networks. This ensemble effect reduces overfitting because individual sub-networks may memorize different noise patterns, but their average captures the true signal.
In modern architectures, dropout is most commonly applied to attention weights and feed-forward layers in transformers, and to fully connected layers in other networks. The dropout rate is a key hyperparameter: too low provides insufficient regularization, while too high (above 0.5) can prevent the network from learning. For production models, dropout provides a free and reliable way to improve generalization, especially when training data is limited relative to model capacity.
Related Terms
RAG (Retrieval-Augmented Generation)
A technique that grounds LLM responses in external data by retrieving relevant documents at query time and injecting them into the prompt context.
Embeddings
Dense vector representations of text, images, or other data that capture semantic meaning in a high-dimensional space, enabling similarity search and clustering.
Vector Database
A specialized database optimized for storing, indexing, and querying high-dimensional vector embeddings with sub-millisecond similarity search.
LLM (Large Language Model)
A neural network trained on massive text corpora that can generate, understand, and transform natural language for tasks like summarization, classification, and conversation.
Fine-Tuning
The process of further training a pre-trained LLM on a domain-specific dataset to specialize its behavior, style, or knowledge for a particular task.
Prompt Engineering
The practice of designing and iterating on LLM input instructions to reliably produce desired outputs for a specific task.