Bias-Variance Tradeoff

The bias-variance tradeoff is the central conceptual framework for understanding model errors. Bias measures how far off the model's average predictions are from the true values (systematic error). Variance measures how much the predictions change across different training sets (instability). Total error is decomposed as bias squared plus variance plus irreducible noise.

Simple models (linear regression, shallow trees) have high bias but low variance: they consistently make the same mistakes regardless of training data because they cannot represent complex patterns. Complex models (deep networks, large ensembles) have low bias but high variance: they can represent any pattern but may also fit noise, producing different predictions from different training sets.

The practical goal is finding the sweet spot where total error is minimized. Strategies to reduce bias include increasing model complexity, adding more features, and using more flexible architectures. Strategies to reduce variance include regularization, dropout, ensemble methods, and more training data. For production ML, this tradeoff guides model selection: start simple, measure performance, and increase complexity only when bias is clearly the bottleneck rather than variance.

Related Terms

RAG (Retrieval-Augmented Generation)

Embeddings

Vector Database

LLM (Large Language Model)

Fine-Tuning

Prompt Engineering