Gradient Boosting

Gradient boosting constructs an additive model in stages. The first model makes predictions, and subsequent models are trained on the residual errors (what the current ensemble gets wrong). Each new model focuses on the hardest examples, gradually chipping away at the remaining error. The final prediction is the sum of all models' contributions, weighted by a learning rate that controls each model's impact.

The dominant implementations are XGBoost, LightGBM, and CatBoost. XGBoost popularized regularized boosting and remains widely used. LightGBM introduced histogram-based splitting for faster training on large datasets. CatBoost handles categorical features natively and reduces overfitting through ordered boosting. All three consistently top leaderboards for tabular data problems.

For production ML on structured data, gradient boosting is the go-to algorithm. It achieves state-of-the-art performance on most tabular datasets, trains efficiently on CPUs (no GPU required), produces interpretable models through feature importance and SHAP values, and integrates easily with production pipelines. For growth applications like propensity modeling, customer scoring, and demand forecasting, gradient boosting typically outperforms both simpler methods and deep learning on structured data.

Related Terms

RAG (Retrieval-Augmented Generation)

Embeddings

Vector Database

LLM (Large Language Model)

Fine-Tuning

Prompt Engineering