Back to glossary

Thompson Sampling

A Bayesian bandit algorithm that selects actions by sampling from posterior probability distributions of each option's reward, naturally balancing exploration and exploitation as uncertainty decreases.

Thompson Sampling maintains a probability distribution (typically Beta distribution for binary outcomes) over the expected reward for each variant. At each decision point, it samples a value from each variant's distribution and selects the variant with the highest sampled value. Early on, when distributions are wide (high uncertainty), exploration happens naturally. As data accumulates and distributions narrow, the algorithm increasingly exploits the best variant.

The elegance of Thompson Sampling lies in its principled uncertainty handling. A variant with limited data has a wide distribution, so it occasionally samples high values and gets explored. A variant with strong evidence of being best has a narrow, high distribution and is selected most of the time. No manual tuning of exploration rates is needed.

For AI-powered personalization, Thompson Sampling is widely used to optimize content recommendations, email send times, notification strategies, and pricing. It adapts quickly to changing user preferences and handles the cold-start problem gracefully by maintaining appropriate uncertainty for new options. Its Bayesian foundation also provides natural confidence intervals for reporting.

Related Terms