Back to glossary

Contextual Bandit

A machine learning framework that makes personalization decisions by balancing exploitation of known preferences with exploration of uncertain options, using contextual features about the user and situation to optimize actions.

Contextual bandits extend the classic multi-armed bandit problem by incorporating contextual information about the user and situation when deciding which action to take. For each decision point, the system observes context features, selects an action from available options, and receives a reward signal. Over time, it learns which actions work best in which contexts while continuing to explore uncertain options.

For growth teams, contextual bandits are ideal for personalization problems where you need to continuously learn and adapt: which homepage layout to show, which email subject line to use, which onboarding flow to present, or which product to feature. Unlike A/B tests that run for a fixed period, bandits continuously shift traffic toward better-performing variants while maintaining exploration. AI-powered bandit systems can handle thousands of contextual features and hundreds of actions simultaneously. Growth engineers should implement contextual bandits for decisions that are made frequently, have clear reward signals, and benefit from personalization. The key advantage over traditional A/B testing is efficiency, as bandits minimize regret by reducing exposure to underperforming variants faster, and they naturally handle the personalization case where the best option varies by user context.

Related Terms