Adaptive Experiment
An experiment design that modifies its parameters during execution based on accumulating data, including adjusting traffic allocation between variants, dropping underperforming arms, or modifying the sample size, while maintaining statistical validity through appropriate corrections.
Adaptive experiments depart from the fixed design of traditional A/B tests by allowing the experiment to evolve as data accumulates. The most common adaptations include response-adaptive randomization (shifting more traffic to better-performing variants), sample size re-estimation (adjusting the planned sample size based on interim effect size estimates), and arm dropping (eliminating clearly inferior variants to focus traffic on promising ones). For growth teams, adaptive experiments offer practical advantages: they reduce the exposure of users to inferior experiences, reach conclusions faster when effects are large, and efficiently handle multi-arm tests where some variants are clearly worse than others. The trade-off is increased statistical complexity and the need for careful correction to maintain valid inference.
The most widely used adaptive design in online experimentation is multi-armed bandit optimization, which gradually shifts traffic from underperforming variants to better ones using algorithms like Thompson Sampling, Upper Confidence Bound (UCB), or epsilon-greedy. Thompson Sampling maintains a posterior distribution for each variant's reward rate and assigns each user to the variant with the highest random draw from its posterior, naturally balancing exploration and exploitation. Group sequential designs allow early stopping for efficacy (the treatment is clearly better) or futility (the treatment is unlikely to achieve significance if continued) at pre-planned interim analyses. Sample size re-estimation, using methods like the Chen-DeMets-Lan approach, allows the planned sample size to be increased if the interim effect size is smaller than assumed in the original power calculation, without inflating the Type I error rate.
Adaptive experiments should be used when the cost of assigning users to inferior variants is high (e.g., testing multiple landing page designs where each poor design represents lost conversions), when the experiment has many arms (making fixed allocation inefficient), or when there is genuine uncertainty about the expected effect size that makes fixed sample size planning difficult. Common pitfalls include using bandit algorithms when the goal is precise effect estimation rather than reward maximization (bandits optimize for reward but provide biased effect estimates), not implementing the statistical corrections necessary for valid inference under adaptive designs, making too many adaptations too quickly (before enough data accumulates for reliable decisions), and using adaptive designs as an excuse to avoid proper experiment planning.
Advanced adaptive designs include Bayesian adaptive trials that use predictive probability to guide adaptations, platform trials that allow new arms to be added while the experiment is running (useful for iterating on designs), and response-adaptive randomization with covariate adjustment that personalizes the allocation probability based on user characteristics. The integration of adaptive experiments with machine learning enables sophisticated personalization: rather than finding the single best variant, the system learns which variant is best for which type of user. Experimentation platforms like Statsig offer built-in support for adaptive experiments with appropriate statistical corrections. The key challenge remains balancing the practical benefits of adaptation against the statistical complexity it introduces, and teams should default to fixed designs unless there is a compelling reason for adaptation.
Related Terms
Epsilon-Greedy
A simple exploration-exploitation algorithm used in multi-armed bandit experiments that exploits the current best-performing variant with probability (1-epsilon) and explores by randomly selecting any variant with probability epsilon, where epsilon is typically a small value like 0.1.
Contextual Bandit Experiment
An adaptive experiment that uses user context (features like demographics, behavior history, and session attributes) to personalize which treatment variant each user receives, learning a policy that maps user characteristics to optimal treatments in real time.
Bayesian Optimization
A sequential decision-making framework that uses a probabilistic model of the objective function to efficiently search for the optimal configuration of parameters, balancing exploration of uncertain regions with exploitation of promising areas.
Multivariate Testing
An experimentation method that simultaneously tests multiple variables and their combinations to determine which combination of changes produces the best outcome, unlike A/B testing which typically varies a single element at a time.
Split Testing
The practice of randomly dividing users into two or more groups and exposing each group to a different version of a product experience to measure which version performs better on a target metric, commonly known as A/B testing.
Holdout Testing
An experimental design where a small percentage of users are permanently excluded from receiving a new feature or set of features, serving as a long-term control group to measure the cumulative impact of product changes over time.