Marketplace Experiment
An experiment conducted in a two-sided or multi-sided marketplace where treatment effects can propagate between buyer and seller sides, requiring specialized experimental designs that account for cross-side interference and equilibrium effects.
Marketplace experiments address the unique challenges of testing changes in platforms that connect two or more participant types: buyers and sellers, riders and drivers, hosts and guests, creators and consumers. The fundamental challenge is that these sides interact through the marketplace mechanism, creating interference that violates the independence assumptions of standard A/B testing. If an algorithm change improves matching for treatment buyers, it may simultaneously worsen matching for control buyers who compete for the same sellers. For growth teams at marketplace companies like Uber, Airbnb, DoorDash, Etsy, and eBay, marketplace experimentation requires specialized techniques that account for these cross-side dynamics and equilibrium effects.
Marketplace experiments can use several design strategies depending on the nature of the treatment. For changes that affect one side independently (e.g., a UI change for buyers that does not affect what sellers see), standard user-level A/B testing on that side may be appropriate if the treatment does not significantly alter the competitive dynamics between treatment and control users. For changes that affect the matching or transaction mechanism (pricing algorithms, search ranking, recommendation engines), geo-randomization assigns entire markets (cities, regions) to variants, ensuring that all participants within a market experience the same treatment. Switchback designs alternate the entire marketplace between treatment and control over time periods. Two-sided randomization simultaneously varies the experience for both sides using a factorial design: some buyers get treatment with some sellers getting treatment, creating four combinations that allow estimation of direct and cross-side effects. The analysis must account for the unique variance structure of marketplace metrics, where outcomes depend on the interaction between both sides.
Marketplace experiments should be designed with careful consideration of the interference structure. The key question is: does the treatment for one user affect outcomes for other users, and if so, through what mechanism? For a seller-side ranking change, competition between sellers means treatment sellers' improved ranking comes at the expense of control sellers. For a buyer-side pricing change, the demand response affects the equilibrium supply available to all buyers. Common pitfalls include ignoring cross-side effects (which biases the treatment effect estimate), using markets that are too small for adequate within-market sample size, failing to account for market-level confounders in geo-experiments, and not considering general equilibrium effects where the treatment changes the market dynamics in ways that the partial equilibrium experiment does not capture.
Advanced marketplace experimentation includes budget-balanced designs that ensure the total demand or supply remains constant while varying how it is distributed, shadow pricing experiments that compute alternative prices but do not execute them to estimate the price sensitivity without affecting the market, and structural model-based experiments that combine experimental variation with a structural model of marketplace dynamics to predict the general equilibrium impact of treatment at full scale. Regression discontinuity designs can be applied at marketplace boundaries where treatment rules change discontinuously. Interference-aware estimators developed by researchers at Uber, Lyft, and DoorDash account for within-market competition effects using structural or reduced-form models of the interference mechanism. The rapidly growing literature on marketplace experimentation reflects the economic importance and methodological challenge of testing changes in interconnected market systems.
Related Terms
Switchback Testing
An experimental design that alternates between treatment and control conditions over time periods within the same unit (such as a geographic region or marketplace), used when user-level randomization is not feasible due to interference or operational constraints.
Cluster Randomization
An experimental design that randomly assigns groups (clusters) of users rather than individual users to treatment conditions, used when individual randomization is not feasible or when interference between users within the same cluster would violate independence assumptions.
Network Effect Experiment
An experiment designed to measure and optimize features that become more valuable as more users adopt them, addressing the unique challenges of testing network-dependent features where individual user value depends on the behavior and adoption of other users.
Multivariate Testing
An experimentation method that simultaneously tests multiple variables and their combinations to determine which combination of changes produces the best outcome, unlike A/B testing which typically varies a single element at a time.
Split Testing
The practice of randomly dividing users into two or more groups and exposing each group to a different version of a product experience to measure which version performs better on a target metric, commonly known as A/B testing.
Holdout Testing
An experimental design where a small percentage of users are permanently excluded from receiving a new feature or set of features, serving as a long-term control group to measure the cumulative impact of product changes over time.