Regression Discontinuity
A quasi-experimental design that exploits a sharp cutoff in a continuous assignment variable to estimate causal effects, comparing units just above and just below the threshold where treatment assignment changes discontinuously.
Regression discontinuity (RD) design leverages situations where treatment is assigned based on whether a continuous variable crosses a threshold. For example, users who score above a certain engagement threshold might receive a premium feature, or customers whose spending exceeds a cutoff might enter a loyalty tier. The key insight is that units just above and just below the cutoff are nearly identical in all respects except their treatment status, creating a local randomized experiment around the threshold. For growth teams, RD is valuable for evaluating policies and features that are triggered by user characteristics crossing thresholds, such as credit limits, loyalty tiers, algorithmic scores, or usage-based feature access.
The RD analysis plots the outcome variable against the running variable (the continuous assignment variable) and looks for a discontinuous jump at the cutoff. The treatment effect is estimated as the difference in the regression functions at the threshold: tau = lim(x->c+) E[Y|X=x] - lim(x->c-) E[Y|X=x], where c is the cutoff. In practice, this is estimated by fitting local polynomial regressions on each side of the cutoff using data within a bandwidth around the threshold. The choice of bandwidth involves a bias-variance tradeoff: narrower bandwidths reduce bias by using more comparable units but increase variance by using fewer observations. Optimal bandwidth selection methods, like those implemented in the rdrobust R and Stata packages, balance this tradeoff using mean squared error minimization. The regression model for a sharp RD is: Y_i = alpha + beta*D_i + f(X_i - c) + g(X_i - c)*D_i + epsilon_i, where D_i = 1(X_i >= c) and f and g are flexible functions of the centered running variable.
RD should be used when treatment assignment is determined by a known, measurable cutoff in a continuous variable and when units cannot precisely manipulate their position relative to the cutoff. The main assumption is continuity: all other factors affecting the outcome vary smoothly through the cutoff, so any discontinuity in the outcome is attributable to the treatment. This assumption is violated if units can precisely manipulate the running variable to sort above or below the threshold (e.g., if users can see their score and game the system to exceed the cutoff). McCrary's density test checks for manipulation by testing whether the density of the running variable is continuous at the cutoff; a jump in density suggests sorting. Other common pitfalls include using a bandwidth that is too wide (introducing bias from non-comparable units), not checking sensitivity to bandwidth choice, and over-extrapolating the local treatment effect to the full population.
Advanced RD concepts include fuzzy RD, where crossing the threshold increases the probability of treatment but does not guarantee it (e.g., being above the cutoff makes users eligible for a feature but not all eligible users activate it). Fuzzy RD uses the cutoff as an instrumental variable for treatment, estimating a local average treatment effect for compliers. Geographic RD exploits spatial boundaries, such as the border between regions with different policies. Regression kink design (RKD) looks for changes in the slope rather than the level of the outcome-running variable relationship, useful when the treatment intensity changes at the kink point rather than the treatment status. For digital experimentation, RD can be combined with other methods: a change in an algorithmic scoring threshold creates an RD opportunity, while the algorithm itself can be evaluated with an A/B test, providing complementary evidence about the system's impact.
Related Terms
Difference-in-Differences
A quasi-experimental statistical method that estimates a treatment effect by comparing the change in outcomes over time between a group that receives a treatment and a group that does not, removing biases from time-invariant differences between groups and common time trends.
Propensity Score Matching
A statistical method that reduces selection bias in observational studies by matching treated and untreated units that have similar probabilities (propensity scores) of receiving the treatment, creating a pseudo-randomized comparison.
Pre-Post Analysis
A quasi-experimental method that compares metrics before and after a treatment is applied to the same group, using the pre-treatment period as a baseline to estimate the treatment effect when a randomized control group is not available.
Multivariate Testing
An experimentation method that simultaneously tests multiple variables and their combinations to determine which combination of changes produces the best outcome, unlike A/B testing which typically varies a single element at a time.
Split Testing
The practice of randomly dividing users into two or more groups and exposing each group to a different version of a product experience to measure which version performs better on a target metric, commonly known as A/B testing.
Holdout Testing
An experimental design where a small percentage of users are permanently excluded from receiving a new feature or set of features, serving as a long-term control group to measure the cumulative impact of product changes over time.