Difference-in-Differences
A quasi-experimental statistical method that estimates a treatment effect by comparing the change in outcomes over time between a group that receives a treatment and a group that does not, removing biases from time-invariant differences between groups and common time trends.
Difference-in-differences (DiD) is one of the most widely used causal inference methods when randomized experiments are not feasible. The core idea is elegant: compare two groups across two time periods. If the treatment group was already different from the control group before the treatment (selection bias) and if both groups would have followed the same trend absent the treatment (the parallel trends assumption), then the difference in their changes over time removes both the selection bias and the common time trend, isolating the treatment effect. For growth and advertising teams, DiD is invaluable for evaluating geo-targeted campaigns, market-level rollouts, policy changes, and any intervention that is applied to identifiable groups rather than randomly assigned individuals.
The DiD estimator is calculated as: tau = (Y_treatment_post - Y_treatment_pre) - (Y_control_post - Y_control_pre). In regression form: Y_it = beta_0 + beta_1*Treat_i + beta_2*Post_t + beta_3*(Treat_i * Post_t) + epsilon_it, where beta_3 is the DiD estimate of the treatment effect. Treat_i indicates membership in the treatment group, Post_t indicates the post-treatment period, and their interaction captures the differential change. Standard errors should be clustered at the group level to account for within-group correlation over time. With multiple pre and post periods, the regression extends to include time fixed effects and group fixed effects, and the treatment indicator captures the staggered adoption of treatment across groups. Tools for DiD analysis include the R packages did, fixest, and the Python library differences, as well as general regression tools in any statistical software.
DiD should be used when you have a natural comparison group that did not receive the treatment and pre-treatment data for both groups. The critical assumption is parallel trends: absent the treatment, both groups would have followed the same trajectory. This assumption is untestable (since we cannot observe the counterfactual) but can be assessed by examining whether the groups followed parallel trends in the pre-treatment period. If pre-treatment trends diverge, the DiD estimate is biased. Common pitfalls include having too few treated or control units for reliable inference (clustered standard errors require at least 20-30 clusters), violating parallel trends due to differential pre-existing trends, and ignoring anticipation effects where the treatment group changes behavior before the official treatment date. When parallel trends are questionable, alternative methods include synthetic control (which constructs a data-driven comparison group) and changes-in-changes (which relaxes the parallel trends assumption to parallel quantile trends).
Advanced DiD methods have seen rapid development in recent years. Staggered DiD, where different units adopt treatment at different times, introduces complications because the standard two-way fixed effects estimator can produce biased estimates when treatment effects vary over time. Recent econometric research by Callaway and Sant'Anna, Sun and Abraham, and others provides corrected estimators for staggered adoption settings. Event study designs extend DiD to trace out the treatment effect dynamics over time, plotting the differential change for each period relative to treatment adoption. For advertising and marketing applications, DiD is commonly used in geo-experiments where campaigns are launched in some cities or regions but not others, with platforms like GeoLift (from Meta) providing end-to-end tooling for designing, analyzing, and interpreting geographic DiD experiments.
Related Terms
Synthetic Control
A causal inference method that constructs a weighted combination of untreated units to create an artificial control group that closely matches the treated unit's pre-treatment characteristics and trajectory, enabling credible treatment effect estimation when only one or a few units are treated.
Pre-Post Analysis
A quasi-experimental method that compares metrics before and after a treatment is applied to the same group, using the pre-treatment period as a baseline to estimate the treatment effect when a randomized control group is not available.
Regression Discontinuity
A quasi-experimental design that exploits a sharp cutoff in a continuous assignment variable to estimate causal effects, comparing units just above and just below the threshold where treatment assignment changes discontinuously.
Multivariate Testing
An experimentation method that simultaneously tests multiple variables and their combinations to determine which combination of changes produces the best outcome, unlike A/B testing which typically varies a single element at a time.
Split Testing
The practice of randomly dividing users into two or more groups and exposing each group to a different version of a product experience to measure which version performs better on a target metric, commonly known as A/B testing.
Holdout Testing
An experimental design where a small percentage of users are permanently excluded from receiving a new feature or set of features, serving as a long-term control group to measure the cumulative impact of product changes over time.