Back to glossary

Experiment Review Board

A cross-functional governance body that reviews experiment designs before launch and results before ship decisions, ensuring statistical rigor, alignment with organizational metrics, and prevention of common methodological errors.

An experiment review board (ERB) provides quality assurance for an organization's experimentation program. As experimentation scales beyond a small team of specialists, the risk of methodological errors increases: incorrect power calculations, inappropriate metrics, flawed randomization, peeking without sequential testing, and metric gaming all become more common when many people run experiments with varying levels of statistical sophistication. The ERB provides expert oversight at two critical junctures: before launch (reviewing the experiment design for correctness) and before shipping (reviewing the analysis for validity). For growth teams, an ERB maintains the credibility of the experimentation program by preventing false positives from reaching production and ensuring that experiment designs are set up for success.

An effective ERB reviews several aspects of each experiment design: the hypothesis clarity and falsifiability, the primary metric selection and its alignment with business goals, the power analysis and expected experiment duration, the randomization unit and its appropriateness for the treatment, potential interactions with concurrent experiments, guardrail metrics that protect against negative side effects, and the pre-registered analysis plan including how multiple comparisons will be handled. Post-experiment, the ERB reviews the sample ratio mismatch check, the primary analysis results and confidence intervals, any deviations from the pre-registered plan, segment analyses and their corrections for multiple testing, and the practical significance of the results relative to implementation costs. The ERB typically includes senior data scientists, product leaders, and engineering representatives.

ERBs should be established when the experimentation program reaches a scale where quality control becomes challenging, typically when multiple teams are running experiments independently. The ERB should streamline, not slow down, experimentation by providing quick turnaround reviews (24-48 hours), offering clear guidelines that pre-answer common questions, and using tiered review levels (lightweight review for standard experiments, deep review for high-stakes experiments). Common pitfalls include making the ERB a bottleneck that reduces experiment velocity, staffing it with people who lack statistical expertise, applying a one-size-fits-all review depth regardless of experiment risk, and not providing feedback that helps teams improve their future experiment designs. The ERB should also maintain an experiment knowledge base and publish guidelines that raise the overall level of experimentation literacy in the organization.

Advanced ERB practices include automated pre-launch checks that catch common errors before human review (insufficient power, missing guardrail metrics, overlap with concurrent experiments), standardized experiment scorecards that make review efficient and consistent, post-mortem analysis of experiments that produced surprising results (both positive and negative) to identify systematic biases in the program, and periodic calibration exercises where ERB members independently review the same experiment to ensure consistency. Some organizations have evolved beyond dedicated ERBs to embedded experimentation expertise within each team, supported by automated tooling that enforces standards programmatically. The meta-goal is to build an organizational culture of experimentation rigor that does not depend on a small group of experts.

Related Terms

Experiment Velocity

The rate at which an organization designs, launches, analyzes, and acts on experiments, typically measured as the number of experiments concluded per unit time, reflecting the speed of the organization's learning and iteration cycle.

Experiment Documentation

The systematic recording of experiment hypotheses, designs, configurations, results, and learnings in a structured, searchable format that preserves institutional knowledge and enables evidence-based decision-making across the organization.

Growth Experimentation Framework

A structured organizational process for systematically generating, prioritizing, running, and learning from experiments across the entire user lifecycle, designed to maximize the rate of validated learning and compound the impact of product improvements.

Multivariate Testing

An experimentation method that simultaneously tests multiple variables and their combinations to determine which combination of changes produces the best outcome, unlike A/B testing which typically varies a single element at a time.

Split Testing

The practice of randomly dividing users into two or more groups and exposing each group to a different version of a product experience to measure which version performs better on a target metric, commonly known as A/B testing.

Holdout Testing

An experimental design where a small percentage of users are permanently excluded from receiving a new feature or set of features, serving as a long-term control group to measure the cumulative impact of product changes over time.