Statistical methodology for frequentist experiments
This guide includes advanced concepts
This section includes an explanation of advanced statistical concepts. We provide them for informational purposes, but you do not need to understand these concepts to use Experimentation.
Overview
This guide explains the statistical methodology LaunchDarkly uses to calculate frequentist experiment variation means, and how these analytics formulas are useful for validating your results.
For a high-level overview of frequentist and Bayesian statistics, read Bayesian versus frequentist statistics.
Data mean formula
The formula for the data mean differs between conversion metrics and numeric metrics:
- Conversion metrics, including custom conversion binary, custom conversion count, page viewed, and clicked or tapped metrics, use the total number of conversions divided by the total number of exposures:
- Numeric metrics use the total value divided by the total number of exposures:
When you hover on the “Conversion rate” and ‘Mean” headings in an experiment results table, the above formulas for the data mean display.
CUPED may affect the exact computation of these results. To learn more, read Covariate adjustment and CUPED methodology.
Fixed-horizon analysis
At LaunchDarkly, we use fixed-horizon analysis for our frequentist experiments. The summary statistics for this method are:
- Mean (or conversion rate): the average number of conversions across all units in the metric, or the percentage of units with at least one conversion
- Confidence interval: the range of values within which the true metric value is likely to fall if you were to repeat the experiment many times
- Relative difference from control: how much a metric in the treatment variation differs from the control variation, expressed as a proportion of the control’s estimated value.
- p-value: a measure of how likely it is that any difference observed between a treatment variation and the control variation is due to random chance, rather than an actual difference in performance between the two variations
LaunchDarkly uses the industry- and scientific-standard z-test for fixed-horizon analyses on metrics based on means. This is because data volumes typical of online experimentation imply that sample means in nearly all cases will be approximately normally distributed.
LaunchDarkly also computes confidence intervals in the usual way based on the same normal approximation used in the z-test.
Mathematical details for the calculations involved in this test are easily found on the internet, such as on Wikipedia, so we do not delve into specifics here.
One note is that LaunchDarkly, by default, computes p-values and confidence intervals based on the relative difference between treatment and control.
The relative difference = Where and are the sample means of the control and treatment variations, respectively.
To compute these statistical quantities, LaunchDarkly uses the standard delta method approximation to calculate the variance of the relative difference. For details of the computation, read Applying the Delta Method in Metric Analytics: A Practical Guide with Novel Ideas.
Conclusion
This guide explained the statistical methods LaunchDarkly applies to frequentist experiments. To learn about Bayesian statistical methods in LaunchDarkly, read Statistical methodology for Bayesian experiments.