A/B testing: the importance of Central limit theorem

In this article, I will explain the practical benefits of this theorem and its importance in A/B testing.A central limit theorem is a powerful tool in the analyst’s equipment.Some of the theorem thesesForming large random samples from any population tend to ​​distribute its mean values according to the normal law and close to the mean value of the general population from which we are sampling, regardless of the distribution form of the general population..Even if the population distribution is exponential, multiple random samples extraction tend toward normal distribution.Most of the mean values ​​of the samples will be close enough to the mean of the population..What exactly should be considered “close enough” is determined by the standard error.It is relatively unlikely that the sample mean value will be farther than two standard errors from the population mean value, and it is extremely unlikely that the sample mean value will be farther than three or more standard errors from the population mean value.The less likely that some outcome was purely random, the more we can be sure that there is an influence of some other factor.Let’s image we launched an experiment where the target metric is an average check..The null hypothesis is that there is no difference in the average check value between the control and experiment groups..An alternative hypothesis is that the difference exists.As we know, a small sample size results in an inaccurate assessment of statistics..According to the large numbers law, the larger sample size, the closer sample mean value tends toward the general population mean value..That means to get a more accurate assessment of the population mean, we need a large enough sample.This can be understood looking at the chart below, which shows that with increasing sample size, the sample mean tends closer toward the general population mean value:We can use bootstrap to determine confidence intervals of our exponentially distributed average check data..As we can see the mean values of the arithmetic sample mean values is approximately equal to the sample mean value from which the statistics have been extracted..The standard deviation has become lesser, as the observations are now as close as possible to the true population mean value.In this case, the standard deviation of the means is the standard error over which confidence intervals were previously plotted..Now using confidence intervals we can assess the statistics..This is one of the main practical values ​​of the central limit theorem.If the goal is to obtain a more accurate estimate of the mean, then it is necessary to minimize the variance..The smaller the spread, the more accurate the mean value.. More details

Leave a Reply