To answer these questions let’s introduce some probability theory terms illustrated through ad views and clicks.
Let a stochastic event be a single view of our ad.
Such event has two outcomes: user either clicks on the ad with a CTR probability or doesn’t click on the ad with the probability 100%-CTR.
A random number of clicks is registered along the series of views with a default length and this number is our stochastic value.
As, if we once again repeat the experiment with the same number of views we’ll end up with a different number of clicks.
We will never be able to name the exact number of clicks in advance but we may suppose it by calculating click expectation along the series of views.
We just need to multiply the click probability during one view, namely our CTR, by the proper number of views.
For example, with 1% CTR click expectation for 250 views will be 2.
Following the mere logic, we understand that a half of the user (may be it is a mermaid?) will not click on our ad, thus in the ad manager we will a see a proximate number of two or three clicks.
But if we carry on with testing our series of 250 views as well as with registering the number of click per series, we will get an average value proximate to the expectation.
Herein, the bigger the number of such experiments the more proximate the average value will be to the expectation.
This is a rather frivolous interpretation of the law of big numbers of the probability theory that goes with our intuition.
Leading to the necessity for calculating a confidence intervalYou may ask: “Well, it is clear that preciseness is referenced as proximity to expectation (namely, a smaller range of variables), but why do we need this at all?”Well, in the ads manager for a thousand of views we will get ten clicks and the system will proudly state that the CTR is 1%.
Happy as hell, we will go traveling, but on arriving back we will discover that after the first thousands of views CTR is only 0.
What a mess!.But, of course, we have taken into account neither the audience nor the creative burnout.
The thing is that your first thousand of views was better than average, and CTR showed in the manager was far from being real.
But it approached its true value along a longer series of views.
The CTR alone is convenient to use as it doesn’t depend on the number of views.
Its preciseness directly depends on the preciseness of the number of clicks.
As a true CTR I will consider the probability that we mentioned at the beginning of the article when we were defining our stochastic event.
As this value is set to be endless and CTR in the ads manager tends to reach it upon elongating the series of views, cuz the true CTR was used to calculate the click expectation.
Predicting future effectiveness of your creative requires awareness of the true CTR.
But it’s impossible (unless your budget is endless), thus you are only left with the spying through the keyhole.
Putting it into maths terms, potential range of the true CTR may be roughly estimated based on the data we bought with our test budget.
And now it’s time for formulae to take the scene.
About the Poisson distributionAn unexpected disclaimer: don’t treat the article as a formal guide for precise calculations.
Its scientific character serves for only calculating, literally, something roughly.
The Poisson distribution may help to define the probability of getting m clicks at n views,where p is the true CTR and λ, derived from the true CTR and n number of views, is the click expectation over the sampling of such size.
Unfortunately, this is true only for λ of the order of 10 that pulls down the preciseness of all further calculations.
But don’t be afraid of the formula as we need only its graphThe graph shows the probability of the true CTR at n views conversion into a random number of clicks.
You may see the form of the probability peak as well as the click expectation at n views and its proximate values on the peak.
At the bottom of the graph (or the root of the peak) there are less probable realizations of our series of views, when the number of clicks is far from the expectation.
Thus, determining the range of probable values of the click number at series of views implies determining how wide our peak is.
Degree range of a stochastic value against its expectation is called dispersion and the root of it — standard deviation (s on the graph) that, indeed, determines the peak width.
Ultimately, to move to calculations it is left to get clear whether the probability of realization of the series of views will be on top or at the bottom.
In other words, what is the probability that the number of clicks in a certain sample won’t exceed the standard deviation from the expectation?.And it happens to be about 70%.
Calculating a confidence intervalInterval in question,is called a confidence interval with a 70% confidence level.
We are going to further focus on it.
We are very lucky as the dispersion of the Poisson distribution is equal to its expectation that is to λ, and the standard deviation value — to the root of λ.
Thus, the 70% confidence interval is the followingLet the true CTR of our ad be 1%, so the click expectation for 900 views will be 9.
Thus, we haveIt means that in 70% of the cases after 900 views we will get a number of click varying from 6 to 12 (with 9 still being the most probable).
Dividing these values for the number of views we’ll easily guess that the ads manager will show CTR ranging from 0.
67% to 1.
This range may be called the confidence interval for CTR in the ads manager.
Thus, if you are satisfied with CTR exceeding 1% and you weren’t lucky enough to get 0.
67% for your first 900 views, the ad that in future would have shown proper performance, may be undeservedly scraped.
In this case, the test period should have been enlarged.
But wait, you are most likely to ask why we are predicting the true CTR and making conclusions on its basis, if in reality the task is exactly the opposite — defining potential true CTRs upon the CTR in the ads manager.
And you are right.
Technically, such calculation should be more complicated but eventually it will end up in the same field that’s described above.
So, don’t be afraid of the apparent difficulty.
Right now let’s make a brief summary of everything that’s been mentioned.
Intermittent conclusionsIt is easier to make sense of an error ratio than of a confidence interval where ratio error is the ratio of the expectation offset to the proper expectation.
In our case we haveThe error ratio of the previously mentioned example is not less than 33%, apart from the error related to the 30% of the cases that we haven’t included in the 70% confidence interval.
In those 30% we can get any random number of clicks not included in the confidence interval.
So, what is the conclusion we come to?.For example, we may conclude that the accuracy of the testing (namely the smallness of the ratio error) doesn’t depend on the number of accumulated stochastic events but on the number of successes, or not on the number of views but on the number of clicks.
That’s why there is no common borderline for determining the right extent of carrying on the testing.
If talking about landing pages except from the duration of a test run dependency we need to mention a CTR dependency and a conversion dependency.
But you have to be always ready to define the borders for each particular case on your own.
In terms of a fixed true CTR value the accuracy increases with the number of views.
Let me recall that the click expectation is determined through multiplying the fixed true CTR value by the number of clicks and shows you that the error ratio is dropping while the click expectation is growingMoreover, you may notice the following regularity of the drop — the longer we test, the slower the accuracy grows, meaning that the test should be concluded on time.
Otherwise, the time and budget is being spent ineffectively.
But what does on time mean?.What is the degree of accuracy required?.That is up to you to decide.
The only thing is certain that after 25–30 clicks the degree of accuracy is growing at a dramatically slow pace.
A reverse taskLet’s go back to calculating confidence intervals and eventually figure out what to do to roughly drawing a potential true CTR upon CTR in the ads manager.
We have determined that a true CTR at n views with 70% probability in the ads manager is realized into one of the values within the related confidence interval, meaning that each CTR in the manager is part of a plethora of confidence intervals of various true CTRs.
We just need to determine which minimum and maximum true CTRs may be realized into our manager (with a given degree of accuracy).
To cut the long story short, we need to find such true CTRs that correspond to the manager’s CTRs on the borders of their confidence intervals.
Hence, looking into the ads manager and seeing a certain CTR there we’ll finally answer the question about the potential true CTR of our ad.
The answer will be the range of true CTRsLet’s find CTR1 and CTR2 by translating the above mentioned picture into the language of click expectation recalling the width of the confidence interval.
Thus, λ is the number of clicks in the ads manager, while λs with numbers correspond to CTR1 and CTR2Left part of the equation corresponds to the right border of the left confidence interval.
Its right part is the left border of the right interval.
Their meeting point exactly corresponds to our number of clicks.
Here is the solution to the equationIt may look nasty.
But if the number of clicks in the manager is significantly bigger than 1, it can be discarded and the solution comes out pretty elegant.
Thus, the calculated confidence interval of the click number isDoesn’t it look pretty familiar?.So, it comes out that we didn’t have to solve any equations as with a slight error in the degree of accuracy we’ve converted the reverse task into a direct one.
Case in PointWell, let’s nail down the above mentioned modus by a case in point.
We see 25 clicks per 1000 views in the manager.
The system will calculate the CTR for us.
That is 2.
Now we calculate the root of the number of clicks.
Thus, we get our standard deviation, namely the width of a confidence interval.
For us, it is 5.
Then we calculate the confidence interval of the number of clicks (25–5;25+5)= (20;30)Then we calculate the relevant interval of true CTRs.
We just have to divide the previously calculated value by number of clicks (and calculate the percentage) (2%; 3%)ProfitBy the way, evaluating degree of the testing accuracy demands for error ratio that is calculated by dividing the standard deviation by the number of clicksBenchmarks TableFor your convenience I’ll put below a table illustrating everything mentioned above as well as demonstrating the number of views required for ad testing (set of 25 clicks after which the degree of accuracy grows slowly) depending on its CTRAB TestingIf it seems to you that the 70% confidence interval coupled with 20% error ratio give a pretty inaccurate results, you are welcomed to use the 95% confidence interval with a width not twice but four times larger than the standard deviations.
In this case we need to substitute the root of the number of clicks for its doubled value.
So, the 20% relative degree of accuracy will be achieved at a hundred clicks.
The above mentioned reasoning is likely to set up a theoretic lower limit for the test duration.
For a shorter testing the sample will be non-representative.
However, we are seldom interested in the accuracy of the proper test as we usually need to compare our results with the default ones, for example, spelt out in a media plan or acquired via other (already tested) ads, or with the results of the second variant within A/B testing.
A far as the first case is concerned the testing may be accomplished beforehand if its results aren’t satisfactory even at the upper border of the confidence interval.
For example, if we have 9 clicks with 0.
9% CTR and the 70% range of true CTRs is (0.
2%) and we need not less than 2% CTR, there is no use waiting for 25 clicks.
The chance of achieving such results is very small as even the 95% range of true CTRs will be only (0.
In case of the A/B testing there are two graphs having a width of its own.
The degree of accuracy of the testing depends on how little they overlap (the index changes through time).
You may ask why.
The thing is that at a specific moment in time each of the results of our variants look like a vertical line on the graph (or a point on the horizontal axis).
If this overlap is big, at a certain moment of time the result on the left may actually belong to the graph on the right, and the result on the right — to the graph on the left.
If we accomplish the test beforehand, we will choose a dead-end road though at that moment it seems a winning one.
The testing accuracy degree on the picture on the left leaves much to be desired compared to the one on your right.
However, e-mail marketing or UX/XI specialist may calculate the degree of A/B testing right in the interface of their professional tools or, at the very least, using online services.
Thus, people dealing with ads managers and as the result with a big amount of A/B testing may use the advice below.
Calculating the exact crossing point of the graphs is a challenging task so we can’t but take decisions based upon the crossing of the true CTR ranges.
In this case it’s better using 95% intervals.
If initially the relevant intervals are far from each other, we may finish the testing beforehand.
But if they overlap, we should carry on testing up to the moment lacking the overlap or to the moment of reaching representative sampling.
The degree of representativeness depends on the extent of the overlap.
If the 90% intervals overlap and the 70% intervals don’t, 30 clicks on each variant will be enough.
But if the 70% intervals overlap as well, meaning that the degrees of effectiveness of the variants are pretty close to each other, you can’t do without a hundred clicks.
Closing statementTo conclude, I will remind you that everything mentioned above is likely to be used at any stage of the funnel and applied not only to views (entries) and clicks but, let’s say, to the lead-to-purchase conversions as well.
The only reservation refers to the Poisson distribution as it works only for random values with low probability.
Thus, we won’t be able to use our method in relation to a high coefficient of conversion from filling one line of a lead form into another.
It is quite possible to analyze the funnel as a whole, namely the conversion accuracy rate from an ad click into an ultimate conversion of the whole funnel despite a plethora of micro conversions in the middle.
Thus, we just need to reference our stochastic event as an ad click, and the outcomes as the ultimate conversion or the lack of it.
That eventually helps us defining (I can’t say calculating) necessary means for hypothesis testing with media planning based on the same 25–30 successful outcomes of a stochastic event.