One approach is to look for a natural experiment, in which treatment and control groups occur naturally, and apply a so-called “difference-in-differences” (DID) analysis to tease out the effect of manipulations.
A Primer on DID Analyses with Natural ExperimentsDID analyses make use of longitudinal data that can be framed as a natural experiment.
To frame such an experiment, we need to be able to identify a control and treatment group.
Identifying the treatment group is easy — it is the group of subjects that are affected by the manipulation (e.
, website visitors that see a new marketing campaign ad).
Identifying a suitable control group is the hard part — after all, if the control group were easily available, then we could have done an A/B test in the first place.
The control group should satisfy two key conditions:Subjects in the control group must not be affected by the manipulation being tested.
For instance, if a new website feature is rolled out to users one geographic region at a time, then this time lag can be exploited — the regions that have not yet seen the new feature can be used as a control group.
If everyone in the user base is simultaneously exposed to the manipulation, we can attempt to construct a pseudo-control group based on the degree of their exposure to the manipulation.
Users that have been minimally exposed to a campaign ad could be considered as part of the pseudo-control group; this is fine if, for instance, we have good reason to expect the differences in characteristics and behavior between the pseudo-control group and true control group to be negligible, or if comparing the pseudo-control group with the treatment group is sufficient for the purposes of our analysis.
Apart from the effect attributable to the manipulation in the treatment group, the (unobservable) differences between the control and treatment groups should be largely indistinguishable over time.
What “largely indistinguishable” means in practice will have to be determined on a case-by-case basis, depending on the sensitivity of the DID analysis to the strictness of this condition.
If there are non-manipulation-related differences between the control and treatment groups in absolute terms, we can check whether the relative trends of these differences are largely indistinguishable.
This is sometimes called the parallel trends assumption, i.
, had the manipulation not taken place, the trend in the treatment group would not deviate significantly from the trend in the control group.
If the DID analysis is modeled as a regression, then any measurable individual-level differences (e.
, demographics) can also be controlled for by adding corresponding control variables to the model accordingly.
Having stated the above, it is worth noting that we typically still won’t have a sure-shot way of knowing whether a person that we are putting in the control group has really not perceived the manipulation (e.
, the person may have seen the campaign ad during an “incognito” session that the e-commerce company could not track accurately, seen the ad on a friend’s screen, or heard about the ad via word-of-mouth).
We would therefore generally have to rely on the use of proxies (e.
, ad displays and clicks) to approximate actual perception of the manipulation.
If we are able to identify a suitable control group, then the observations in the dataset can be split into four groups based on whether the observations belong to the treatment group (TREAT = 1, otherwise 0), and whether the observations occur after the manipulation has taken place (POST = 1, otherwise 0).
This produces four groups that we will denote as G(TREAT, POST):G(0,0) is the control group before the manipulation took place.
G(0,1) is the control group after the manipulation took place.
G(1,0) is the treatment group before the manipulation took place.
G(1,1) is the treatment group after the manipulation took place.
Now we can compute two types of differences (see Fig.
1) to produce the DID estimator — indeed, this is where the “difference-in-differences” name comes from.
1: Computing the DID estimatorFirst, within the treatment and control groups, we compute the difference in trends before and after the manipulation took place.
With the above notation, and using “-” as the difference operator, we can write this first difference as G(0,1)-G(0,0) for the control group, and G(1,1)-G(1,0) for the treatment group.
Next, we compare these differences between the treatment and control groups, computing (G(1,1)-G(1,0))-(G(0,1)-G(0,0)).
The second difference captures the effect of the manipulation on the treatment group, beyond what the treatment group would have looked like in a counterfactual state in which the manipulation did not take place; as Fig.
1 shows, this second difference can also be written as G(1,1)-(G(1,0)+(G(0,1)-G(0,0))).
Typically, DID analyses are modeled as regressions of the form y = b0 + b1*TREAT + b2*POST + b3*(TREAT x POST) + epsilon, where y is the outcome variable we are interested in (e.
, views of a piece of content that is being promoting), and the coefficient b3 of the interaction term captures the DID estimator.
If b3 is statistically significant and sufficiently large, then we may have reason to believe that the manipulation did actually have a meaningful effect.
An E-Commerce Case Study: Boosting Views of Premium Editorial ContentImage Source: PhotoMIXIn this section we will briefly go through a DID analysis that was carried out at a global, B2C e-commerce platform for entertainment content.
The context and actual numbers have been modified to preserve confidentiality.
The platform hosted a wide array of video entertainment content for users to consume.
As is typical on such platforms, a small portion of the content accounted for most of the viewership.
However, the editorial team noticed that some of the less-viewed content included premium pieces that were not only expensive to produce but also included ads for the products and brands of other companies; the platform received revenue based on the number of views these ads received.
Boosting views for the less-viewed premium content was thus strategically vital for the revenue growth of the e-commerce platform.
After some thought, the team devised an approach of cross-marketing content on the platform in a way that was non-intrusive and yet could conceivably nudge consumers to look at the less-viewed premium content.
Due to limited financial resources and technical limitations, no explicit A/B test could be conducted.
Nevertheless, the team wanted to know whether their cross-marketing approach did actually have the desired effect, and resorted to a DID analysis.
Since the cross-marketing approach was rolled-out in one go to all users across all regions, deriving a sensible (pseudo-)control group was the key challenge.
Regional variances using the time lag tricks described earlier could not be exploited, since the cross-marketing was roll-out to all regions simultaneously.
Fortunately, the e-commerce platform had implemented the usual online tracking of clickstream data, consisting of timestamped records of content viewed per user.
A sample of users that had been active on the platform in the time period of about two months before and after the cross-marketing launch was selected (i.
, everyone in the sample had to be active in the pre- and post-manipulation phases).
The DID analysis would therefore cover the same sample of users over the given timeframe, which was deemed large enough by the editorial team to show meaningful effects.
Moreover, some of the less-viewed premium content had been available to all users on the platform before and after the launch of the cross-marketing approach; this content was flagged and could be used in combination with the sample of users to derive the control and treatment groups, and specify the outcome variable.
Specifically, the control group would be the subset of users in the sample that had never seen the cross-marketing ads, while the treatment group would consist of users that had seen the ads.
The outcome variable would be each user’s level of engagement with the less-viewed premium content promoted in the ads.
The trends in the treatment and control groups were similar and could be controlled for with additional control variables in the regression analysis as needed.
By carefully framing the data as a natural experiment of a sample of active users in a given timeframe, the DID effect could be modeled using the regression approach discussed earlier.
The interaction term TREAT x POST, denoting the DID estimator, was found to be significant and the coefficient size was reasonably large some of the less-viewed premium content.
There was thus some reason to believe that the cross-marketing approach was having the desired effect.
The WrapIn summary, carrying out a DID analysis can be an effective way of establishing the effect of manipulations when an A/B test is not an option.
In order to use DID in your particular situation, try to frame the data as a natural experiment and identify suitable control and treatment groups.
Identifying the control group can be especially tricky in practice and there are a number of ways to spot them (e.
, via regional differences, time lags and other differences derived from clickstream data).
In practice, a key limitation of DID may be the need to find suitable treatment and control groups.
Also, in its simplest form, the DID method does not account for unobservable variables that fluctuate over time; you should try to think about these unobservable variables and find proxies for them, where possible, so that they can be controlled for in the DID regression model.