Predicted Ghost Ads: An Accurate and Cost-Effective Method for Measuring Incrementality

Previously published on the dataxu.technology publication on Medium.

Incrementality testing measures the impact of advertising spend on the conversion rate of a test campaign (Johnson, Lewis, and Nubbemeyer 2017). When measuring incrementality, advertisers need to compare the performance of their campaigns on targeted user populations, against control groups where their ads are not served. In a typical setting, the campaign serves actual ads to users in treatment while PSAs (Public Service Announcements) are substituted for the control group. However, there are several issues that arise with this method. To learn more about these issues as well as a new method for measuring incrementality for advertisers, read more below.

Uncovering issues with PSA-based incrementality testing

There are many issues with the traditional method for incrementality testing which advertisers should keep in mind. First, in order to achieve statistical significance, the advertiser needs a significantly large control group. Since advertisers are paying for both the ads served in their campaign as well as the PSA ads served to the control group, this can be added cost which advertisers are unwilling to take on. Second, PSA-based incrementality testing becomes inaccurate when advertisers use performance-optimizing computer algorithms to deliver ads. This is because the algorithms will target the PSA ads in control to people who are more likely to interact with the PSA ad. This population is most likely not the same population who is interested in the ads launched by the test campaign.

Predicted Ghost Ads: Why is it better?

There is another method advertisers should consider when executing incrementality testing: Predicted Ghost Ads (Johnson, Lewis, and Nubbemeyer 2017). Predicted Ghost Ads offers a couple of major advantages that are attractive and beneficial to advertisers:

Entire budget is available for advertiser’s ads: The Predicted Ghost Ads method doesn’t serve ads to the control set at all, meaning that there is no cost to measure control responses. Once a would-be-bid in the control set is recorded, an alternate ad can be served in place of the campaign in test, thus not losing the bidding opportunity.
Works on performance-optimized campaigns: In campaigns which optimize towards a certain audience, PSAs might fail due to the targeting algorithm being biased toward a different audience profile. Predicted Ghost Ads would be immune to this issue.

Predicted Ghost Ads: How does it work?

The Predicted Ghost Ads method doesn’t serve ads to the control group at all; it simply records all bids which would have been served. One of the core functions of this methodology is choosing a set of bids as “would-be-impressions” from the recorded control group bids. This is achieved using a simulated auction (see Section 4 in Figure 1). At dataxu, this simulated auction is implemented by training a recall-optimized machine-learning model on the bid and impression data from the treatment group. A recall-optimized model focuses on making sure that a majority of the impressions are predicted correctly rather than purely the accuracy with which it predicts the impressions and non-impressions. By “recalling” most impressions in this way, it helps the test capture more conversions within the relevant population in treatment and control, and in turn helps achieve significance.

Figure 1 shows a flow chart of the predicted ghost ads process. (1) As bid opportunities arrive from exchanges, the flights in the test campaign bid on the dataxu internal auction according to recommendations based on the dataxu bidding engine. If the flight wins, the ID relevant to the bid opportunity (either at cookie level or household level) is marked as falling in either the treatment or control group by leveraging a hashing function. The hashing function ensures that an ID which reappears will always be marked to the same group. (2) If in treatment, the ad is served. Both bids and actual impressions are recorded. If in control, the bid is recorded but the ad is not served. (3) A machine-learning model is then created using the treatment bid and impression data to predict impressions with high recall. (4) This model is run on the recorded bids from both treatment and control in order to obtain predicted impressions in both sets. Any small errors in the model will be shared by both groups and thus a fair comparison is achieved. (5) Actual conversion events within the predicted treatment and control impressions is calculated. (6) The lift is calculated across these conversion events. It is then scaled by the probability that a predicted impression is an actual impression, which is calculated using the treatment data.

Figure 1: Predicted ghost ads as implemented at dataxu

Results

Table 1 shows preliminary results for a campaign in test. We use a two-proportion z-test to compare the conversion rates in treatment and control. A small p-value (typically p < .05) indicates strong evidence against the null hypothesis, which in this case is the conversion rates being equal.The results show Tactic 2 to be performing well in treatment while a significant lift is not observed in Tactic 1 (see Figure 2). This suggests that Tactic 1 should be tuned up and retested or its budget could be diverted to the more successful Tactic 2.

Table 1: Predicted Ghost Ads preliminary results for a campaign in test

Figure 2: Difference in Conversion Rates in Treatment and Control for each tactic

Overall, using the Predicted Ghost Ads methodology over the traditional PSA-based incrementality test allows the advertiser to measure incrementality for their campaigns in a more accurate and cost-effective manner. The implementation of this tool expands the state-of-the-art in the dataxu A/B testing framework and, as always, enables more accurate and data-driven decision-making for advertisers using the platform.