As seems to be true of all things marketing these days, third-party data acquisition expenditures are rising. The data broker industry is forecasted to have a compound annual growth rate of 11.5% from 2017 to 2022. Much of this growth is driven by the need for basic demographic and firmographic data to expand mailing lists, size new markets and create more complete and accurate customer profiles. Newer uses of third-party data are for digital ad targeting and crafting personalized marketing efforts through the use of prescriptive analytics.
With an expanding list of use cases, third-party data seems like a necessary part of marketing operations. However, the cost of acquiring, ingesting, and analyzing third-party data can seem burdensome, especially in addition to pre-existing data management and analytics costs. Without a focused data acquisition strategy, companies may find themselves over-invested in third-party data.
Given the potential for high costs, some argue that third-party data is a waste of money, but this claim can be a disguised pitch for second-party data, which is essentially someone else’s first-party data. The truth, whether you are spending money to acquire second- or third-party data, is this: marketers need to prove their data expenditures deliver returns.
A Three-Step Approach to Explore Data Value
To identify which new data purchases generate returns, and which should be avoided or discontinued because they do not, marketers can harness a three-step approach:
- Compare the predictive power of models with and without the third-party data.
- Estimate the benefits of the new predictive capabilities, and weigh those against the costs.
- Confirm the prediction.
1. Compare Predictive Power
Advanced analytics make it possible for marketers to make increasingly accurate predictions about which new data feeds will generate returns. For example, say a retailer has historical first-party transaction data and customer contact details for past customers. Let’s also say they can connect this data to previous marketing activity, like direct mail campaigns and current digital website activity.
Although the retailer has a good foundation of first-party data, they have a limited ability to make predictions about people who have not yet purchased from the retailer. Adding third-party data, such as demographics, lifestyle, and psychographic information should yield more predictive power – but how much more?
The process for predicting the value of third-party data relies on modeling comparisons. First, a baseline model is created that uses previously available first-party customer data to predict the customer’s propensity to purchase (or any other desired action). This first-party data likely includes any prior purchases (if any), as well as any digital interactions that can be tracked by the company. Once the baseline model is created, performance tests are executed to measure accuracy. In this example, let’s assume that the baseline model delivers an accurate prediction 60% of the time.
After creating and measuring a baseline model, the next step is to create a second, comprehensive model that also includes the demographic, lifestyle, and psychographic variables from the third-party data set. The accuracy measurements from this model are then compared to the baseline model.
In most cases, the model’s predictive power will improve once third-party data is added. For example, let’s assume the new accuracy is 80% once the third-party data is added. This now gives the marketer an opportunity to weigh the cost of acquiring third-party data against the benefits of improving predictive accuracy by 20 percentage points.
2. Estimate the Benefits
The estimation of benefits will largely depend on how the company is planning to utilize the predictions. If the predictions are used to find new prospects, then there may be a substantial increase in new revenue. Conversely, if the predictions are used to narrow the marketing targets to just high propensity targets, then there may be significant cost savings, depending on the cost of the tactics in play. These are two limited cases, but marketers may also be interested in predicting cross-sell / up-sell opportunities, predicting churn, or predicting which customers should receive specific messages or offers at specific times. Understanding these use cases is key to assigning value to the benefits.
To estimate cost of the third-party data, the marketer needs to understand which variables they actually need to procure. When considering thousands of variables, this may seem like a daunting task. However, most modern-day predictive algorithms incorporate some type of variable selection. This process selects only the statistically significant variables for inclusion into the model, conveniently giving the marketer a guide as to which variables need to be purchased. If additional data acquisition costs need to be cut, the marketer can also assess variable importance and begin eliminating low-impact variables from the data.
Executing the modeling comparison and benefits estimate sequence helps marketers understand the value of third-party data and avoid unnecessary data expenditures. The challenge with completing the sequence often involves competency or time availability. If such gaps exist, they can be filled by using a consultant. Or, it is increasingly common is to ask third-party data providers to conduct such analyses to pre-justify a data purchase. This way marketers leverage their time and avoid unnecessary data expenditures.
3. Confirm the Prediction
One clear benefit of predicting data value is it enables a subsequent reality check. Once a model using new third-party data is applied in a real-world setting, the marketer can compare predicted with actual results and verify whether or not the anticipated gains were realized.
This allows not only for accuracy testing, but also yields an opportunity to enhance the model. For example, a marketer might notice that the model doesn’t predict a specific type customer very well. Upon further analysis, we might find that the model suffers from overfitting– in which case we’ve included too muchdata in our model and have subsequently introduced a bias that does not represent the real world. Conversely, we might find that we don’t have enough data to accurately predict a certain segment of customers, in which case we may look for additional data to supplement those predictions. Finally, we may simply be able to create a better model through use of different algorithms, variable transformations, etc.
While the above confirmation step verifies a prediction, it does not provide a definitive confirmation that a particular data feed played a critical role. In other words, a prediction confirmation does not always tell us whether a new data feed was marginally useful or indispensable. A test, on the other hand, can be more definitive.
The test process requires marketers to establish test and control groups. Once these are created, the test group is exposed to the new data, the new model, and the subsequent actions that are taken. The control group receives only the actions that would have been taken without third-party data and predictions. Below are several common examples:
If the new data adds value, then actual results for the test group should be better than for the control group. Be careful, though: marketers will only see positive results if the actions themselves are valuable. For example, if we correctly predict that a prospect is interested in product A, but we don’t have content to offer to the prospect, then we will not be able to capture the value of the third-party data.
A variation to the test/control site approach is the timeseries approach, where analytics performance (e.g., sales volume) is gathered for a period of time without access to the third-party data, and then compared to performance with the new data. While less robust than the test/control group method, the timeseries method is useful, especially when creating test and control groups is not feasible.
The promise of third-party data is that it adds value by helping marketers better serve customers. Growth of the third-party data industry is one indication this promise is being realized. Further evidence is found in the growth of digital advertising, where ad spends are increasingly targeted using third-party data. But the best evidence of all is created the hard way: by modeling, estimating, and confirming the value of third-party data in your company and in real-world marketing applications.
Don Sutherland is Senior Vice President of Data, Analytics, and Consulting at Harte Hanks.
David Mendelsberg is Director of Consulting Services at Harte Hanks.