Understanding a customer’s behavior is crucial for a firm in allocating its resources in customer segmentation and promotion targeting. Therefore, marketing scholars have spent extensive time in developing models to predict the prospective relationship between a firm and its customers. However, these models share one major disadvantage: they require multiple observations for the same customer. In other words, existing customer relationship management (CRM) models only work for repeated customers and miss out on first-time users. Firms, therefore, are constrained in designing an optimal plan to manage their relationships with these newly acquired customers.
Authors Nicolas Padilla and Eva Ascarza present a novel approach to tackling this so-called cold start problem, in which the firm only has data about the first purchase of their customers and would like to leverage those limited data to forecast future behaviors of these customers. To compensate for the lack of depth of individual-level data (i.e., repeated purchases information), this model utilizes the breadth of data available by augmenting multiple data points observed on that single transaction, including (1) transaction-specific characteristics (e.g., price, channels, discount, holiday season), (2) product-specific characteristics (e.g., product category, package size), and (3) shopping basket information. Their probabilistic machine learning framework combines a flexible demand specification with a state-of-the-art machine learning model and a Bayesian framework to link those observed behaviors with the future customer purchase behaviors through an assumption that there exists some common latent traits between these behaviors and purchases.
This paper has provided significant contributions to marketing literature and practice. First, the probabilistic machine learning framework presented in this article provides a pathway to overcoming the cold start problem in CRM literature. Second, this approach enables managers to use their data to make insightful marketing decisions in a convenient and automatic way. Last, it suggests a method by which a firm can fully leverage first-party data to achieve high performance without worrying about the increasingly prevalent privacy regulations that limit a firm’s ability to use customers’ data collected by third parties.
Q & A with Authors Nicolas Padilla and Eva Ascarza
Q: What was the motivation behind this research, and why are you passionate about using a probabilistic machine learning framework in the study of customer relationship management?
A: The success of a good CRM relies on managing customers differently. For this purpose, the literature has focused on methods to identify these differences in how customers transact and how they respond to marketing actions. However, most of these methods rely on observing customers multiple times. This aspect limits the usability of these methods for firms that want to manage customers right after they are acquired. This problem is not exclusive to CRM problems. Indeed, it has been a focus of interest in computer science for recommendation systems and is defined as the “cold start problem.”
So, the motivation for this paper was the realization that such a problem was very relevant to CRM as well, for the reasons mentioned before. The excitement about the probabilistic machine learning method came from both combining the marketing problem with the nature of the data to which the firms have access. Indeed, the cold start problem of CRM has unique aspects that set it apart from the literature in recommendation systems. First, there is an extensive literature in marketing on individual-level probabilistic models to compute customers’ value. Thus, a model that aims to tackle the cold start problem should ideally be easy to integrate into existing modeling approaches. Second, during the first transaction, which for most firms (especially for retailers) is the moment of acquisition, firms are able to collect multiple data points on that single interaction. These two aspects distinguish the cold start problem of CRM from that of recommender systems, giving the chance for probabilistic machine learning methods to shine. The “probabilistic” component allows the model to be integrated easily with existing methods (e.g., models for contractual and non-contractual settings); while the “machine learning” component allows the model to effectively extract the relevant information from the first transaction that allows the modeler or researcher to better infer how these new customers will behave in the future.
Q: How do you propose to bring the cold start framework together with the repeat purchase framework to understand the evolution of the relationship with a customer? In your opinion, is it possible to bring them together or should they remain separate?
A: Our modeling framework easily combines these two approaches. Certainly, our cold start approach provides a way to make inferences on new customers using just their acquisition data. However, as repeat purchases from these customers become available, these inferences will start incorporating the incoming data, and the model allows the computation of the posterior distribution on how these inferences evolve as data on these purchases arrive.
Also, if the purpose of such a model is to understand the evolution of the customer parameters as their relationship with the firm evolves, and the extent to which acquisition parameters can explain that evolution, then we would recommend expanding the current demand specification to a dynamic model (e.g., a hidden Markov model) and allow the transition probabilities to be related to the customer characteristics observed at the moment of purchase.
Q: Your research focused on customers who have been acquired. Could the probabilistic machine learning approach be used by firms as a strategy for customer acquisition using publicly available data?
A: The probabilistic machine learning approach could be used for many, many other applications. Specifically, it can be used to relate parameters that one estimates from any customer-level model to rich sources of customer data, being publicly available or proprietary. For example, a company could combine data on their customers with publicly available data to determine which characteristics (of the publicly available data) better describe the customers with high customer lifetime value (CLV). Such information could be used to inform acquisition strategies.
However, it is very important to note that customers self-select themselves to be acquired, which is a consequence of the firm’s product offering, the marketing mix variables and more generally the conditions of the market at the time. Therefore, optimizing the acquisition strategy involves changing the conditions under which these customers were acquired in a way that is not necessarily observed in the available data. It would not be enough to relate publicly available data with CLV to try to acquire those users—the firm needs to investigate which actions are most effective at acquiring those customers.
To do so, firms could manipulate exogenously some of the variables in specific contexts and introduce exogenous variation to allow other probabilistic machine learning approaches to infer the relationship among marketing actions and resulting acquisition outcomes. For example, firms could test multiple email communications/promotions to prospect customers and properly set control groups. Then probabilistic machine learning could properly be used to extract how certain communications/promotions may be more useful at acquiring certain customers vs. others.
Q: Your article mentions that the intuitive idea to overcome the cold start is to model the purchasing decision as a function of observed acquisition behavior, but that model has been shown to have critical weaknesses. How did you decide to model the purchasing decision and acquisition behavior separately in your framework?
A: There are multiple reasons. The first relates to the number of acquisition variables and the correlation among them. Depending on the context there are potentially multiple acquisition behaviors that may be observed. Naturally, many of these variables may contain important information to infer the future behavior of these customers, but this information may be conveyed by multiple variables as these variables may be strongly correlated among them (i.e., price paid, total amount, and discounts). These correlations may affect the purchase model if not accounted for; in an extreme case, they could cause multicollinearity when including these variables directly.
The second reason is that modeling these acquisition behaviors as outcomes provides a natural way to account for how some of these behaviors may be driven by the firm’s marketing actions or the conditions of the market. Modeling these behaviors as outcomes allows the ability to extract the variation that is customer-specific and remove the systematic variation induced by these factors.
Third, missing observations are prevalent in these types of data sets. For example, different markets may record different types of information, or some variables may be observed through the online channel but not when purchases are made in the brick-and-mortar store. Modeling acquisition behavior as an outcome provides a natural way to handle missing observations.
Q: What are the key takeaways of this research study for different stakeholders (e.g., academics, marketing, organizations, government agencies)?
A: We believe this research has multiple takeaways. First, we show that firms can further leverage their existing databases by augmenting their cold start data using available techniques (e.g., by characterizing the nature of the products customers by using prod2vec techniques). Second, we show that these data are relevant for making inferences of recently acquired customers, and their informativeness may be subtle and nonlinear which requires models that can properly extract this information. That has implications for practitioners and academics. Firms may leave value on the table by not fully using the information extracted from all behaviors observed at acquisition when managing new customers. Scholars could potentially further investigate a wider range of acquisition characteristics and their relevance to infer customers’ future behavior. For example, whether customers are visiting the store alone or with family when being acquired may be relevant to project future consumption patterns. Finally, this research speaks to how firms can fully leverage first-party data: data that is increasingly more relevant nowadays following the privacy regulations that limit firms’ ability to use customers’ data collected from third-party sources.
Q: Do you envision the role of machine learning models in marketing literature to grow in the future? What are the benefits of using this type of approach?
A: Yes, we strongly believe that they will continue to grow. First, there is an increasing need for automation in decision making, particularly in marketing settings in which managers must make granular decisions over thousands of customers. Second, the field has moved toward customization and personalized communications. This research is an example of that: how firms can better make decisions on how to manage their recently acquired customers by leveraging the fact that they behave and respond differently. Third, firms are increasingly storing more and better data. This leads to better models that can extract subtle signals from these high-dimensional data sets. The main benefit of these approaches relates to their ability to make better predictions without prespecifying functional forms that constrain the potential nonlinear relationships that may be present in the data.
Read the full article:
Padilla N and Eva Ascarza (2021), “Overcoming the Cold Start Problem of Customer Relationship Management Using a Probabilistic Machine Learning Approach,” Journal of Marketing Research, 58 (5), 981–1006. doi:10.1177/00222437211032938