To buy or not to buy:­ conversion rate predictions

foto van Maarten van Hooft
Maarten van Hooft
Data Scientist

February 14th, 2019

Labs | We believe in continuously improving our solutions with the latest advancements in Data Science & Machine Learning to enhance the added value for our clients. With the series 'Labs', we aim at keeping you up-to-date about our latest innovations and translate these techniques into the business impact it will make on your organization.


Are the recommendations displayed at my web shop truly beneficial to the selected KPI’s I want to improve? Every e-commerce manager should ask this question to oneself. Nowadays, most recommendation engines are programmed to select the best recommendations based on the relevance to the customer, which is undoubtedly of true added value. But what about conversion probabilities, wouldn’t it be even more valuable to predict the probability a customer will buy a recommendation?

A web shop can only show a limited number of recommendations to every individual customer, therefore displaying the optimal subset of recommendations to each individual customer is crucial to positively influence the selected business KPI’s. With our existing models, we already understand which recommendations are most relevant to the individual customer, but now the aim is to add the probability that a customer will buy the recommended product. When using algorithms that can predict conversion probabilities, selecting the optimal subset of recommendations becomes an interesting game with various possibilities to steer business KPI’s. When considering profit margins of products for example, an optimal subset of recommendations can be calculated which will lead to the highest revenue and customer life-time value per customer. Taking stock levels into account, the optimal subset of recommendations can be calculated to achieve the desired stock levels. etc. etc. 

Conversion rate predictions are all about buying intentions. Predicting buying intentions for recommendations in an online environment is complex. The efforts required for the customer to visit an online shop are negligible, and therefore, customers are more likely to visit an online store without any buying intentions. Often, people are just browsing through a web shop for the sole purpose of entertainment or exploration.

In contrast, a sales representative within a brick-and-mortar store has an intuitive feeling about which customers are likely going to buy based on the behavior of the visitors instore. Therefore, the sales representative is able to allocate his resources as effective as possible by recommending the most relevant products leading to higher conversion rates. In a web shop we aim to mimic this by inferring buying intentions from implicit feedback based on the website clicks and transactions.

Relatively low buying intentions within a large e-commerce web shop results in millions of website clicks/views while conversions are relatively low in comparison. Because of the low level of conversions, it can be hard to find patterns in the data on when, why and on which products website visitors will convert. Accordingly, determining which recommendations to show becomes a complex task. This is especially true for long-tail products and/or products or services that are durable and not frequently bought.

So, before we could add this feature to our solutions, we had to solve the problem of imbalanced data sets. In Data Science we refer to an imbalanced data set as a data set in which the fraction of observations that belong to one case (purchase) is very small in comparison to the fraction of observations for the other case (no purchase). Simply put, the events of interest are rare, not taking place that often. So, there are only a limited number of observations from which the model can learn to predict conversion probabilities. In the case of a web shop only few data points (on actual conversions) can be retrieved from the web shop in order to predict the chance a visitor is going to buy a recommendation. 

By doing extensive research we solved the problem of imbalanced data sets and are now able to predict probabilities in contexts with rare events. Furthermore, we automated the process which enable our Data Scientists to dedicate their time to focus on more interesting and value adding subjects instead of selecting algorithms and tweaking parameters.


These new techniques are cool, but how will organizations benefit from these innovations? Conversion rate predictions not only can be leveraged by retailers strategically selecting their recommendations based on the KPI they are aiming to improve. These techniques can be of added value in almost every context containing rare events and is therefore highly valuable in various business cases. For example, it will enhance predictive models detecting fraudulent insurance claims (rare events) and accordingly accelerate the automated claim handling process. Another example can be found in AdWords, bidding on the most effective keywords is a tedious task and can somewhat feel arbitrary, besides chances are high your customers will ignore them completely. Applying these smart algorithms enables the marketeer to optimize keyword selection based on the chance that customers will actually click on the ad. Now marketing expenditures can be allocated to the best performing keywords on an individual level based on conversion rate predictions.

Imagine all the possible cases in which rare events play an important and you will understand how this can affect many processes within an organization.