Data Scientist and Data Translator

July 13th, 2016

As the need of extracting value out of data rises, the demand for data scientists is exploding. When looking at the numerous data scientist job vacancies, it seems that every company wants one. But what exactly is a data scientist?

According to The Guardian a data scientist is someone with both technical (database) skills, analytical skills and presentation skills. As explained earlier in the blog ‘Data science is a team sport’, someone with all these kinds of specializations hardly exists. That’s why we redefined the role of the former data scientist into three different, more specialized, roles in the data science team:

▪ **Data engineer**: the person who collects and prepares the data

▪** Data translator:** translates a business opportunity into a statistical/mathematical problem

▪ **Data scientist:** uncovers the relations and information out of the data

In this article we will dive deeper into the role of the data scientist. A statistician armed with strong mathematical and programming skills able to extract added value out of complex data structures and turn it into predictive insights to gain a competitive edge for their organization.

More commonly known in the field of data analytics is the business or data analyst. But what does a data scientist add to the traditional role of a data analyst? IBM puts this quite nicely by describing the data scientist as the representation of the evolution from the business or data analyst role. But where did this evolution come from?

A traditional data analyst often gains valuable insights from mostly structured internal databases such as CRM or sales data. However, the availability of data in both the internal as well as the external environment has grown exponentially. At the beginning of the 21st century the storage of a gigabyte of data became cheaper than $10. This exponential decrease in cost changed the way organizations looked at storing their data. Visionary organizations saw the potential of storing data other than just the operational data that was needed to run the organization’s daily processes. Moreover, computational power kept growing and computers became able to process these gigantic datasets much faster.

*Exponential decrease of data storage costs over the past 30 years*

In line with these trends new opportunities arose and the need to combine and process all information from various data sources increased. What if it would be possible to enrich internal data with information from external sources and extract value out of unstructured data sources like pictures, sounds and/or texts? Moreover, it would have even greater value when automatically implemented into a business processes immediately, instead of reporting it in documents and presentations only.

To meet these changing needs, a broader skillset than just strong analytical skills was needed. Because of the high velocity and variety of the available data and the numerous possibilities to enrich the data with additional information, it becomes much harder to incorporate only the information in a mathematical model that is truly relevant for specific business problems. In order to tackle this big data problem, statistics and computer science combined forces to exploit the newly available computational power and new intelligent algorithms were invented. As a result of the evolution of combining statistics, mathematics and programming skills the modern data scientist was born.

A data scientist often has a background in econometrics, statistics, mathematics or computer science. Their approach is more pragmatic and empirical than only classic statistics, keeping in mind computational efficiency and possible incompleteness of data. A data scientist has a love for an absolute truth, added value and an optimal solution. Not a hypothesis or a subjective opinion, but a 99% confidence interval. That is what one can give as input to the decision makers and where decisions can be based on. A data scientist enables you to make fact based decisions and sketch scenarios of what will happen to Y by making a change in variable X, with each a corresponding probability of occurrence.

A good data scientist is able to fetch the relevant information and causal relations from giant data sets. But how does one do it? First, the data is explored by just looking at some descriptives. Then, the data is put into a model and the data scientist exposes the causal relations between the variables and explores why specific events have occurred in the past. If causal relationships are found, the scientist is able to predict what will happen when the decision variables change in different future scenarios and determine the optimal strategy accordingly.

Let us take Amazon as an example of a company using a data scientist’s skills to improve their process and optimize their services throughout the entire organization. What value does the data scientist add for the company and their customers?

Thanks to a cutting-edge forecasting model of what products are going to be sold at what location, Amazon is able to optimize their order quantities and stock levels to a very accurate level. They are able to provide their local distribution centers with products that aren’t even sold yet, perfectly matching the demand in that particular region! This level of forecasting enables Amazon to optimize their delivery times, resulting in higher customer satisfaction and hence a major competitive edge.

But that’s not all. While you are ordering a product on Amazon’s website, products are shown that you might also like; these are called recommendations. These products are not taken randomly, but are selected especially for you. They are able to do this with their recommendation engine, analyzing customer’s buying behavior, matching this with their characteristics and previous product choices. Thanks to a good recommendation, a customer is able to find more products that he/she likes, increasing your cross-sell opportunities!

*Example of a recommendation at Amazon.com*

The expectation is that in the coming years the demand for data scientists will further increase, since businesses become increasingly digitalized and data driven. The data science landscape is evolving rapidly and new techniques are invented on a regular basis. Therefore, it is extremely important to stay up-to-date and continue learning. We see university programs in data science developing quickly and expect data scientists to have an even more prominent role in many industries in the nearby future.