Distinguishing Data Analytics from Data Science. Implications for your organisation

People often struggle distinguishing Data Analytics from Data Science. These are two related but distinct disciplines which are both important to a business. This post distinguishes Data Analytics from Data Science and lists the implications of that distinction for your organisation.

Forget about data for a minute

Think about a traditional scientist and what they do. You wouldn’t define them by their use of a petri dish, or a microscope, or any other tool. They are defined by following the scientific method. They aim to understand the world by producing mathematical models of the world from observed data and collecting further data to validate those models. Think about a nuclear physicist. They use mathematics and computer simulation to model the behaviour of subatomic particles. If these models are good, they predict the behaviour of those subatomic particles well in all general cases. For a model to be good, scientists must be able to reproduce it and it must allow us to reason about the real world. Amazing science has been done for centuries with pen and paper and simple apparatus but always by following the scientific method.

Distinguishing Data Analytics from Data Science

Data Science is a science like any other. It is irrelevant what apparatus is used so stop defining Data Science by Machine Learning, Venn Diagrams or programming languages. A Data Scientist models a business (customers, products, processes, web sites, stores, machinery) by gathering suitable data and evaluating models. If the Data Science is successful, the models will generalise well and allow a business to make predictions and optimisations about their customers, products, processes etc. Data Scientists are effectively creating data generating processes (experiments and models) to test hypotheses.

Data Analysts look at existing data to report patterns, summaries, populations, trends etc. Yes, Data Scientists look at existing data before creating an experiment. Yes, they should do Analytics on the outputs of their experiments to understand what is going on. But fundamentally, Analytics is tactical Business Intelligence. Data Science is, well, science!

Implications for your organisation

Implication 1: Mature Analytics becomes reporting, mature Data Science becomes algorithms

If Analytics starts to identify common requests, new KPIs of interest, common sub-populations of interest to the business etc then these should be productionised in Reporting. There is no point having a team of Analytics repeatedly writing the same queries. Put them in a dashboard so the business can self-serve.

Data Science, by contrast, is creating those data consuming and data generating models. If models need to be available for decision making then those models should be productionised in algorithms.

Implication 2: Analytics has fewer dependencies than Data Science

Analytics can be run tactically off a data warehouse with few other dependencies beyond the right tools. Sure, without a reporting function to productionise Analytics, an organisation will never consolidate its tactical queries and will be forever in a state of panicked queries. But the Analytics will still get done and the business will get their information.

If Data Science models are not turned into engineered algorithms then an organisation is simply wasting its time and money on curiosities. Data Science benefits from the same warehouse of data as Analytics. But it also needs a mechanism to bring in other new data sources. It needs an engineering team to turn its models into algorithms. An engineering team needs a testing and support team to make sure things keep working. An any change in automation and decision making needs change management to make everybody comfortable with the work the algorithm will do.

Implication 3: Data Analytics to Data Science is a big career leap

The hype around Data Science has unhelpfully led to analytics professionals rebranding themselves. Awesome Analytics professionals are customer facing, know an organisation’s data inside out, can wrangle and manipulate data quickly and produce relevant and accurate business KPIs with little business steer. They tell a compelling story with visualizations. Their code will never be productionised. None of this helps them be awesome Data Scientists.

A significant part of a scientist’s work involves distilling a business objective into hypotheses. Experiments need to be designed to choose the right model and evaluate the robustness of that model. Experiments need to be assessed for significance, bias, confounding, blocking, correlations etc. And when a good model is found, a knowledge of software engineering for productionisation is required. What is the computational complexity? What data should be logged for that scientific reproducibility? What data needs to be filtered out to avoid model degradation and biased results? These are questions an Analytics professional shouldn’t need to ask.

Taking heart

Distinguishing Data Analytics from Data Science is not a competition. Both functions are clearly important for an organisation. You need the capability to mine your data for patterns and summaries that are not yet available in reporting. You need the capability to rigourously create models that can help automate decision making. Data Science may currently be more hyped than Analytics but perhaps a reckoning is coming. Models and even the model tuning process are becoming increasingly commoditised. Organisations will hopefully see sense and stop rewarding Data Scientists simply for knowing the APIs of a concoction of evolving programming libraries and will instead focus on the production of models that are understood and that they can have confidence in. That will only come from those Data Scientists who understand the scientific method and how to apply it.

One thought on “Distinguishing Data Analytics from Data Science. Implications for your organisation”

  1. Thanks Enda – certainly food for thought. I initially found myself reading and nodding away…then I reread a few times and found I disagreed with all the implications. Guess this isn’t so black/white for me.

    Tactical analytics queries ideally use the whole of the data, but typically end up making use of masks for missing data and assumptions. Agree that there are mature endpoints (Reporting and Algorithms to support decision making) but both require the pipeline for data as well as an engineering team to set up and support ongoing production. Both also requires change management. And most of the algorithms are presented in a self-serve reporting format.

    I guess that analytics has fewer dependencies because it doesn’t take into account the mature solution (that’s left to the reporting mechanism?)

    Judgement is normally required to include/exclude the drivers of an algorithm (prior to predictive power/algorithmic dimension reduction techniques), and analytics outputs may serve as pointers on where to look.
    Practically all analytics I’ve come across involve translating a business hypothesis to an analytical model of the business, reporting the situation, making a model to forecast the future and creating tests against these models. The complexity of the forecasting model can range from a simple average uplift from last year, to ARIMA modelling, to RNN/LSTM or some other machine learning classifier.

    For me, the differences lie in time allowed to create a solution and how explainable the method is to the layperson.

    As you say, analytics is tactical and commonly provides an output of business recommendations. Analytics models tends to err towards fast-to-create and easily explainable at the cost of model accuracy. Data science units are allowed more time to test and develop minimal viable algorithms/products that use more accurate but also more complex models at the expense of explain-ability and time.

    Leaving aside the end-to-end knowledge of software engineering etc (that I believe sits within Data science and reporting), the model development aspect seems similar and I believe the transition from analytics to data science lies in learning the statistics behind the more complex techniques.


Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s