People often struggle distinguishing Data Analytics from Data Science. These are two related but completely distinct disciplines which are both important to a business. This post distinguishes Data Analytics from Data Science and lists the implications of that distinction for your organisation.
Forget about data for a minute
Think about a traditional scientist and what they do. You wouldn’t define them by their use of a petri dish, or a microscope, or any other tool. They are defined by following the scientific method. They aim to understand the world by producing mathematical models of the world from observed data and collecting further data to validate those models. Think about a nuclear physicist. They use mathematics and computer simulation to model the behaviour of subatomic particles. If these models are good, they predict the behaviour of those subatomic particles well in all general cases. For a model to be good, scientists must be able to reproduce it and it must allow us to reason about the real world. Amazing science has been done for centuries with pen and paper and simple apparatus but always by following the scientific method.
Distinguishing Data Analytics from Data Science
Data Science is a science like any other. It is irrelevant what apparatus is used so stop defining Data Science by Machine Learning, Venn Diagrams or programming languages. A Data Scientist models a business (customers, products, processes, web sites, stores, machinery) by gathering suitable data and evaluating models. If the Data Science is successful, the models will generalise well and allow a business to make predictions and optimisations about their customers, products, processes etc. Data Scientists are effectively creating data generating processes (experiments and models) to test hypotheses.
Data Analysts look at existing data to report patterns, summaries, populations, trends etc. Yes, Data Scientists look at existing data before creating an experiment. Yes, they should do Analytics on the outputs of their experiments to understand what is going on. But fundamentally, Analytics is tactical Business Intelligence. Data Science is, well, science!
Implications for your organisation
Implication 1: Analytics needs complete data, Science does not
I often get some strange looks when data engineers and architects promise Data Science ‘all the data’ and I say they don’t need it. Data Scientists need samples of data. Yes, those samples need to be of sufficient size to build good generalisable models. Yes, the data should not be biased or biases should be clear. But using ‘all the data’ is probably a bad thing when building models. Analytics, being a type of tactical reporting, needs complete data because it is generating business KPIs. A profit KPI plus or minus 10% isn’t very helpful. A statistical definition of a customer won’t help you scale your website – you need to know hits and logins.
Implication 2: Mature Analytics becomes reporting, mature Data Science becomes algorithms
If Analytics starts to identify common requests, new KPIs of interest, common sub-populations of interest to the business etc then these should be productionised in Reporting. There is no point having a team of Analytics repeatedly writing the same queries. Put them in a dashboard so the business can self-serve.
Data Science, by contrast, is creating those data consuming and data generating models. If models need to be available for decision making then those models should be productionised in algorithms.
Implication 3: Data Science results can be used by Analytics but Analytics results should rarely feed into Data Science
A business will want to report on the decisions made by the models productionised as algorithms. It therefore makes sense that tactical queries from Analytics could be run against algorithm output data.
Analytics, however, does not typically produce results that feed into Data Science work. Analytics can help inform the Data Scientist about the domain. After all, Analytics know the typical tactical reports and the typical KPIs the business uses. However, when it comes to model variables, the Data Scientist needs to figure that out for themselves and evaluate those variables as part of their experimental process.
Implication 4: Analytics has fewer dependencies than Data Science
Analytics can be run tactically off a data warehouse with few other dependencies beyond the right tools. Sure, without a reporting function to productionise Analytics, an organisation will never consolidate its tactical queries and will be forever in a state of panicked queries. But the Analytics will still get done and the business will get their information.
If Data Science models are not turned into engineered algorithms then an organisation is simply wasting its time and money on curiosities. Data Science benefits from the same warehouse of data as Analytics. But it also needs a mechanism to bring in other new data sources. It needs an engineering team to turn its models into algorithms. An engineering team needs a testing and support team to make sure things keep working. An any change in automation and decision making needs change management to make everybody comfortable with the work the algorithm will do.
Implication 5: Data Analytics to Data Science is a big career leap
The hype around Data Science has unhelpfully led to analytics professionals rebranding themselves. Awesome Analytics professionals are customer facing, know an organisation’s data inside out, can wrangle and manipulate data quickly and produce relevant and accurate business KPIs with little business steer. They tell a compelling story with visualizations. Their code will never be productionised. None of this helps them be awesome Data Scientists.
A significant part of a scientist’s work involves distilling a business objective into hypotheses. Experiments need to be designed to choose the right model and evaluate the robustness of that model. Experiments need to be assessed for significance, bias, confounding, blocking, correlations etc. And when a good model is found, a knowledge of software engineering for productionisation is required. What is the computational complexity? What data should be logged for that scientific reproducibility? What data needs to be filtered out to avoid model degradation and biased results? These are questions an Analytics professional shouldn’t need to ask.
Distinguishing Data Analytics from Data Science is not a competition. Both functions are clearly important for an organisation. You need the capability to mine your data for patterns and summaries that are not yet available in reporting. You need the capability to rigourously create models that can help automate decision making. Data Science may currently be more hyped than Analytics but perhaps a reckoning is coming. Models and even the model tuning process are becoming increasingly commoditised. Organisations will hopefully see sense and stop rewarding Data Scientists simply for knowing the APIs of a concoction of evolving programming libraries and will instead focus on the production of models that are understood and that they can have confidence in. That will only come from those Data Scientists who understand the scientific method and how to apply it.