Vague mixes of skill sets. A focus on activities and technology. Bizarre Venn diagrams. It seems there is huge confusion over what Data Science is. Is it Big Data? Isn’t it statistics? Is it something else entirely? This confusion causes untold problems. It leads to vendor and recruiter hype. It leads to inflated career expectations from those who work with data. It leads to rebranding of solid, established and much-needed fields like Analytic, Business Intelligence and Statistics.
Wouldn’t it be better if you could clearly state what you do as a Data Scientist? You probably agree your work life would be easier if your colleagues and customers could understand what you do.
- A biologist wouldn’t say they are a biologist because they work with petri dishes as opposed to experiments to understand life. However some Data Science definitions focus on use of tools like Hadoop.
- A physicist wouldn’t say they are a physicist because they run simulations of their models as opposed to understanding matter. However some Data Science definitions focus on activities like modelling, data cleaning and visualizations.
- All these sciences use statistics to design their experiments and test their hypotheses. Yet some Data Science definitions focus on overlaps of statistics with computer science and unicorns.
[su_spacer]A Definition of Data Science
The secret to defining data science is to focus on the science. Here is a simple definition of Data Science:
Data Science is the application of the scientific method to find opportunities and efficiencies in business data
There are a few things to note about this definition:
- it’s technology agnostic. It’s not about Big Data, Hadoop or whatever the next technology breakthrough might be.
- it’s applied to finding opportunities and efficiencies in data. It’s not the study of data – that’s statistics.
- it’s not about activities that may be part of the lifecycle of working with data.
- most importantly, it uses the scientific method, “systematic observation, measurement, and experiment, and the formulation, testing, and modification of hypotheses” .
The application of the scientific method is central to data science and something I want to come back to in a more detailed post.