Wouldn’t it be better if you could clearly state what you do as a Data Scientist? You probably agree your work life would be easier if your colleagues and customers could understand what you do.
- A biologist wouldn’t say they are a biologist because they work with petri dishes as opposed to experiments to understand life. However some Data Science definitions focus on use of tools like Hadoop.
- A physicist wouldn’t say they are a physicist because they run simulations of their models as opposed to understanding matter. However some Data Science definitions focus on activities like modelling, data cleaning and visualizations.
- All these sciences use statistics to design their experiments and test their hypotheses. Yet some Data Science definitions focus on overlaps of statistics with computer science and unicorns.
A Definition of Data Science
The secret to defining data science is to focus on the science. Here is a simple definition of Data Science:
Data Science is the application of the scientific method to find opportunities and efficiencies in business data
There are a few things to note about this definition:
- it’s technology agnostic. It’s not about Big Data, Hadoop or whatever the next technology breakthrough might be.
- it’s applied to finding opportunities and efficiencies in data. It’s not the study of data – that’s statistics.
- it’s not about activities that may be part of the lifecycle of working with data.
- it’s applied to the data that describes a business’s processes, just like the data a natural scientist collects to understand a natural process
- most importantly, it uses the scientific method, “systematic observation, measurement, and experiment, and the formulation, testing, and modification of hypotheses” .
The application of the scientific method is central to data science and something I want to come back to in a more detailed post.