Programming language version 3.2. SQL, NoSQL, NewSQL. It seems that too often, the path to become a Data Scientist involves skills in vogue rather than more permanent competencies. In a fast paced field like Data Science, skills are more tangible. They can be directly tested. They can be dated to the latest technology or the latest language version.
Competencies are different. Competencies are a more general combination of skills, behaviours and knowledge.
You can have great Powerpoint skills creating beautiful slides but still be a terrible communicator. You can be skilled at Python syntax but still be a poor programmer. Communication and programming are competencies. It is competencies that are most important when you build a Data Science career that is robust to changing trends in skills like languages and technology platforms.
Here are some key competencies and example skills for successful Data Science.
- Communication: data and data science are complex. You need to be able to really listen and understand the problem a customer wants solved. You need to be able to communicate your solution at the right level so your customer can take action.
Typical skills include report writing, presentation, speaking and story telling.
- Data Modelling: you will encounter data from a variety of systems. You will need to organise your project data for flexible efficient use over the course of a project. This means being able to model data.
Typical skills include database design, SQL, normal forms, table design, indexing.
- Data wrangling: you will need to reshape data so it can be visualized, profiled and made ready for algorithms.
Typical skills include data manipulation libraries like pandas, languages like Python or R and visualization libraries.
- Programming and Tuning Algorithms: ultimately, you will produce an algorithm that captures your data science insights. Algorithms need to be tuned to data and their performance robustness quantified. There will be scenarios where the algorithm works well and scenarios where it does not. There will be efficient structures and structures that give the same results but with dramatically reduced efficiency.
Typical skills include a programming language, data structures, code testing, version control, complexity.
- Data pipelining: once data is fairly well understood and candidate algorithms have been identified, you will begin iterating through data and algorithms to get to your final insights. This iteration is most effective when data can be torn down and rebuilt repeatably and quickly. This is a data build.
Typical skills include SQL, pipeline management libraries, ETL design.
- Design of Experiments: Data Science is about applying the scientific method to understand data. Being able to design and execute an experiment is essential to being able to test an algorithm or model in the wild and demonstrate cause rather than correlation.
Typical skills include experiment layout, randomisation and blocking, statistical inference, and hypothesis testing.
- Consulting: consulting is about influencing without power. Significant numbers of data science projects fail because data scientists could not convince their customers to change their business and use their data science. It may seem bizarre but like all aspects of change, data science has an impact on people, existing processes and existing technology.
Typical skills include meeting and workshop facilitation, stakeholder management and mapping and influencing.
- Project Management: last but not least, if you cannot run a project (even as the sole data scientist on a project) then all the above is irrelevant. Project management is a hugely diverse and complex field. However there are some key skills that will help your data science succeed.
Typical skills include estimation, planning, budgeting, resourcing, and RAID management.