Data Science Patterns: Preparing Data for Agile Data Science
Data Science Patterns, as with Software Engineering Patterns, are ‘common solutions to recurring problems’. They are a huge area of potential improvement in the maturity of Data Science as a field. It is only with the recognition and automation of common patterns that Data Science can begin focusing on value add activities instead of re-inventing the data wrangling wheel.
I was inspired to put this webinar together based on a few things.
- I build Data Science teams. Repeatedly, you find teams working inconsistently in terms of the data preparation approaches, structures and conventions they use. Patterns help resolve this problem. Without patterns, you end up with code maintenance challenges, difficulty in supporting junior team members and all round team inefficiency due to having a completely ad-hoc approach to data preparation.
- I read a recent paper ‘Tidy Data’ by Hadley Wickham in the Journal of Statistical Software http://vita.had.co.nz/papers/tidy-data.pdf. This paper gives an excellent clear description of what ‘tidy data’ is – the data format used by most Data Science algorithms and visualizations. While there isn’t anything new here if you have a computer science background, Wickham’s paper is an easy read and has some really clear worked examples.
- My book, Guerrilla Analytics (here for USA or here for UK), has an entire appendix on data manipulation patterns and I wanted to share some of that thinking with the Data Science community.
Do get in touch with your thoughts and comments.