I was invited to speak at Predictive Analytics World 2015 in London on October 28th 2015.
My talk covered how the 7 Guerrilla Analytics Principles are the foundation for doing Agile Data Science. With a Data Science Operating Model that follows these principles, your team always know where their data came from, who changed it and why and can explain any of the highly iterative explorations and analyses their customers require.
You can find the slides below and at Slideshare. As always, feedback and questions are welcome. Enjoy!
Are you a data scientist working on a project with constantly changing requirements, flawed changing data and other disruptions? Guerrilla Analytics can help.
The key to a high performing Guerrilla Analytics team is its ability to recognise common data preparation patterns and quickly implement them in flexible, defensive data sets.
After this webinar, you’ll be able to get your team off the ground fast and begin demonstrating value to your stakeholders.
You will learn about:
* Guerrilla Analytics: a brief introduction to what it is and why you need it for your agile data science ambitions
* Data Science Patterns: what they are and how they enable agile data science
* Case study: a walk through of some common patterns in use inreal projects
I recently gave a webinar on Data Science Patterns. The slides are here.
Data Science Patterns, as with Software Engineering Patterns, are ‘common solutions to recurring problems’. I was inspired to put this webinar together based on a few things.
- I build Data Science teams. Repeatedly, you find teams working inconsistently in terms of the data preparation approaches, structures and conventions they use. Patterns help resolve this problem. Without patterns, you end up with code maintenance challenges, difficulty in supporting junior team members and all round team inefficiency due to having a completely ad-hoc approach to data preparation.
- I read a recent paper ‘Tidy Data’ by Hadley Wickham in the Journal of Statistical Software http://vita.had.co.nz/papers/tidy-data.pdf. This paper gives an excellent clear description of what ‘tidy data’ is – the data format used by most Data Science algorithms and visualizations. While there isn’t anything new here if you have a computer science background, Wickham’s paper is an easy read and has some really clear worked examples.
- My book, Guerrilla Analytics (here for USA or here for UK), has an entire appendix on data manipulation patterns and I wanted to share some of that thinking with the Data Science community.
I hope you enjoy the webinar and find it useful. You can hear the recording here. Do get in touch with your thoughts and comments as I think Data Science patterns is a huge area of potential improvement in the maturity of Data Science as a field.