Irish Language Data Science lecture at Engineers Ireland

I may be the first person to coin Data Science in Gaeilge!

I gave the following lecture to Engineers Ireland which is the Irish professional body for Engineers. The lecture is about “Data Science and the benefits for engineering” and is entirely in Irish.

It was an interesting exercise to brush up on my Gaelic and also to see the wealth of resources that now exist for using Irish with modern technical vocabulary. If you are curious or are trying to get your Gaelic up to scratch then please get in touch!

In terms of content, it covers what data and data science look like and how traditional engineering problems might benefit from the application of data science.

The full video is linked below.

And here are the slides 2016-04 Engineering Ireland_04_Gaeilge.

Guerrilla Analytics: 7 Principles for Agile Analytics at PAW London 2015

I was invited to speak at Predictive Analytics World 2015 in London on October 28th 2015.

My talk covered how the 7 Guerrilla Analytics Principles are the foundation for doing Agile Data Science. With a Data Science Operating Model that follows these principles, your team always know where their data came from, who changed it and why and can explain any of the highly iterative explorations and analyses their customers require.

You can find the slides below and at Slideshare. As always, feedback and questions are welcome. Enjoy!

 

Data Science Patterns: Preparing Data for Agile Data Science

I recently gave a webinar on Data Science Patterns. The slides are here.

Data Science Patterns, as with Software Engineering Patterns, are ‘common solutions to recurring problems’. I was inspired to put this webinar together based on a few things.

  • I build Data Science teams. Repeatedly, you find teams working inconsistently in terms of the data preparation approaches, structures and conventions they use. Patterns help resolve this problem. Without patterns, you end up with code maintenance challenges, difficulty in supporting junior team members and all round team inefficiency due to having a completely ad-hoc approach to data preparation.
  • I read a recent paper ‘Tidy Data’ by Hadley Wickham in the Journal of Statistical Software http://vita.had.co.nz/papers/tidy-data.pdf. This paper gives an excellent clear description of what ‘tidy data’ is – the data format used by most Data Science algorithms and visualizations. While there isn’t anything new here if you have a computer science background, Wickham’s paper is an easy read and has some really clear worked examples.
  • My book, Guerrilla Analytics (here for USA or here for UK), has an entire appendix on data manipulation patterns and I wanted to share some of that thinking with the Data Science community.

I hope you enjoy the webinar and find it useful. You can hear the recording here. Do get in touch with your thoughts and comments as I think Data Science patterns is a huge area of potential improvement in the maturity of Data Science as a field.

Guerrilla Analytics: Tactics for Coping with Data Science Reality

man-65049_1920

Here are the slides from a talk I gave today to the Information Technology Department at the National University of Ireland, Galway. Thanks to Michael Madden for the opportunity to speak.

The talk was about how Guerrilla Analytics principles and practice tips help you do Data Science in circumstances that are very dynamic, constrained and yet required traceability of what you do.

There were plenty of questions afterwards which is always encouraging. I’ll try to address these questions in subsequent blog posts so please do follow me @enda_ridge for all the latest posts.

Here are some of the questions from today.

  • what are the key skills to focus on if you want to work in data analytics / data science?
  • is programming ability a pre-requisite for doing data science? This question came up before at Newcastle University.
  • do the guerrilla analytics principles map to research projects?
  • do the guerrilla analytics principles map to ‘big data’ projects?

Since NUI Galway is a bi-lingual university, you can find my broken Gaelic version below!

As Gaeilge

Seo h-iad na sleamhnáin ó léacht a bhí agam inniú sa Roinn Teicneolaíocht Fáisnéise in Ollscoil na h-Éireann, Gaillimh. Buíochas le Michael Madden as an deis labhairt.

Bhain an léacht le cén chaoi is féidir leis na  prionsabail agus noda Guerrilla Analytics cabhair leat agus tú ag déanamh Data Science i ndálaí atá dinimic, srianta ach fós tá sé riachtanach go bhfuil inrianaitheacht ann.

Bhí mórán ceisteanna tar éis an léacht agus is maith an rud é. Freagróidh mé iad i mblag eile agus bígí cinnte mé a leanacht ag @enda_ridge don scéal is déanaí.

Seo h-iad roinnt de na ceisteanna.

  • céard iad na scilleanna is tábhachtaí agus tú ag iarraidh obair mar data scientist?
  • an gá duit bheith in ann ríomhchlárú le h-aghaidh obair mar data scientist?
  • an bhfuil baint ann idir na prionsabail agus tionscadail taighde?
  • an bhfuil baint ann idir na prionsabail agus ‘Big Data’?

Building Guerrilla Analytics Teams

I recently had the opportunity to present a webinar on ‘Building Guerrilla Analytics Teams’ as part of the BrightTalk ‘Business Intelligence and Analytics’ series. You can access the full recorded webinar and slides here and the slides are embedded below.

Some really interesting questions came up at the end of the session. I’ve listed them here and will pick them up in subsequent blog posts.

  • How do you build a business case to resource and set up a data science team?
  • What is the number one tip for someone putting together a completely new data science team?
  • What role is most important when setting up a data science team?
  • What are the typical challenges faced when setting up a Guerrilla Analytics team?

You can learn more about building a Guerrilla Analytics capability in my book Guerrilla Analytics: A Practical Approach to Working with Data which has chapters devoted to getting the right people in place, giving them the right technology and controlling everything with a minimal lightweight process.

Introduction to Guerrilla Analytics at Newcastle University

I was recently invited to give a talk introducing Guerrilla Analytics and the principles described in the book. The talk covers some examples of how these principles are applied. It concludes by identifying some key research and development areas for doing this type of analytics in real-world projects.

 
This was a great opportunity to engage with a cross-disciplinary audience including computer scientists, computational biologists and engineers and to have a sounding board for some of the key research and development areas I think need to be addressed to enable practical data science work.
A key take-away for me was the gap between the advanced data science being studied in academia and the lack of simple, practical methodologies that hold back the implementation of this research.

Big Data Debate: The Controversial Questions at Google campus

I was recently invited to take part on the panel at the Big Data Debate (@bigdatadebate) at Google’s campus near Old Street in London1.

Big Data Debate 2

It was a great opportunity to meet like minded folks such as Christian Prokopp @prokopp Rangespan, Paul Bradshaw @paulbradshaw, Duncan Ross @duncan3ross Teradata, Daniel Hulme Satalia, Michael Cutler @cotdp TUMRA, Andy Piper @andypiper Pivotal and Will Scott Moncrieff  from DueDil. Overall it was an interesting debate with some interesting contributions from the panel and the packed house.

Big Data Debate 3

We spent perhaps half of the panel hour and most of the audience questions on data privacy. I guess this is revealing in itself if such concerns are at the forefronts of the public’s mind as opposed to the opportunities presented by data analytics.

Christian did start one controversial question with me. Paraphrasing, it was around the dangers that arise when we have the potential to mine vast quantities of data looking for patterns. My answer, as it has been since my PhD days is that this is simply poor methodology *whatever* the volumes of data you are analysing. A data science methodology should allow us to answer questions (test hypotheses) about a problem (as described by data) while reducing bias as far as possible. Think about that. If you go trawling for an effect that you expect to exist in data you will eventually find it. Instead, your approach should be:

  • understand the problem (talk to the business, formulate a scientific theory)
  • turn the problem into hypotheses (our campaign increased sales, a fraudulent user has a log pattern that is different from his peers etc)
  • decide what effect is practically significant
  • then you go and apply an appropriate statistical test with the correct sample size, and power. You check the test’s assumptions.
  • when you don’t find what you were looking for, you can’t keep changing your effect sizes and revisiting the data! That’s cherry picking or a confirmation bias.

So I suppose the answer to Christian’s question is a ‘yes’ but it has nothing to do with ‘Big Data’. Big Data is dangerous because new tools and hype can lead to folks forgetting that garbage in results in garbage out. You have to understand the data and the rigorous analysis you are applying – just like any scientist.

Here are some recent good reads:

[1] I am employed by KPMG, one of the event sponsors