You also might be interested in
In a Guerrilla Analytics environment, available tooling is often limited. There is either not enough budget, time or IT flexibility to get all the tools you want. On many jobs, I find myself using Microsoft SQL Server as the project RDBMS. Out of the box, SQL Server does not yet have a fuzzy match capability. You need to install additional tools such as SSIS to avail of fuzzy matching. Even then, SSIS is a GUI-driven application which contradicts a key Guerrilla Analytics Principle. In a Guerrilla Analytics environment, you would much rather have fuzzy match capabilities available in SQL code. This is where the following Similarity library comes in handy.
@edwardacurry and I did a talk at Enterprise Data World[...]
The danger of bias hasn't been given enough consideration in Data Science. Bias is anything that would cause us to skew our conclusions and not treat results and evidence objectively. Bias is sometimes unavoidable, sometimes accidental and unfortunately sometimes deliberate. While bias is well recognised as a danger in mainstream science, I think Data Science could benefit from improving in this area. In this post I categorise the types of bias encountered in typical Data Science work. I have gathered these from recent blog posts , ,  and a discussion in my PhD thesis . I also show how to reduce bias using some of the principles you can learn about in Guerrilla Analytics: A Practical Approach to Working with Data.