Reading List

Reading materials

This page is a categorised reading list of useful books for Data Scientists. I’ve read these and recommend them for the challenges a Data Science team will face when operating in dynamic Guerrilla Analytics project environments. If you would like to recommend an addition, please get in touch.


The Basics

Pro Git (Expert’s Voice in Software Development)

’s a fact of Guerrilla Analytics life. A significant amount of project chaos will disappear if you have some form of version control.

You don’t need to become an enterprise class dev ops practitioner. You do need to know about versioning, tagging, reverting and other common version control activities. This is the go-to Git reference. Everything you need to know about Git and written from a Git perspective.

Data Science at the Command Line: Facing the Future with Time-Tested Tools

To be a true Guerrilla Analyst, you need to be comfortable at the command line. It’s the only way to quickly peek at, summarise, clean and join up the wide variety of data files that you are likely to encounter. It’s also the best way to automate your work for efficiency and reproducibility.

This book will teach you all the tools and tricks you need to get around the most awkward and broken data files that come your way. You’ll learn about chunking files, patching them together, sorting, editing and modifying in ways you probably thought possible only in ‘real’ analytics environment.

Data Smart: Using Data Science to Transform Information into Insight

A great introductory book written in a fun and entertaining style and based around analytics done in spreadsheets. Spreadsheets mean trouble for the Guerrilla Analyst but from a beginner’s perspective they are a familiar way to dip a toe in the water.

Sometimes a spreadsheet is the quickest way to get a feel for your data and this book might open your eyes to how much is possible in ubiquitous desktop software.

Bad Data Handbook

If you are going to work with data then you really need to understand the many ways it can be flawed. This book is a fun and comprehensive treatment of the flaws to expect and how to detect them in a huge variety of data types. I especially liked the chapter ‘Data Quality Demystified’ which was the foundation for the categorisation of data tests in Guerrilla Analytics: A Practical Approach to Working with Data. You may not have time to implement everything in this book but it never hurts to be aware of problems lurking in your data and what may be causing those strange and unexpected numbers in your report.


Intermediate

Git Pocket Guide

Now that you have your Git reference book, you could probably use this shorter pocket guide for most of your day-to-day work.

Machine Learning for Hackers

This book is a really well written and structured introduction to the main machine learning techniques. Every technique is supported by real coded examples on real datasets.

Read this book to whet your appetite for all things machine learning.

Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython

Sometimes SQL just isn’t enough. SQL is great for heavy lifting data preparation but certain data transformations are difficult in plain old SQL and its ability to summarise data is limited. This book is all about pandas, a Python library for data manipulation, plotting and basic data analysis.
The book is a comprehensive guide to the pandas library and will get you through the most awkward data manipulations you are likely to encounter.
Intermediate knowledge of Python is required.

Guerrilla Analytics: A Practical Approach to Working with Data

Some shameless self-promotion by yours truly. Now that you have the basics and some of the intermediates down, you’re ready for some Guerrilla Analytics.

Learn how to organise your projects (data, code, deliverables, testing, processes and team) so you are robust to all the disruptions of high pressure Data Science projects.

Programming Collective Intelligence: Building Smart Web 2.0 Applications

This book is a fun, well written and comprehensive introduction to a wide range of common machine learning algorithms. The author takes you through building up each algorithm step by step and sets the context for why the algorithm does what it does. Intermediate knowledge of Python and programming is required to get the most benefit from this book. For a bonus challenge, work through the exercises using your pandas knowledge from Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython!


Advanced

Natural Language Processing with Python

So-called unstructured data is where a significant amount of insight lies. But how do you get at it?
This book is a tour de force in natural language processing using the NLTK Python library. Not for the faint hearted but very well written and comprehensive. Read this book and you have a powerful weapon in your Guerrilla arsenal


Miscellaneous

Flawless Consulting: A Guide to Getting Your Expertise Used

Even the best data science will fail if its benefits cannot be communicated and understood. Regardless of whether your job title is consultant or not, we are all consultants to the extent that we wish to influence others and have our opinions and ideas accepted. Peter Block’s book gives an amazing guide to consulting including how to work with peers, difficult clients and others. The book emphasises the idea that success is based on authentic behaviours that establish trust and great working relationships. This is the consultant’s bible but everybody can learn something from it.