An executive’s guide to machine learning

Iron Man

Iron Man

McKinsey recently published at excellent guide to Machine Learning for Executives. In this post I categorise the key points that stood out from the perspective of establishing machine learning in an organisation. The key take away for me was that without leadership from the C Suite, machine learning will be limited to being a small part of existing operational processes.

What does it take to get started?

Strategy

  • C-level executives will make best use of machine learning if it is part of a strategic vision.
  • Not taking a strategic view of machine learning risks its being buried inside routine operations. While it may be a useful service, its long-term value will be limited to “cookie cutter” applications like retaining customers.
  • C Suite should make a commitment to:
    • investigate all feasible alternatives
    • pursue the strategy wholeheartedly at the C-suite level
    • acquire expertise and knowledge in the C-suite to guide the strategy.

People

  • Companies need two types of people to leverage machine learning.
    • “Quants” are technical experts in machine learning
    • “Translators” bridge the disciplines of data, machine learning, and decision making.

Data

  • Avoid departments hoarding information and politicising access to it.
  • A frequent concern for the C-suite when it embarks on the prediction stage is the quality of the data. That concern often paralyzes executives. Adding new data sources may be of marginal benefit compared with what can be done with existing warehouses and databases.

Quick Wins

  • Start small—look for low-hanging fruit to demonstrate successes. This will boost grassroots support and ultimately determine whether an organization can apply machine learning effectively.
  • Be tough on yourself. Evaluate machine learning results in the light of clearly identified criteria for success.

What does the future hold?

  • People will have to direct and guide the machine learning algorithms as they attempt to achieve the objectives they are given.
  • No matter what fresh insights machine learning unearths, only human managers can decide the essential questions regarding the company’s business problems.
  • Just as with people, algorithms will need to be regularly evaluated and refined by experienced experts with domain expertise.

You can read more in the original article here. You can also read a more general guide to building data science capability here.

Data Scientists Need a Better Operating Model

A Data Science Operating Model

A Data Science Operating Model

Motivation

In my job, I interview many potential Data Scientists and Data Analysts. I have also managed people with a wide range of experience from interns to seasoned PhDs with degrees in fields including Computer Science, Chemistry, Physics, Mathematics, Engineering and the Humanities. Just last week I had several conversations with prospective Data Scientists who are early in their careers and wondering what projects they should try to get on, what technologies they should learn and what additional courses they should study.

In many cases, where Data Scientists struggle on projects has nothing to do with the technical complexity of problems or any lack of Data Science skills – they have all of that from their study and training and are quite motivated people who are passionate about their field.

In fact, what makes Data Science difficult for many is the complexity of operating in a Data Science project environment. Specifically, a Data Scientist has to operate in an environment that looks like the following.

  • Dynamics of data: data will change over the course of most projects. It will be refreshed, added to, replaced and repaired. Manual data sources are a common way of interfacing with other team members outside the Data Science team. Since much Data Science involves bringing together disparate data sources in novel ways, it is rare for all of this data to arrive at the same time and to schedule. So Data Scientists have to cope with trying to design and implement their work on top of a base of data that is always in flux.
  • Dynamics of requirements: Data Science is exploratory. You really don’t know what’s in the data until you have worked with it. Typically several algorithms and analyses have to be tried out. The insights from these activities often lead to the project taking a new direction and new analyses being framed for these new requirements.
  • Dynamics of people: it is rare to work in isolation. A Data Scientist will typically interact with IT, warehousing, developers, business SMEs, third party data providers and, of course, their team mates and their customer. This means that other people are providing inputs to their data, other people are writing code and creating data sets they depend on and other people are presenting results they may have contributed to. When other team members leave or take vacation, they may be expected to take over work.
  • Constraints on time and resources: despite the dynamics above, the Data Scientist will be expected to add value and deliver successfully in limited time and with limited resources. You don’t always get the ideal technology stack or one that you are familiar with. You don’t always get all the skill sets you need on a project. And you don’t always get all the data for a perfect analysis.

If a Data Scientist does not have methods for coping with these dynamics and constraints then they will struggle to perform. Ultimately, they will rarely see the more advanced analytics where they can really add value.

  • They become mired in forensics of their own work and their team’s work
  • Time is wasted investigation and explaining inconsistencies
  • Deliverables must be rewritten because the original cannot be reproduced or cannot be explained
  • A team descends into reactively producing analyses rather than leading the project from their data and their deliverables
  • Results are plain wrong because of the chaos that arises from project dynamics and constraints

A Data Science Operating Model with Guerrilla Analytics

Guerrilla Analytics and its 7 Principles provide a tried and tested operating model for Data Scientists. It has been used in many high pressure, dynamic and constrained project environments to deliver analyses that are reproducible, auditable and explainable.

This Guerrilla Analytics operating model breaks Data Science activities into the following components, highlighting the challenges faced in each component and offering guidelines on how to overcome these challenges.

  • Data Extraction: how data is extracted and transported by a team in a traceable manner
  • Data Receipt: how data should be received and logged by a team
  • Data Load: how to load multiple versions of data into an analytics environment without breaking data provenance
  • Coding: how data should be manipulated in ways that promote flexibility, testability, audit and agility. How to structure code and how to mix multiple tools and programming languages without being overwhelmed.
  • Work products and Reports: how to produce multiple versions of agile work products and project milestone reports so they can be tracked easily with a customer or fellow team members
  • Building consolidated analytics: how to identify and control consolidated understanding, business rules and data sets that emerge over the course of a project to promote efficiency and consistency and to avoid re-inventing the wheel
  • Testing: how to test analytics code and data sets in a fast paced environment
  • Workflows: simple workflows for peer review and quality control

Operating models may not produce beautiful visualizations or involve high end statistics and machine learning. However they do allow Data Scientists to hit the ground running. They provide Data Scientists with the tools they need to survive real world project environments. This is turn improves the Data Scientist’s coordination with team members, their efficiency, their credibility, and ultimately increases the opportunities to add value.

We expect methodology from traditional laboratory scientists. Let’s expect the same from Data Scientists.