How to identify and scope a good machine learning problem
In today’s blog, we’re going to unpack the first core step to solving a business problem with machine learning. Contrary to popular belief, ML is not magic, which is to say it’s not well suited to all problems. Here are the top things to look for to ensure your foray into ML is successful.
Implementing new technologies isn’t cheap, so pick a problem worth solving. Given how enterprises measure success, strong ROI is a plus, so is organizational buy-in. You can and should scale down the problem but not so much that it loses relevance. If people don’t care about the project, it won’t get the attention it needs, and the risk of failure will drastically increase. If this happens, there’s a good chance leadership will become disillusioned with the new technology.
Gut check: Could I put together a business case for this that my CFO would say yes to? Will my leadership team care about the performance of this project? The answer should be a yes to both.
There should be a clear, measurable goal that can be quantified by a dominant metric. ML algorithms require a measurable outcome to optimize against. If you’re trying to predict if someone will buy one of your products, it’s best to start with a dataset that directly links predictive data (like clickstream behavior) to outcomes data (like sales transactions). There’s always a risk of falling prey to proxies, as Jeff Bezos outlines in the “Resist Proxies” section of his 2016 letter to shareholders. While you’ll sometimes need to default to a proxy, this shouldn’t be your first instinct. Also, if two people can’t agree on what the outcome is, there’s no way a machine can encode it as a function!
Gut check: Can I define the objective of the algorithm with a quantifiable metric? The answer should be yes and should ideally NOT be a proxy for something else.
There’s a temptation to believe that all computing will be machine learning in the future. But there are many applications that are best captured by rules. ML is impactful when the complexity of a task is too much to be encoded by a set of hand-coded rules.
Gut check: Could I solve this problem by creating a set of rules (if…then…) that a computer automatically executes? If the answer is yes, you probably don’t need ML and can start with a simple rules-based engine instead.
Data with good signal
The more historical data you have, the more the algorithm has to learn from on day one.
For example, at integrate.ai, we’re helping a client leverage data from a cross-industry partnership to convert leads into customers. Our platform recommends which leads to focus their sales spend on to deliver the best ROI. Our algorithms learn from past conversion data: what people were like (behaviourally, psychographically, etc.) and what actions the business took to convert them. Often, partnering with a third-party source provides key insights to make ML practical, even for large enterprises with vast historical data!
Gut check: Do we (or does someone) collect the kind of information that we need in order to answer this question and/or can we partner to get the kind of data that we need? The answer should be yes.
The impact of an error
ML looks at the world probabilistically (“we are 80% sure that this person is going to love product x”), not deterministically (“this customer will love product x”). Accordingly, it’s best to start with problems and opportunities for which people will be more comfortable accepting probabilistic insights because the impact of an error is low (product recommendation) than ones where the impact of error is significantly higher (cancer detection).
I’m not saying that the latter is not a candidate for ML (there are projects in progress that aim to use ML to better detect cancer, like this one), but when you’re introducing new technologies into your organization, it’s easier to get people comfortable with opportunities where the impact of an error is lower.
Gut check: If the model makes an incorrect prediction, what is the worst thing that can happen? The answer shouldn’t be that the impact could be far worse than what you are doing today.eedback loop
ML is powerful because it learns continuously. To do so, outcome data needs to be continuously fed back into the model.
For example, we are helping a few of our clients predict which customers are most likely to convert to a new product so they can focus efforts on customers that need an extra nudge. We calculate a likelihood-to-convert score for every incoming customer and get feedback on whether those customers converted. By using that feedback, our algorithm’s predictions improve over time (e.g., I thought that this customer was highly likely to convert but they didn’t so next time that I see a customer who looks like this, I will give them a lower prediction score).
Gut check: Do I reliably collect data on the outcome that I am looking to drive? If so, how timely am I in doing so? The answer should be yes! And, I get feedback in real time or at least monthly.
So, there you have it. Also note the multidisciplinary nature of all the steps above. We need all kinds of people with all kinds of perspectives, talents and skills to bring to bear the potential of ML—not just people who we have traditionally labeled “technologists”.
If you found this helpful (or not), please let us know! The next post in the series will focus on step two: How to define and scope your opportunity.