When to Use Machine Learning - Does Your App Really Need ML?
Machine learning applications can bring you more clients, increase sales and reduce business costs. However, if not used properly, they may lead to customer outflow, money loss and reputation damage.
When should you use ML in your business and when is it better to stick with traditional computing methods?
To answer this question let's meet Tay. She is about 19, lives in the US, likes music and chatting on Twitter. She seems to be a nice American teenager. Well… she is a Holocaust denier, calls president Obama "a monkey" and offers sex to anonymous Twitter users. In fact, Tay was not a real teenager - she was an ML-based chatbot built by Microsoft. The idea was that, at the beginning, Tay would know as much as a typical teenager, and she would then learn new things by speaking with Internet users. But something went wrong. After a few hours of interacting with internet users, the bot turned into a Hitler-loving, racist, anti-feminist little monster. Microsoft had to shut Tay down.
The young Hungarian start-up "Talk-a-bot" also builds chatbots. They are ML-based, but none of them praise Hitler or use racial slurs. The Hungarian bots talk to customers on Facebook on behalf of different brands: they sell products, answer queries and comments, etc. They are very good at this. So good in fact, that the chatbot they designed was used by more than 430,000 people in 3 months.
Why did a chatbot invented by the giant Microsoft fail, and the one made by a small Hungarian start-up work so well? The answer is data.
Machine Learning vs. Traditional Computing
Data is the key to success (or lack thereof) of machine learning applications. Short and simple: in traditional software development, humans create computer systems, and machines simply follow these pre-programmed rules. Thus, the crucial part of the application is the algorithm inside.
Machine learning is different. It is a set of artificial intelligence techniques that allows systems to learn directly from data. How? A programmer writes down a learning algorithm. Then the computer receives a training set of data and examples, and starts learning on its own, changing the algorithm as it learns more about the information it is processing. It means it’s the computer that creates the system. It also means that the ultimate shape of the system depends on data. If the quality of the data is poor, or if the data is biased, then so is the system - just like in the case of Microsoft’s Tay.
When Machine Learning Works
Machine learning is used in different sectors: starting from retail and finance, through health care, to education and charity. Each of them adjusts it to its needs. Google uses ML, for instance, in the Gmail spam filter, Apple - in its personal assistant, Siri. Machine learning helps banks detect suspicious transactions and allows insurance companies to calculate risk more accurately. Machine learning is beloved by ecommerce and marketing: Amazon, Netflix and hundreds of online shops built their recommendation engines on it. Hedge funds, such as Two Sigma or Binatix, have ML algorithms which forecast stock prices. The medical company Medecision uses ML to predict avoidable hospitalizations in diabetes patients, Schneider Electric to prevent oil and gas pumps from failure, and the Zoological Society of London to track endangered animals in photos taken in Africa. Have you ever seen a Facebook application such as "which celebrity do you look like"? They are also ML-based.
There are hundreds of business applications of machine learning. In general, it solves several types of problems. The main ones are:
classification: Is this credit card transaction fraudulent or not? Is this email spam or not? Machine learning is a great tool when you need to divide objects (for example clients or products) into two or more pre-defined groups.
clustering: ML discovers patterns in chaos. It enables those who use it to find parallels between data points and divide objects into similar groups (clusters). What is important, there is no need to define the groups in advance.
regression: It's like future prediction. On the basis of an input from a dataset (usually historical data plus other factors), ML estimates the most likely numeric value of a particular quantity. It could be anything, such as stock or real estate prices, consumer behaviour, or wear and tear on a piece of equipment.
dimensionality reduction - In an ocean of information, ML can choose which data are the most significant and how they can be summarised. In practice, it is applied in such fields as photo processing and text analysis.
When Traditional Software Methods Are Better
Although machine learning gives businesses numerous new options, there are situations when it's better to stick with traditional software methods. When are you better off avoiding ML?
You don't have enough data: Machine Learning is designed to work with huge amounts of data. Really huge. 100k records is a good start. If the training data set is too small, then the system's decisions will be biased.
Data are too noisy: "Noise" in ML is the irrelevant information in a dataset. If there is too much of it, the computer might memorise noise. This was the case of Tay.
You don't have much time (and money): ML is time- and resource-intensive. First, data scientists need to prepare a dataset (if they don't do it, see point no. 2). Then, the computer needs some time to learn. Then the IT team performs test and adjusts the algorithm. Then, the computer needs some time to learn, again. IT does some testing, and adjusts the algorithm. The computer goes back to learning... The cycle repeats over and over again. The more time is needed, the more you need to pay IT specialists.
You have a simple problem to solve.
To sum up: Machine Learning helps find patterns in the chaos of big datasets. It is worth considering when you have a complex task to solve, or if you’re dealing with a large volume of data and lots of variables. But this method has its limits. It's better not to choose it if you are limited by time, or the amount or quality of available data.