Data is not free at all in machine learning projects
As I mentioned above, to train a machine learning model, you need big sets of data. It may seem that it's not a problem anymore, since everyone can afford to store and process petabytes of information. While storage may be cheap, it requires time to collect a sufficient amount of data. Moreover, buying ready sets of data is expensive.
There are problems in machine learning of a different nature. Preparing data for algorithm training is a complicated process. Here's an interesting post on how it is done. You need to know what issue you want your algorithm to solve, because you will need to plan classification, clustering, regression, and ranking ahead.
You need to establish data collection mechanisms and consistent formatting. Then you have to reduce data with attribute sampling, record sampling, or aggregating. You need to decompose the data and rescale it. It is a complex task that requires skilled engineers and time. So even if you have infinite disk space, the process is expensive.
If you plan to use personal data, you will probably face additional challenges. People around the world are more and more aware of the importance of protecting their privacy. They may be unwilling to share them with you or issue a formal complaint if when they realize you did it, even if you obtained all they gave you their consent.
Personal data and big data activities have also become more difficult, risky and costly with the introduction of new regulations protecting personal data, such as the famous European General Data Protection Regulation.
The machine learning technology is very young
Once again, from the outside, it looks like a fairytale. The biggest tech corporations are spending money on open source frameworks for everyone. The Alphabet Inc. (former Google) offers TensorFlow, while Microsoft cooperates with Facebook developing Open Neural Network Exchange (ONNX). These systems are powered by data provided by business and individual users all around the world.
However, the central problem of machine learning is that all these environments are very young. The first version of TensorFlow was released in February 2017, while PyTorch, another popular library, came out in October 2017. Web application frameworks are much, much older - Ruby on Rails is 14 years old, and the Python-based Django is 13 years old.
On one hand young technology uses the most contemporary solutions, on the other, it may not be production-ready, or be borderline production ready.
You need time to achieve any satisfying results and planning is difficult
Traditional enterprise software development is pretty straightforward. You have your business goals, functionalities, choose technology to build it, and assume it will take some months to release a working version. In machine learning development has more layers. The engineers are writing a program that will generate a program, which will learn to perform the actions you planned when setting your business goals. Just adding these one or two levels makes everything much more complicated.
The challenge is that machine learning takes much more time. You have to gather and prepare data, then train the algorithm. There are much more uncertainties. That is why, while in traditional website or application development an experienced team can estimate the time quite precisely, a machine learning project used for example to provide product recommendations can take much less or much more time than expected. Why? Because even the best machine learning engineers don't know how the deep learning networks will behave when analyzing different sets of data. It also means that the machine learning engineers and data scientists cannot guarantee that the training process of a model can be replicated.
Understand the limits of contemporary machine learning technology
It's very likely machine learning will soon reach the point when it's a common technology but the main machine learning problems are yet to be solved. Nevertheless, engaging in an AI project is a high risk, high reward enterprise. You need to be patient, plan carefully, respect the challenges machine learning technology brings, and find people who truly understand machine learning and are not trying to sell you an empty promise.