10 Easy Steps to Accelerate the Machine Learning Development Process [2021 UPDATE]

Grzegorz Mrukwa

Jul 27, 2018 • 10 min read
Accelerating the Machine Learning Development Process

Machine learning (ML) is a young domain focused on providing computers with training data and algorithms to imitate the way humans learn. It’s becoming more popular with businesses from all industries as it gives them a competitive advantage. Machine learning development can reduce costs, optimize business processes, and increase customer satisfaction as long as machine learning models are quickly deployed to production.

Why is the fast machine learning development process important? The quicker the product is introduced to the market, the faster will the business recover development costs and start making profits. Currently, it can take up to 18 months to deploy a single machine learning model to production, so you need to factor in a lot of time and money before your new machine learning algorithm starts bringing in revenue.

Luckily, you can accelerate the machine learning model development process - the work can be reduced to just a few months or weeks (depending on the nature and purpose of the model). Here is our list of 10 simple ways to do it:

1. Look at the business problem first

When you outsource machine learning model development to an external partner or delegate it to your internal ML division, discuss the business objectives with them up-front, not just the idea you have for the machine learning project.

One business problem can usually be solved in a variety of different ways by different machine learning algorithms. The data scientists understand which solutions are feasible, and developers know which are easier to implement than others from the technological perspective.

When you start scoping the machine learning algorithm by yourself, you may find that it’s a research project that could span over two years, while at the outset, all you need is a Proof of Concept. Ask the experienced machine learning team to estimate the end-to-end solution, not just a general cost and time breakdown.

They will be able to not only assess the feasibility of your machine learning project but could also indicate a better solution. That itself may save as much as 95% of the project time!

2. Allocate time for research

By nature, machine learning and data science require in-depth human research. The machine learning team should allocate time to sift through the available scientific publications. A lot of data challenges have already been tackled and solved, and the solutions made available to the general public.

Even when you’re working on a cutting-edge idea, the literature research will have a lot of value: it can provide building blocks for the machine learning system, offer training examples and help to envision the final performance of the machine learning model. If you’re a data scientist, check Paperswithcode, Cornell University’s arXiv, and this two-minute paper channel. The key is to do the research quickly, then these aggregates will help. You will find other useful websites as well.

3. Experiment with the machine learning model early

At the outset, you may have many different hypotheses for your machine learning model development. Start experimenting and testing the hypotheses early and in short iterations. This will allow you to confirm promising ideas regarding your machine learning algorithms and reject dead ends early on. There is no point in moving to the development phase before the initial machine learning concept is validated.

If you don’t pay enough attention to validating the idea, you may end up with a machine learning algorithm that is useless from the business perspective. Deploy to production only when the prototype has been sufficiently tested - otherwise, you will waste your time.

4. Make use of existing data science prototyping frameworks

The internet offers open-source, high-level experimentation frameworks that allow data scientists to test various machine learning models before the code is written. Fast.ai, Hugging Face, or Uber’s Ludwig can help development teams deliver the prototypes quickly for initial model evaluation. You will find other useful frameworks on the web as well.

5. Ensure efficient communication across the machine learning development team

Your machine learning project team certainly consists of multiple experts from different domains, with different competencies and tasks on their hands.

A developer may find it difficult to grasp the objectives of the data science team. You need to make sure all team members can communicate clearly and efficiently - otherwise, mistakes and delays are bound to occur.

Establishing effective communication channels and strategies will allow developers, data scientists, and engineers to focus on solving the data challenges instead of handling miscommunication issues. Follow the best practices you apply in traditional software development projects.

6. Visualize the machine learning systems

Don’t underestimate the power of visualization during the brainstorming phase! As a data scientist, you want to be able to focus on model training and validation instead of constantly explaining how the machine learning model should work.

The machine learning team should prepare diagrams and schemes that visualize the data flow and describe the algorithms that make up the model.

Once you arrive at a defined vision of the machine learning model, these diagrams will be your greatest asset when the development phase comes. They will facilitate and speed up communication between teams and stakeholders. You will also find them invaluable when onboarding new employees or discussing potential project changes.

7. Plan data collection & machine learning model evaluation upfront

The machine learning algorithms must continuously learn, otherwise, they become stale. Data collection for training and retraining are part of the model development lifecycle. It also includes data preparation for the training process (adding the new labels).

The machine learning, data engineering, software engineers, and product teams need to map out both the inputs and outputs for machine learning algorithms. They should also create and implement alerts for model performance degradation of the machine learning algorithm so that they know when a model requires retraining.

Product owners should take an active part in these discussions to ensure all business criteria are addressed. Their role is also vital in defining and collecting the expected outputs. This step is often skipped by project teams and it’s also the main cause of delays.

It’s important to remember not to test the model on the entire data set. Data scientists should split data sets to identify data for training and testing, or training, testing, and validation of the model while they continue to fine-tune it. A good split of the data set could feature:

  • 70% training data and 30% testing data, or
  • 60% training data, 20% testing data and 20% validation data

Such division of the dataset ensures the model isn’t overtrained and offers a good indication of the future model performance with real-world data.

8. Plan retraining of the machine learning model

The world is changing fast and so does real-world data. Most machine learning models have to be systematically retrained to avoid drift. As a data scientist, you should plan the model training process and the data points for retraining at the outset. It will help you avoid delays in the deployment and maintenance of your machine learning model.

To do it right, the machine learning, data engineers, DevOps teams, and backend developers should get together and work out the data flow between services for retraining. Record it on a diagram to avoid misunderstandings. Don’t forget to also agree on the model deployment method of the new machine learning model after it has been successfully retrained.

9. Use a tool stack that enables CI/CD pipelines

Moving on from the research phase to machine learning model production is more complex than traditional software development. The machine learning approach relies not only on the code but also on given data and experimentation with different algorithms and parameters. There are usually multiple experiments involved.

These tasks can be conducted much faster with a Continuous Delivery/Continuous Integration workflow applied to machine learning development. The workflow will simplify the process and help multidisciplinary teams collaborate better. With the CI/CD framework operationalizing, testing, and deploying the machine learning models will be easier for developers and data scientists alike.

10. Benefit from pre-trained models

Looking at the already-created, pre-trained models will certainly accelerate the machine learning model development. These ready-made solutions to common problems can be used as starting points to solve similar issues. There are a lot of open-source resources - huggingface transformers and tensorflow model garden are two of the most reliable frameworks on the market.

Although the resources available will require tailoring to your machine learning project, they can help you save a lot of time on the initial model training as well as money needed for the infrastructure. A model trained for a different problem may also prove to be useful, especially when you are working with a small data set.

Accelerating the Machine Learning model development

Machine Learning model development is a young but fast-developing domain of artificial intelligence that businesses can use to their advantage - as long as they are focused on accelerating the machine learning development process. There are simple ways to achieve it, which include organized research and focus on validating the initial idea at the outset.

Remembering to plan certain phases of the machine learning model development lifecycle up-front and establish a solid communication process. Planning carefully will help to avoid delays. Teams can also use open-source tools for model validation and use pre-trained machine learning models to ensure the algorithms are deployed for production in the minimum time.

Related topics

More posts by this author

Grzegorz Mrukwa

Data Science Manager
AI Consulting