What’s the Role of Quality Assurance in Machine Learning Projects?

Updated Jul 2, 2024 • 13 min read

In a typical app development project, Quality Assurance professionals focus on product quality throughout the development journey. In doing their work, QA teams benefit from established and well-defined processes.

However, when it comes to machine learning projects, QA specialists also need to validate datasets that train machine learning models. The Quality Assurance methods and practices for such tasks have yet to be fully defined.

Quality Assurance specialists are responsible for monitoring, inspecting, and proposing measures to correct or improve digital products and processes. Their main role is to ensure the product does its job in the best possible way and the final output meets (and even surpasses) client expectations. However, if your company or startup is working on a machine learning project, you need to sit down with your entire project team to redefine and clarify the work of the QA team. It won’t be business as usual.

In machine learning projects, quality assurance isn’t just about testing the product so that it meets quality standards. QA specialists need to work collaboratively with engineers in iterating the ML models, test data and algorithms. The QA team needs to set up the tools and methods for the datasets and date models to be valid and the development process to go as smoothly as possible.

From my experience as a Machine Learning Engineer, working together with QA Specialists, there are three areas where QA teams are crucial to the success of ML projects:

Error analysis
Validating the solution on different devices
Label reviews

In this article, I’ll discuss the role of QA specialists in each of these areas. I’ll also briefly touch on the responsibilities of machine learning engineers to highlight the nature of their engagement or collaboration with QA teams.

1. Error analysis in machine learning projects

The idea of error analysis is simple: It’s about improving your machine learning model. To do that, you need to know what type of errors the model makes. Identify examples from your dataset where the model made the biggest mistakes, and check them one by one.

Your team needs to analyze these errors and its likely causes. Aggregated statistics about the issues will give you a hint on which ones to prioritize and how to optimize data models. For an in-depth walkthrough about performing error analysis, you may refer to this course.

Role of the QA team

Going through examples with the highest number of errors in your dataset can be time-consuming. Splitting this task into chunks and involving the QA team in machine learning and AI projects will help speed up this process.

You need to remember that identifying the causes of errors is a subjective exercise. Classifying errors will vary for every reviewer. When it comes to classification or segmentation in computer vision and machine learning, one reviewer will see a particular pattern that another wouldn’t. Thus, additional team members can broaden the set of error categories that your project team can spot. This can also function as a double check for dubious examples.

The Quality Assurance process is about understanding your clients' business needs and focusing on the quality of the final solution. Keeping this in mind will focus your approach on how to structure the error analysis process. For example, when doing an error analysis of product recommendations, you can rely on your QA specialists to identify product categories that are performing poorly. You might even find out the user demographics susceptible to incorrect recommendations, which will help to correct and optimize the model performance.

Role of the ML engineer

It’s important to have the proper setup in place before conducting error analysis. An ML engineer should be the one responsible for this. For example, in one of our projects at Netguru, our team performed error analysis on a predefined set of videos. To speed up the process, we added a caption to each video frame, showing its number. We then input these numbers into a spreadsheet to identify the frame. Such testing practices help simplify the QA process.

For relatively simple or small projects, you can use a shared spreadsheet to manage your error analysis. However, for bigger projects or those that require a lengthy iterative process, using an advanced tool will make the work more manageable.

In one of our projects that involved image segmentation, we moved our error analysis to Streamlit. We built a graphical interface that enabled viewing of examples in detail — the predicted mask overlaid on the image, false positives and false negatives, and image metadata. This setup gave our Quality Assurance specialist a powerful view of the examples and predictions.

2. Validation of the machine learning model on different devices

When building machine learning models, project teams are often constrained by the resources of the devices on which the inference will take place. This is especially the case when developing models for mobile applications where the inference runs directly on the device.

More device models and operating systems adds complexity to the project because of the need to ensure quality across different types of hardware and APIs. This requires heavy coordination when it comes to review and testing of test cases by the QA team.

Role of the QA team

By collecting data about the clients' target market, you can discover what type of devices you’ll encounter in the final production setting. In one of our projects, the client had statistics about their mobile site entries. It contained information about the specific devices their customers used when logging onto the site.

Having a rough estimate of the computational complexity of our algorithm, our Quality Assurance specialist used public machine learning benchmarks like AI-Benchmark to identify devices that wouldn’t be able to run the solution in real-time. The image below is an example of performance ranking showing the most impactful differences across phone models, including the AI Score.

How to categorize and calculate error rates. Source: DeepLearning.AI

Engaging the QA team is also crucial in brainstorming for ideas on how to approach the types of devices that can’t handle your initial implementation. In our project, we decided that the evaluation wouldn’t run in real-time on weaker devices. However, we made some tweaks in the UI design to provide a good user experience for these types of lower performance devices.

One of the most important duties of the QA team in machine learning projects is to test the solution on different platforms. In machine learning, this process requires carrying out the following tasks:

Tracking app versions: As improving a machine learning model is an iterative process, you need to keep track of the specific versions deployed on any given device type. This will allow you to assess whether a change had the expected effect. This will also help your project team when communicating progress to clients, especially if they’re also performing their own tests.
Checking for issues: The things that you need to look out for will vary from one project to the next. For example, in some computer vision projects, you may not have a metric to assess the quality of the solution as it will be purely visual. In that case, having a proper setup in place that will allow you to easily compare the effects between different versions of the app can speed up the testing process.
Figuring out ways to break the model: Conduct a proper audit of a solution ready for deployment into production. Figuring out what could go wrong and testing software for these specific cases will help you to avoid headaches later on. Oftentimes, you need to balance this with limited time and use cases that you expect for the product.
Collecting metrics: Collect the right metrics given the deployment of a machine learning model on different devices. The scope of different hardware configurations and a relatively low adoption rate of ML technologies make these metrics vary across devices. When testing real-time solutions for mobile, you’ll be gathering standard metrics such as frames per second (FPS) or whether the machine learning model runs on a particular device on GPU, CPU, or some other specific ML accelerator.

With a multitude of different devices, manual testing is oftentimes cumbersome in testing software. Because of this, project teams should always consider building an automated testing pipeline to test data. This requires collaborative team effort. Developers, ML engineers, and QA specialists need to work together when setting up automated testing scenarios built into a continuous integration process. The responsibilities of the QA team include helping design the process, writing automated tests, and collecting and analyzing results from different devices.

Role of the ML engineer

As in the case of error analysis, ML engineers should assist the QA team in developing the setup for testing, even in manual testing scenarios. For example, in one of our projects, we had a top-down process for localizing key points on detected objects. We introduced an additional debug to visualize the bounding boxes of the detected objects. This debug switch was helpful in the development phase of the project for the QA team, as well as for ML engineers.

Further, machine learning engineers should also help in assessing which metrics are needed in cooperation with the mobile development team. At the same time, ML engineers have the best knowledge about limitations of the machine learning models and constraints of the devices when it comes to the implementation. They need to communicate these constraints clearly to the QA team so everyone is on the same page.

3. Label reviews in machine learning data preparation process

Quality also matters when it comes to the labeling process. If provided with low quality data, the machine learning algorithm will perform poorly. It’s essential to take note of this because labeling is typically done by an external workforce. Furthermore, it’s possible that there’s more than one person labeling the data, which can lead to inconsistencies in data inputs.

Role of the QA team

Many labeling platforms offer a review process. Projects designate a reviewer who’s able to see a sample of the labeled data. It’s their responsibility to make adjustments and approve or reject the labeled assets. This allows them to spot issues early in the project. It also allows for reassessing labeling rules and aligning the labels across the tasks. The role of the Quality Assurance Specialist in this case is to conduct this review and handle the communication between the labeling team and machine learning engineers.

Review interface in Labelbox, a labeling platform

Role of the ML engineer

The ML engineer's role for this task is to prepare the labeling rules. The ML team knows best how to conduct the labeling process in a manner that generates the required data for the ML model. Discussions on any questionable labels with the QA team will also be necessary to resolve potential ambiguities.

Challenges in QA for machine learning projects

QA engineers have exceptional skills for locating weak points of technical solutions. Nevertheless, machine learning is a complex topic that’s still not widely understood even within technical organizations. QA teams may require introduction to basic ML concepts, including how to collaborate with ML engineers.

Further, project teams need to discuss workflow and iterability of solutions as it may affect the effort required to test any given application. This means that the collaboration between the data scientists and QA engineers should commence as early as possible. The QA team needs to create a strategy at the beginning of the project, which should include the selection of supported devices. In addition, the QA specialists and the ML developers need to plan together on the debug tools to be able to have an efficient feedback loop and optimize model performance.

Quality assurance for machine learning

The role of a Quality Assurance Specialist in a machine learning project is nuanced from the traditional testing role. Their work isn’t only about testing for quality. It’s also about empowering the engineers and developers to iterate quickly (and effectively) on the ML models and algorithms.

Among peers in the ML space, there’s already a prevalent temptation to designate data scientists as QA specialists. While they can certainly lend their expertise to QA teams, it’s important to have experienced QA personnel lead the process. At the heart of Quality Assurance is making sure that the product is suitable for its intended users. Their training and experience in this regard will be valuable in building machine learning models and ensuring a smooth development process.