Machine Learning has blossomed over the last few years and is not going to slow down any time soon.
Optimizing machine learning performance
There is a branch of MLOps which focuses on managing the tools that enable Machine Learning Engineers to optimize the performance of their software. All the work, starting from data collection, preprocessing, experiment handling, versioning, and deployments can be automated to some degree, and MLOps focuses on just that. This makes Machine Learning Engineers more effective and brings an antidote to chaos.
However, choosing the best tool for the project is not an easy task. It’s a bit like with a toolbox - there are many tools which are dedicated for a particular task, but will definitely fail at others. Imagine you are assembling a drawer for your bedroom - you certainly need a screwdriver (and probably some nails), but a chisel would probably not bring much help.
The same goes for Machine Learning tools - all the components are more or less a toolbox, and they can be either very helpful or useless. What is worse, every year new tools come out, which makes it very hard to determine what is useful for a particular case. Thus, the knowledge about the tools and clear comparison can help a bit in making the right choice.
Choosing the right machine learning tools
Although there are many solutions for ML experiments, the scope of their usage varies a lot and there is no clear comparison between them. This article will attempt to make it easier for you to choose the best ML tool for your needs. We will mainly focus here on two groups of tools:
- Experiment tracking-oriented,
- Full lifecycle-oriented.
A perceptive eye may see that one category is actually a subset of the other. In fact, experiment tracking can be a part of full lifecycle-oriented tools. However, more often than not experiment tracking is the most important feature.
Experimental vs full lifecycle-oriented tools
Depending on the needs, you may favor either experiment tracking tools or full lifecycle-oriented ones. In terms of trying out different model architectures and choosing the best solution for your problem, experiment tracking tools will be the best bet for you. Moreover, in cases such as ML Proof of Concept, where the project timeline is tight, every day lost on setup is painful. That is why an easy (and fast to set up) tool is the best to organize, and manage experiments in order to find the best solution for the particular problem.
In case you would like to adopt a tool which would help you with all the steps along the Machine Learning lifecycle - from data processing (or even data creation in some cases) to model deployment, experiment tracking-oriented tools will not be the best choice for your needs. However, in those solutions, experiment tracking functionality can be quite obscure, and reduced to a bare minimum.
List of experiment tracking oriented tools
Here is the list of tools for experiment tracking:
Weights & Biases
When it comes to ease of use and setup time, Wandb is the best option from all of the tools. Wandb makes it extremely easy to track experiments and compare them from an intuitive web interface. With a couple lines of code, the experiment is fully reported inside Wandb. From the code standpoint, Wandb handles most of the work for you - if you want to log something to Wandb, you use one function to rule it all.
A really nice feature is that Wandb has online plots - using those, a user can quickly diagnose and verify experiments. There is support for more advanced plotting in Wandb as well - in case you want to diagnose deep learning model weights or their distributions, Wandb is certainly the best solution for that research.
Although the main functionality of Wandb is experiment tracking, there are a couple more functionalities worth checking out. Reporting is a nice utility to quickly generate reports for the team as well as for the client. Sweeps enable the user to perform hyperparameter optimization, which is a nice addition. Wandb also enables artifact storage.
Wandb is also the go-to choice for research teams focused on discovery rather than delivery. Without saying, Wandb is a great companion for ML insights and the development of the tool seems to really focus on helping researchers in their work.
If experiment tracking is the only functionality necessary and the budget is high, Wandb is the best choice. However, Wandb provides experiment tracking only, so if model deployments or any other interactive functionalities, such as notebook hosting, are necessary, Wandb is probably not the best choice.
- Great, interactive interface.
- Minimal boilerplate code.
- Handy experiment comparisons, great for research projects or validations.
- Experiment-tracking only.
- Relatively pricey.
Neptune offers a similar experience to Wandb, as it is mainly focused on experiment tracking. The usage is similar - Neptune offers a Python package. The user uses a couple of lines of code which first connect to the Neptune server to create an experiment and then log artifacts which are then stored inside the experiment.
Neptune also provides interactive plotting - it supports a wide range of different visualization libraries. This makes it convenient for users to be flexible in terms of their favorite plotting packages. It does not provide additional tools such as reporting or sweeps as in Wandb, but those features are generally not that relevant compared to experiment tracking and may be developed on the code level.
Neptune may not be that convenient in logging as Wandb, and may include a little bit more of boilerplate code. The support for automatic logging for common deep learning libraries is also less advanced. However, the web interface is nice and intuitive and provides information in a clean format to quickly discover the experiment results and find the optimal choice for the given case.
- Nice, interactive interface.
- Relatively cheap.
- Handy experiment comparisons.
- May be not sufficient for full-time, long projects.
- Experiment tracking only.
List of full lifecycle-oriented tools
Below you can find the list of full lifecycle-oriented tools:
When it comes to experiment tracking, MLflow is quite basic in its usage. In case the experiments involve more complex visualizations, MLFlow does not help much in viewing them. Most plots are stored as artifacts rather than embedded widgets, and this makes it less convenient to grasp experiment results. To view these plots it is often necessary to download and investigate them locally. This also makes it quite hard to discover problems in case some experiment does not bring the expected results. Sometimes a lot of manual work to identify the best approach may be necessary. There is always more boilerplate code than in Wandb or Neptune.
However, in comparison with the previous tools, MLflow offers a richer set of features. MLflow’s capabilities resemble those of the full-lifecycle frameworks.
MLflow Projects is the component to package code in a Conda or Docker environment to ensure reproducibility of code executions. This feature is very useful when the code repository needs to be deployed somewhere else and not only in the development environment. Moreover, the MLflow Models component offers an abstraction layer to deploy Machine Learning models from the most popular ML libraries. In case the model on the production environment needs to be updated, MLFlow Models makes it easy to accomplish - with a couple lines of code, the model can be deployed for inference.
Finally, MLflow Registry offers model management - a new model can be added to the model registry and accessed by its identifier. Models can also be versioned, so in case several model versions need to be deployed, Model Registry organizes them for us. In case of emergency, an older model can be reverted thanks to versioning.
Finally, MLflow is free, so if the budget is tight, it is a nice choice. A rule of thumb is that if you want to have only an experiment tracking tool, MLflow offers a very basic feature set. In case you need deployment and model management functionality, MLflow can provide that for you.
- Free, open source tool.
- Offers management and versioning of machine learning models.
- Offers deployment tools for machine learning models.
- Does not make it easy to track and compare experiments.
Databricks is a bridge between Data Engineering and Machine Learning. For Big Data solutions, it offers data engineering tools, such as Spark, to create preprocessing pipelines or statistics. There are tools which enable advanced analytics about the incoming data. Moreover, it is possible to define jobs which can be executed in order to accomplish a given task. For example, we could define scripts for exploratory data analysis of the new data, or we can define jobs for data processing.
The tool also offers data versioning, which is useful for scenarios where data is constantly updated. Imagine a situation where a Machine Learning model is trained on ever-changing data. Without versioning, achieving reproducibility is hard, and turns into an impossible task as time passes and the data grows.
Databricks supports collaborative notebooks, where visualizations can be built directly from new data. Python, SQL, R, and Scala are supported. Notebook execution can be scheduled, so that the visualizations are updated together with the incoming data.
For experiment tracking, Databricks offers managed MLflow to train models on the data. The interface is the same as in the free MLflow version, so there is not much improvement for the experiment tracking experience there. The visualizations are basic and can be saved as artifacts attached to the specific experiment. Apart from experiment tracking, all the other functionality present in MLflow is also incorporated in Databricks. From model registry to model deployments, all these features are available from the Databricks interface.
Generally, if your project involves Big Data and complex preprocessing pipelines, Databricks will be a great choice for end-to-end management of your project. In case of simpler projects, where data does not change that much, and the volume is low, an open source version of MLflow would definitely be enough.
- Great for Big Data projects.
- Does not bring much to the table for the ML part compared to the free MLflow version.
Polyaxon is a robust tool for a wide range of tasks. It provides an interface to run any job on a Kubernetes cluster, which can be set up by the client, or can be managed in the cloud. Therefore, it can serve as an experiment tracking platform for machine learning projects, but also for any other job, such as data processing, visualization, or other analytics. It can host notebook servers or even Visual Studio Code.
When it comes to the experiment tracking experience, Polyaxon is similar to MLflow. Files and plots are stored as artifacts, there are no interactive plots, and the interface is not focused on experiment comparison. It is more of a repository for experiments.
However, as an end-to-end tool, Polyaxon is great for managing many projects and team collaboration. If the team works on many projects requiring high computational power or GPU usage, Polyaxon is a great platform to optimally distribute resources. From any task, starting from data processing, training and finishing on visualizations, Polyaxon is a great way to save time and orchestrate the computational power for Machine Learning purposes.
The main drawback of Polyaxon is its setup time, as Polyaxon works on top of a Kubernetes cluster. However, once Polyaxon is set up, subsequent projects may be easily added and setup is very short. There is also an option for a cloud deployment, in which Polyaxon takes care of the infrastructure, but it costs a bit (plans are uniquely tailored for the clients).
If you work in a team, have multiple projects, and need access to multi-core CPU or GPU machines, Polyaxon can serve as a central solution for management of ML projects. Without much setup, you can quickly move from setup to actual work. On top of that, Polyaxon offers many utilities, such as notebook hosting, containerization, deployment, experiment tracking, registry of machine learning models, and much more. If you work alone, or you just need a lightweight solution, Polyaxon is probably overkill for you.
- Great variety of applications.
- May be overkill for basic tasks.
- Not great experiment tracking functionality.
AWS SageMaker is a solution offered by Amazon to build, train and deploy Machine Learning models at scale. Using a web interface, it is possible to configure data, choose models, and spin up infrastructure to train machine learning models.
SageMaker provides a ready to use implementation for most common Machine Learning algorithms - it is only necessary to provide the data (which should be stored in an S3 bucket). The models are then trained and easily deployed on AWS. Endpoints can be quickly defined from the AWS web interface. SageMaker rents computational power for experiments for a relatively low price. There is also an option to provide custom training procedures. In this case, a Docker container with a training procedure needs to be uploaded to ECR in order to execute the training.
SageMaker also offers many different features which will not be found in the other tools. For example, in case a large amount of data needs to be labelled, AWS provides a data annotation service - for a given cost, data labellers may be hired to prepare data for the project.
SageMaker cooperates heavily with other AWS components. This is an advantage as well as a disadvantage. For projects which are already deployed on AWS, the productivity can be greatly increased, as ML solutions can be delivered from reusable building blocks provided by AWS. For projects involved with different cloud vendors, SageMaker may not be the best choice.
When it comes to the experiment tracking experience, only logs from the executed job are provided and there is no interactive experience in terms of visualizations. This makes it extremely hard to perform any research about a particular problem in the project. Generally, for well-defined and known ML problems SageMaker is great, although may not help in case some research needs to be performed.
SageMaker provides hosting of notebook instances, so for users used to performing experiments from notebooks, SageMaker would be a nice choice. SageMaker also offers processing jobs for cases different from model training. For example, in case a raw dataset needs to be processed in some way, a single job can be defined for that purpose and executed on demand on data located in particular S3 bucket.
SageMaker also offers convenient deployment of machine learning models. With easy endpoint configuration tools, models can be quickly deployed to production and inferred from.
- Great for AWS-centered projects.
- Very poor experiment tracking experience.
- Does not work with different cloud vendors.
Valohai is somewhat competitive to SageMaker, Polyaxon and Databricks, as it deals with the full lifecycle of Machine Learning projects. However, the idea is much simpler - train, evaluate, deploy, and repeat. By providing a Github repository, or a notebook, or just a script, the solution is executed in the cloud and the results are saved. If the necessity is to quickly train the model and then deploy it - Valohai is a means to achieve it. Why use it? In case there is no computational power available to perform experiments, Valohai provides it. With minimal effort, the experiment can be executed, and the results can be accessed, or deployed in Valohai.
This solution is great for teams with different vendor requirements. Valohai operates well with GCP, AWS or Azure, and takes care of configuration for these platforms. Valohai ensures full reproducibility of the results, as the code executed during an experiment is stored inside the experiment.
Valohai is surely not a solution for exploring different ML approaches - for Valohai, hyperparameter tuning is the maximal feasible exploration scenario. In other words, if the goal is to find an experiment tracking tool, Valohai will not be a good choice.
- Simplicity of use.
- Good web interface support.
- Lack of additional features besides the bare minimum for a full lifecycle solution.
Floydhub is very similar in its functionality to Valohai - it provides computational serversto run machine learning experiments and offers deployment services for them. It is definitely not an experiment tracking tool. The platform offers deep learning environments with installed packages, which will be used to run Python scripts. The platform also ensures full reproducibility of the results, as the executed code is stored inside the experiment.
In case a Machine Learning model needs to be trained, data needs to be uploaded first. Then, from the command line, data is provided for the script to use it and the training results are stored in Floydhub. Training can also be executed from notebooks hosted on Floydhub. In case a particular machine learning model is ready for production, REST API endpoints can be defined and the model can be deployed.
The tool is extremely easy to use. The scenario of usage is that if there is a lack of computational resources to perform training, Floydhub offers services to execute it.
- Simplicity of use.
- Compared to Valohai, Floydhub offers notebook support.
- Lack of additional features besides the bare minimum for a full lifecycle solution.
The aspects of machine learning tools
There are many tools available to help Machine Learning Engineers with their work and each of them is suited for a specific case. Knowing and understanding the differences may help to choose the right tool - whether it should be a superior experiment tracking experience, a full machine learning lifecycle platform, or just a platform for running experiments, the options are available. In this article, only 8 different tools were described, but there is much more available on the market. Hopefully, this article showed you what aspects should be considered when looking for MLOps solutions.