Machine learning is undoubtedly one of the hottest topics in software development right now. And for good reason. Machine learning opens up a whole world of new possibilities for developers, exciting app owners and end users alike. From greater personalisation to smarter recommendations, improved search functions, intelligent assistants, and applications that can see, hear, and react – machine learning can improve an app and the experience of using it in all manner of ways.
Machine learning is a subset of artificial intelligence (AI) that gives computers the ability to “learn” – i.e. progressively improve performance on a specific task – from data without relying on rule-based programming. In other words, it is the practice of using algorithms to parse and learn from data, and then automatically make a prediction or “figure out” how to perform a certain task.
OK – but which programming language is the best when it comes to machine learning? If you’ve got an idea for a new project which will require machine learning capabilities, it’s important that that you make the right choice, for the success (or failure) of your application will hinge upon it.
For starters, you’ll need a language with good machine learning libraries. You’ll also need good runtime performance, good tool support, a large community of programmers, and a healthy ecosystem of supporting packages.
There are many languages to choose from that tick these boxes, but today we’re going to narrow the field down to two of the most popular – Python and C++. Let’s take a look and see how they compare.
GitHub put together the 10 most popular programming languages used for machine learning. Python is the most common language among machine learning repositories and is the third most common language on GitHub overall.
Why is Python more popular than C++? Well, a lot of it comes down to the fact that Python is extremely easy to learn, and is also easy to use in practice when compared to C++. You don’t need years of software engineering experience to get started with Python, and it also has a huge number of libraries that are ready to use for the purposes of machine learning and data analysis.
Also, academics working in machine learning have historically implemented their models in Python and not C++, meaning that most models published in papers are publicly available in the form of implementations in Python.
Jupyter Notebooks have also been instrumental in helping student programmers learn to use Python for data science, machine learning, and research. Jupyter was designed for Julia, Python, and R (hence the name – though it was formerly known as IPython), and is an open-source web application that allows users to create and share documents that contain live code, equations, visualisations, and explanatory text. Essentially, Jupyter Notebooks are interactive textbooks, full of explanations and examples which students can test out right from their browsers.
There are many additional services offered around Jupyter Notebooks as well, such as Google Colab – Google’s free cloud service for AI developers, which also includes free access to high performance GPUs on which Jupyter Notebooks can be run. Google Colab also ties in directly with Google Drive, meaning datasets and Notebooks can be stored there, too.
With everything being free, there’s really nothing else out there with a lower cost of entry, which has undoubtedly helped with Python’s popularity as the machine learning language of choice for so many developers.
Despite its popularity, there are a few areas where C++ outperforms Python.
For one thing, C++ has the advantage of being a statically typed language, so you won’t have type errors show up during runtime. The performance crown also goes to C++, as C++ creates more compact and faster runtime code. However, there are several ways to optimise Python code so it runs more efficiently. For example, there are optimising extensions for Python such as Cython, which is essentially Python with static typing – and because Cython is statically typed, you can easily compile it to C/C++ and run at C/C++ speeds, so there is practically no difference.
The fact that Python is a dynamic (as opposed to static) language does have some advantages of its own, however – not least because it reduces complexity when it comes to collaborating, and optimises programmer efficiency, so you can implement functionality with less code.
Unlike C++, where all major compilers tend to do specific optimisation and can be platform specific, Python code can be run on pretty much any platform without wasting time on specific configurations.
Another factor to consider is the rise of GPU-accelerated computing. GPUs offer capabilities for parallelism, and have led to the creation of libraries such as CUDA Python and cuDNN. What this essentially means is that more and more of the actual computing for machine learning workloads is being offloaded to GPUs – and the result is that any performance advantage that C++ may have is becoming increasingly irrelevant.
Python is renowned for its concise and easily-readable code, earning it high regard for its ease-of-use and simplicity – particularly amongst new developers. The same cannot be said for C++, which is considered to be a lower-level language, which means that it is easier to read for the computer (hence its higher performance), though harder to read for humans.
Given the complexity of machine learning algorithms, the less a developer has to worry about the intricacies of coding, the more they can focus on what truly matters – finding solutions to problems and achieving the goals of the project.
Simplicity and readability also help when it comes to collaborative coding, or when machine learning projects need to change hands between development teams. In this sense, Python comes up trumps.
Python’s simple syntax also allows for a more natural and intuitive ETL (Extract, Transform, Load) process, and means that it is faster for development when compared to C++, allowing developers to quickly test machine learning algorithms without having to implement them.
For us, the clear winner between C++ and Python for machine learning is Python.
There are many reasons it’s so popular:
That all being said, specific projects need specific technologies. So, if you’re in the midst of planning a new project with machine learning capabilities and want to know whether C++, Python, or any other language will be the most appropriate, get in touch with Netguru and we’ll chat through your specific requirements and advise you on the best path forward.