If you’ve been following the latest trends in technology, you’ve probably noticed that machine learning (ML) and deep learning (DL) are not just buzzwords anymore (see what is the difference between ML and DL). In fact, they’re responsible for some of the most impressive achievements in artificial intelligence so far, like the spotting of a distant eight-planet solar system or beating real humans at master-level chess. But are you also familiar with the concept of convolutional neural networks (CNNs or ConvNets), and how they contribute to AI development (see how does AI differ from ML)?
CNNs make up one of the most significant techniques in DL, in terms of analyzing visual imagery, and are widely applied in various technologies, such as facial recognition. So, in order to get a general idea of CNNs and what all the fuss is about - here’s a short guide, covering the basics in the simplest possible terms.
Convolutional Neural Networks - what they stem from and how they work
Firstly, let’s go back to the primal roots of CNNs - our own eyes and brains. We are constantly analyzing what’s going on around us by seeing things, labeling them, and recognizing patterns. It all starts with the eyes, but the information processing part takes place in the brain - in an area called the primary visual cortex, which makes sense of what we see. We see a table, and we know it’s a table based on the number of visual examples that were given to us in the past.
The hierarchical architecture of neurons in the brain plays a key role not only in remembering objects, but also in labeling them. Each layer of neurons learns to recognize different groups of characteristics. They all communicate with one another to develop a holistic comprehension of what we see.
Similar to how we learn to recognize things, machine algorithms also learn how to recognize objects in pictures - by seeing millions of images, and acknowledging patterns. However, a convolutional neural network (which is a sort of artificial neural network) works a bit differently than the neural networks in our brains. For example, it sees images as RGB pixels (or numbers), and while the layers are usually organized hierarchically, the convolution operator works in 3 dimensions: width, height and depth.
All convolutional layers contain filters (or kernels) that slide over the input image, detect features, create a feature map, and pass the results to the next layer. Then the operation repeats. Filters in the first few layers handle the simplest features (like edges or curves), while the next layers combine these results and use them to detect more detailed ones (like textures or entire body parts).
When all the feature maps are ready, they are merged to get the final output that represents predictions about the object that the machine sees. And the more images it processes this way, the smarter it gets!
The significance of CNNs in Machine Learning
CNNs (and deep learning in general) are so important in machine learning because they are capable of learning features on their own, and fully automatically. They replace the process of manual feature engineering, and therefore save a lot of time, effort, and cost. They’re currently being used not only for image classification, video recognition, or adding color to black and white pictures, but also for speech recognition, security enhancement, and detecting intrusion behavior.
CNNs considerably accelerate the speed of development in machine learning, and they allow companies to create stunning applications that make people’s lives easier and more enjoyable. Here are some great examples:
PrettyCity - our Netguru Python-based app in which we use neural network algorithms to detect cranes hanging above buildings, so they can be deleted from pictures, making the images look prettier.
AI Scry - an iOS app that identifies and describes what’s in front of your phone’s camera.
Google Translate - now able to translate speech without transcribing it first.
It’s not an exaggeration to say that developing CNNs was a milestone in the evolution of AI. Of course, there’s still a lot of room for improvement, but being able to simulate activity in the human brain sounds captivating enough to engage even the best specialists on the market in the entire upgrade process.
At Netguru, we are working on this as well, so if you want to learn more about ML's pros and cons, see our article. If you are interested in our Machine Learning projects, we’ll be happy to answer your questions. Or maybe you’re thinking about developing your own ML-enabled app? Anyway, just drop us a line.