Over five hundred images, three days, and one team to release the first version of the application that will make our cities look prettier on photographs. Netguru and MicroscopeIT joined forces to run in an AI/ML-focused hackfest. The result? An app that can automatically remove all those pesky cranes from your cityscape pictures with a single click of a button, all thanks to Deep Learning algorithms. If you've ever been to London or Warsaw, you'll know that cranes are everywhere! Even though the app has very limited use (let's be honest, you don't actually get so many cranes in pictures, do you?), it lays a solid base for co-operation on future Machine Learning projects between the two companies.
Deep Learning image recognition is a technology that will definitely shape the way we use images. It’s already present in many applications we use everyday, and it helps us gain information on various aspects of many fields, in both the public and private sector. For instance, the facial recognition technology market alone is projected to generate an estimated $9.6 billion in revenue by 2022. It will be used across different industries such as healthcare, marketing, retail or security. Good examples are automated border control systems which will leverage different modalities, such as face, fingerprint, or iris recognition to streamline and speed up security proceedings.
Yet, we’re still at the beginning of the bumpy journey to discover the full potential of Deep Learning algorithms and their use in image recognition. There are many challenges we need to face in this area.
First, Machine Learning is a complex statistical process, in which we input a great number of samples and try to guess what rules are behind them. This is very similar to the way we come up with weather forecasts. We analyse historical data of temperatures across all the seasons from dozens of years and then, using regression models, we guess what the weather will look like in the future. This example is a simple function that has been used in many areas, but it’s not Deep Learning yet.
Deep Learning is a complicated function in which we leverage artificial neural networks (ANN). They have been vaguely inspired by the way our brain works. Such systems learn tasks based on examples. In plain English, they develop their recognition abilities from material that they process.
Initially, ANNs were not able to achieve human-competitive performance on certain tasks, especially when it came to image recognition. Only in 2011, convolutional neural networks (CNNs) were introduced. CNNs provide additional layers that are responsible for finding different parts of a picture. They are suitable for processing visual and other two-dimensional data.
Data scientists have been exploring the potential of this technology for the past few years. One team that has great experience in the field are the guys from MicroscopeIT, a software company that specialises in image analysis, computer vision and Machine Learning. Their latest challenge was to apply Deep Learning in practice during Microsoft’s AI/ML-focused Hackfest in Prague. The team invited Netguru to be a part of this exciting journey.
The cooperation between Netguru and MicroscopeIT started with a project we developed during a three-day AI/ML-focused Hackfest organised by Microsoft in Prague. The teams, including backend developers from MicroscopeIT, Netguru’s frontend developer and a consultant from Microsoft, joined together during the event to create a simple application using Deep Learning and image recognition. And we did.
Both teams met in the Czech Republic’s capital and worked for hours every day. The outcome was an app that first uses Deep Learning to detect cranes in images, and then precisely cuts them out of the photos using an algorithm similar to those in graphics editors such as Photoshop.
Our algorithm deals with one of the most difficult tasks that can be solved with neural networks: segmentation. The simplest one – classification – gives you only a “yes or no” answer (a crane is/isn’t in the picture). You can only learn whether the picture includes a specific object, without specifying where it is. Detection is more precise and tells you more or less where that object is (e.g. a crane is in top right corner). Then finally we’ve got segmentation, which enables us to determine where the object is with single-pixel accuracy. Another advanced Deep Learning is instance segmentation, which can also tell the difference between different objects of the same class.
Segmentation was essential in this type of application. We needed to know exactly where the crane was located, so the image processing algorithm could precisely cut it out of the photo and replace the missing pixels with new ones. The biggest challenge, and the most time-consuming task, was creating training material for Machine Learning. The process, called data annotation, involves manually adding descriptions to data. The more detailed the annotation, the better results the networks achieves – but this comes at the cost of being more time-consuming.
The event resulted in the first release of the application. The team named it PrettyCity: Cranes edition. The application enables users to automatically remove an unwanted element from a picture and view beautiful landmarks in their essence – without unnecessary noise. Something that only a few years back was only possible with advanced skills in a good photo editor and plenty of time can now be done with just a couple of clicks.
Naturally, not everything always goes the way we expect it to go. Some images uploaded to the application are more difficult for the neural networks to read, just like the one below. The algorithm sometimes interprets a part of a crane as a image's background, so here we’ve got a hook hanging from the sky.
Developers are working on the final touches right now, and soon we’ll release the PrettyCity app to the public.
The app is just an example of what Deep Learning is capable of. Users can now cut cranes from city landscapes, but the algorithms can also learn how to remove other elements, as well as perform different kinds of tasks.
You might have already tried removing unwanted crowds from your holiday pictures. No one wants to spoil their beautiful shots from vacation because some strangers walked into the frame, right?
Until recently, it has been possible only with the help of Photoshop. You need several, dozens or even hundreds of images depending on the density of the crowd. Photoshop would then take a statistical average of the content found in all the photos, keeping identical areas and removing everything that changes between the different shots. So it would eliminate objects that are moving through all or most layers, such as people walking through the scene.
This method has two major limitations: first, you need a big sample of data and second, it won’t remove objects which are still in all the shots taken.
With Deep Learning algorithms, we’re able to create systems that will recognise the specified type of an object and remove it from your shots much faster, without the need to take multiple photos and regardless of whether the person is moving or standing still.
Segmentation can be also used to determine the number of objects in a particular picture. It might be relatively easy to count people in a photo when you’ve got only 20 or even 50 of them. But what if you want to verify how many participants attended a public demonstration?
There has been some controversy around this topic on many occasions. It is often the case that one party organises a public march, and the other tries to diminish its significance by undermining the number of people who took part.
Using Deep Learning and segmentation, our algorithms can give you the exact number of people in the picture.
It can also be useful for different kinds of events. An organiser of a music festival or tech conference can determine how many people attended particular presentations on concerts. Based on such data, they may better learn the attendees’ preferences and then plan future events to first, match preferences and second, maximise venue capacity.
The ecommerce market is another area where Deep Learning will be put to use, and used car sales may particularly benefit from such technology.
Automatic Number Plate Recognition (ANPR) has been already widely used for security control of highly restricted areas, like military zones or the areas around top government offices, but also on parking lots to streamline the payment process. Such systems first detect vehicles and then photographs them. Vehicle number plate region is extracted by applying segmentation to the image obtained.
The same algorithms can also be used to recognise number plates in car photos added by an owner or a dealer and automatically cut them out of the picture. The technology will improve processes on many auction portals, such as eBay or Amazon, but also auto traders across the world.
The uses of Deep Learning in image recognition have practically no boundaries, but there are still many impediments slowing down the process. Data annotation is not efficient and significantly increases the time needed to build apps based on Deep Learning. Also, the technology is neither cheap, nor widely accessible. For instance, Nvidia is basically the only company capable of producing graphic cards that can handle Deep Learning.
If you’re interested in learning more about the technology, we recommend taking a look at this quora thread.