How AI and Machine Learning in Cybersecurity Buzz Gets the Trends Wrong
The new surge in Artificial Intelligence development has a strong impact on the InfoSec sector. However, there is much more nuance to it than you may think.
Machine learning algorithms are not ready to take over on both sides of the fence. They are just a powerful tool that both security specialists and their adversaries use in their arms race.
The buzz about machine learning (ML) is overwhelming. No wonder it is also present in such a profitable and advanced sector as cybersecurity. Actually, the InfoSec business has been using artificial intelligence (AI) for a very long time, and the recent boom in the adoption of ML techniques (algorithms being trained on a data set, and later able to extrapolate the rules giving precise answers on new data sets) didn't go unnoticed.
Probably you came across an InfoSec company's marketing piece describing how algorithms can detect fraud, malware or network security breaches on the fly. Alternatively, maybe you heard a more sophisticated sales pitch saying how threatened and vulnerable world is due to imminent AI cyberattacks. Both are not true.
While ML creates an opportunity for the two sides of the cybersecurity arms race, the balance remains untouched, and it seems that the expected development of AI won't dramatically tip the scales as both sides benefit from deep learning methods that outperform traditional rule-based solutions.
Organizations need to react to ML-based traps
The problem is the industry's drowsiness. Cybersecurity solutions have been traditionally expensive and time-consuming to implement, especially in big organizations. That is why most security systems work using old-school rule-based or signature-based technology that doesn’t work on new sets of data. An attacker can change the signatures to penetrate the network.
Hackers can use neural networks to train their malware using historical data. It's enough to "feed" an algorithm with the scenarios of phishing emails that were blocked or succeeded in bypassing a detection system. According to specialists, this can increase the attacker's success rate against an AI system from 0.3% to an astonishing 15%.
Additionally, an ML algorithm can help phishers prepare the most effective email that will maximize the "conversion" rate of a system penetration, for example, the number of people clicking a fraudulent link and logging into a fake bank account revealing their password.
One of the most popular ways of dealing with ML security systems is "data poisoning." The deep learning frameworks are no secret, and everyone knows the general rule. If you spam the AI system by marking malware as "safe", it is possible you will cause a security breach, and you are free to roam around the compromised system in the time before a human specialist closes it.
How ML may be useful to protect us
Machine learning can be used to look for more sophisticated characteristics of malware than just precise rules and signatures. It makes a hacker's job much more difficult. Neural networks can learn entirely new ways of detecting anomalies, while the exact mechanism stays unknown even to the engineers supervising them. An ML model can identify malware even if it sees it for the first time.
One of the methods is the analysis of components. Cybercriminals hardly ever build their viruses and trojans from scratch. They pick up ready-to-use blocks of code, since they need many trials before they succeed with a spectacular attack that will bring them a substantial profit. They cannot afford too much coding, and this is the chance for a modern antivirus systems to detect them.
As the mechanical solutions are usually well balanced on both sides, the biggest threat is human error. To protect against it, security specialists apply deep learning in so-called User and Entity Behaviour Analytics (UEBA). Biomimetic data such as keystroke dynamics can be used to verify suspicious patterns. A neural network can profile users and network nodes in an organization, trying to find anomalies that are later examined by a human team.
Providing data to human analysts
Cybersecurity systems produce event management alerts, firewall logs, user activity logs. You need to track each one of them to secure a network. The scale of data generated and interchanged every second makes human security experts helpless unless supported by machines.
Even basic algorithms can perform beneficial tasks of processing and classification of this data. You can train them to recognize patterns, such as spam and phishing emails. Machine learning solutions can take into consideration the sender, content, and context of a message to flag it as a threat.
In practice, the process stays the same. When on holiday in some exotic country, you may receive a call from your bank asking if your card wasn’t stolen. These are algorithms tracking anomalies in your account behavior (unexpected payments in the Philippines) that send alerts to your bank's security team. A human security analyst makes the final decision to verify it with a call.
Language processing and predictive analysis
AI is entering new fields that are useful in cybersecurity. Natural language processing (NLP) is one of them. Recently, deep learning methods have outperformed statistical algorithms in solving some challenging NLP problems. It enables machines to understand an unstructured text, which is a powerful tool in the hands of security experts. They can classify and analyze web pages, emails, or speech transcripts.
Another new trend is predictive analysis. It can help detect potential weak nodes in the network, such as unaware system users. After analyzing many cases of incidents, ML engineers can train a neural to flag people who are the most likely to keep their password on a sticky note. Once again, the final decision to talk with an individual is made by a human specialist.
Chasing cybercriminals in a TOR network
The usual goal of cyberattack is to steal valuable data and disappear. Hackers find anonymous networks created to protect internet users’ privacy, such as TOR, very handy. Encrypted networks and IP addresses leave traditional rule-based and IP blocking solutions vulnerable.
However, deep learning pattern recognition can be used to identify encrypted traffic. Specially trained algorithms can detect a TOR user by analyzing the traffic patterns.
Human and machine model is the solution for the foreseeable future
There are many actions computers do much better than humans. They analyze data, automate tasks, solve equations, but cannot think out-of-the-box. While even the best human security specialists cannot process data and act as quickly as a machine, they can use the same data to make better intuitive decisions when facing an unusual situation.
ML algorithms are not so good at identifying anomalies. There are not enough rare events to efficiently train a neural network to recognize them. That is why AI alone is not able to detect an unusual cyberattack. It can help with data processing, but the final decision belongs to the analyst.
It’s also true on the other side. ML algorithms can optimize spam, phishing, perform DDoS, and brute-force attacks, but penetrating a system in an efficient, undetectable way still needs a creative hacker who combines human-based analysis with the newest algorithms.
This kind of human-machine collaboration is the future trend in InfoSec. At the highest level, cybersecurity will remain a form of art, where the most intelligent and creative minds will compete with the use of the best technology available.