Machine learning is trending in the security industry. A quick Google search brings up nearly every security company and many technology companies jumping on the machine learning bandwagon as the newest silver bullet against threats. But machine learning isn't new. In fact, it has been around for decades.
Machine learning is a series of algorithms, the simplest being a decision tree that proceeds through a yes or no path, classifying things until it reaches a conclusion. This is supervised machine learning. It classifies what it has been trained to classify, and it is incredibly valuable to strengthen security. If you train a machine-learning algorithm to look for certain words used together, it does a great job of identifying and combatting spam. Train it to look for specific indicators, and it can detect and stop various types of malware. We can even train it to monitor internet usage to detect malicious internet events. And we're getting better at training – taking advantage of bigger data sets and more processing power to train on more things faster allowing us to apply machine learning to special problems. We now have thousands of algorithms coming together to identify a broad swath of known vulnerabilities and threats.
There is also unsupervised machine learning. It helps to detect unknown threats based on a common set of behaviors, looking for activities within or outside of those conventions and matching that. For example, it may know that a user always logs on between 8 a.m. and 5 p.m. from San Francisco from a specific laptop. All of a sudden the user appears to log in from Russia from a different computer and is downloading documents. These behaviors it has never seen before help it to identify potential malicious activity.
Together, these two categories show that machine learning holds great promise. Our experiences with Amazon, Google, or Facebook reinforce this perception, learning and delivering information in real time and solving problems at a massive scale. But unlike these modern apps, security is a beast unto itself. If Amazon recommends a product that isn't a good fit, or if a Google search comes up with something that isn't relevant, it's a minor misstep. In contrast, security is a high stakes game with adversaries continuously changing tactics, techniques, and procedures (TTPs) to infiltrate organizations and cause disruption or steal high-value data. Security practitioners must deal with a tremendous amount of noise, making response incredibly difficult. Block something in error and your entire infrastructure could shut down; the toxicity of a false positive can be massive.
Determining which events warrant attention is no easy task – it's like finding needles in a haystack. While many modern business applications can rely exclusively on machine learning in a hands-off approach, security has always required attention and tuning. Machine learning is incredibly valuable to strengthen security. In fact, how we will continue to bolster defenses relies a great deal on the advances in machine learning. But it isn't a panacea. What's needed is a combination of numerous and differing technologies along with humans working together to achieve the highest levels of security effectiveness.
We're in the early days of this collaborative approach, and it requires a different skill set – security experts, working with data science experts, working with backend professionals to query and understand the data correctly. We're seeing that with time, as they interact and learn from each other, teams get stronger, machines get stronger, and effectiveness improves. This approach augments the lack of human capital the security industry is facing and allows organizations to scale faster. Efficacy rates go up because you can discover and block more threats. And you gain context which is critical to finding the needles in the haystack, reducing the firehose of information to a more manageable subset of higher priority events that are mission critical.
With the right information at the right time you can make more informed decisions, faster. As we continue to improve and drive mean time to detection of threats closer to zero, we can evolve from a traditionally reactive to more predictive stance. By combining the patterns of behavior machine learning can detect with human expertise, we can begin to anticipate and better prepare for what may happen next, how, where, and when.
As I said earlier, there's a lot of buzz about machine learning, and it's going to continue. The problem is that it can be hard to judge the quality of machine learning, because the process often isn't easily understood. Without visibility into how it came to an outcome, you can't understand exactly “why” it decided to detect something, which can make investigation hard. And if you get an incorrect answer you can't determine what went wrong to correct the false positive in the future. There is also no practical way to fully test machine learning in a custom environment to determine where mistakes may occur.
But instead of blindly trusting a vendor's approach to machine learning, which can be risky, ask for a proof of concept. With evidence that the solution using the algorithms will perform in your environment and provide capabilities and context that will add value to your team, you can build confidence in its use.
The promise of machine learning is to help drive down the time to detect and respond to threats as part of an overall security operation, making the humans and systems you have more effective and scalable, with the context they need to focus on real threats today, and better anticipate threats tomorrow. That's a big promise that's sure to keep machine learning at the forefront of security innovation.