Far from sci-fi depictions, artificial intelligence – through machine learning algorithms and big data – is key to defusing today's evolving cyberthreats.
Artificial intelligence (AI) is the hottest new trend that’s been around for years. While that may sound like an oxymoron, there’s a reason the buzz around AI and its subset, machine learning, hasn’t quieted down and is likely to continue for a long time. It’s uniquely critical in our fight against the ever-growing number and variety of cyberthreats and vital to our ability to scan for and remove malware.
The field of AI research began in 1956, with scientists working on the Dartmouth Summer Research Project on Artificial Intelligence. Since then, AI has been depicted in many films, typically in the form of robots with human-like thoughts and feelings. AI in real life, however, is far different from and broader than what we see in the movies.
AI is the science of making computers carry out tasks by mimicking human intelligence and is used in industries including healthcare, customer care, and finance. Of particular interest and a subset of AI is the process by which computers create machine learning algorithms to make decisions based on patterns in data, thus learning from the big data analysis they perform. At Avast, we’ve been using AI and machine learning for years to protect our users from prevalent threats. One of our engines, MDE, uses machine learning designed internally from our security specialists in 2012.
In the early days of computers and the internet, we used string-based signatures to generalize variants of a threat. Signatures require an analyst and time, and aren’t flexible enough to detect the vast variety of modern cyber threats, which have proliferated because they’re so profitable for criminals. There just aren’t enough people or time to keep up with the sheer number of new threats, which is where AI and machine learning come into play.
Machine learning and AI are vital to security, as cybercriminals around the world work around the clock to create new malware variants that often look like clean files, and who create malware that can morph, making these threats even more difficult for antivirus engines to detect. To make matters worse, criminals also sell malware on the darknet, allowing even people with little technical knowledge to alter and spread new malware strains.
Of course human analysts can analyze files to determine whether they’re malicious or not, but this requires that they analyze the files’ code, checking if they have malicious characteristics. Analyzing every file this way is physically impossible, considering we see more than a million new files every day. You wouldn’t want a malware scanner or malware remover that relied on human power alone.
Computers, on the other hand, are good at crunching numbers. To make them do what analysts used to do manually, we’ve created algorithms that convert files into suitable numeric representations. These machine learning algorithms extract certain characteristics, or “fingerprints,” from the files they receive. What gets extracted is much smaller than the original files, thus suitable for massive processing and – especially important – quick decision-making.
Because the threat landscape continuously evolves, we continuously update our machines’ knowledge of both clean and malicious files, so they can better differentiate between the two. To do this, our analysts convert their findings of malware authors’ new techniques into new algorithms, which our machines then use to learn how to make decisions when they receive new data.
Of course our ability to do all this depends on the data we’re able to feed our number-crunching computers. The more data we put in, the more accurate the decisions our systems make. So thanks to our more than 400 million worldwide users, who act as sensors, we have a vast amount of information, on which we perform extensive data analytics to help us determine whether a file is malicious or not. Ultimately the combination of big data, machine learning, and AI are what make us able to provide our users the fastest, best malware protection.
Avast collects a stream of potential false detections to which we need to react swiftly. To do this, we have a dedicated pipeline which processes the reported binaries and automatically decides whether or not a reported binary is clean.
Avast researchers use a general feature-blind learning framework for fast detection of novel malware based on diverse data sources.