From ransomware to botnets, malware takes seemingly endless forms, and it’s forever proliferating. Try as we might, the humans who would defend our computers from it are drowning in the onslaught, so they are turning to AI for help.

There’s just one problem: machine-learning tools need a lot of data. That’s fine for tasks like computer vision or natural-language processing, where large, open-source data sets are available to teach algorithms what a cat looks like, say, or how words relate to one another.

In the world of malware, such a thing hasn’t existed—until now. This week, the cybersecurity firm Endgame released a large, open-source data set called EMBER (for “Endgame Malware Benchmark for Research”).

EMBER is a collection of more than a million representations of benign and malicious Windows-portable executable files, a format where malware often hides. A team at the company also released AI software that can be trained on the data set.

The idea is that if AI is to become a potent weapon in the fight against malware, it needs to know what to look for. Security firms have a sea of potential data to train their algorithms on, but that’s a mixed blessing. Read more from…

thumbnail courtesy of