摘要

Research and business organizations are vulnerable to attack by malware, particularly advanced persistent threat malware tailored for a specific target. Malware identification is made more difficult because samples can be subtly altered to avoid detection by methods that check for an identical match to known code. Different versions of an original piece of malware form a malware family. When new malicious software is identified, reverse engineers seek to identify its origin and purpose. Knowing whether new malware is from a known family or a previously unobserved family aids the efficiency of reverse engineers. This article presents a three-stage method to classify new malware into a family by comparing its similarity to existing static traces, and assigning it to the most similar family. First, a fast filtering method creates a shortlist of samples with some similarity to the new malware, using a simple bigram comparison of the instructions. The second stage takes the call graph view of the shortlisted static traces and uses simulated annealing to estimate the graph edit distance, a measure of dissimilarity between graphs. Finally, a random forest classifier combines the previous two results to predict the family to which a new sample belongs. The paper also considers how to detect when malware is from a new family.

  • 出版日期2017-6