摘要

The volume of malwares is growing at an exponential speed nowadays. This huge growth makes it extremely hard to analyse malware manually. Most existing signatures extracting methods are based on string signatures, and string matching is not accurate and time consuming. Therefore, this paper presents AutoMal, a system for automatically extracting signatures from large-scale malwares. Firstly, the system proposes to represent the network flows by using feature hashing, which can dramatically reduce the high-dimensional feature spaces that are general in malware analysis. Then, we design a clustering and median filtering method to classify the malware vectors into different types. Finally, it introduces the signature generation algorithm based on Bayesian method. The system can extract both the byte signature and the hash signature of malwares from its network flow with low false positive and zero false negative. Our evaluation shows that AutoMal can generate strongly noise-resisted signatures that exactly depict the characteristics of malware.