摘要

Creating effective features is a critical issue in malware analysis. It requires a proper tradeoff between discriminative power and invariance. Previous studies have shown that it is fairly effective to design features based on the binary code. However, the current existing binary-based features seldom take into consideration the problem of obfuscation, such as relocated sections, incomplete code and redundant operations. In this paper, we propose a novel Pairwise rotation invariant co-occurrence local binary pattern (PRICoLBP) feature, and further extend it to incorporate the Term frequency-inverse document frequency (TFIDF) transform. Different from other static analysis techniques, our method not only achieves better linear separability, but also appears to be more resilient to obfuscation. In addition, we evaluate PRICoLBP-TFIDF comprehensively on three datasets from different perspectives, e.g., classification performance, classifier selection and performance against obfuscation. What's more, we compare our PRICoLBP-TFIDF method with other techniques, and demonstrate that PRICoLBP-TFIDF is quite an efficient and effective tradeoff between discriminative power and invariance.