摘要

A malware detection model based on a negative selection algorithm with penalty factor (NSAPF) is proposed in this paper. This model extracts a malware instruction library (MIL), containing instructions that tend to appear in malware, through deep instruction analysis with respect to instruction frequency and file frequency. From the MIL, the proposed model creates a malware candidate signature library (MCSL) and a benign program malware-like signature library (BPMSL) by splitting programs orderly into various short bit strings. Depending on whether a signature matches "self", the NSAPF further divides the MCSL into two malware detection signature libraries (MDSL1 and MDSL2), and uses these as a two-dimensional reference for detecting suspicious programs. The model classifies suspicious programs as malware and benign programs by matching values of the suspicious programs with MDSL1 and MDSL2. Introduction of a penalty factor C in the negative selection algorithm enables this model to overcome the drawback of traditional negative selection algorithms in defining the harmfulness of "self" and "nonself", and focus on the harmfulness of the code, thus greatly improving the effectiveness of the model and also enabling the model to satisfy the different requirements of users in terms of true positive and false positive rates. Experimental results confirm that the proposed model achieves a better true positive rate on completely unknown malware and a better generalization ability while keeping a low false positive rate. The model can balance and adjust the true positive and false positive rates by adjusting the penalty factor C to achieve better performance.