摘要

Integrating various features from different protein properties helps to improve the prediction accuracy of protein structural class but need to deal with the corresponding integrated high-dimensional data. Thus, the feature selection process used to select the informative features from the integrated features also becomes an indispensable key step. This paper proposes a novel feature selection method, Partial-Maximum-Correlation-Information based Recursive Feature Elimination (PMCI-RFE), to quickly select the best feature subset from the integrated high-dimensional protein features set to improve the prediction performance of protein structural class. PMCI-RFE can also be used to find different types of informative features to further analyze some biological relationships. The proposed PMCI-RFE method uses the correlation information between the feature space and class encoding space to select informative features based on the idea of orthogonal component projection in the feature space. The experimental results on six widely used benchmark datasets show that PMCI-RFE is a fast and effective method compare to other four state-of-the-art feature selection methods, which indeed can make full use of different protein property information and improve the predictability of protein structural class.