摘要

The ensemble classifier plays a critical role in protein fold recognition. In this article, a novel hierarchical ensemble classifier named GAOEC (Genetic-Algorithm Optimized Ensemble Classifier) is presented and it can be constructed in the following steps. First, a novel optimized classifier named GAET-KNN (Genetic-Algorithm Evidence-Theoretic K Nearest Neighbors) is proposed as a component classifier. Second, six component classifiers in the first layer are used to get a potential class index for every query protein. Third, according to the results of the first layer, every component classifier in the second layer generates a 27-dimension vector whose elements represent the confidence degrees of 27-folds. Finally, genetic algorithm is used for generating weights for the outputs of the second layer to get the final classification result. The standard percentage accuracy of GAOEC is 64.7% on a widely used benchmark dataset, where the proteins in the testing set have less than 35% identity with those in the training set.

  • 出版日期2008-11
  • 单位湘潭大学; 中国人民解放军信息工程大学