摘要

Selective ensemble learning is a technique that selects a subset of diverse and accurate basic models in order to generate stronger generalization ability. In this paper, we proposed a novel learning algorithm that is based on parallel optimization and hierarchical selection (PTHS). Our novel feature selection method is based on maximize the sum of relevance and distance (MSRD) for solving the problem of high dimensionality. Specifically, we have a PTHS algorithm that employs parallel optimization and candidate model pruning based on k-means and a hierarchical selection framework. We combine the prediction result of each basic model by majority voting, which employs the divide-and-conquer strategy to save computing time. In addition, the PT algorithm is capable to transform a multi-class problem into a binary classification problem, and thereby allowing our ensemble model to address multi-class problems. Empirical study shows that MSRD is efficient in solving the high dimensionality problem, and PTHS exhibits better performance than the other existing classification algorithms. Most importantly, our classifier achieved high-level performance on several bioinformatics problems (e.g. tRNA identification, and protein-protein interaction prediction, etc.), demonstrating efficiency and robustness.