摘要

Boosting is one of the most important strategies in ensemble learning because of its ability to improve the stability and performance of weak learners. It is nonparametric, multivariate, fast and interpretable but is not robust against outliers. To enhance its prediction accuracy as well as immunize it against outliers, a modified version of a boosting algorithm (AdaBoost R2) was developed and called AdaBoost R3. In the sampling step, extremum samples were added to the boosting set. In the robustness step, a modified Huber loss function was applied to overcome the outlier problem. In the output step, a deterministic threshold was used to guarantee that bad predictions do not participate in the final output. The performance of the modified algorithm was investigated with two anticancer data sets of tyrosine kinase inhibitors, and the mechanism of inhibition was studied using the relative weighted variable importance procedure. Investigating the effect of base learner's strength reveals that boosting is only successful using the classification and regression tree method (a weak to moderate learner) and does not have a significant effect using the radial basis functions partial least square method (a strong base learners). Copyright (c) 2015 John Wiley & Sons, Ltd. The AdaBoost R3 algorithm was proposed by the following modifications on the AdaBoost R2: introducing extremums to the boosting set, passing the loss values through a modified Huber loss function and adding an acceptance threshold to remove the bad predictions.

  • 出版日期2015-4