摘要

Application of boosting to both two-class and multi-class classification problems are studied. Five real chemical data sets are used. Each data is randomly divided into two subsets, one for training and the other for prediction. For two-class classification, each data is separated into a high response level class and a low response level class according to a threshold value. As a result, three data sets, wheat data, cream data and HIV data, show that boosting using classification and regression trees (CART) as a base learner may decrease the misclassification rate in prediction with respect to using a single CART. However, boosting for green tea data indicates that overfitting may occur when boosting is applied. For the chromatographic retention data, boosting performs worse than a single CART. The cream data and the HIV data are also used for multi-class classification. Both data sets demonstrate that boosting performs better than CART in multi-classification. Variable importance analysis suggests that the improvement made by boosting may be due to the use of more variables, which give more information on special types of samples in the training data.