摘要

In many real world problems, the collected data are not always numeric; rather, the data can include categorical variables. Inclusion of different types of variables may lead to complications in regression analysis. Many regression algorithms such as linear regression, support vector regression, and neural networks that train parameters of a model to identify relations between input and output variables, can easily process numeric variables; however, there are additional considerations for categorical variables. On the other hand, a decision tree algorithm estimates a target based on the specified rules; therefore, it can support categorical variables as well as numeric variables. Using this property, a new hybrid model combining a decision tree with another regression algorithm is proposed to analyze mixed data. In the proposed model, the portions explained by categorical variables in target values are estimated by the decision tree and the remaining parts are predicted by any regression algorithm trained by numerical variables. The proposed algorithm was evaluated using 12 datasets selected from real decision problems, and it was confirmed that the proposed algorithm achieved better or comparable accuracy than the comparison methods including the M5 decision tree and the evolutionary tree. In addition, the new hybrid method does not significantly increase computational complexity, even though it builds two separate models, which is an advantage that is in contrast with the M5 decision tree and the evolutionary tree.

  • 出版日期2017-10