DECISION TREE WITH BETTER CLASS PROBABILITY ESTIMATION

作者:Jiang Liangxiao*; Li Chaoqun; Cai Zhihua
来源:International Journal of Pattern Recognition and Artificial Intelligence, 2009, 23(4): 745-763.
DOI:10.1142/S0218001409007296

摘要

Traditionally, the performance of a classifier is measured by its classification accuracy or error rate. In fact, probability-based classifiers also produce the class probability estimation (the probability that a test instance belongs to the predicted class). This information is often ignored in classification, as long as the class with the highest class probability estimation is identical to the actual class. In many data mining applications, however, classification accuracy and error rate are not enough. For example, in direct marketing, we often need to deploy different promotion strategies to customers with different likelihood (class probability) of buying some products. Thus, accurate class probability estimations are often required to make optimal decisions. In this paper, we firstly review some state-of-the-art probability-based classifiers and empirically investigate their class probability estimation performance. From our experimental results, we can draw a conclusion: C4.4 is an attractive algorithm for class probability estimation. Then, we present a locally weighted version of C4.4 to scale up its class probability estimation performance by combining locally weighted learning with C4.4. We call our improved algorithm locally weighted C4.4, simply LWC4.4. We experimentally test LWC4.4 using the whole 36 UCI data sets selected by Weka. The experimental results show that LWC4.4 significantly outperforms C4.4 in terms of class probability estimation.