摘要

Minimal cost classification is an important issue in data mining and machine learning. Recently, many enhanced algorithms based on the C4.5 algorithm have been proposed to tackle this issue. One disadvantage in these methods is that they are inefficient for medium or large data sets. To overcome this problem, we present a cost-sensitive decision tree algorithm based on weighted class distribution with a batch deleting attribute mechanism (BDADT). In the BDADT algorithm, a heuristic function is designed for evaluating attributes in node selection. This contains a weighted information gain ratio, a test cost, and a user specified non-positive parameter for adjusting the effect of the test cost. Meanwhile, a batch deleting attribute mechanism is incorporated into our algorithm. This mechanism deletes redundant attributes according to the values of the heuristic function in the process of assigning nodes to improve the efficiency of decision tree construction. Experiments are conducted on 20 UCI data sets with representative test cost normal distribution to evaluate the proposed BDADT algorithm. The experimental results show that the average total costs obtained by the proposed algorithm are smaller than the existing CS-C4.5 and CS-GainRatio algorithms. Furthermore, the proposed algorithm significantly increases the efficiency of cost-sensitive decision tree construction.

  • 出版日期2017-2-1
  • 单位闽南师范大学