摘要

The simplicity and interpretability of decision tree induction makes it one of the more widely used machine learning methods for data classification. However, for continuous valued (real and integer) attribute data, there is room for further improvement in classification accuracy, complexity, and tree scale. We propose a new K-ary partition discretization method with no more than K-1 cut points based on Gaussian membership functions and the expected class number. A new K-ary crisp decision tree induction is also proposed for continuous valued attributes with a Gini index, combining the proposed discretization method. Experimental results and non-parametric statistical tests on 19 real-world datasets showed that the proposed algorithm outperforms four conventional approaches in terms of both classification accuracy, tree scale, and particularly tree depth. Considering the number of nodes, the proposed methods decision tree tends to be more balanced than in the other four methods. The complexity of the proposed algorithm was relatively low.