摘要

Discretization is a process to convert continuous attributes into discrete format to represent signals for further data processing in learning systems. The main concern in discretization techniques is to find an optimal representation of continuous values with limited number of intervals that can effectively characterize the data and meanwhile minimize information loss. In this paper, we propose a novel class-attribute interdependency discretization algorithm (termed as NCAIC), which takes account of data distribution and the interdependency between all classes and attributes. In our proposed solution, the upper approximation of rough sets as a prime part of the discretization algorithm is applied, and the class-attribute mutual information is used to automatically control and adjust the scope of the discretization of continuous attributes. Some experiments with comparison to five other discretization algorithms are reported, where 13 benchmarked datasets extracted from UCI database and the well-known C4.5 decision tree tool are employed in this study. Results demonstrate that in general our proposed algorithm outperforms other tested discretization algorithms in terms of classification performance.