A semi-random multiple decision-tree algorithm for mining data streams

作者:Hu Xue Gang*; Li Pei Pei; Wu Xin Dong; Wu Gong Qing
来源:Journal of Computer Science and Technology, 2007, 22(5): 711-724.
DOI:10.1007/s11390-007-9084-9

摘要

Mining with streaming data is a hot topic in data mining. When performing classification on data streams, traditional classification algorithms based on decision trees, such as ID3 and C4.5, have a relatively poor efficiency in both time and space due to the characteristics of streaming data. There are some advantages in time and space when using random decision trees. An incremental algorithm for mining data streams, SRMTDS (Semi-Random Multiple decision Trees for Data Streams), based on random decision trees is proposed in this paper. SRMTDS uses the inequality of Hoeffding bounds to choose the minimum number of split-examples, a heuristic method to compute the information gain for obtaining the split thresholds of numerical attributes, and a Naive Bayes classifier to estimate the class labels of tree leaves. Our extensive experimental study shows that SRMTDS has an improved performance in time, space, accuracy and the anti-noise capability in comparison with VFDTc, a state-of-the-art decision-tree algorithm for classifying data streams.