摘要

Detecting concept drifts and reducing the impact from the noise in real applications of data streams are challenging but valuable for inductive learning. It is especially a challenge in a light demand on the overheads of time and space. However, though a great number of inductive learning algorithms based on ensemble classification models have been proposed for handling concept drifting data streams, little attention has been focused on the detection of the diversity of concept drifts and the influence from noise in data streams simultaneously. Motivated by this, we present a new light-weighted inductive algorithm for concept drifting detection in virtue of an ensemble model of random decision trees (named CDRDT) to distinguish various types of concept drifts from noisy data streams in this article. We use variably small data chunks to generate random decision trees incrementally. Meanwhile, we introduce the inequality of Hoeffding bounds and the principle of statistical quality control to detect the different types of concept drifts and noise. Extensive studies on synthetic and real streaming data demonstrate that CDRDT could effectively and efficiently detect concept drifts from the noisy streaming data. Therefore, our algorithm provides a feasible reference framework of classification for concept drifting data streams with noise.