摘要

Recently data stream has been extensively explored due to its emergence in a great deal of applications such as sensor networks, web click streams and network flows. One of the most important challenges in data streams is concept change where data underlying distributions change from time to time. A vast majority of researches in the context of data stream mining are devoted to labeled data, whereas, in real word human practice label of data are rarely available to the learning algorithms. Moreover, most of the methods that detect changes in unlabeled data stream merely deal with numerical data sets, and also, they are facing considerable difficulty when dimension of data tends to increase. In this paper, we present a Precise Statistical approach for Concept Change Detection in unlabeled data streams, which, abbreviated as PSCCD, detects changes using an exchangeable test. This hypothesis test is driven from a martingale which is based on Doob's Maximal Inequality. The advantages of our approach are three fold. First, it does not require a sliding window on the data stream whose size is a well-known challenging issue; second, it works well in multi-dimensional data stream, and last but not the least, it is applicable to different types of data including categorical, numerical and mixed-attribute data streams. To explore the advantages of our approach, quite a lot of experiments with different settings and specifications are conducted. The obtained results are very promising.

  • 出版日期2011-8