摘要

The occurrence of outliers in real-world phenomena is quite usual. If these anomalous data are not properly treated, unreliable models can be generated. Many approaches in the literature are focused on a posteriori detection of outliers. However, a new methodology to a priori predict the occurrence of such data is proposed here. Thus, the main goal of this work is to predict the occurrence of outliers in time series, by using, for the first time, imbalanced classification techniques. In this sense, the problem of forecasting outlying data has been transformed into a binary classification problem, in which the positive class represents the occurrence of outliers. Given that the number of outliers is much lower than the number of common values, the resultant classification problem is imbalanced. To create training and test sets, robust statistical methods have been used to detect outliers in both sets. Once the outliers have been detected, the instances of the dataset are labeled accordingly. Namely, if any of the samples composing the next instance are detected as an outlier, the label is set to one. As a study case, the methodology has been tested on electricity demand time series in the Spanish electricity market, in which most of the outliers were properly forecast.

  • 出版日期2016-9