摘要

Random projection is widely used as a method of dimension reduction. In recent years, its combination with standard techniques of regression and classification has been explored. Here, we examine its use for anomaly detection in high-dimensional settings, in conjunction with principal component analysis (PCA) and corresponding subspace detection methods. We assume a so-called spiked covariance model for the underlying data generation process and a Gaussian random projection. We adopt a hypothesis testing perspective of the anomaly detection problem, with the test statistic defined to be the magnitude of the residuals of a PCA analysis. Under the null hypothesis of no anomaly, we characterize the relative accuracy with which the mean and variance of the test statistic from compressed data approximate those of the corresponding test statistic from uncompressed data. Furthermore, under a suitable alternative hypothesis, we provide expressions that allow for a comparison of statistical power for detection. Finally, whereas these results correspond to the ideal setting in which the data covariance is known, we show that it is possible to obtain the same order of accuracy when the covariance of the compressed measurements is estimated using a sample covariance, as long as the number of measurements is of the same order of magnitude as the reduced dimensionality. We illustrate the practical impact of our results in the context of predicting volume anomalies in Internet traffic data.

  • 出版日期2013-11