摘要

Public clouds are a style of computing platforms, where scalable and elastic Information Technology-enabled capabilities are provided as a service to external customers using Internet technologies. Using public cloud services can reduce costs and increase the choices of technologies, but it also implies limited system information for users. Thus, anomaly detection at user end has to be non-intrusive and hence difficult, particularly during DevOps operations because the impacts from both anomalies and these operations are often indistinguishable, and hence, it is hard to detect the anomalies. In this paper, our work is specific to a successful public cloud, Amazon Web Service, and a representative DevOps operation, rolling upgrade, on which we report our anomaly detection that can effectively detect anomalies. Our anomaly detection requires only metrics data and logs supplied by most public clouds officially. We use support vector machine to train multiple classifiers from monitored data for different system environments, on which the log information can indicate the best suitable classifier. Moreover, our detection aims at finding anomalies over every time interval, called window, such that the features include not only some indicative performance metrics but also the entropy and the moving average of metrics data in each Our experimental evaluation systematically demonstrates the effectiveness of our approach.