摘要

With the widely adoption of Hadoop-based big data platfom in industry, monitoring and maintaining Hadoop clusters has become increasingly critical, which also brings high demands on automatic or semi-automatic health checking and diagonostics. Anomaly detection plays an essential role for monitoring Hadoop clusters and provides basis for automatic alerts generation. Most existing methods for anomaly detection for Hadoop clusters are rule-based and relies on characterizing of anomalies with domain-knowledge and understanding of the specific target system, which bring barriers for practical application in complex business platform. In this paper, we leverage statistical analysis to discover critical components in execution data collected through Hadoop JobTracker. A two-stage anomaly detection method is proposed by integrating PCA and DBSCAN. This new method has no requirements on prior knowledge of target nodes and jobs. Experiments are conducted to detect buggy jobs based on real execution data collected from three Hadoop clusters provided by one of the largest E-Commerce company in the world. The results show our method is effective to detect anomalies without specifying the features of anomalies in advance.

全文