Network-Wide Anomaly Event Detection and Diagnosis With perfSONAR

作者:Zhang Yuanxun*; Debroy Saptarshi; Calyam Prasad
来源:IEEE Transactions on Network and Service Management, 2016, 13(3): 666-680.
DOI:10.1109/TNSM.2016.2546943

摘要

High-performance computing (HPC) environments supporting data-intensive applications need multidomain network performance measurements from open frameworks such as perfSONAR. Detected network-wide correlated anomaly events that impact data throughput performance need to be quickly and accurately notified along with a root-cause analysis for remediation. In this paper, we present a novel network anomaly events detection and diagnosis scheme for network-wide visibility that improves accuracy of root-cause analysis. We address analysis limitations in cases where there is absence of complete network topology information, and when measurement probes are mis-calibrated leading to erroneous diagnosis. Our proposed scheme fuses perfSONAR time-series path measurements data from multiple domains using principal component analysis (PCA) to transform data for accurate correlated and uncorrelated anomaly events detection. We quantify the certainty of such detection using a measurement data sanity checking that involves: 1) measurement data reputation analysis to qualify the measurement samples and 2) filter framework to prune potentially misleading samples. Lastly, using actual perfSONAR one-way delay measurement traces, we show our proposed scheme's effectiveness in diagnosing the root-cause of critical network performance anomaly events.

  • 出版日期2016-9