Post-Deployment Anomaly Detection and Diagnosis in Networked Embedded Systems by Program Profiling and Symptom Mining

作者:Dong, Wei*; Luo, Luyao; Chen, Chun; Bu, Jiajun; Liu, Xue; Liu, Yunhao
来源:IEEE Transactions on Parallel and Distributed Systems, 2016, 27(12): 3588-3601.
DOI:10.1109/TPDS.2016.2542815

摘要

Detecting and diagnosing anomalies in networked embedded systems like sensor networks is a very difficult task, due to the variable workloads and severe resource constraints. In this paper, we focus on how to aid bug diagnosis after the system has been deployed. We notice that most node-level debugging tools can provide detailed program information inside the node but fail to detect when and where a problem occurs in the network. On the other hand, most network-level diagnosis tools can effectively detect a problem from the network but fail to narrow down the problem within the node because they lack detailed program information. To close the gap, we propose D2, a new method for post-deployment anomaly detection and diagnosis in networked embedded systems by combining program profiling and symptom mining. D2 employs binary instrumentation to perform lightweight function count profiling. Based on the statistics, D2 uses PCA (Principal Component Analysis) based approach for automatically detecting network anomalies. Compared with previous methods, D2 is able to point programmers closer to the most likely causes by a novel approach combining statistical tests and program call graph analysis. We implement our method based on TinyOS 2.1.1 and evaluate its effectiveness by case studies in the development of a working sensor network. Results show that our method can aid programmers to diagnose problems quickly in real-world sensor network systems, and at the same time, incurs an acceptable overhead to the running system.