Data Inconsistency Evaluation for Cyberphysical System

作者:Wang, Hao*; Li, Jianzhong; Gao, Hong
来源:International Journal of Distributed Sensor Networks, 2016, 12(8): 9496878.
DOI:10.1177/155014779496878

摘要

Cyberphysical systems (CPSs) have been widely applied in a variety of applications to collect data, while data is often dirty in reality. We pay attention to the way of evaluating data inconsistency which is a major concern for evaluating quality of data and its source. This paper is the first study on data inconsistency evaluation problem for CPS based on conditional functional dependencies. Given a database instance.. including n tuples and a CFD set Sigma including r CFDs, data inconsistency is defined as the ratio of the size of minimum culprit in D, where a culprit is a set of tuples leading to integrity errors. Firstly, we give a sufficient analysis on the complexity and inapproximability of minimum culprit problem. Then, we provide a practical algorithm that gives a 2-approximation of the data dirtiness in D (rn log n) time based on independent residual subgraph. To deal with the large dynamic data, we provide a compact structure based on B-tree for storing independent residual subgraph in order to update inconsistency efficiently. At last, we test our algorithm on both synthetic and real-life datasets; the experiment results show the scalability of our algorithm and the quality of the evaluation result.