A Measure-Theoretic Foundation for Data Quality

作者:Bronselaer Antoon*; De Mol Robin; De Tre Guy
来源:IEEE Transactions on Fuzzy Systems, 2018, 26(2): 627-639.
DOI:10.1109/TFUZZ.2017.2686807

摘要

In this paper, a novel framework for data quality is proposed by adopting ameasure-theoretic treatment of the problem. Instead of considering a specific setting in which quality must be assessed, our approach departs more formally from the concept of measurement. The basic assumption of the framework is that the highest possible quality can be described by means of a set of predicates. Quality of data is then measured by evaluating those predicates and by combining their evaluations. This combination is based on a capacity function (i.e., a fuzzy measure) that models for each combination of predicates the capacity with respect to the quality of the data. It is shown that expression of quality on an ordinal scale entails a high degree of interpretation and a compact representation of the measurement function. Within this purely ordinal framework for measurement, it is shown that reasoning about quality beyond the ordinal level naturally originates from the uncertainty about predicate evaluation. It is discussed how the proposed framework is positioned with respect to other approaches with particular attention to aggregation of measurements. The practical usability of the framework is discussed for several well known dimensions of data quality and demonstrated in a use-case study about clinical trials.

  • 出版日期2018-4