Discovering Conservation Rules

作者:Golab Lukasz*; Karloff Howard; Korn Flip; Saha Barna; Srivastava Divesh
来源:IEEE Transactions on Knowledge and Data Engineering, 2014, 26(6): 1332-1348.
DOI:10.1109/TKDE.2012.171

摘要

Many applications process data in which there exists a %26quot;conservation law%26quot; between related quantities. For example, in traffic monitoring, every incoming event, such as a packet%26apos;s entering a router or a car%26apos;s entering an intersection, should ideally have an immediate outgoing counterpart. We propose a new class of constraints-Conservation Rules-that express the semantics and characterize the data quality of such applications. We give confidence metrics that quantify how strongly a conservation rule holds and present approximation algorithms (with error guarantees) for the problem of discovering a concise summary of subsets of the data that satisfy a given conservation rule. Using real data, we demonstrate the utility of conservation rules and we show order-of-magnitude performance improvements of our discovery algorithms over naive approaches.

  • 出版日期2014-6
  • 单位AT&T Labs