AN EFFICIENT ENTITY RESOLUTION METHOD FOR LARGE RELATIONS

作者:Li, Yakun*; Wang, Hongzhi; Gao, Hong; Li, Jianzhong
来源:International Journal of Cooperative Information Systems, 2013, 22(1): UNSP 1350006.
DOI:10.1142/S0218843013500068

摘要

Entity resolution (ER) is to find the data objects referring to the same real-world entity. When ER is performed on relations, the crucial operator is record matching, which is to judge whether two tuples refer to the same real-world entity. Record matching is a longstanding issue. However, with massive and complex data in applications, current methods cannot satisfy the requirements. A Sequence-rule-based record matching (SeRe-Matching) is presented with the consideration of both which attributes should be used and their importance in record matching. We have changed the Bloom filter and therefore the checking speed is greatly increased. The best performance of the algorithm makes the complexity of entity resolution O(n). And extensive experiments were performed to evaluate our methods.

全文