摘要

When a table containing individual data is published, disclosure of sensitive information should be prohibitive. Since simply removing identifiers such as name and social security number may reveal the sensitive information by linking attacks which joins the published table with other tables on some attributes, the notion of k-anonymity which makes each record in the table be indistinguishable with k-1 other records by suppression or generalization has been proposed previously. It is shown to be NP-hard to k-anonymize a table minimizing information loss. The approximation algorithms with up to O(k) approximation ratio were proposed when generalization is used for anonymization. In this paper, we propose several approximation algorithms for k-anonymity with generalizing the attribute values by hierarchies that guarantee O(logk) approximation ratio and perform significantly better than the traditional algorithms. Since suppression of attributes is a special case of generalization of attributes with the hierarchies of two-level trees where the root nodes are '*' character, our approximation result works also for suppression methods. We next provide O(beta logk) approximate algorithms which gracefully adjust their running time according to the tolerance beta(>= 1) of the approximation ratios. We also present the approximate algorithms for both k-anonymity and C diversity with generalizing the attribute values by hierarchies. Experimental results confirm that our approximation algorithms perform significantly etter than traditional approximation algorithms.

  • 出版日期2010-12