摘要

As one of the basic research methods of bioinformatics, DNA motif finding is of great significance to the study of mechanisms for regulating gene expression and the discovery of biological functional sites. However, because of the high sensitivity of DNA data, the privacy disclosure of these data during motif finding has become a bottleneck in the field of gene research. Meanwhile, traditional privacy protection data mining methods cannot deal with DNA sequences directly, and the existing private motif finding methods usually decrease the utility of the results. To solve these problems, we propose a high-utility motif finding algorithm based on E-differential privacy, which is known as a rigorous definition of privacy with meaningful privacy guarantees in the presence of arbitrary external information. Our solution makes use of the closed frequent pattern set to reduce redundant motifs of result sets and obtain accurate motifs results, satisfying E-differential privacy. Furthermore, a post-processing method based on the best linear unbiased estimate is used to optimize the utility of noisy consolidated motif support. Experiments on real-life DNA sequence datasets confirm that our algorithm is superior to the existing algorithms in terms of utility.