摘要

The Individual Haplotyping MFR problem is a computational problem that, given a set of DNA sequence fragment data of an individual, induces the corresponding haplotypes by dropping the minimum number of fragments. Bafna, Istrail, Lancia, and Rizzi proposed an algorithm of time O(2(2k)m(2)n + 2(3k)m(3)) for the problem, where m is the number of fragments, n is the number of SNP sites, and k is the maximum number of holes in a fragment. When there are mate-pairs in the input data, the parameter k can be as large as 100, which would make the Bafna-Istrail-Lancia-Rizzi algorithm impracticable. The current paper introduces a new algorithm PM-MFR of running time O(nk(2)3(k2) + m logm + mk(1)), where k(1) is the maximum number of SNP sites that a fragment covers (k(1) is smaller than n), and k(2) is the maximum number of fragments that cover a SNP site (k(2) is usually about 10). Since the time complexity of the algorithm PM-MFR is not directly related to the parameter k, the algorithm solves the Individual Haplotyping MFR problem with mate-pairs more efficiently and is more practical in real biological applications.