摘要

The focus of this paper is the frequent gene team problem. Given a quorum parameter mu and a set of m genomes, the problem is to find gene teams that occur in at least R. of the given genomes. In this paper, a new algorithm is presented. Previous solutions are efficient only when mu is small. Unlike previous solutions, the presented algorithm does not rely on examining every combination of mu genomes. Its time complexity is independent of mu. Under some realistic assumptions, the practical running time is estimated to be O(m(2)n(2) lg n), where n is the maximum length of the input genomes. Experiments showed that the presented algorithm is extremely efficient. For any mu, it takes less than 1 second to process 100 bacterial genomes and takes only 10 minutes to process 2,000 genomes. The presented algorithm can be used as an effective tool for large scale genome analyses.