摘要

In this article, we propose a new method for computing rare maximal exact matches between multiple sequences. A rare match between k sequences S(1),...,S(k) is a string that occurs at most t(i)-times in the sequence Si, where the t(i) > 0 are user-defined thresholds. First, the suffix tree of one of the sequences ( the reference sequence) is built, and then the other sequences are matched separately against this suffix tree. Second, the resulting pairwise exact matches are combined to multiple exact matches. A clever implementation of this method yields a very fast and space efficient program. This program can be applied in several comparative genomics tasks, such as the identification of synteny blocks between whole genomes.

  • 出版日期2008-5
  • 单位上海生物信息技术研究中心