摘要

Accompanied with the rapid increase of the amount of data registered in the databases of biological sequences, the need for a fast method of sequence comparison applicable to sequences of large size is also increasing. In general, alignment is used for sequence comparison. However, the alignment may not be appropriate for comparison of sequences of large size such as whole genome sequences due to its large time complexity. In this article, we propose a semi alignment-free method of sequence comparison based on word frequency distributions, in which we partially use the alignment to measure word frequencies along with the idea of fuzzy set theory. Experiments with ten bacterial genome sequences demonstrated that the fuzzy measurements has the effect that facilitates discrimination between close relatives and distant relatives.

  • 出版日期2014-8

全文