A New Measure for Similarity Searching in DNA Sequences

作者:Zhang Yusen*; Chen Wei
来源:MATCH-Communications in Mathematical and in Computer Chemistry, 2011, 65(2): 477-488.

摘要

The purpose of the present study was designed to develop a new mathematical model for comparison of DNA sequences. Instead of the classical distances, a new distance based on dinucleotide absolute frequency in large DNA sequences is introduced. The proposed distance that requires neither homologous sequences nor prior sequence alignments is used to search for similar sequences from a database. This method was tested using a set of 39 DNA sequences and a set of 63 DNA sequences. The sensitivity and the selectivity are computed to evaluate and compare the performance of the proposed distance measure. Real data analysis shows that it is a very efficient, high-selective and high-sensitive comparison algorithm that can determine the relative dissimilarity in a large dataset of DNA sequences' very rapidly.