摘要

Motifs comparison plays a key role in clustering of redundant motifs and mapping motifs to transcription factors from previously characterized motif databases. Most of existed algorithms decompose the similarity of two motifs into the sum of similarities of aligned positions using position-independence assumption. However it is unreasonable to compare two motifs with vast difference in length. In this paper, we present a novel features extraction method, which extracts statistical information of positions information content and pair wise nucleotide dependencies. Then we combine these two aspects of information into one uniform formula called probability similarity scoring schema (PS3). Results on simulated dataset generated from JASPAR database demonstrates that our method outperforms others, and experiments on a real dataset from human kidney tissue shows that our method finds many motifs that are not only found in human tissue but also in relevant species such as mouse and rat, which indicates that it's a possible approach for elucidating DNA motifs that employing cross-species sequence conservation.

全文