Using Markov model to improve word normalization algorithm for biological sequence comparison

Dai, Qi<sup>*</sup>; Liu, Xiaoqing; Yao, Yuhua; Zhao, Fukun

doi:10.1007/s00726-011-0906-2

摘要

There are two crucial problems with statistical measures for sequence comparison: overlapping structures and background information of words in biological sequences. Word normalization in improved composition vector method took into account these problems and achieved better performance in evolutionary analysis. The word normalization is desirable, but not sufficient, because it assumes that the four bases A, C, T, and G occur randomly with equal chance. This paper proposed an improved word normalization which uses Markov model to estimate exact k-word distribution according to observed biological sequence and thus has the ability to adjust the background information of the k-word frequencies in biological sequences. The improved word normalization was tested with three experiments and compared with the existing word normalization. The experiment results confirm that the improved word normalization using Markov model to estimate the exact k-word distribution in biological sequences is more efficient.

出版日期2012-5
单位浙江理工大学; 杭州电子科技大学

全文

访问全文

收藏分享被引(2) 浏览

更新时间：2023-07-01 11:45

Using Markov model to improve word normalization algorithm for biological sequence comparison

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友