摘要

Based on the frequency and the position distribution entropy of the existing k-words, we construct a modified statistical method for k-words. We call this method as an Existing-k-word method. The method consists of two parts. The first is to extract the existing k-words in proteins but not the all possible 20(k) k-words. The other is to design a feature vector consisting of the frequencies and the position distribution entropies of the existing k-words. Then, this proposed method is applied to two datasets, nine ND5 proteins (NADH dehydrogenase subunit 5), and twenty-four transferrin protein sequences. The results illustrate the utility of the proposed method.

  • 出版日期2015