摘要

The information of protein subcellular location can provide useful clues for the research of protein function. In order to understand the function of protein, identifying the subcellular location becomes the important research area of proteomics. Focuses on the topic of prediction of protein subcellular location, the paper makes intensive studies on protein sequence encoding and designing of classification algorithms. We proposes a novel method for protein sequence encoding, firstly, we introduce a novel feature extraction using moment descriptor (MD) which mode position weight and catch the strong correlations in the protein sequence. Then, according to physical and chemical properties of amino acid, we classify twenty kinds of amino acids into 4 categories, and we get 40 features which describes tripeptide composition. The methods were proposed based on a large set of low-identity sequences are very simple. By means of this method, the prediction result by the KNN and SVM classifier shows that our method is better than some others.

全文