A modified Markov clustering approach to unsupervised classification of protein sequences

Szilagyi Laszlo<sup>*</sup>; Medves Lehel; Szilagyi Sandor M

doi:10.1016/j.neucom.2010.02.023

摘要

In this paper we propose a modified Markov clustering algorithm for efficient and accurate clustering of large protein sequence databases, based on previously evaluated sequence similarity criteria. The proposed modification consists in an exponentially decreasing inflation rate, which aims at helping the quick creation of the hard structure of clusters by using a strong inflation in the beginning, and at producing fine partitions with a weaker inflation thereafter. The algorithm, which was tested and validated using the whole SCOP95 database, or randomly selected 10-50% sections, generally converges within 12-14 iteration cycles and provides clusters of high quality. Furthermore, a novel generalized formula for the inflation operation is given, and an efficient matrix symmetrization technique is presented, in order to improve the partition quality with relatively low amount of extra computations. Finally, an extra speedup is achieved via excluding isolated proteins from further processing. The proposed method performs better than previous solutions, from the point of view of partition quality, and computational load as well.

出版日期2010-8

全文

访问全文

收藏分享被引(4) 浏览

更新时间：2018-02-09 15:22

A modified Markov clustering approach to unsupervised classification of protein sequences

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友