A novel clustering method via nucleotide-based Fourier power spectrum   analysis

Zhao Bo; Duan Victor; Yau Stephen S T<sup>*</sup>

doi:10.1016/j.jtbi.2011.03.029

摘要

A novel clustering method is proposed to classify genes or genomes. This method uses a natural representation of genomic data by binary indicator sequences of each nucleotide (adenine (A), cytosine (C), guanine (G), and thymine (T)). Afterwards, the discrete Fourier transform is applied to these indicator sequences to calculate spectra of the nucleotides. Mathematical moments are calculated for each of these spectra to create a multidimensional vector in a Euclidean space for each gene or genome sequence. Thus, each gene or genome sequence is realized as a geometric point in the Euclidean space. Finally, pairwise Euclidean distances between these points (i.e. genome sequences) are calculated to cluster the gene or genome sequences. This method is applied to three sets of data. The first is 34 strains of coronavirus genomic data, the second is 118 of the known strains of Human rhinovirus (HRV), and the third is 30 bacteria genomes. The distance matrices are computed based on the three sets, showing the distances from each point to the others. We used the complete linkage clustering algorithm to build phylogenetic trees to indicate how the distances among these sequence correspond to the evolutionary relationship among these sequences. This genome representation provides a powerful and efficient method to classify genomes and is much faster than the widely acknowledged multiple sequence alignment method.

出版日期2011-6-21
单位清华大学

全文

下载全文

收藏分享被引浏览

更新时间：2018-08-02 13:51

A novel clustering method via nucleotide-based Fourier power spectrum analysis

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友