摘要

Recently, digital signal processing has been widely applied in the study of genomics. One of the genomic studies is identification of protein-coding regions. Where is a protein coded? How much is encoded? Where are growth and development regulated? The answer to these questions is possible by DNA sequences that can be classified as the exon and intron. In signal processing application, numerical signals are used due to symbolic signal nature of DNA sequence; yet, it must be converted from symbolic sequence to numeric sequence prior the analysis in data preprocessing. The bases in a DNA sequence are represented with four letters A, G, C and T. Each letter corresponds to a numeric value. In the literature, several numerical mapping techniques exist. In this paper, a novel numerical mapping approach has been proposed for converting string to numerical values. Each codon is mapped by improved fractional derivative of Shannon equation in this approach. For exon regions prediction, three methods have been used. These methods are singular value decomposition (SVD), discrete Fourier transform (DFT) and short-time Fourier transform (STFT). The performance of the proposed mapping technique has been evaluated based on the above-mentioned three classification methods. The proposed novel technique has showed more success in the identification of protein-coding regions as compared to the predominant existing mapping techniques SVD, DFT and STFT methods.

  • 出版日期2018-4