An SVM-based approach to discover microRNA precursors in plant genomes

作者:Wang, Yi*; Jin, Cheqing; Zhou, Minqi; Zhou, Aoying
来源:15th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2011, China,Guangdong,Shenzhen, 2011-05-24 to 2011-05-27.
DOI:10.1007/978-3-642-28320-8_26

摘要

MicroRNAs (miRNAs) are noncoding RNAs of ∼22 nucleotides that play versatile regulatory roles in multicelluler organisms. Since the cloning methods for miRNAs identification are biased towards abundant miRNAs, the computational approaches provide useful complements to identify miRNAs which are highly constrained by tissue- and time-specifically expression manners. In this paper, we propose a novel Support Vector Machine (SVM) based detector, named MiR-PD, to identify pre-miRNAs in plants. The classifier is constructed based on twelve features of pre-miRNAs, inclusive of five global features and seven sub-structure features. Trained on 790 plant pre-miRNAs and 7,900 pseudo pre-miRNAs, MiR-PD achieves 96.43% five-fold cross-validation accuracy. Tested on the newly identified 441 plant pre-miRNAs and 62,883 pseudo pre-miRNAs, MiR-PD reports an accuracy of 99.71% with 77.55% sensitivity and 99.87% specificity, suggesting a feasible genome-wide application of this miRNAs detector so as to identify novel miRNAs (especially for those species-specific miRNAs) in plants without relying on phylogenetical conservation.