Accurate Prediction of Transposon-Derived piRNAs by Integrating Various Sequential and Physicochemical Features

Luo, Longqiang; Li, Dingfang; Zhang, Wen<sup>*</sup>; Tu, Shikui; Zhu, Xiaopeng; Tian, Gang

doi:10.1371/journal.pone.0153268

摘要

Background Piwi-interacting RNA (piRNA) is the largest class of small non-coding RNA molecules. The transposon-derived piRNA prediction can enrich the research contents of small ncRNAs as well as help to further understand generation mechanism of gamete. Methods In this paper, we attempt to differentiate transposon-derived piRNAs from non-piRNAs based on their sequential and physicochemical features by using machine learning methods. We explore six sequence-derived features, i.e. spectrum profile, mismatch profile, subsequence profile, position-specific scoring matrix, pseudo dinucleotide composition and local structure-sequence triplet elements, and systematically evaluate their performances for transposonderived piRNA prediction. Finally, we consider two approaches: direct combination and ensemble learning to integrate useful features and achieve high-accuracy prediction models. Results We construct three datasets, covering three species: Human, Mouse and Drosophila, and evaluate the performances of prediction models by 10-fold cross validation. In the computational experiments, direct combination models achieve AUC of 0.917, 0.922 and 0.992 on Human, Mouse and Drosophila, respectively; ensemble learning models achieve AUC of 0.922, 0.926 and 0.994 on the three datasets. Conclusions Compared with other state-of-the-art methods, our methods can lead to better performances. In conclusion, the proposed methods are promising for the transposon-derived piRNA prediction. The source codes and datasets are available in S1 File.

出版日期2016-4-13
单位武汉大学

全文

访问全文

收藏分享被引(38) 浏览

更新时间：2021-11-23 15:43

Accurate Prediction of Transposon-Derived piRNAs by Integrating Various Sequential and Physicochemical Features

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友