摘要

RNA-Seq technology has been gradually becoming a routine approach for characterizing the properties of transcriptome in terms of organisms, cell types and conditions and consequently a big burden has been put on the facet of data analysis, which calls for an easy-to-learn workflow to cope with the increased demands from a large number of laboratories across the world. We report a one-in-all solution called hppRNA, composed of four scenarios such as pre-mapping, core-workflow, post-mapping and sequence variation detection, written by a series of individual Perl and R scripts, counting on well-established and preinstalled software, irrespective of single-end or paired-end, unstranded or stranded sequencing method. It features six independent core-workflows comprising the state-of-the-art technology with dozens of popular cutting-edge tools such as Tophat-Cufflink-Cuffdiff, SubreadfeatureCounts-DESeq2, STAR-RSEM-EBSeq, Bowtie-eXpress-edgeR, kallisto-sleuth, HISAT-StringTie-Ballgown, and embeds itself in Snakemake, which is a modern pipeline management system. The core function of this pipeline is turning the raw fastq files into gene/isoform expression matrix and differentially expressed genes or isoforms as well as the identification of fusion genes, single nucleotide polymorphisms, long noncoding RNAs and circular RNAs. Last but not least, this pipeline is specifically designed for performing the systematic analysis on a huge set of samples in one go, ideally for the researchers who intend to deploy the pipeline on their local servers. The scripts as well as the user manual are freely available at https://sourceforgenet/projects/hppma/

  • 出版日期2018-7