摘要

We have developed Phoenix 2, a ribosomal RNA gene sequence analysis pipeline, which can be used to process large-scale datasets consisting of more than one hundred environmental samples and containing more than one million reads collectively. Rapid handling of large datasets is made possible by the removal of redundant sequences, pre-partitioning of sequences, parallelized clustering per partition, and subsequent merging of clusters. To build the pipeline, we have used a combination of open-source software tools and custom-developed Perl scripts. For our project we utilize hardware-accelerated searches, but it is possible to reconfigure the analysis pipeline for use with generic computing infrastructure only, with a considerable reduction in speed. The set of analysis results produced by Phoenix 2 is comprehensive, including taxonomic annotations using multiple methods, alpha diversity indices, beta diversity measurements, and a number of visualizations. To date, the pipeline has been used to analyze more than 1500 environmental samples from a wide variety of microbial communities, which are part of our Hydrocarbon Metagenomics Project (http://www.hydrocarbonmetagenomics.com). The software package can be installed as a local software suite with a Web interface. Phoenix 2 is freely available from http://sourceforge.net/projects/phoenix2.

  • 出版日期2013-9-20