ADaCGH2: parallelized analysis of (big) CNA data

作者:Diaz Uriarte Ramon*
来源:Bioinformatics, 2014, 30(12): 1759-1761.
DOI:10.1093/bioinformatics/btu099

摘要

Motivation: Studies of genomic DNA copy number alteration can deal with datasets with several million probes and thousands of subjects. Analyzing these data with currently available software (e.g. as available from BioConductor) can be extremely slow and may not be feasible because of memory requirements. Results: We have developed a BioConductor package, ADaCGH2, that parallelizes the main segmentation algorithms (using forking on multicore computers or parallelization via message passing interface, etc., in clusters of computers) and uses ff objects for reading and data storage. We show examples of data with 6 million probes per array; we can analyze data that would otherwise not fit in memory, and compared with the non-parallelized versions we can achieve speed-ups of 25-40 times on a 64-cores machine.

  • 出版日期2014-6-15