ARACHNE: A whole-genome shotgun assembler

作者:Batzoglou S; Jaffe DB; Stanley K; Butler J; Gnerre S; Mauceli E; Berger B; Mesirov JP; Lander ES*
来源:Genome Research, 2002, 12(1): 177-189.
DOI:10.1101/gr.208902

摘要

We describe a new computer system, called ARACHNE, for assembling genome sequence using paired-end whole-genome shotgun reads. ARACHNE has several key features, including an efficient and sensitive procedure for finding read overlaps, a procedure for scoring overlaps that achieves high accuracy by correcting errors before assembly, read merger based on forward-reverse links, and detection of repeat contigs by forward-reverse link inconsistency. To test ARACHNE, we created simulated reads providing similar to10-fold coverage of the genomes of H. influenzae, S. cerevisiae, and A melanogaster, as well as human chromosomes 21 and 22. The assemblies of these simulated reads yielded nearly complete coverage of the respective genomes, with a small number of contigs joined into a smaller number of supercontigs (or scaffolds). For example, analysis of the A melanogaster genome yielded similar to98% coverage with an N50 contig length of 324 kb and an N50 supercontig length of 5143 kb. The assembly accuracy was high, although not perfect: small errors occurred at a frequency of roughly 1 per 1 Mb (typically, deletion of similar to1 kb in size), with a very small number of other misassemblies. The assembly was rapid: the Drosophila assembly required only 21 hours on a single 667 MHz processor and used 8.4 Gb of memory.

  • 出版日期2002-1

全文