Space-efficient and exact de Bruijn graph representation based on a Bloom filter

Chikhi Rayan<sup>*</sup>; Rizk Guillaume

doi:10.1186/1748-7188-8-22

摘要

Background: The de Bruijn graph data structure is widely used in next-generation sequencing (NGS). Many programs, e. g. de novo assemblers, rely on in-memory representation of this graph. However, current techniques for representing the de Bruijn graph of a human genome require a large amount of memory (>= 30 GB). Results: We propose a new encoding of the de Bruijn graph, which occupies an order of magnitude less space than current representations. The encoding is based on a Bloom filter, with an additional structure to remove critical false positives. Conclusions: An assembly software implementing this structure, Minia, performed a complete de novo assembly of human genome short reads using 5.7 GB of memory in 23 hours.

出版日期2013-9-16

全文

访问全文

收藏分享被引(237) 浏览

更新时间：2024-04-16 22:16

Space-efficient and exact de Bruijn graph representation based on a Bloom filter

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友