摘要

Objectives: Whole genome sequencing (WGS) has revolutionized the subtyping of Legionella pneumophila but calling the traditional sequence-based type from genomic data is hampered by multiple copies of the mompS locus. We propose a novel bioinformatics solution for rectifying that limitation, ensuring the feasibility of WGS for cluster investigation. Methods: We designed a novel approach based on the alignment of raw reads with a reference sequence. With WGS, reads originating from either of the two mompS copies cannot be differentiated. Therefore, when non-identical copies were present, we applied a read-filtering strategy based on read alignment to a reference sequence via unique 'anchors'. If minimal read coverage was achieved after filtration (>= 3X), a consensus sequence was built based on mapped reads followed by calling the sequence-based typing allele. The entire procedure was implemented using a Perl script. Results: The method was validated using a diverse sample of 265 L. pneumophila genomes, consisting of 59 different sequence types (STs) and 23 mompS variants; 57 of the 265 (22%) had non-identical mompS copies. In 237 of the 265 samples (89.4%), mompS calling was successful and no erroneous calling occurred. A 98.1% success was recorded among 109 samples meeting quality requirements. The method was superior to alternative approaches. Conclusions: As WGS becomes more accessible, technical difficulties in routine clinical and surveillance work will arise. The case of mompS in L. pneumophila serves as an example for such limitations that necessitate the development of novel computational solutions that meet end-user demands. M.

  • 出版日期2017-5