Graph analysis of fragmented long-read bacterial genome assemblies

被引:2
作者
Marijon, Pierre [1 ]
Chikhi, Rayan [2 ]
Varre, Jean-Stephane [1 ]
机构
[1] Univ Lille, CNRS, Cent Lille, INRIA,CRIStAL,UMR 9189, F-59000 Lille, France
[2] CNRS, Inst Pasteur, C3BI, USR 3756, F-75015 Paris, France
关键词
MICROBIAL GENOMES; SINGLE;
D O I
10.1093/bioinformatics/btz219
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Long-read genome assembly tools are expected to reconstruct bacterial genomes nearly perfectly; however, they still produce fragmented assemblies in some cases. It would be beneficial to understand whether these cases are intrinsically impossible to resolve, or if assemblers are at fault, implying that genomes could be refined or even finished with little to no additional experimental cost. Results: We propose a set of computational techniques to assist inspection of fragmented bacterial genome assemblies, through careful analysis of assembly graphs. By finding paths of overlapping raw reads between pairs of contigs, we recover potential short-range connections between contigs that were lost during the assembly process. We show that our procedure recovers 45% of missing contig adjacencies in fragmented Canu assemblies, on samples from the NCTC bacterial sequencing project. We also observe that a simple procedure based on enumerating weighted Hamiltonian cycles can suggest likely contig orderings. In our tests, the correct contig order is ranked first in half of the cases and within the top-three predictions in nearly all evaluated cases, providing a direction for finishing fragmented long-read assemblies.
引用
收藏
页码:4239 / 4246
页数:8
相关论文
共 36 条
[1]   A comparative evaluation of genome assembly reconciliation tools [J].
Alhakami, Hind ;
Mirebrahim, Hamid ;
Lonardi, Stefano .
GENOME BIOLOGY, 2017, 18
[2]  
[Anonymous], BRIEF BIOINFORM
[3]  
[Anonymous], UNDERSTANDING TRIVIA
[4]  
[Anonymous], SCI REP
[5]   HYBRIDSPADES: an algorithm for hybrid assembly of short and long reads [J].
Antipov, Dmitry ;
Korobeynikov, Anton ;
McLean, Jeffrey S. ;
Pevzner, Pavel A. .
BIOINFORMATICS, 2016, 32 (07) :1009-1015
[6]   SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing [J].
Bankevich, Anton ;
Nurk, Sergey ;
Antipov, Dmitry ;
Gurevich, Alexey A. ;
Dvorkin, Mikhail ;
Kulikov, Alexander S. ;
Lesin, Valery M. ;
Nikolenko, Sergey I. ;
Son Pham ;
Prjibelski, Andrey D. ;
Pyshkin, Alexey V. ;
Sirotkin, Alexander V. ;
Vyahhi, Nikolay ;
Tesler, Glenn ;
Alekseyev, Max A. ;
Pevzner, Pavel A. .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2012, 19 (05) :455-477
[7]   MEDUSA: a multi-draft based scaffolder [J].
Bosi, Emanuele ;
Donati, Beatrice ;
Galardini, Marco ;
Brunetti, Sara ;
Sagot, Marie-France ;
Lio, Pietro ;
Crescenzi, Pierluigi ;
Fani, Renato ;
Fondi, Marco .
BIOINFORMATICS, 2015, 31 (15) :2443-2451
[8]   Optimal assembly for high throughput shotgun sequencing [J].
Guy Bresler ;
Ma'ayan Bresler ;
David Tse .
BMC Bioinformatics, 14 (Suppl 5)
[9]  
Chin CS, 2016, NAT METHODS, V13, P1050, DOI [10.1038/NMETH.4035, 10.1038/nmeth.4035]
[10]   ALE: a generic assembly likelihood evaluation framework for assessing the accuracy of genome and metagenome assemblies [J].
Clark, Scott C. ;
Egan, Rob ;
Frazier, Peter I. ;
Wang, Zhong .
BIOINFORMATICS, 2013, 29 (04) :435-443