HASLR: Fast Hybrid Assembly of Long Reads

被引:36
作者
Haghshenas, Ehsan [1 ,2 ]
Asghari, Hossein [1 ,2 ]
Stoye, Jens [3 ,4 ]
Chauve, Cedric [5 ,6 ]
Hach, Faraz [2 ,7 ]
机构
[1] Simon Fraser Univ, Sch Comp Sci, Burnaby, BC V5A 1S6, Canada
[2] Vancouver Prostate Ctr, Vancouver, BC V6H 3Z6, Canada
[3] Bielefeld Univ, Fac Technol, Bielefeld, Germany
[4] Bielefeld Univ, Ctr Biotechnol, Bielefeld, Germany
[5] Simon Fraser Univ, Dept Math, Burnaby, BC V5A 1S6, Canada
[6] Univ Bordeaux, LaBRI, Bordeaux, France
[7] Univ British Columbia, Dept Urol Sci, Vancouver, BC V5Z 1M9, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
GENOME; ACCURATE;
D O I
10.1016/j.isci.2020.101389
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Third-generation sequencing technologies from companies such as Oxford Nano -pore and Pacific Biosciences have paved the way for building more contiguous and potentially gap-free assemblies. The larger effective length of their reads has provided a means to overcome the challenges of short to mid-range repeats. Currently, accurate long read assemblers are computationally expensive, whereas faster methods are not as accurate. Moreover, despite recent advances in third-generation sequencing, researchers still tend to generate accurate short reads for many of the analysis tasks. Here, we present HASLR, a hybrid assembler that uses error-prone long reads together with high-quality short reads to effi-ciently generate accurate genome assemblies. Our experiments show that HASLR is not only the fastest assembler but also the one with the lowest number of misassemblies on most of the samples, while being on par with other assem-blers in terms of contiguity and accuracy.
引用
收藏
页数:34
相关论文
共 42 条
[1]   HYBRIDSPADES: an algorithm for hybrid assembly of short and long reads [J].
Antipov, Dmitry ;
Korobeynikov, Anton ;
McLean, Jeffrey S. ;
Pevzner, Pavel A. .
BIOINFORMATICS, 2016, 32 (07) :1009-1015
[2]   SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing [J].
Bankevich, Anton ;
Nurk, Sergey ;
Antipov, Dmitry ;
Gurevich, Alexey A. ;
Dvorkin, Mikhail ;
Kulikov, Alexander S. ;
Lesin, Valery M. ;
Nikolenko, Sergey I. ;
Son Pham ;
Prjibelski, Andrey D. ;
Pyshkin, Alexey V. ;
Sirotkin, Alexander V. ;
Vyahhi, Nikolay ;
Tesler, Glenn ;
Alekseyev, Max A. ;
Pevzner, Pavel A. .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2012, 19 (05) :455-477
[3]   Space-efficient and exact de Bruijn graph representation based on a Bloom filter [J].
Chikhi, Rayan ;
Rizk, Guillaume .
ALGORITHMS FOR MOLECULAR BIOLOGY, 2013, 8
[4]  
Chin CS, 2016, NAT METHODS, V13, P1050, DOI [10.1038/NMETH.4035, 10.1038/nmeth.4035]
[5]  
Di Genova A., 2019, BIORXIV
[6]  
Garg S., 2018, BIOINFORMATICS, V34, P1105
[7]   lordFAST: sensitive and Fast Alignment Search Tool for LOng noisy Read sequencing Data [J].
Haghshenas, Ehsan ;
Sahinalp, S. Cenk ;
Hach, Faraz .
BIOINFORMATICS, 2019, 35 (01) :20-27
[8]   CoLoRMap: Correcting Long Reads by Mapping short reads [J].
Haghshenas, Ehsan ;
Hach, Faraz ;
Sahinalp, S. Cenk ;
Chauve, Cedric .
BIOINFORMATICS, 2016, 32 (17) :545-551
[9]   ART: a next-generation sequencing read simulator [J].
Huang, Weichun ;
Li, Leping ;
Myers, Jason R. ;
Marth, Gabor T. .
BIOINFORMATICS, 2012, 28 (04) :593-594
[10]  
Jaworski C.C., 2019, BIORXIV