LoRDEC: accurate and efficient long read error correction

被引:552
作者
Salmela, Leena [1 ,2 ]
Rivals, Eric [3 ,4 ,5 ]
机构
[1] Univ Helsinki, Dept Comp Sci, FI-00014 Helsinki, Finland
[2] Univ Helsinki, Helsinki Inst Informat Technol, FI-00014 Helsinki, Finland
[3] LIRMM, F-34095 Montpellier 5, France
[4] CNRS, Inst Biol Computat, F-34095 Montpellier 5, France
[5] Univ Montpellier, F-34095 Montpellier 5, France
基金
芬兰科学院;
关键词
BASIC LOCAL ALIGNMENT; GENOME ASSEMBLIES;
D O I
10.1093/bioinformatics/btu538
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: PacBio single molecule real-time sequencing is a third-generation sequencing technique producing long reads, with comparatively lower throughput and higher error rate. Errors include numerous indels and complicate downstream analysis like mapping or de novo assembly. A hybrid strategy that takes advantage of the high accuracy of second-generation short reads has been proposed for correcting long reads. Mapping of short reads on long reads provides sufficient coverage to eliminate up to 99% of errors, however, at the expense of prohibitive running times and considerable amounts of disk and memory space. Results: We present LoRDEC, a hybrid error correction method that builds a succinct de Bruijn graph representing the short reads, and seeks a corrective sequence for each erroneous region in the long reads by traversing chosen paths in the graph. In comparison, LoRDEC is at least six times faster and requires at least 93% less memory or disk space than available tools, while achieving comparable accuracy.
引用
收藏
页码:3506 / 3514
页数:9
相关论文
共 26 条
[1]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]   Improving PacBio Long Read Accuracy by Short Read Alignment [J].
Au, Kin Fai ;
Underwood, Jason G. ;
Lee, Lawrence ;
Wong, Wing Hung .
PLOS ONE, 2012, 7 (10)
[3]   A hybrid approach for the automated finishing of bacterial genomes [J].
Bashir, Ali ;
Klammer, Aaron A. ;
Robins, William P. ;
Chin, Chen-Shan ;
Webster, Dale ;
Paxinos, Ellen ;
Hsu, David ;
Ashby, Meredith ;
Wang, Susana ;
Peluso, Paul ;
Sebra, Robert ;
Sorenson, Jon ;
Bullard, James ;
Yen, Jackie ;
Valdovino, Marie ;
Mollova, Emilia ;
Luong, Khai ;
Lin, Steven ;
Lamay, Brianna ;
Joshi, Amruta ;
Rowe, Lori ;
Frace, Michael ;
Tarr, Cheryl L. ;
Turnsek, Maryann ;
Davis, Brigid M. ;
Kasarskis, Andrew ;
Mekalanos, John J. ;
Waldor, Matthew K. ;
Schadt, Eric E. .
NATURE BIOTECHNOLOGY, 2012, 30 (07) :701-+
[4]  
Cazaux B, 2014, LECT NOTES COMPUT SC, V8486, P89, DOI 10.1007/978-3-319-07566-2_10
[5]   Fragment assembly with short reads [J].
Chaisson, M ;
Pevzner, P ;
Tang, HX .
BIOINFORMATICS, 2004, 20 (13) :2067-2074
[6]   Short read fragment assembly of bacterial genomes [J].
Chaisson, Mark J. ;
Pevzner, Pavel A. .
GENOME RESEARCH, 2008, 18 (02) :324-330
[7]   Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory [J].
Chaisson, Mark J. ;
Tesler, Glenn .
BMC BIOINFORMATICS, 2012, 13
[8]  
Chikhi Rayan, 2012, Algorithms in Bioinformatics. Proceedings of the12th International Workshop, WABI 2012, P236, DOI 10.1007/978-3-642-33122-0_19
[9]  
Chin CS, 2013, NAT METHODS, V10, P563, DOI [10.1038/NMETH.2474, 10.1038/nmeth.2474]
[10]  
Deshpande Viraj, 2013, Algorithms in Bioinformatics. 13th International Workshop, WABI 2013. Proceedings: LNCS 8126, P349, DOI 10.1007/978-3-642-40453-5_27