Quality-score guided error correction for short-read sequencing data using CUDA

被引:7
作者
Shi, Haixiang [1 ]
Schmidt, Bertil [1 ]
Liu, Weiguo [1 ]
Mueller-Wittig, Wolfgang [1 ]
机构
[1] Nanyang Technol Univ, Sch Comp Engn, Singapore 639798, Singapore
来源
ICCS 2010 - INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, PROCEEDINGS | 2010年 / 1卷 / 01期
关键词
DNA sequencing; CUDA; high-through short-read assembly; bioinformatics; TECHNOLOGY;
D O I
10.1016/j.procs.2010.04.125
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Recently introduced new sequencing technologies can produce massive amounts of short-read data. Detection and correction of sequencing errors in this data is an important but time-consuming pre-processing step for de-novo genome assembly. In this paper, we demonstrate how the quality-score value associated with each base-call can be integrated in a CUDA-based parallel error correction algorithm. We show that quality-score guided error correction can improve the assembly accuracy of several datasets from the NCBI SRA (Short-Read Archive) in terms of N50-values as well as runtime. We further propose a number of improvements of to our previously published CUDA-EC algorithm to improve its runtime by a factor of up to 1.88. (C) 2010 Published by Elsevier Ltd.
引用
收藏
页码:1123 / 1132
页数:10
相关论文
共 17 条
[1]   ALLPATHS: De novo assembly of whole-genome shotgun microreads [J].
Butler, Jonathan ;
MacCallum, Iain ;
Kleber, Michael ;
Shlyakhter, Ilya A. ;
Belmonte, Matthew K. ;
Lander, Eric S. ;
Nusbaum, Chad ;
Jaffe, David B. .
GENOME RESEARCH, 2008, 18 (05) :810-820
[2]   Fragment assembly with short reads [J].
Chaisson, M ;
Pevzner, P ;
Tang, HX .
BIOINFORMATICS, 2004, 20 (13) :2067-2074
[3]   Short read fragment assembly of bacterial genomes [J].
Chaisson, Mark J. ;
Pevzner, Pavel A. .
GENOME RESEARCH, 2008, 18 (02) :324-330
[4]   De novo fragment assembly with short mate-paired reads: Does the read length matter? [J].
Chaisson, Mark J. ;
Brinza, Dumitru ;
Pevzner, Pavel A. .
GENOME RESEARCH, 2009, 19 (02) :336-346
[5]   Substantial biases in ultra-short read data sets from high-throughput DNA sequencing [J].
Dohm, Juliane C. ;
Lottaz, Claudio ;
Borodina, Tatiana ;
Himmelbauer, Heinz .
NUCLEIC ACIDS RESEARCH, 2008, 36 (16)
[6]   Summary cache: A scalable wide-area Web cache sharing protocol [J].
Fan, L ;
Cao, P ;
Almeida, J ;
Broder, AZ .
IEEE-ACM TRANSACTIONS ON NETWORKING, 2000, 8 (03) :281-293
[7]   De novo bacterial genome sequencing: Millions of very short reads assembled on a desktop computer [J].
Hernandez, David ;
Francois, Patrice ;
Farinelli, Laurent ;
Osteras, Magne ;
Schrenzel, Jacques .
GENOME RESEARCH, 2008, 18 (05) :802-809
[8]   The impact of next-generation sequencing technology on genetics [J].
Mardis, Elaine R. .
TRENDS IN GENETICS, 2008, 24 (03) :133-141
[9]   Bioinformatics challenges of new sequencing technology [J].
Pop, Mihai ;
Salzberg, Steven L. .
TRENDS IN GENETICS, 2008, 24 (03) :142-149
[10]   A fast hybrid short read fragment assembly algorithm [J].
Schmidt, Bertil ;
Sinha, Ranjan ;
Beresford-Smith, Bryan ;
Puglisi, Simon J. .
BIOINFORMATICS, 2009, 25 (17) :2279-2280