SOAP3-dp: Fast, Accurate and Sensitive GPU-Based Short Read Aligner

被引:81
作者
Luo, Ruibang [1 ]
Wong, Thomas [1 ]
Zhu, Jianqiao [1 ,5 ]
Liu, Chi-Man [1 ]
Zhu, Xiaoqian [2 ]
Wu, Edward [1 ]
Lee, Lap-Kei [1 ]
Lin, Haoxiang [3 ]
Zhu, Wenjuan [3 ]
Cheung, David W. [1 ]
Ting, Hing-Fung [1 ]
Yiu, Siu-Ming [1 ]
Peng, Shaoliang [2 ]
Yu, Chang [3 ]
Li, Yingrui [3 ]
Li, Ruiqiang [4 ]
Lam, Tak-Wah [1 ]
机构
[1] Univ Hong Kong, HKU BGI Bioinformat Algorithms & Core Technol Res, Dept Comp Sci, Hong Kong, Hong Kong, Peoples R China
[2] Natl Univ Def Technol, Sch Comp Sci, Changsha, Hunan, Peoples R China
[3] BGI Shenzhen, Shenzhen, Guangdong, Peoples R China
[4] Peking Univ, Peking Tsinghua Ctr Life Sci, Biodynam Opt Imaging Ctr, Sch Life Sci, Beijing 100871, Peoples R China
[5] Univ Wisconsin, Dept Comp Sci, Madison, WI 53706 USA
来源
PLOS ONE | 2013年 / 8卷 / 05期
关键词
ALIGNMENT; SEQUENCE; FRAMEWORK; EFFICIENT; ULTRAFAST; TOOL;
D O I
10.1371/journal.pone.0065632
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
To tackle the exponentially increasing throughput of Next-Generation Sequencing (NGS), most of the existing short-read aligners can be configured to favor speed in trade of accuracy and sensitivity. SOAP3-dp, through leveraging the computational power of both CPU and GPU with optimized algorithms, delivers high speed and sensitivity simultaneously. Compared with widely adopted aligners including BWA, Bowtie2, SeqAlto, CUSHAW2, GEM and GPU-based aligners BarraCUDA and CUSHAW, SOAP3-dp was found to be two to tens of times faster, while maintaining the highest sensitivity and lowest false discovery rate (FDR) on Illumina reads with different lengths. Transcending its predecessor SOAP3, which does not allow gapped alignment, SOAP3-dp by default tolerates alignment similarity as low as 60%. Real data evaluation using human genome demonstrates SOAP3-dp's power to enable more authentic variants and longer Indels to be discovered. Fosmid sequencing shows a 9.1% FDR on newly discovered deletions. SOAP3-dp natively supports BAM file format and provides the same scoring scheme as BWA, which enables it to be integrated into existing analysis pipelines. SOAP3-dp has been deployed on Amazon-EC2, NIH-Biowulf and Tianhe-1A.
引用
收藏
页数:11
相关论文
共 28 条
  • [1] A map of human genome variation from population-scale sequencing
    Altshuler, David
    Durbin, Richard M.
    Abecasis, Goncalo R.
    Bentley, David R.
    Chakravarti, Aravinda
    Clark, Andrew G.
    Collins, Francis S.
    De la Vega, Francisco M.
    Donnelly, Peter
    Egholm, Michael
    Flicek, Paul
    Gabriel, Stacey B.
    Gibbs, Richard A.
    Knoppers, Bartha M.
    Lander, Eric S.
    Lehrach, Hans
    Mardis, Elaine R.
    McVean, Gil A.
    Nickerson, DebbieA.
    Peltonen, Leena
    Schafer, Alan J.
    Sherry, Stephen T.
    Wang, Jun
    Wilson, Richard K.
    Gibbs, Richard A.
    Deiros, David
    Metzker, Mike
    Muzny, Donna
    Reid, Jeff
    Wheeler, David
    Wang, Jun
    Li, Jingxiang
    Jian, Min
    Li, Guoqing
    Li, Ruiqiang
    Liang, Huiqing
    Tian, Geng
    Wang, Bo
    Wang, Jian
    Wang, Wei
    Yang, Huanming
    Zhang, Xiuqing
    Zheng, Huisong
    Lander, Eric S.
    Altshuler, David L.
    Ambrogio, Lauren
    Bloom, Toby
    Cibulskis, Kristian
    Fennell, Tim J.
    Gabriel, Stacey B.
    [J]. NATURE, 2010, 467 (7319) : 1061 - 1073
  • [2] A framework for variation discovery and genotyping using next-generation DNA sequencing data
    DePristo, Mark A.
    Banks, Eric
    Poplin, Ryan
    Garimella, Kiran V.
    Maguire, Jared R.
    Hartl, Christopher
    Philippakis, Anthony A.
    del Angel, Guillermo
    Rivas, Manuel A.
    Hanna, Matt
    McKenna, Aaron
    Fennell, Tim J.
    Kernytsky, Andrew M.
    Sivachenko, Andrey Y.
    Cibulskis, Kristian
    Gabriel, Stacey B.
    Altshuler, David
    Daly, Mark J.
    [J]. NATURE GENETICS, 2011, 43 (05) : 491 - +
  • [3] SeqAn An efficient, generic C++ library for sequence analysis
    Doering, Andreas
    Weese, David
    Rausch, Tobias
    Reinert, Knut
    [J]. BMC BIOINFORMATICS, 2008, 9 (1)
  • [4] Base-calling of automated sequencer traces using phred.: II.: Error probabilities
    Ewing, B
    Green, P
    [J]. GENOME RESEARCH, 1998, 8 (03): : 186 - 194
  • [5] STABLE PROPAGATION OF COSMID SIZED HUMAN DNA INSERTS IN AN F-FACTOR BASED VECTOR
    KIM, UJ
    SHIZUYA, H
    DEJONG, PJ
    BIRREN, B
    SIMON, MI
    [J]. NUCLEIC ACIDS RESEARCH, 1992, 20 (05) : 1083 - 1085
  • [6] BarraCUDA - A fast short read sequence aligner using graphics processing units
    Klus P.
    Lam S.
    Lyberg D.
    Cheung M.
    Pullan G.
    McFarlane I.
    Yeo G.S.H.
    Lam B.Y.H.
    [J]. BMC Research Notes, 5 (1)
  • [7] High Throughput Short Read Alignment via Bi-directional BWT
    Lam, T. W.
    Li, Ruiqiang
    Tam, Alan
    Wong, Simon
    Wu, Edward
    Yiu, S. M.
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2009, : 31 - +
  • [8] Ultrafast and memory-efficient alignment of short DNA sequences to the human genome
    Langmead, Ben
    Trapnell, Cole
    Pop, Mihai
    Salzberg, Steven L.
    [J]. GENOME BIOLOGY, 2009, 10 (03):
  • [9] Clustal W and clustal X version 2.0
    Larkin, M. A.
    Blackshields, G.
    Brown, N. P.
    Chenna, R.
    McGettigan, P. A.
    McWilliam, H.
    Valentin, F.
    Wallace, I. M.
    Wilm, A.
    Lopez, R.
    Thompson, J. D.
    Gibson, T. J.
    Higgins, D. G.
    [J]. BIOINFORMATICS, 2007, 23 (21) : 2947 - 2948
  • [10] Mapping short DNA sequencing reads and calling variants using mapping quality scores
    Li, Heng
    Ruan, Jue
    Durbin, Richard
    [J]. GENOME RESEARCH, 2008, 18 (11) : 1851 - 1858