GateKeeper: a new hardware architecture for accelerating pre-alignment in DNA short read mapping

被引:64
作者
Alser, Mohammed [1 ]
Hassan, Hasan [2 ,3 ]
Xin, Hongyi [4 ]
Ergin, Oguz [2 ]
Mutlu, Onur [3 ]
Alkan, Can [1 ]
机构
[1] Bilkent Univ, Dept Comp Engn, TR-06800 Ankara, Turkey
[2] TOBB Univ Econ & Technol, Ankara, Turkey
[3] Swiss Fed Inst Technol, Dept Comp Sci, CH-8092 Zurich, Switzerland
[4] Carnegie Mellon Univ, Dept Comp Sci, Pittsburgh, PA 15213 USA
关键词
ACCURATE;
D O I
10.1093/bioinformatics/btx342
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: High throughput DNA sequencing (HTS) technologies generate an excessive number of small DNA segments - called short reads- that cause significant computational burden. To analyze the entire genome, each of the billions of short reads must be mapped to a reference genome based on the similarity between a read and 'candidate' locations in that reference genome. The similarity measurement, called alignment, formulated as an approximate string matching problem, is the computational bottleneck because: (i) it is implemented using quadratic-time dynamic programming algorithms and (ii) the majority of candidate locations in the reference genome do not align with a given read due to high dissimilarity. Calculating the alignment of such incorrect candidate locations consumes an overwhelming majority of a modern read mapper's execution time. Therefore, it is crucial to develop a fast and effective filter that can detect incorrect candidate locations and eliminate them before invoking computationally costly alignment algorithms. Results: We propose GateKeeper, a new hardware accelerator that functions as a pre-alignment step that quickly filters out most incorrect candidate locations. GateKeeper is the first design to accelerate pre-alignment using Field-Programmable Gate Arrays (FPGAs), which can perform pre-alignment much faster than software. When implemented on a single FPGA chip, GateKeeper maintains high accuracy (on average >96%) while providing, on average, 90-fold and 130-fold speedup over the state-of-the-art software pre-alignment techniques, Adjacency Filter and Shifted Hamming Distance (SHD), respectively. The addition of GateKeeper as a pre-alignment step can reduce the verification time of the mrFAST mapper by a factor of 10.
引用
收藏
页码:3355 / 3363
页数:9
相关论文
共 39 条
  • [21] Li Ming, 2004, J Bioinform Comput Biol, V2, P417, DOI 10.1142/S0219720004000661
  • [22] Lindner M.S, 2016, BIOINFORMATICS
  • [23] SOAP3: ultra-fast GPU-based parallel alignment tool for short reads
    Liu, Chi-Man
    Wong, Thomas
    Wu, Edward
    Luo, Ruibang
    Yiu, Siu-Ming
    Li, Yingrui
    Wang, Bingqiang
    Yu, Chang
    Chu, Xiaowen
    Zhao, Kaiyong
    Li, Ruiqiang
    Lam, Tak-Wah
    [J]. BIOINFORMATICS, 2012, 28 (06) : 878 - 879
  • [24] SOAP3-dp: Fast, Accurate and Sensitive GPU-Based Short Read Aligner
    Luo, Ruibang
    Wong, Thomas
    Zhu, Jianqiao
    Liu, Chi-Man
    Zhu, Xiaoqian
    Wu, Edward
    Lee, Lap-Kei
    Lin, Haoxiang
    Zhu, Wenjuan
    Cheung, David W.
    Ting, Hing-Fung
    Yiu, Siu-Ming
    Peng, Shaoliang
    Yu, Chang
    Li, Yingrui
    Li, Ruiqiang
    Lam, Tak-Wah
    [J]. PLOS ONE, 2013, 8 (05):
  • [25] Marco-Sola S, 2012, NAT METHODS, V9, P1185, DOI [10.1038/NMETH.2221, 10.1038/nmeth.2221]
  • [26] A GENERAL METHOD APPLICABLE TO SEARCH FOR SIMILARITIES IN AMINO ACID SEQUENCE OF 2 PROTEINS
    NEEDLEMAN, SB
    WUNSCH, CD
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1970, 48 (03) : 443 - +
  • [27] Hardware Acceleration of Short Read Mapping
    Olson, Corey B.
    Kim, Maria
    Clauson, Cooper
    Kogon, Boris
    Ebeling, Carl
    Hauck, Scott
    Ruzzo, Walter L.
    [J]. 2012 IEEE 20TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2012, : 161 - 168
  • [28] Efficient q-gram filters for finding all ε-matches over a given length
    Rasmussen, KR
    Stoye, J
    Myers, EW
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2006, 13 (02) : 296 - 308
  • [29] IDENTIFICATION OF COMMON MOLECULAR SUBSEQUENCES
    SMITH, TF
    WATERMAN, MS
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1981, 147 (01) : 195 - 197
  • [30] Three Ages of FPGAs: A Retrospective on the First Thirty Years of FPGA Technology
    Trimberger, Stephen M.
    [J]. PROCEEDINGS OF THE IEEE, 2015, 103 (03) : 318 - 331