SneakySnake: a fast and accurate universal genome pre-alignment filter for CPUs, GPUs and FPGAs

被引:42
作者
Alser, Mohammed [1 ,2 ]
Shahroodi, Taha [1 ]
Gomez-Luna, Juan [1 ,2 ]
Alkan, Can [3 ]
Mutlu, Onur [1 ,2 ,3 ,4 ]
机构
[1] Swiss Fed Inst Technol, Dept Comp Sci, CH-8006 Zurich, Switzerland
[2] Swiss Fed Inst Technol, Dept Informat Technol & Elect Engn, CH-8006 Zurich, Switzerland
[3] Bilkent Univ, Dept Comp Engn, TR-06800 Ankara, Turkey
[4] Carnegie Mellon Univ, Dept Elect & Comp Engn, Pittsburgh, PA 15213 USA
关键词
SEQUENCE ALIGNMENT; READ;
D O I
10.1093/bioinformatics/btaa1015
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: We introduce SneakySnake, a highly parallel and highly accurate pre-alignment filter that remarkably reduces the need for computationally costly sequence alignment. The key idea of SneakySnake is to reduce the approximate string matching (ASM) problem to the single net routing (SNR) problem in VLSI chip layout. In the SNR problem, we are interested in finding the optimal path that connects two terminals with the least routing cost on a special grid layout that contains obstacles. The SneakySnake algorithm quickly solves the SNR problem and uses the found optimal path to decide whether or not performing sequence alignment is necessary. Reducing the ASM problem into SNR also makes SneakySnake efficient to implement on CPUs, GPUs and FPGAs. Results: SneakySnake significantly improves the accuracy of pre-alignment filtering by up to four orders of magnitude compared to the state-of-the-art pre-alignment filters, Shouji, GateKeeper and SHD. For short sequences, SneakySnake accelerates Edlib (state-of-the-art implementation of Myers's bit-vector algorithm) and Parasail (stateof-the-art sequence aligner with a configurable scoring function), by up to 37.7 x and 43.9 x (>12 x on average), respectively, with its CPU implementation, and by up to 413 x and 689 x (>400 x on average), respectively, with FPGA and GPU acceleration. For long sequences, the CPU implementation of SneakySnake accelerates Parasail and KSW2 (sequence aligner of minimap2) by up to 979 x (276.9 x on average) and 91.7 x (31.7 x on average), respectively. As SneakySnake does not replace sequence alignment, users can still obtain all capabilities (e.g. configurable scoring functions) of the aligner of their choice, unlike existing acceleration efforts that sacrifice some aligner capabilities.
引用
收藏
页码:5282 / 5290
页数:9
相关论文
共 35 条
[1]  
Alser M., 2017, Transactions on Internet Research, V13, P33
[2]  
Alser M, 2020, ARXIV PREPRINT ARXIV
[3]   Accelerating Genome Analysis: A Primer on an Ongoing Journey [J].
Alser, Mohammed ;
Bingol, Zulal ;
Cali, Damla Senol ;
Kim, Jeremie ;
Ghose, Saugata ;
Alkan, Can ;
Mutlu, Onur .
IEEE MICRO, 2020, 40 (05) :65-75
[4]   Shouji: a fast and efficient pre-alignment filter for sequence alignment [J].
Alser, Mohammed ;
Hassan, Hasan ;
Kumar, Akash ;
Mutlu, Onur ;
Alkan, Can .
BIOINFORMATICS, 2019, 35 (21) :4255-4263
[5]   GateKeeper: a new hardware architecture for accelerating pre-alignment in DNA short read mapping [J].
Alser, Mohammed ;
Hassan, Hasan ;
Xin, Hongyi ;
Ergin, Oguz ;
Mutlu, Onur ;
Alkan, Can .
BIOINFORMATICS, 2017, 33 (21) :3355-3363
[6]  
[Anonymous], 1966, Soviet Physics Doklady
[7]  
Auton A., 2015, Nature, V526, P68, DOI DOI 10.1038/NATURE15393
[8]   GenASM: A High-Performance, Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis [J].
Cali, Damla Senol ;
Kalsi, Gurpreet S. ;
Bingol, Zulal ;
Firtina, Can ;
Subramanian, Lavanya ;
Kim, Jeremie S. ;
Ausavarungnirun, Rachata ;
Alser, Mohammed ;
Gomez-Luna, Juan ;
Boroumand, Amirali ;
Nori, Anant ;
Scibisz, Allison ;
Subramoney, Sreenivas ;
Alkan, Can ;
Ghose, Saugata ;
Mutlu, Onur .
2020 53RD ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO 2020), 2020, :951-966
[9]   Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions [J].
Cali, Damla Senol ;
Kim, Jeremie S. ;
Ghose, Saugata ;
Alkan, Can ;
Mutlu, Onur .
BRIEFINGS IN BIOINFORMATICS, 2019, 20 (04) :1542-1559
[10]   Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory [J].
Chaisson, Mark J. ;
Tesler, Glenn .
BMC BIOINFORMATICS, 2012, 13