RabbitSAlign: Accelerating Short-Read Alignment for CPU-GPU Heterogeneous Platforms

被引:0
作者
Yan, Lifeng [1 ]
Yin, Zekun [1 ]
Li, Jinjin [1 ]
Yang, Yang [1 ]
Zhang, Tong [1 ]
Zhu, Fangjin [1 ]
Duan, Xiaohui [1 ]
Schmidt, Bertil [2 ]
Liu, Weiguo [1 ]
机构
[1] Shandong Univ, Sch Software, Jinan, Peoples R China
[2] Johannes Gutenberg Univ Mainz, Inst Comp Sci, Mainz, Germany
来源
BIOINFORMATICS RESEARCH AND APPLICATIONS, PT II, ISBRA 2024 | 2024年 / 14955卷
关键词
Next-generation sequencing; Read alignment; GPUs; High-performance bio-computing; GENOME;
D O I
10.1007/978-981-97-5131-0_8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Short-read alignment is a critical, yet time-consuming step in many next-generation sequencing data analysis pipelines. Most approaches follow the seed-and-extend strategy, where seeding usually involves a large number of random memory accesses, and extension of seeds relies on computationally expensive alignment algorithms, resulting in huge time consumption. Recently, Strobealign has reached state-of-the-art alignment speed while maintaining high accuracy through an innovative seeding strategy. Yet, there is still room for further optimization, especially on modern CPU-GPU heterogeneous platforms. In this paper, we present RabbitSAlign, a new GPU-accelerated short-read aligner based on Strobealign. By optimizing inefficient operations in the seeding process and utilizing GPUs to accelerate the extension process, RabbitSAlign doubles the processing speed on real biological datasets compared to Strobealign. It surpasses the performance of highly optimized BWA-MEM2 and NVIDIA Parabricks by a factor of at least four, while also being one-order-of-magnitude faster than the widely-utilized BWA-MEM and Bowtie2. Additionally, RabbitSAlign features highly competitive accuracy on both simulated and real biological data. Remarkably, it can process a 30x human genome sequencing dataset in merely 18 min. C++ sources are available at https://github.com/RabbitBio/RabbitSAlign.
引用
收藏
页码:83 / 94
页数:12
相关论文
共 27 条
[1]   WFA-GPU: gap-affine pairwise read-alignment using GPUs [J].
Aguado-Puig, Quim ;
Doblas, Max ;
Matzoros, Christos ;
Espinosa, Antonio ;
Moure, Juan Carlos ;
Marco-Sola, Santiago ;
Moreto, Miquel .
BIOINFORMATICS, 2023, 39 (12)
[2]   GASAL2: a GPU accelerated sequence alignment library for high-throughput NGS data [J].
Ahmed, Nauman ;
Levy, Jonathan ;
Ren, Shanshan ;
Mushtaq, Hamid ;
Bertels, Koen ;
Al-Ars, Zaid .
BMC BIOINFORMATICS, 2019, 20 (01)
[3]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[4]  
CHAO KM, 1992, COMPUT APPL BIOSCI, V8, P481
[5]   Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments [J].
Daily, Jeff .
BMC BIOINFORMATICS, 2016, 16
[6]   Twelve years of SAMtools and BCFtools [J].
Danecek, Petr ;
Bonfield, James K. ;
Liddle, Jennifer ;
Marshall, John ;
Ohan, Valeriu ;
Pollard, Martin O. ;
Whitwham, Andrew ;
Keane, Thomas ;
McCarthy, Shane A. ;
Davies, Robert M. ;
Li, Heng .
GIGASCIENCE, 2021, 10 (02)
[7]   Syncmers are more sensitive than minimizers for selecting conserved k-mers in biological sequences [J].
Edgar, Robert .
PEERJ, 2021, 9
[8]   Coming of age: ten years of next-generation sequencing technologies [J].
Goodwin, Sara ;
McPherson, John D. ;
McCombie, W. Richard .
NATURE REVIEWS GENETICS, 2016, 17 (06) :333-351
[9]  
Holtgrewe M, 2010, MASON READ SIMULATOR
[10]   Strelka2: fast and accurate calling of germline and somatic variants [J].
Kim, Sangtae ;
Scheffler, Konrad ;
Halpern, Aaron L. ;
Bekritsky, Mitchell A. ;
Noh, Eunho ;
Kallberg, Morten ;
Chen, Xiaoyu ;
Kim, Yeonbin ;
Beyter, Doruk ;
Krusche, Peter ;
Saunders, Christopher T. .
NATURE METHODS, 2018, 15 (08) :591-+