Hardware acceleration of BWA-MEM genomic short read mapping for longer read lengths

被引：94

作者：

Houtgast, Ernst Joachim ^{[1
,2
]}

Sima, Vlad-Mihai ^{[2
]}

Bertels, Koen ^{[1
]}

Al-Ars, Zaid ^{[1
]}

机构：

[1] Delft Univ Technol, Comp Engn Lab, Mekelweg 4, NL-2628 CD Delft, Netherlands

[2] Bluebee, Laan Zuid Hoorn 57, NL-2289 DC Rijswijk, Netherlands

来源：

COMPUTATIONAL BIOLOGY AND CHEMISTRY | 2018年 / 75卷

关键词：

Acceleration; BWA-MEM; FPGA; GPU; Short read mapping; Systolic array;

D O I：

10.1016/j.compbiolchem.2018.03.024

中图分类号：

Q [生物科学];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

We present our work on hardware accelerated genomics pipelines, using either FPGAs or GPUs to accelerate execution of BWA-MEM, a widely-used algorithm for genomic short read mapping. The mapping stage can take up to 40% of overall processing time for genomics pipelines. Our implementation offloads the Seed Extension function, one of the main BWA-MEM computational functions, onto an accelerator. Sequencers typically output reads with a length of 150 base pairs. However, read length is expected to increase in the near future. Here, we investigate the influence of read length on BWA-MEM performance using data sets with read length up to 400 base pairs, and introduce methods to ameliorate the impact of longer read length. For the industry-standard 150 base pair read length, our implementation achieves an up to two-fold increase in overall application-level performance for systems with at most twenty-two logical CPU cores. Longer read length requires commensurately bigger data structures, which directly impacts accelerator efficiency. The two-fold performance increase is sustained for read length of at most 250 base pairs. To improve performance, we perform a classification of the inefficiency of the underlying systolic array architecture. By eliminating idle regions as much as possible, efficiency is improved by up to +95%. Moreover, adaptive load balancing intelligently distributes work between host and accelerator to ensure use of an accelerator always results in performance improvement, which in GPU-constrained scenarios provides up to +45% more performance. (C) 2018 Elsevier Ltd. All rights reserved.

引用

页码：54 / 64

页数：11

共 23 条

[1]

Ahmed N., 2015, P IEEE ACM INT C COM

[2]

Alpha Data, 2015, ALPH DAT ADM PCIE 7V

[3]

[Anonymous], 2016, CUDA C PROGRAMMING G

[4]

[Anonymous], ARXIV13033997

[5]

[Anonymous], 2016, P INT S HIGHL EFF AC

[6]

[Anonymous], INT C EMB COMP SYST

[7]

Bioplanet.com, 2016, GEN COMP AN TEST

[8]

Chang M.-C.F., 2016, P 24 IEEE INT S FIEL

[9]

Chen Yu-Ting., 2015, CS-BWAMEM: A fast and scalable read aligner at the cloud scale for whole genome sequencing

[10]

Chen Yu-Ting., 2016, Proceedings of the 8th USENIX Conference on Hot Topics in Cloud Computing, P64

← 1 2 3 →