ASW: Accelerating Smith–Waterman Algorithm on Coupled CPU–GPU Architecture

被引:0
作者
Huihui Zou
Shanjiang Tang
Ce Yu
Hao Fu
Yusen Li
Wenjie Tang
机构
[1] Tianjin University,College of Intelligence and Computing
[2] Nankai University,School of Computing
[3] National University of Defense Technology,College of Systems Engineering
来源
International Journal of Parallel Programming | 2019年 / 47卷
关键词
Smith–Waterman algorithm; Heterogeneous processors; APU; Load balancing;
D O I
暂无
中图分类号
学科分类号
摘要
Smith–Waterman algorithm (SW) is a popular dynamic programming algorithm widely used in bioinformatics for local biological sequence alignment. Due to the O(n2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(n^2)$$\end{document} high time and space complexity of SW and growing size of biological data, it is crucial to accelerate SW for high performance. In view of the GPU high efficiency in science computation, many existing studies (e.g., CUDAlign, CUDASW++) speedup SW with GPU. However, the strong data dependency makes SW communication intensive, and the previous works fail to fully leverage the heterogeneous capabilities of the GPU machine for either the neglect of the CPU ability or the low bandwidth of PCI-e. In this paper, we propose ASW, which aims at accelerating SW algorithm with accelerated processing unit (APU), a heterogeneous processor integrates CPU and GPU in a single die and share the same memory. This coupled CPU–GPU architecture is more suitable for frequent data exchanging due to the elimination of PCI-e bus. For the full utilization of both CPU and GPU in APU system, ASW partitions the whole SW matrix into blocks and dynamically dispatches each block to CPU and GPU for the concurrent execution. A DAG-based dynamic scheduling method is presented to dispatch the workload automatically. Moreover, we also design a time cost model to determine the partition granularity in the matrix division phase. We have evaluated ASW on AMD A12 platform and our results show that ASW achieves a good performance of 7.2 GCUPS (gigacells update per second).
引用
收藏
页码:388 / 402
页数:14
相关论文
共 44 条
[1]  
Altschul SF(1990)Basic local alignment search tool J. Mol. Biol. 215 403-410
[2]  
Gish W(2012)AMD fusion APU: Llano IEEE Micro 32 28-37
[3]  
Miller W(2016)Cudalign 4.0: incremental speculative traceback for exact chromosome-wide alignment in GPU clusters IEEE Trans. Parallel Distrib. Syst. 27 2838-2850
[4]  
Myers EW(2013)Revisiting co-processing for hash joins on the coupled CPU–GPU architecture Proc. VLDB Endow. 6 889-900
[5]  
Lipman DJ(2014)In-cache query co-processing on coupled CPU–GPU architectures Proc. VLDB Endow. 8 329-340
[6]  
Branover A(1985)Rapid and sensitive protein similarity searches Science 227 1435-1441
[7]  
Foley D(2013)Cudasw++ 3.0: accelerating Smith–Waterman protein database search by coupling CPU and GPU SIMD instructions BMC Bioinform. 14 117-453
[8]  
Steinman M(1970)A general method applicable to the search for similarities in the amino acid sequence of two proteins J. Mol. Biol. 48 443-197
[9]  
De Oliveira Sandes EF(1981)Identification of common molecular subsequences J. Mol. Biol. 147 195-73
[10]  
Miranda G(2010)OpenCL: a parallel programming standard for heterogeneous computing systems Comput. Sci. Eng. 12 66-872