Parametrizing Multicore Architectures for Multiple Sequence Alignment

被引:2
作者
Isaza, Sebastian [1 ]
Sanchez, Friman [2 ]
Cabarcas, Felipe [3 ,4 ]
Ramirez, Alex [3 ,4 ]
Gaydadjiev, Georgi [1 ]
机构
[1] Delft Univ Technol, Comp Engn Lab, NL-2600 AA Delft, Netherlands
[2] Tech Univ Catalonia, Comp Architecture Dept, Barcelona, Spain
[3] Tech Univ Catalonia, Barcelona, Spain
[4] Barcelona Supercomp Ctr, Barcelona, Spain
来源
PROCEEDINGS OF THE 2011 8TH ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS (CF 2011) | 2011年
关键词
multicore architectures; hardware accelerators; ClustalW; multiple sequence alignment; bioinformatics; SMITH-WATERMAN; SENSITIVITY;
D O I
10.1145/2016604.2016642
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Sequence alignment is one of the fundamental tasks in bioinformatics. Due to the exponential growth of biological data and the computational complexity of the algorithms used, high performance computing systems are required. Although multicore architectures have the potential of exploiting the tasklevel parallelism found in these workloads, efficiently harnessing systems with hundreds of cores requires deep understanding of the applications and the architecture. When incorporating large numbers of cores, performance scalability will likely saturate shared hardware resources like buses and memories. In this paper we evaluate the performance impact of various configurations of an acceleratorbased multicore architecture with the aim of revealing and quantifying the bottlenecks. Then, we compare against a multicore using general purpose processors and discuss the performance gap. Our target application is ClustalW, one of the most popular programs for Multiple Sequence Alignment. Different input data sets are characterized and we show how they influence performance. Simulation results show that due to the high computationtocommunication ratio and the transfer of data in large chunks, memory latency is well tolerated. However, bandwidth is critical to achieving maximum performance. Using a 32KB cache configuration with 4 banks can capture most of the memory traffic and therefore avoid expensive offchip transactions. On the other hand, using a hardware queue for the tasks synchronization allows us to handle a large number of cores. Finally, we show that using a simple load balancing strategy, we can increase performance of general purpose cores by 28%.
引用
收藏
页数:10
相关论文
共 29 条
[1]  
Amdahl G. M., 1967, P APR 18 20 1967 SPR, P483, DOI [10.1145/1465482.1465560, DOI 10.1145/1465482.1465560]
[2]  
[Anonymous], 2010, CONVEY COMPUTER ANNO
[3]  
[Anonymous], OPTIMIZING SMITH WAT
[4]   BioPerf: A benchmark suite to evaluate high-performance computer architecture on bioinformatics applications [J].
Bader, DA ;
Li, Y ;
Li, T ;
Sachdeva, V .
IISWC - 2005: PROCEEDINGS OF THE 2005 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION, 2005, :163-173
[5]   A Highly Parameterized and Efficient FPGA-Based Skeleton for Pairwise Biological Sequence Alignment [J].
Benkrid, Khaled ;
Liu, Ying ;
Benkrid, AbdSamad .
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2009, 17 (04) :561-570
[6]  
Cheetham J, 2003, LECT NOTES COMPUT SC, V2668, P300
[7]   Bioinformatics - An introduction for computer scientists [J].
Cohen, J .
ACM COMPUTING SURVEYS, 2004, 36 (02) :122-158
[8]   MUSCLE: a multiple sequence alignment method with reduced time and space complexity [J].
Edgar, RC .
BMC BIOINFORMATICS, 2004, 5 (1) :1-19
[9]  
European Bioinformatics Institute, CLUSTALW WEB SERV
[10]  
European Bioinformatics Institute, FASTA WEB SERV