Fine-grain parallelism using multi-core, Cell/BE, and GPU Systems

被引:17
作者
不详
机构
[1] SiPS, INESC-ID/IST, Universidade Técnica de Lisboa, 1000-029 Lisbon
[2] CASPER, Department of Computer Science, University of Cyprus, CY 1678 Nicosia
[3] Exelixis Lab, Scientific Computing Group, Heidelberg Institute for Theoretical Studies, D-69118 Heidelberg
[4] National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, IL 61801
关键词
Multi-core processors; Multi-core acelerators; Performance evaluation; Fine-grain parallelism; Scientific workloads; Database workloads; DNA-SEQUENCES; GRAPHICS; PERFORMANCE; INFERENCE; DYNAMICS;
D O I
10.1016/j.parco.2011.08.002
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Currently, we are facing a situation where applications exhibit increasing computational demands and where a large variety of parallel processor systems are available. In this paper we focus on exploiting fine-grain parallelism for three applications with distinct characteristics: a Bioinformatics application (MrBayes), a Molecular Dynamics application (NAMD), and a database application (TPC-H). We assess, side-by-side, the performance of the three applications on general-purpose multi-core processors, the Cell Broadband Engine (Cell/BE), and Graphics Processing Units (GPU). Our results indicate that application performance depends on the characteristics of the parallel architectures and on the computational requirements of the core functions of the respective applications. For MrBayes the best overall performance is achieved on general-purpose multi-core processors, for NAMD on the Cell/BE, and for TPC-H on GPUs. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:365 / 390
页数:26
相关论文
共 55 条
[31]   Introduction to the cell broadband engine architecture [J].
Johns, C. R. ;
Brokenshire, D. A. .
IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2007, 51 (05) :503-519
[32]   NVIDIA Tesla: A unified graphics and computing architecture [J].
Lindholm, Erik ;
Nickolls, John ;
Oberman, Stuart ;
Montrym, John .
IEEE MICRO, 2008, 28 (02) :39-55
[33]  
Luebke David., 2004, GRAPH 04, P33, DOI [10.1145/ 1103900.1103933, DOI 10.1145/1103900.1103933]
[34]  
Meredith J.S., 2007, INT PARALLEL DISTRIB, P1
[35]   Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System [J].
Molka, Daniel ;
Hackenberg, Daniel ;
Schoene, Robert ;
Mueller, Matthias S. .
18TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PROCEEDINGS, 2009, :261-270
[36]  
Olivier Stephen., 2007, PARALLEL DISTRIBUTED, P370
[37]  
Ott Michael., 2007, Proceedings of the 2007 ACM/IEEE conference on Supercomputing, P1
[38]  
Patla P., 2009, AMD SERVER STRATEGY
[39]  
Phillips J.C., 2002, P SC02 P 2002 ACM IE, P36, DOI [DOI 10.1109/SC2002.10019, 10.1109/SC.2002.10019, DOI 10.1109/SC.2002.10019]
[40]  
Pratas Frederico, 2009, Proceedings of the 2009 International Conference on Parallel Processing (ICPP 2009), P9, DOI 10.1109/ICPP.2009.30