A framework for exploiting task and data parallelism on distributed memory multicomputers

被引:48
作者
Ramaswamy, S
Sapatnekar, S
Banerjee, P
机构
[1] UNIV MINNESOTA, DEPT ELECT ENGN, MINNEAPOLIS, MN 55455 USA
[2] NORTHWESTERN UNIV, CTR PARALLEL & DISTRIBUTED COMP, EVANSTON, IL 60208 USA
基金
美国国家航空航天局;
关键词
task parallel; data parallel; allocation; scheduling; HPF; distributed memory; convex programming;
D O I
10.1109/71.642945
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Distributed Memory Multicomputers (DMMs), such as the IBM SP-2, the Intel Paragon. and the Thinking Machines CM-5, offer significant advantages over shared memory multiprocessors in terms of cost and scalability, Unfortunately, the utilization of all the available computational power in these machines involves a tremendous programming effort on the part of users, which creates a need for sophisticated compiler and run-time support for distributed memory machines. In this paper, we explore a new compiler optimization for regular scientific applications-the simultaneous exploitation of task and data parallelism. Our optimization is implemented as part of the PARADIGM HPF compiler framework we have developed. The intuitive idea behind the optimization is the use of task parallelism to control the degree of data parallelism of individual tasks. The reason this provides increased performance is that data parallelism provides diminishing returns as the number of processors used is increased. By controlling the number of processors used for each data parallel task in an application and by concurrently executing these tasks, we make program execution more efficient and, therefore, faster. A practical implementation of a task and data parallel scheme of execution for an application on a distributed memory multicomputer also involves data redistribution. This data redistribution causes an overhead. However, as our experimental results show, this overhead is not a problem; execution of a program using task and data parallelism together can be significantly faster than its execution using data parallelism alone. This makes our proposed optimization practical and extremely useful.
引用
收藏
页码:1098 / 1116
页数:19
相关论文
共 48 条
[1]  
AMARASINGHE SP, 1993, P 6 WORKSH LANG COMP, P253
[2]  
[Anonymous], 1979, Computers and Intractablity: A Guide to the Theoryof NP-Completeness
[3]  
*APP PAR RES, 1995, XHPF US GUID VERS 2
[4]   THE PARADIGM COMPLIER FOR DISTRIBUTED-MEMORY MULTICOMPUTERS [J].
BANERJEE, P ;
CHANDY, JA ;
GUPTA, M ;
HODGES, EW ;
HOLM, JG ;
LAIN, A ;
PALERMO, DJ ;
RAMASWAMY, S ;
SU, E .
COMPUTER, 1995, 28 (10) :37-+
[5]  
BARETT R, 1994, TEMPLATES SOLUTION L
[6]  
Belkhale K. P., 1991, Proceedings. The Fifth International Parallel Processing Symposium (Cat. No.91TH0363-2), P500, DOI 10.1109/IPPS.1991.153827
[7]  
Belkhale K. P., 1990, P 1990 INT C PARALLE, P72
[8]   COMPILING FORTRAN 90D/HPF FOR DISTRIBUTED-MEMORY MIMD COMPUTERS [J].
BOZKUS, Z ;
CHOUDHARY, A ;
FOX, G ;
HAUPT, T ;
RANKA, S ;
WU, MY .
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1994, 21 (01) :15-26
[9]  
CHAPMAN B, 1992, SCI PROGRAMMING-NETH, V1, P31
[10]  
*CTR RES PAR COMP, 1994, HIGH PERF FORTR FOR