Communication-Aware Load Balancing for Parallel Applications on Clusters

被引:13
作者
Qin, Xiao [1 ]
Jiang, Hong [2 ]
Manzanares, Adam [1 ]
Ruan, Xiaojun [1 ]
Yin, Shu [1 ]
机构
[1] Auburn Univ, Samuel Ginn Coll Engn, Shelby Ctr Engn Technol, Dept Comp Sci & Software Engn, Auburn, AL 36849 USA
[2] Univ Nebraska Lincoln, Dept Comp Sci & Engn, Lincoln, NE 68588 USA
基金
美国国家科学基金会;
关键词
Cluster; communication-aware computing; parallel computing; load balancing; DESIGN; I/O; ARCHITECTURES; ALGORITHMS; SYSTEMS; MYRINET;
D O I
10.1109/TC.2009.108
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Cluster computing has emerged as a primary and cost-effective platform for running parallel applications, including communication-intensive applications that transfer a large amount of data among the nodes of a cluster via the interconnection network. Conventional load balancers have proven effective in increasing the utilization of CPU, memory, and disk I/O resources in a cluster. However, most of the existing load-balancing schemes ignore network resources, leaving an opportunity to improve the effective bandwidth of networks on clusters running parallel applications. For this reason, we propose a communication-aware load-balancing technique that is capable of improving the performance of communication-intensive applications by increasing the effective utilization of networks in cluster environments. To facilitate the proposed load-balancing scheme, we introduce a behavior model for parallel applications with large requirements of network, CPU, memory, and disk I/O resources. Our load-balancing scheme can make full use of this model to quickly and accurately determine the load induced by a variety of parallel applications. Simulation results generated from a diverse set of both synthetic bulk synchronous and real parallel applications on a cluster show that our scheme significantly improves the performance, in terms of slowdown and turn-around time, over existing schemes by up to 206 percent (with an average of 74 percent) and 235 percent (with an average of 82 percent), respectively.
引用
收藏
页码:42 / 52
页数:11
相关论文
共 31 条
[1]  
Acharya A, 1999, PERFORMANCE EVALUATION REVIEW, SPECIAL ISSUE, VOL 27 NO 1, JUNE 1999, P35, DOI 10.1145/301464.301478
[2]   MYRINET - A GIGABIT-PER-SECOND LOCAL-AREA-NETWORK [J].
BODEN, NJ ;
COHEN, D ;
FELDERMAN, RE ;
KULAWIK, AE ;
SEITZ, CL ;
SEIZOVIC, JN ;
SU, WK .
IEEE MICRO, 1995, 15 (01) :29-36
[3]  
BRIGHTWELL R, 2002, P WORKSH COMM ARCH C, P164
[4]  
BUNTINAS D, 2003, P 3 INT S CLUST COMP, P2
[5]   When the herd is smart: Aggregate behavior in the selection of job request [J].
Cirne, W ;
Berman, F .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2003, 14 (02) :181-192
[6]   Messages scheduling for parallel data redistribution between clusters [J].
Cohen, Johanne ;
Jeannot, Emmanuel ;
Padoy, Nicolas ;
Wagner, Frederic .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2006, 17 (10) :1163-1175
[7]   Towards communication-sensitive load balancing [J].
Cruz, J ;
Park, K .
21ST INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, PROCEEDINGS, 2001, :731-734
[8]  
CYPHER R, 1993, P 20 ANN INT S COMP, P2
[9]  
DUSSEAU AC, 1996, P ACM SIGMETRICS 199, P25
[10]  
Feng Wu-chun, 2003, P 2003 ACM IEEE C SU, P50