Asynchronous Work Stealing on Distributed Memory Systems

被引:10
作者
Li, Shigang [1 ]
Hu, Jingyuan [2 ]
Cheng, Xin [1 ]
Zhao, Chongchong [1 ]
机构
[1] Univ Sci & Technol Beijing, Sch Comp & Commun Engn, Beijing, Peoples R China
[2] Univ Technol Troyes, Troyes, France
来源
PROCEEDINGS OF THE 2013 21ST EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING | 2013年
关键词
asynchronous work stealing; distributed memory; task granularity; UPC;
D O I
10.1109/PDP.2013.35
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Work stealing is a popular policy for dynamic load balancing of irregular applications. However, communication overhead incurred by work stealing may make it less efficient, especially on distributed memory systems. In this work we propose an asynchronous work stealing (AsynchWS) strategy which exploits opportunities to overlap communication with local residual tasks. Profiling information is collected locally to optimize task granularity and guide the asynchronous work stealing. AsynchWS is implemented in Unified Parallel C (UPC), which effectively supports non-blocking one-sided communication and facilitates the implementation. Experiments are conducted on a 32 nodes Xeon X5650 cluster using a set of irregular applications. Results show that up to 16% better performance than the state-of-the-art strategies on distributed memory.
引用
收藏
页码:198 / 202
页数:5
相关论文
共 19 条
[1]  
Acar U. A., 2000, SPAA 2000. Twelfth Annual ACM Symposium on Parallel Algorithms and Architectures, P1, DOI 10.1145/341800.341801
[2]  
[Anonymous], 2007, Intel(R), Threading Building Blocks: Reference Manual
[3]  
[Anonymous], P 2009 IEEE INT S PA
[4]  
[Anonymous], 2008, OPENMP APPLICATION P
[5]  
[Anonymous], 2005, TECHNICAL REPORT LBN
[6]  
[Anonymous], 2000, ACM 2000 C JAVA GRAN, DOI 10.1145/337449.337465
[7]  
Chase D., 2005, PROCEEDINGS OF THE S
[8]  
Dinan J., 2009, P C HIGH PERF COMP N, P1, DOI [DOI 10.1145/1654059.1654113, 10.1145/1654059.1654113]
[9]  
Duran Alejandro, 2009, Proceedings of the 2009 International Conference on Parallel Processing (ICPP 2009), P124, DOI 10.1109/ICPP.2009.64
[10]  
Duran A., 2008, PROCEEDINGS OF THE 2