Balancing Parallel Applications on Multi-core Processors Based on Cache Partitioning

被引:2
作者
Suo, Guang [1 ]
Yang, Xue-jun [1 ]
机构
[1] Natl Univ Def Technol, Sch Comp, Parallel & Distribute Proc Lab, Changsha, Hunan, Peoples R China
来源
2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS, PROCEEDINGS | 2009年
关键词
Multi-core Processor; Cache Partitioning; Load Balancing; Shared Cache;
D O I
10.1109/ISPA.2009.37
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Load balancing is an important problem for parallel applications. Recently, many super computers are built on multi-core processors which are usually sharing the last level cache. On one hand different accesses from different cores conflict each other, on the other hand different cores have different work loads resulting in load unbalancing. In this paper, we present a novel technique for balancing parallel applications for multi-core processors based on cache partitioning which can allocate different part of shared caches to different cores exclusively. Our intuitive idea is partitioning shared cache to different cores based on their workloads. That is to say, a heavy load core will get more shared caches than a light load core, so the heavy load core runs faster. We give 2 algorithms in this paper, initial cache partitioning algorithm (ICP) and dynamical cache partitioning algorithm (DCP). ICP is used to determine the best partition when application starting while DCP is used to adjust the initial partition based on the changes of load balancing. Our experiment results show that the running time can be reduced by 7% on average when our load balancing mechanism based on cache partitioning is used.
引用
收藏
页码:190 / 195
页数:6
相关论文
共 15 条
[1]  
*AMD, 2006, MULT PROC NEXT EV CO
[2]  
Bhandarkar M, 2001, LECT NOTES COMPUT SC, V2074, P108
[3]  
BONETI C, BALANCING HPC APPL S, P1
[4]  
Iyer R., 2004, ICS 04, P257
[5]  
Iyer R, 2007, PERF E R SI, V35, P25
[6]  
Kim SB, 2004, 13TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURE AND COMPILATION TECHNIQUES, PROCEEDINGS, P111
[7]   Niagara: A 32-way multithreaded SPARC processor [J].
Kongetira, P ;
Aingaran, K ;
Olukotun, K .
IEEE MICRO, 2005, 25 (02) :21-29
[8]   Simics:: A full system simulation platform [J].
Magnusson, PS ;
Christensson, M ;
Eskilson, J ;
Forsgren, D ;
Hållberg, G ;
Högberg, J ;
Larsson, F ;
Moestedt, A ;
Werner, B .
COMPUTER, 2002, 35 (02) :50-+
[9]   Analytical analysis of finite cache penalty and cycles per instruction of a multiprocessor memory hierarchy using miss rates and queuing theory [J].
Matick, RE ;
Heller, TJ ;
Ignatowski, M .
IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2001, 45 (06) :819-842
[10]  
*NPB, NASA NAS PAR BENCHM