Efficient inter-core power and thermal balancing for multicore processors

被引:3
作者
Cebrian, Juan M. [1 ]
Sanchez, Daniel [1 ]
Aragon, Juan L. [1 ]
Kaxiras, Stefanos [2 ]
机构
[1] Univ Murcia, Dept Comp Architecture, Murcia, Spain
[2] Uppsala Univ, Dept Informat Technol, Uppsala, Sweden
基金
欧盟第七框架计划;
关键词
Power consumption; Power budget; Power tokens; Chip multiprocessor; MANAGEMENT; PERFORMANCE; HARDWARE; VOLTAGE;
D O I
10.1007/s00607-012-0236-6
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Nowadays the market is dominated by processor architectures that employ multiple cores per chip. These architectures have different behavior depending on the applications running on the processor (parallel, multiprogrammed, sequential), but all happen to meet what is called the power and temperature wall. For future technologies (less than 22 nm) and a fixed die size, it is still uncertain the percentage of processor that can be simultaneously powered on. Power saving and power budget mechanisms can be useful to precisely control the amount of power been dissipated by the processor. After an initial analysis we discover that legacy power saving techniques work properly for matching a power budget in thread-independent and multi-programmed workloads, but not in parallel workloads. When running parallel shared-memory applications sacrificing some performance in a single core (thread) in order to be more energy-efficient can unintentionally delay the rest of cores (threads) due to synchronization points (locks/barriers), having a negative impact on global performance. In order to solve this problem we propose power token balancing (PTB) aimed at accurately matching an external power constraint by balancing the power consumed among the different cores. Experimental results show that PTB matches more accurately a predefined power budget (50 % of the original peak power) than other mechanisms like DVFS. The total energy consumed over the budget is reduced to only 8 % for a 16-core CMP with only a 3 % energy increase (overhead). We also introduce a novel mechanism named "Nitro". Nitro will overclock the core that enters a critical section (delimited by locks) in order to free the lock as soon as possible. Experimental results have shown that Nitro is able to reduce the execution time of lock-intensive applications in more than 4 % by overclocking the frequency by 15 % in selected program phases over a period of time that represents a 22 % of the total execution time. We conclude the work with an analysis of the thermal effects of PTB in different CMP configurations using realistic power numbers and heatsink/fan configurations. Results show how PTB not only balances temperature between the different cores, reducing temperature gradient and increasing signal reliability, but also allows a reduction of 28-30 % of both average and peak temperatures for the studied benchmarks when a peak power budget of 50 % is exceeded.
引用
收藏
页码:537 / 566
页数:30
相关论文
共 25 条
[1]  
[Anonymous], 2005, SIGARCH Comput. Archit. News
[2]  
[Anonymous], P IEEE INT TEST C
[3]  
Bhattacharjee A, 2009, CONF PROC INT SYMP C, P290, DOI 10.1145/1555815.1555792
[4]   Meeting Points: Using Thread Criticality to Adapt Multicore Hardware to Parallel Regions [J].
Cai, Qiong ;
Gonzalez, Jose ;
Rakvic, Ryan ;
Magklis, Grigorios ;
Chaparro, Pedro ;
Gonzalez, Antonio .
PACT'08: PROCEEDINGS OF THE SEVENTEENTH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2008, :240-249
[5]  
CEBRIAN JM, 2011, P IEEE INT PAR DISTR
[6]  
CEBRIAN JM, 2009, P IEEE INT PAR DISTR, P1, DOI DOI 10.1109/IPDPS.2009.5161022
[7]  
Donald J, 2006, CONF PROC INT SYMP C, P78, DOI 10.1145/1150019.1136493
[8]  
Esmaeilzadeh H, 2011, ISCA 2011: PROCEEDINGS OF THE 38TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, P365, DOI 10.1145/2024723.2000108
[9]   Microprocessor design issues: Thoughts on the road ahead [J].
Flynn, MJ ;
Hung, P .
IEEE MICRO, 2005, 25 (03) :16-31
[10]  
Isci C, 2006, INT SYMP MICROARCH, P347