Minimizing Thermal Variation Across System Components

被引:15
作者
Zhang, Kaicheng [1 ]
Ogrenci-Memik, Seda [1 ]
Memik, Gokhan [1 ]
Yoshii, Kazutomo [2 ]
Sankaran, Rajesh [2 ]
Beckman, Pete [2 ]
机构
[1] Northwestern Univ, Dept EECS, Evanston, IL 60208 USA
[2] Argonne Natl Lab, Math & Comp Sci Div, Argonne, IL 60439 USA
来源
2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS) | 2015年
关键词
D O I
10.1109/IPDPS.2015.37
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Thermal overheating is a serious concern in modern supercomputing systems. Elevated temperature levels reduce the reliability and the lifetime of the underlying hardware and increase their power consumption. Previous studies on mitigating thermal hotspots at the hardware and run-time system levels have typically used approaches that trade off performance for reduced operating temperatures. In this paper, we first show that in a large-scale system, physical attributes cause an uneven temperature distribution. We then develop a model to characterize the thermal behavior of a complex system using various machine learning methods. We propose to improve application placement by incorporating thermal awareness into the decision-making process. Specifically, our system predicts the thermal condition of the system based on application mapping and uses these predictions to mitigate thermal hotspots without any performance loss. We provide two versions of our prediction mechanism. On a two-node configuration, these models achieve 72.5% and 78.8% success rates in their predictions, respectively. In other words, the scheduling decisions of our models result in a task placement that has a lower maximum average temperature. Overall, the more aggressive scheme reduces the average peak temperature by up to 11.9 degrees C (2.3 degrees C on average) without any performance degradation.
引用
收藏
页码:1139 / 1148
页数:10
相关论文
共 16 条
[1]  
[Anonymous], P ANN C USENIX ANN T
[2]  
Bianchini R., 2008, S HIGH PERF COMP ARC
[3]  
Choi J., 2007, SUPERCOMPUTING
[4]   Thermal Prediction and Adaptive Control Through Workload Phase Detection [J].
Cochran, Ryan ;
Reda, Sherief .
ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2013, 18 (01)
[5]   Argonne applications for the IBM Blue Gene/Q, Mira [J].
Coghlan, S. ;
Kumaran, K. ;
Loy, R. M. ;
Messina, P. ;
Morozov, V. ;
Osborn, J. C. ;
Parker, S. ;
Riley, K. M. ;
Romero, N. A. ;
Williams, T. J. .
IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2013, 57 (1-2)
[6]  
Gupta S. K. S., 2007, P COMSWARE
[7]  
Hall M., 2009, SIGKDD Explorations, V11, P10, DOI DOI 10.1145/1656274.1656278
[8]   Exploiting Application/System-Dependent Ambient Temperature for Accurate Microarchitectural Simulation [J].
Jang, Hyung Beom ;
Choi, Jinhang ;
Yoon, Ikroh ;
Lim, Sung-Soo ;
Shin, Seungwon ;
Chang, Naehyuck ;
Chung, Sung Woo .
IEEE TRANSACTIONS ON COMPUTERS, 2013, 62 (04) :705-715
[9]  
Kumar A., 2006, P 43 ANN DES AUT C
[10]   Predictive Temperature-Aware DVFS [J].
Lee, Jong Sung ;
Skadron, Kevin ;
Chung, Sung Woo .
IEEE TRANSACTIONS ON COMPUTERS, 2010, 59 (01) :127-133