Online Resource Management in Thermal and Energy Constrained Heterogeneous High Performance Computing

被引:6
作者
Oxley, Mark A. [1 ]
Pasricha, Sudeep [2 ]
Maciejewski, Anthony A. [1 ]
Siegel, Howard Jay [1 ,2 ]
Burns, Patrick J. [3 ]
机构
[1] Colorado State Univ, Dept Elect & Comp Engn, Ft Collins, CO 80523 USA
[2] Colorado State Univ, Dept Comp Sci, Ft Collins, CO 80523 USA
[3] Colorado State Univ, Informat Technol, Ft Collins, CO 80523 USA
来源
2016 IEEE 14TH INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, 14TH INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, 2ND INTL CONF ON BIG DATA INTELLIGENCE AND COMPUTING AND CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/DATACOM/CYBERSC | 2016年
关键词
heterogeneous computing; resource management; thermal-aware computing; energy-aware computing; HPC; DVFS; DATA CENTERS; POWER;
D O I
10.1109/DASC-PICom-DataCom-CyberSciTec.2016.111
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Operators of high-performance computing (HPC) facilities face conflicting trade-offs between the operating temperature of the facility, reliability of compute nodes, energy costs, and computing performance. Intelligent management of the HPC facility typically involves taking a proactive approach by predicting the thermal implications of allocating tasks to different cores around the facility. This offers the benefit of operating the HPC facility at a hotter CRAC temperature while avoiding hotspots. However, such an approach can be a time-consuming process that requires complicated air flow models to be calculated for every mapping decision. We propose a framework in which offline analysis is used to assist an online resource manager by predicting the thermal implications of mapping a given workload. The goal is to maximize the reward earned from completing tasks by their individual deadlines throughout the day, while adhering to a daily energy budget and temperature threshold constraints. We show that our proposed techniques can earn significantly greater reward than traditional load balancing and thermal management schemes.
引用
收藏
页码:604 / 611
页数:8
相关论文
共 21 条
[1]  
A. T. Commitee, 2011, TECHNICAL REPORT
[2]  
Adaptive Computing, 2015, NOD ALL POL MOAB WOR
[3]   Power and Thermal-Aware Workload Allocation in Heterogeneous Data Centers [J].
Al-Qawasmeh, Abdulla M. ;
Pasricha, Sudeep ;
Maciejewski, Anthony A. ;
Siegel, Howard Jay .
IEEE TRANSACTIONS ON COMPUTERS, 2015, 64 (02) :477-491
[4]  
Demetriou D., 2013, J ELECT PACKAGING, V135
[5]  
Feitelson D.G., 2015, Parallel workload archive
[6]   Energy and Network Aware Workload Management for Sustainable Data Centers with Thermal Storage [J].
Guo, Yuanxiong ;
Gong, Yanmin ;
Fang, Yuguang ;
Khargonekar, Pramod P. ;
Geng, Xiaojun .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2014, 25 (08) :2030-2042
[7]  
Hewlet-Packard/Intel/Microsoft/Phoenix/Toshiba, 2011, ADV CONF POW INT SPE
[8]   Statistical prediction of task execution times through analytic benchmarking for scheduling in a heterogeneous environment [J].
Iverson, MA ;
Özgüner, F ;
Potter, L .
IEEE TRANSACTIONS ON COMPUTERS, 1999, 48 (12) :1374-1379
[9]  
Jonas M., 2012, 3 INT GREEN COMP C I
[10]  
Lee E. K., 2012, 19 INT C HIGH PERF C