Toward Physics-Guided Safe Deep Reinforcement Learning for Green Data Center Cooling Control

被引：11

作者：

Wang, Ruihang ^{[1
]}

Zhang, Xinyi ^{[1
]}

Zhou, Xin ^{[1
]}

Wen, Yonggang ^{[1
]}

Tan, Rui ^{[1
]}

机构：

[1] Nanyang Technol Univ, Singapore, Singapore

来源：

2022 13TH ACM/IEEE INTERNATIONAL CONFERENCE ON CYBER-PHYSICAL SYSTEMS (ICCPS 2022) | 2022年

基金：

新加坡国家研究基金会;

关键词：

Data center; safe reinforcement learning; energy efficiency; thermal safety;

D O I：

10.1109/ICCPS54341.2022.00021

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep reinforcement learning (DRL) has shown good performance in tackling Markov decision process (MDP) problems. As DRL optimizes a long-term reward, it is a promising approach to improving the energy efficiency of data center cooling. However, enforcement of thermal safety constraint during DRL's state exploration is a main challenge. The widely adopted reward shaping approach adds negative reward when the exploratory action results in unsafety. Thus, it needs to experience sufficient unsafe states before it learns how to prevent unsafety. In this paper, we propose a safety-aware DRL framework for single-hall data center cooling control. It applies offline imitation learning and online post-hoc rectification to holistically prevent thermal unsafety during online DRL. In particular, the post-hoc rectification searches for the minimum modification to the DRL-recommended action such that the rectified action will not result in unsafety. The rectification is designed based on a thermal state transition model that is fitted using historical safe operation traces and able to extrapolate the transitions to unsafe states explored by DRL. Extensive evaluation for chilled water and direct expansion cooled data centers in two climate conditions shows that our approach saves 22.7% to 26.6% total data center power compared with conventional control, reduces safety violations by 94.5% to 99% compared with reward shaping.

引用

页码：159 / 169

页数：11

共 31 条

[1]

[Anonymous], 2021, GLOBAL INTERNET DATA

[2]

[Anonymous], 2021, Alibaba Cluster Trace Program.

[3]

[Anonymous], 2021, ENERGYPLUS SETPOINT

[4]

[Anonymous], 2011, 2011 Thermal Guidelines for Data Processing Environments - Expanded Data Center Classes and Usage Guidance

[5]

[Anonymous], 1995, DISCRETE TIME CONTRO

[6] Nonlinear control of a heating, ventilating, and air conditioning system with thermal load estimation [J].

Argüello-Serrano, B ;

Vélez-Reyes, M .

IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 1999, 7 (01) :56-63

[7]

Bingqing Chen, 2021, e-Energy '21: Proceedings of the Twelfth International Conference on Future Energy Systems, P199, DOI 10.1145/3447555.3464874

[8]

Brockman G, 2016, Arxiv, DOI arXiv:1606.01540

[9] Gnu-RL: A Precocial Reinforcement Learning Solution for Building HVAC Control Using a Differentiable MPC Policy [J].

Chen, Bingqing ;

Cai, Zicheng ;

Berges, Mario .

BUILDSYS'19: PROCEEDINGS OF THE 6TH ACM INTERNATIONAL CONFERENCE ON SYSTEMS FOR ENERGY-EFFICIENT BUILDINGS, CITIES, AND TRANSPORTATION, 2019, :316-325

[10]

Chi C., 2020, ACM E ENERGY

← 1 2 3 4 →