Gnu-RL: A Precocial Reinforcement Learning Solution for Building HVAC Control Using a Differentiable MPC Policy

被引：117

作者：

Chen, Bingqing ^{[1
]}

Cai, Zicheng ^{[2
]}

Berges, Mario ^{[1
]}

机构：

[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA

[2] Dell Technol, Austin, TX USA

来源：

BUILDSYS'19: PROCEEDINGS OF THE 6TH ACM INTERNATIONAL CONFERENCE ON SYSTEMS FOR ENERGY-EFFICIENT BUILDINGS, CITIES, AND TRANSPORTATION | 2019年

关键词：

Deep Reinforcement Learning; HVAC Control; MODEL-PREDICTIVE CONTROL; PART;

D O I：

10.1145/3360322.3360849

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Reinforcement learning (RL) was first demonstrated to be a feasible approach to controlling heating, ventilation, and air conditioning (HVAC) systems more than a decade ago. However, there has been limited progress towards a practical and scalable RL solution for HVAC control. While one can train an RL agent in simulation, it is not cost-effective to create a model for each thermal zone or building. Likewise, existing RL agents generally take a long time to learn and are opaque to expert interrogation, making them unattractive for real-world deployment. To tackle these challenges, we propose Gnu-RL: a novel approach that enables practical deployment of RL for HVAC control and requires no prior information other than historical data from existing HVAC controllers. To achieve this, Gnu-RL adopts a recently-developed Differentiable Model Predictive Control (MPC) policy, which encodes domain knowledge on planning and system dynamics, making it both sample-efficient and interpretable. Prior to any interaction with the environment, a Gnu-RL agent is pre-trained on historical data using imitation learning, which enables it to match the behavior of the existing controller. Once it is put in charge of controlling the environment, the agent continues to improve its policy end-to-end, using a policy gradient algorithm. We evaluate Gnu-RL on both an EnergyPlus model and a real-world testbed. In both experiments, our agents were directly deployed in the environment after offline pre-training on expert demonstration. In the simulation experiment, our approach saved 6.6% energy compared to the best published RL result for the same environment, while maintaining a higher level of occupant comfort. Next, Gnu-RL was deployed to control the HVAC of a real-world conference room for a three-week period. Our results show that Gnu-RL saved 16.7% of cooling demand compared to the existing controller and tracked temperature set-point better.

引用

页码：316 / 325

页数：10

共 50 条

[1]

Agbi C, 2012, IEEE DECIS CONTR P, P6951, DOI 10.1109/CDC.2012.6425995

[2]

Amos B., 2017, P MACHINE LEARNING R, V70, P136

[3]

Amos B., 2018, Differentiable MPC for End-to-end Planning and Control

[4]

[Anonymous], 2016, PROC INT C MACH LEAR

[5]

[Anonymous], 2016, ARXIV161101224

[6] Reducing Transient and Steady State Electricity Consumption in HVAC Using Learning-Based Model-Predictive Control [J].

Aswani, Anil ;

Master, Neal ;

Taneja, Jay ;

Culler, David ;

Tomlin, Claire .

PROCEEDINGS OF THE IEEE, 2012, 100 (01) :240-253

[7] Control-Oriented Thermal Modeling of Multizone Buildings: Methods and Issues INTELLIGENT CONTROL OF A BUILDING SYSTEM [J].

Atam, Ercan ;

Helsen, Lieve .

IEEE CONTROL SYSTEMS MAGAZINE, 2016, 36 (03) :86-111

[8]

Baghaee S., 2018, 2018 26th Signal Processing and Communications Applications Conference (SIU), P1, DOI DOI 10.18260/1-2--29750

[9]

Baydin AG, 2018, J MACH LEARN RES, V18

[10]

Bengea S., 2012, P 2 INT C BUILD EN E, P979

← 1 2 3 4 5 →