A Multi-Agent Reinforcement Learning Approach to Price and Comfort Optimization in HVAC-Systems

被引：14

作者：

Blad, Christian ^{[1
,2
,4
]}

Bogh, Simon ^{[1
]}

Kallesoe, Carsten ^{[2
,3
]}

机构：

[1] Aalborg Univ, Dept Mat & Prod, Robot & Automat Grp, DK-9220 Aalborg, Denmark

[2] Grundfos, Technol & Innovat, Control Dept, DK-8850 Bjerringbro, Denmark

[3] Aalborg Univ, Dept Elect Syst, DK-9220 Aalborg, Denmark

[4] Fibigerstr 16, DK-9220 Aalborg, Denmark

来源：

ENERGIES | 2021年 / 14卷 / 22期

关键词：

deep reinforcement learning; artificial intelligence; HVAC-systems; underfloor heating; energy in buildings; predictive analytics; DEMAND RESPONSE; BUILDINGS; ENERGY;

D O I：

10.3390/en14227491

中图分类号：

TE [石油、天然气工业]; TK [能源与动力工程];

学科分类号：

0807 ; 0820 ;

摘要：

This paper addresses the challenge of minimizing training time for the control of Heating, Ventilation, and Air-conditioning (HVAC) systems with online Reinforcement Learning (RL). This is done by developing a novel approach to Multi-Agent Reinforcement Learning (MARL) to HVAC systems. In this paper, the environment formed by the HVAC system is formulated as a Markov Game (MG) in a general sum setting. The MARL algorithm is designed in a decentralized structure, where only relevant states are shared between agents, and actions are shared in a sequence, which are sensible from a system's point of view. The simulation environment is a domestic house located in Denmark and designed to resemble an average house. The heat source in the house is an air-to-water heat pump, and the HVAC system is an Underfloor Heating system (UFH). The house is subjected to weather changes from a data set collected in Copenhagen in 2006, spanning the entire year except for June, July, and August, where heat is not required. It is shown that: (1) When comparing Single Agent Reinforcement Learning (SARL) and MARL, training time can be reduced by 70% for a four temperature-zone UFH system, (2) the agent can learn and generalize over seasons, (3) the cost of heating can be reduced by 19% or the equivalent to 750 kWh of electric energy per year for an average Danish domestic house compared to a traditional control method, and (4) oscillations in the room temperature can be reduced by 40% when comparing the RL control methods with a traditional control method.

引用

页数：20

共 42 条

[1]

Banerjee B., 2003, The Conference on Autonomous Agents and Multiagent Systems, P686

[2] Autonomous HVAC Control, A Reinforcement Learning Approach [J].

Barrett, Enda ;

Linder, Stephen .

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT III, 2015, 9286 :3-19

[3] DYNAMIC PROGRAMMING [J].

BELLMAN, R .

SCIENCE, 1966, 153 (3731) :34-&

[4] Multiagent Reinforcement Learning: Rollout and Policy Iteration [J].

Bertsekas, Dimitri .

IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2021, 8 (02) :249-272

[5] Control of HVAC-systems with Slow Thermodynamic Using Reinforcement Learning [J].

Blad, C. ;

Koch, S. ;

Ganeswarathas, S. ;

Kallesoe, C. S. ;

Bogh, S. .

29TH INTERNATIONAL CONFERENCE ON FLEXIBLE AUTOMATION AND INTELLIGENT MANUFACTURING (FAIM 2019): BEYOND INDUSTRY 4.0: INDUSTRIAL ADVANCES, ENGINEERING EDUCATION AND INTELLIGENT MANUFACTURING, 2019, 38 :1308-1315

[6]

Blad C, 2020, IEEE/SICE I S SYS IN, P938, DOI [10.1109/sii46433.2020.9026189, 10.1109/SII46433.2020.9026189]

[7] Multiagent learning using a variable learning rate [J].