Mixed Deep Reinforcement Learning Considering Discrete-continuous Hybrid Action Space for Smart Home Energy Management

被引：47

作者：

Huang, Chao ^{[1
,2
,3
]}

Zhang, Hongcai ^{[1
,4
]}

Wang, Long ^{[2
,3
]}

Luo, Xiong ^{[2
,3
]}

Song, Yonghua ^{[1
,4
]}

机构：

[1] Univ Macau, State Key Lab Internet Things Smart City, Macau, Peoples R China

[2] Univ Sci & Technol Beijing, Sch Comp & Commun Engn, Beijing 100083, Peoples R China

[3] Univ Sci & Technol Beijing, Shunde Grad Sch, Foshan 528399, Peoples R China

[4] Univ Macau, Dept Elect & Comp Engn, Macau, Peoples R China

来源：

JOURNAL OF MODERN POWER SYSTEMS AND CLEAN ENERGY | 2022年 / 10卷 / 03期

基金：

中国国家自然科学基金; 北京市自然科学基金;

关键词：

Home appliances; HVAC; Reinforcement learning; Costs; Aerospace electronics; Renewable energy sources; Task analysis; Demand response; deep reinforcement learning; discrete-continuous action space; home energy management; safe reinforcement learning; SYSTEM; ELECTRICITY;

D O I：

10.35833/MPCE.2021.000394

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This paper develops deep reinforcement learning (DRL) algorithms for optimizing the operation of home energy system which consists of photovoltaic (PV) panels, battery energy storage system, and household appliances. Model-free DRL algorithms can efficiently handle the difficulty of energy system modeling and uncertainty of PV generation. However, discrete-continuous hybrid action space of the considered home energy system challenges existing DRL algorithms for either discrete actions or continuous actions. Thus, a mixed deep reinforcement learning (MDRL) algorithm is proposed, which integrates deep Q-learning (DQL) algorithm and deep deterministic policy gradient (DDPG) algorithm. The DQL algorithm deals with discrete actions, while the DDPG algorithm handles continuous actions. The MDRL algorithm learns optimal strategy by trial-and-error interactions with the environment. However, unsafe actions, which violate system constraints, can give rise to great cost. To handle such problem, a safe-MDRL algorithm is further proposed. Simulation studies demonstrate that the proposed MDRL algorithm can efficiently handle the challenge from discrete-continuous hybrid action space for home energy management. The proposed MDRL algorithm reduces the operation cost while maintaining the human thermal comfort by comparing with benchmark algorithms on the test dataset. Moreover, the safe-MDRL algorithm greatly reduces the loss of thermal comfort in the learning stage by the proposed MDRL algorithm.

引用

页码：743 / 754

页数：12

共 44 条

[1] Automated Demand Response From Home Energy Management System Under Dynamic Pricing and Power and Comfort Constraints [J].

Althaher, Sereen ;

Mancarella, Pierluigi ;

Mutale, Joseph .

IEEE TRANSACTIONS ON SMART GRID, 2015, 6 (04) :1874-1883

[2] Deep Reinforcement Learning A brief survey [J].

Arulkumaran, Kai ;

Deisenroth, Marc Peter ;

Brundage, Miles ;

Bharath, Anil Anthony .

IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (06) :26-38

[3] An Online Learning Algorithm for Demand Response in Smart Grid [J].

Bahraini, Shahab ;

Wong, Vincent W. S. ;

Huang, Jianwei .

IEEE TRANSACTIONS ON SMART GRID, 2018, 9 (05) :4712-4725

[4] Reinforcement Learning and Its Applications in Modern Power and Energy Systems: A Review [J].

Cao, Di ;

Hu, Weihao ;

Zhao, Junbo ;

Zhang, Guozhou ;

Zhang, Bin ;

Liu, Zhou ;

Chen, Zhe ;

Blaabjerg, Frede .

JOURNAL OF MODERN POWER SYSTEMS AND CLEAN ENERGY, 2020, 8 (06) :1029-1042

[5] HEMS-enabled transactive flexibility in real-time operation of three-phase unbalanced distribution systems [J].

Faqiry, Mohammad Nazif ;

Wang, Li ;

Wu, Hongyu .

JOURNAL OF MODERN POWER SYSTEMS AND CLEAN ENERGY, 2019, 7 (06) :1434-1449

[6] Batch-Constrained Reinforcement Learning for Dynamic Distribution Network Reconfiguration [J].

Gao, Yuanqi ;

Wang, Wei ;

Shi, Jie ;

Yu, Nanpeng .

IEEE TRANSACTIONS ON SMART GRID, 2020, 11 (06) :5357-5369

[7]

García J, 2015, J MACH LEARN RES, V16, P1437

[8] Multi-Objective Air-Conditioning Control Considering Fuzzy Parameters Using Immune Clonal Selection Programming [J].

Hong, Ying-Yi ;

Lin, Jie-Kai ;

Wu, Ching-Ping ;

Chuang, Chi-Cheng .

IEEE TRANSACTIONS ON SMART GRID, 2012, 3 (04) :1603-1610

[9] Chance Constrained Optimization in a Home Energy Management System [J].

Huang, Yantai ;

Wang, Lei ;

Guo, Weian ;

Kang, Qi ;

Wu, Qidi .

IEEE TRANSACTIONS ON SMART GRID, 2018, 9 (01) :252-260

[10] Adaptive residential demand-side management using rule-based techniques in smart grid environments [J].

Keshtkar, Azim ;

Arzanpour, Siamak ;

Keshtkar, Fazel .

ENERGY AND BUILDINGS, 2016, 133 :281-294

← 1 2 3 4 5 →