Bi-Level Off-Policy Reinforcement Learning for Two-Timescale Volt/VAR Control in Active Distribution Networks

被引:35
作者
Liu, Haotian [1 ]
Wu, Wenchuan [1 ]
Wang, Yao [2 ]
机构
[1] Tsinghua Univ, Dept Elect Engn, State Key Lab Power Syst, Beijing 100084, Peoples R China
[2] State Grid Jilin Elect Power Co Ltd, Changchun 130021, Jilin, Peoples R China
基金
美国国家科学基金会;
关键词
Volt/var control; reinforcement learning; bi-level; multi-timescale; active distribution networks; DISTRIBUTION-SYSTEMS; OPTIMIZATION;
D O I
10.1109/TPWRS.2022.3168700
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In Volt/Var control (VVC) of active distribution networks (ADNs), both slow timescale discrete devices (STDDs, e.g. on-load tap changers) and fast timescale continuous devices (FTCDs, e.g. distributed generators) are involved and should be coordinated in time sequence. Traditional two-timescale VVC optimizes STDDs and FTCDs based on accurate system models, but sometimes is impractical because of its unaffordable modeling effort. In this paper, a novel bi-level off-policy reinforcement learning (RL) method is proposed to solve this in a model-free manner. A Bi-level Markov decision process (BMDP) is defined and separate agents are set up for the slow and fast timescale sub-problems. For the fast timescale sub-problem, we adopt an off-policy RL method with high sample efficiency. For the slow one, we develop an off-policy multi-discrete soft actor-critic (MDSAC) algorithm to address the curse of dimensionality with various STDDs. To mitigate the non-stationary issue in the two agents' training, we propose a multi-timescale off-policy correction (MTOPC) method by adopting the importance sampling technique. Comprehensive numerical studies not only demonstrate the proposed method can achieve stable and satisfactory optimization of both STDDs and FTCDs without any model information, but also support that the proposed method outperforms existing VVC methods involving both STDDs and FTCDs.
引用
收藏
页码:385 / 395
页数:11
相关论文
共 33 条
[1]   Model-Free Optimal Control of VAR Resources in Distribution Systems: An Extremum Seeking Approach [J].
Arnold, Daniel B. ;
Negrete-Pincetic, Matias ;
Sankur, Michael D. ;
Auslander, David M. ;
Callaway, Duncan S. .
IEEE TRANSACTIONS ON POWER SYSTEMS, 2016, 31 (05) :3583-3593
[2]   NETWORK RECONFIGURATION IN DISTRIBUTION-SYSTEMS FOR LOSS REDUCTION AND LOAD BALANCING [J].
BARAN, ME ;
WU, FF .
IEEE TRANSACTIONS ON POWER DELIVERY, 1989, 4 (02) :1401-1407
[4]   First return, then explore [J].
Ecoffet, Adrien ;
Huizinga, Joost ;
Lehman, Joel ;
Stanley, Kenneth O. ;
Clune, Jeff .
NATURE, 2021, 590 (7847) :580-586
[5]   Batch-Constrained Reinforcement Learning for Dynamic Distribution Network Reconfiguration [J].
Gao, Yuanqi ;
Wang, Wei ;
Shi, Jie ;
Yu, Nanpeng .
IEEE TRANSACTIONS ON SMART GRID, 2020, 11 (06) :5357-5369
[6]  
Gu S., 2016, Q-Prop: Sample-Efficient Policy Gradient with an Off-Policy Critic
[7]  
Haarnoja T, 2018, PR MACH LEARN RES, V80
[8]   Bi-Level Volt-VAR Optimization to Coordinate Smart Inverters With Voltage Control Devices [J].
Jha, Rahul Ranjan ;
Dubey, Anamika ;
Liu, Chen-Ching ;
Schneider, Kevin P. .
IEEE TRANSACTIONS ON POWER SYSTEMS, 2019, 34 (03) :1801-1813
[9]   Two-Timescale Multi-Objective Coordinated Volt/Var Optimization for Active Distribution Networks [J].
Jin, Dan ;
Chiang, Hsiao-Dong ;
Li, Peng .
IEEE TRANSACTIONS ON POWER SYSTEMS, 2019, 34 (06) :4418-4428
[10]  
Kurbatova T, 2020, 2020 IEEE KHPI WEEK ON ADVANCED TECHNOLOGY (KHPI WEEK), P260, DOI [10.1109/KhPIWeek51551.2020.9250098, 10.1109/khpiweek51551.2020.9250098]