Reward shaping-based deep reinforcement learning for look-ahead dispatch with rolling-horizon

被引：0

作者：

Xu, Hongsheng ^{[1
]}

Xu, Yungui ^{[1
]}

Wang, Ke ^{[1
]}

Li, Yaping ^{[2
]}

Al Ahad, Abdullah ^{[1
]}

机构：

[1] Hohai Univ, Sch Elect & Power Engn, Nanjing 211100, Peoples R China

[2] China Elect Power Res Inst, Dept Power Automat, Nanjing 210000, Peoples R China

来源：

INTERNATIONAL JOURNAL OF ELECTRICAL POWER & ENERGY SYSTEMS | 2025年 / 168卷

关键词：

Look-ahead dispatch; Rolling-horizon; Deep reinforcement learning; Reward shaping; Soft actor-critic;

D O I：

10.1016/j.ijepes.2025.110673

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The increasing penetration of renewable energy exacerbates the challenges in designing an effective and adaptable model-driven Look-ahead Dispatch (LAD) method. Recently, deep reinforcement learning (DRL) methods show enormous potential in developing a dispatching agent with self-learning ability, attributed to their superior generalization, adaptability, and computational efficiency. However, existing DRL-based LAD methods overlook the discounting effect when calculating the immediate total reward for LAD and lack attention to trial-and-error reward design and expected discounted returns that could reflect the true performance metrics of LAD. Therefore, this paper proposes novel reward shaping (RS)-based DRL algorithms for the rolling-horizon LAD problem. We propose the method for accurately estimating the look-ahead discounted factor that best matches different look-ahead horizons (LAHs). The shaped reward functions are designed and an RS-based regularization is also proposed by employing a potential function. Case studies on the SG 126-bus and IEEE 118-bus systems demonstrate the effectiveness of the proposed improved measures, as well as the superiority and adaptability of the proposed improved DRL algorithms in training and testing performance. (c) 2017 Elsevier Inc. All rights reserved.

引用

页数：21

共 45 条

[1] [Anonymous], 2021, Intelligent arrangement of grid operation organization
[2] Multiple Time Resolution Stochastic Scheduling for Systems With High Renewable Penetration
Bakirtzis, Emmanouil A.
Biskas, Pandelis N.
[J]. IEEE TRANSACTIONS ON POWER SYSTEMS, 2017, 32 (02) : 1030 - 1040
[3] Booth S, 2023, AAAI CONF ARTIF INTE, P5920
[4] A scalable graph reinforcement learning algorithm based stochastic dynamic dispatch of power system under high penetration of renewable energy
Chen, Junbin
Yu, Tao
Pan, Zhenning
Zhang, Mengyue
Deng, Bairong
[J]. INTERNATIONAL JOURNAL OF ELECTRICAL POWER & ENERGY SYSTEMS, 2023, 152
[5] Improved Proximal Policy Optimization Algorithm for Sequential Security-Constrained Optimal Power Flow Based on Expert Knowledge and Safety Layer
Chen, Yanbo
Du, Qintao
Liu, Honghai
Cheng, Liangcheng
Younis, Muhammad Shahzad
[J]. JOURNAL OF MODERN POWER SYSTEMS AND CLEAN ENERGY, 2024, 12 (03) : 742 - 753
[6] Cheng L, 2023, CSEE Journal of Power and Energy Systems
[7] [成梁成 Cheng Liangcheng], 2024, [电网技术, Power System Technology], V48, P3133
[8] [冯斌 Feng Bin], 2023, [电力系统自动化, Automation of Electric Power Systems], V47, P187
[9] Fujimoto S, 2018, PR MACH LEARN RES, V80
[10] Haarnoja T, 2019, Arxiv, DOI arXiv:1812.05905

← 1 2 3 4 5 →