共 41 条
Online regulation control of pulsed power loads via supercapacitor with deep reinforcement learning utilizing a long short-term memory network and attention mechanism
被引:1
作者:
Shang, Chengya
[1
]
Fu, Lijun
[1
]
Xiao, Haipeng
[1
]
Lin, Yunfeng
[1
]
机构:
[1] Naval Univ Engn, Natl Key Lab Electromagnet Energy, Wuhan 430033, Peoples R China
关键词:
Attention mechanism;
Deep reinforcement learning;
Pulse power load;
Ship power system;
Supercapacitor regulation control;
ENERGY-STORAGE;
COORDINATION;
D O I:
10.1016/j.est.2024.114080
中图分类号:
TE [石油、天然气工业];
TK [能源与动力工程];
学科分类号:
0807 ;
0820 ;
摘要:
The integration of pulse power loads (PPLs) presents substantial challenges to the stable operation of DC ship power systems (SPSs). However, current model-based control strategies, both offline and online, are susceptible to the impact of model inaccuracies or parameter uncertainties. This article proposes a novel deep reinforcement learning (DRL) method to address PPL online regulation problem in real-time for SPS. While considering the charging current's ramp-up limitation for supercapacitor-based energy storage systems (ESSs), the PPL online regulation model is formulated with the goal of the fast charging of supercapacitor, rapid regulation of bus voltage, and proportional distribution of generator load current. Then, a twin delayed deep deterministic policy gradient (TD3) combined with a bi-directional long short-term memory (Bi-LSTM) network and an attention mechanism (AM), referred to as Bi-LSTM-AM-TD3 algorithm, is applied to optimize the generator output voltage and ESS charging current. The proposed method can improve the feature extraction ability of agents from state data and enhance their control performance. A case study is analyzed based on historical operational dataset of a DC SPS. The numerical results indicate that the proposed method improves the reward by 8.43 %, 9.72 %, and 20.16 % compared to TD3, DDPG, and PI-based methods, respectively. Additionally, it shows a 5.75 % improvement in the reward and a 23.19 % reduction in convergence time compared to the agent without AM. The effectiveness of the proposed method under continuous PPL scenarios and migration scenarios is also validated. Finally, we test the algorithm's performance on a laboratory-scale platform.
引用
收藏
页数:19