共 1 条
Autonomous Input Voltage Sharing Control and Triple Phase Shift Modulation Method for ISOP-DAB Converter in DC Microgrid: A Multiagent Deep Reinforcement Learning-Based Method
被引:35
|作者:
Zeng, Yu
[1
]
Pou, Josep
[1
]
Sun, Changjiang
[2
]
Mukherjee, Suvajit
[3
]
Xu, Xu
[2
,4
]
Gupta, Amit Kumar
[3
]
Dong, Jiaxin
[1
]
机构:
[1] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore 639798, Singapore
[2] Nanyang Technol Univ, Rolls Royce NTU Corp Lab, Singapore 639798, Singapore
[3] Rolls Royce Singapore Private Ltd, Singapore 638673, Singapore
[4] Xian Jiaotong Liverpool Univ, Sch Adv Technol, Dept Elect & Elect Engn, Suzhou 215123, Peoples R China
基金:
新加坡国家研究基金会;
关键词:
Microgrids;
Voltage control;
Stress;
Uncertainty;
Minimization;
Inductors;
Training;
Input-series output-parallel-connected dual active bridge (ISOP-DAB) converter;
input voltage sharing (IVS);
multiagent twin-delayed deep deterministic policy gradient (MA-TD3);
triple phase shift modulation;
BIDIRECTIONAL DC/DC CONVERTER;
REACTIVE POWER;
CONTROL STRATEGY;
OPTIMIZATION;
TRANSFORMER;
D O I:
10.1109/TPEL.2022.3218900
中图分类号:
TM [电工技术];
TN [电子技术、通信技术];
学科分类号:
0808 ;
0809 ;
摘要:
This article proposes a multiagent (MA) deep reinforcement learning (DRL) based autonomous input voltage sharing (IVS) control and triple phase shift modulation method for input-series output-parallel (ISOP) dual active bridge (DAB) converters to solve the three challenges: the uncertainties of the dc microgrid, the power balance problem, and the current stress minimization of the converter. Specifically, the control and modulation problem of the ISOP-DAB converter is formed as a Markov game with several DRL agents. Subsequently, the MA twin-delayed deep deterministic policy gradient (MA-TD3) algorithm is applied to train the DRL agents in an offline manner. After the training process, the multiple agents can provide online control decisions for the ISOP-DAB converter to balance the IVS, and minimize the current stress among different submodules. Without accurate model information, the proposed method can adaptively obtain the optimal modulation variable combinations in a stochastic and uncertain environment. Simulation and experimental results verify the effectiveness of the proposed MA-TD3-based algorithm.
引用
收藏
页码:2985 / 3000
页数:16
相关论文