Batch reinforcement learning;
cloud-edge collaboration;
distribution network reconfiguration;
multi-agent deep reinforcement learning;
safe reinforcement learning;
D O I:
10.1109/TPWRS.2023.3296463
中图分类号:
TM [电工技术];
TN [电子技术、通信技术];
学科分类号:
0808 ;
0809 ;
摘要:
Network reconfiguration can maintain the optimal operation of distribution network with increasing penetration of distributed generations (DGs). However, network reconfiguration problems may not be solved quickly by traditional methods in large-scale distribution networks. In this context, a cloud-edge collaboration framework based on multi-agent deep reinforcement learning (MADRL) is proposed, where the MADRL model can be trained centrally in the cloud center and decentrally executed in edge servers to reduce the training cost and execution latency of MADRL. In addition, a discrete multi-agent soft actor-critic algorithm (MASAC) is introduced as the basic algorithm to address the non-stationary environment problem in MADRL. Then, online safe learning and offline safe learning are combined for the distribution network reconfiguration task in practice to update the neural networks of MADRL under constraints. Specifically, a novel offline algorithm called multi-agent constraints penalized Q-learning (MACPQ) is proposed to reduce the cost of trial-and-error process of MADRL while allowing agents to pre-train their policies from a historical dataset considering constraints. Meanwhile, a new online MADRL method called primal-dual MASAC is proposed to further improve the performance of agents by directly interacting with the physical distribution network under the safe action exploration. Finally, the superiority of the proposed methods is verified in IEEE 33-bus system and a practical 445-bus system.