Automatic train operation speed control based on ASP-SAC algorithm

被引：0

作者：

Liu, Bohong ^{[1
]}

Lu, Tian ^{[1
]}

机构：

[1] School of Automation and Electrical Engineering, Lanzhou Jiaotong University, Lanzhou

来源：

Journal of Railway Science and Engineering | 2024年 / 21卷 / 07期

关键词：

ASP-SAC algorithm; automatic train operation; multi-objective control; reinforcement learning; speed control;

D O I：

10.19713/j.cnki.43-1423/u.T20231620

中图分类号：

学科分类号：

摘要：

With the green transformation of economic development and the rapid advancement of artificial intelligence, urban rail transit has become an important mode of daily travel for residents. While ensuring safety, efficiency, and punctuality, the energy-saving and comfort demands of train operation have also attracted increasing attention. Reasonable operation strategies can effectively achieve automatic driving speed control of trains under multiple control requirements. Reinforcement learning, as an intelligent decision-making method, can effectively solve this control problem. Firstly, based on the comprehensive analysis of factors such as technology, safety, and passenger experience, the Soft Actor-Critic (SAC) algorithm was improved as the Action-State Experience Prioritized Soft Actor-Critic (ASP-SAC) method, using expert experience action segmentation and state information entropy to study the problem of automatic train operation speed control. Secondly, the problem was formalized as a Markov decision process. The train operation environment was established. The state space, action space, and reward function based on goal control were determined. Finally, using a section of data from the Beijing Subway Yizhuang Line as an example, the ASP-SAC method was validated and compared with other algorithms in the same environment. The research results show that the method is feasible for automatic train operation speed control under multiple target requirements, with an efficiency improvement of 22.73% compared to the unimproved algorithm, and a 29.17% improvement compared to the PPO algorithm. Additionally, the method outperforms SAC, DQN, PPO, and PID algorithms in timeliness, precision, and energy efficiency while ensuring safety and comfort during train operation, with energy consumption reduced by 3.64%, 5.62%, 4.38%, and 7.35% respectively, demonstrating good control effects. Furthermore, the method can possess robustness and has certain superiority and reference value in the aspect of automatic train operation speed control. © 2024, Central South University Press. All rights reserved.

引用

页码：2637 / 2648

页数：11

共 18 条

[11] KATHIRGAMANATHAN A, MANGINA E, FINN D P., Development of a Soft Actor Critic deep reinforcement learning approach for harnessing energy flexibility in a Large Office building, Energy and AI, 5, (2021)
[12] GUO Xiao, MENG Jianjun, CHEN Xiaoqiang, Et al., Deep reinforcement learning-based energy-saving optimization control method for urban rail transit trains [J/OL], Railway Standard Design, (2023)
[13] ZHANG Miao, ZHANG Qi, LIU Wentao, Et al., A policy-based reinforcement learning algorithm for intelligent train control[J], Journal of the China Railway Society, 42, 1, (2020)
[14] WU Xiaochun, JIN Zeling, Research on train energy saving control strategy based on DDPG algorithm[J], Journal of Railway Science and Engineering, 20, 2, (2023)
[15] LIU Panfeng, Research on optimization method of experience replay in deep reinforcement learning, (2021)
[16] HAKLIDIR M, TEMELTAS H., Guided soft actor critic: a guided deep reinforcement learning approach for partially observable Markov decision processes[J], IEEE Access, 9, pp. 159672-159683, (2021)
[17] BANERJEE C, CHEN Zhiyong, NOMAN N., Improved soft actor-critic: mixing prioritized off-policy samples with on-policy experiences[J], IEEE Transactions on Neural Networks and Learning Systems, 35, 3, (2024)
[18] YANG Jiachen, ZHANG Jipeng, XI Meng, Et al., A deep reinforcement learning algorithm suitable for autonomous vehicles: double bootstrapped soft-actor-critic-discrete[J], IEEE Transactions on Cognitive and Developmental Systems, 15, 4, (2023)

← 1 2 →