共 9 条
Soft Actor-Critic Deep Reinforcement Learning for Train Timetable Collaborative Optimization of Large-Scale Urban Rail Transit Network Under Dynamic Demand
被引:0
|作者:
Wen, Longhui
[1
]
Hu, Liyang
[2
]
Zhou, Wei
[1
]
Ren, Gang
[2
]
Zhang, Ning
[1
]
机构:
[1] Southeast Univ, Intelligent Transportat Syst Res Ctr, Nanjing 211189, Peoples R China
[2] Southeast Univ, Jiangsu Prov Collaborat Innovat Ctr Modern Urban T, Sch Transportat, Jiangsu Key Lab Urban ITS, Nanjing 211189, Peoples R China
基金:
中国国家自然科学基金;
关键词:
Rails;
Schedules;
Collaboration;
Real-time systems;
Dynamic scheduling;
Optimal scheduling;
Artificial intelligence;
Time-varying systems;
Synchronization;
Urban rail transit;
deep reinforcement learning;
train timetable collaborative optimization;
soft actor-critic;
TIME-DEPENDENT DEMAND;
METRO SYSTEM;
MODEL;
SYNCHRONIZATION;
COORDINATION;
ALGORITHM;
D O I:
10.1109/TITS.2025.3525538
中图分类号:
TU [建筑科学];
学科分类号:
0813 ;
摘要:
To address the collaborative issue in large-scale urban rail transit (URT) network operations, this paper proposes an adaptive real-time control framework based on the Soft Actor-Critic (SAC) deep reinforcement learning (DRL) method, featuring flexible train scheduling capabilities. First, by analyzing dynamic passenger travel behavior (e.g., entering/exiting stations, transferring) and train operation events (e.g., dispatching, interstation running, station dwelling), the control problem is modeled as a Markov Decision Process (MDP) and an efficient URT simulation environment is constructed. Then, considering constraints such as train capacity and dispatch intervals, a train scheduling model is developed to minimize both passenger costs and operational costs. Subsequently, the real-time state of the URT system is represented by the overall number of passengers present at every platform, and train dispatch intervals on all lines are used as decision variables. A solving algorithm based on the SAC framework is developed. Finally, experimental results on a large-scale URT network comprising 10 lines demonstrate the effectiveness of the proposed framework, showing superior performance compared to other reinforcement learning algorithms and traditional heuristic optimization algorithms. The proposed approach achieves a 1.63% reduction in average passenger waiting time, equivalent to 2.09 seconds, while utilizing 49 fewer trains, representing a 2.97% decrease, compared to the second-best TD3 algorithm.
引用
收藏
页数:15
相关论文