Limited Information Aggregation for Collaborative Driving in Multi-Agent Autonomous Vehicles

被引:4
作者
Liang, Qingyi [1 ,2 ]
Liu, Jia [2 ]
Jiang, Zhengmin [2 ,3 ]
Yin, Jianwen [2 ]
Xu, Kun [2 ]
Li, Huiyun [2 ]
机构
[1] Southern Univ Sci & Technol, Shenzhen 518055, Peoples R China
[2] Chinese Acad Sci, Shenzhen Inst Adv Technol, CAS Key Lab Human Machine Intelligence Synergy Sys, Shenzhen 518055, Peoples R China
[3] Univ Chinese Acad Sci, Beijing 101408, Peoples R China
关键词
Collaboration; Observability; Autonomous vehicles; Training; Decision making; Data mining; Uncertainty; Multi-agent autonomous vehicles; collaborative driving; information aggregation;
D O I
10.1109/LRA.2024.3410159
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Multi-agent reinforcement learning (MARL) methods have emerged as a promising solution for multi-agent collaborative driving in the intersection and roundabout scenarios. However, these methods need large amounts of training data obtained from the interaction with the driving simulator, and learning from limited interaction remains significantly underdeveloped. In this letter, we propose an efficient MARL method to address this challenge. Our method enables each vehicle to receive limited messages from surrounding vehicles, which are then used to augment the input representation of the local driving policy. By predicting the next-step state based on the current augmented local state and action, our approach enhances the decision-making capability of each vehicle. Specifically, we design a Self-supervised Message Attention Encoding (SMAE) module that utilizes an attention mechanism to aggregate the received messages and local observations, generating a compact representation. Then, this representation is used in a self-supervised module to predict the next-step state. By jointly training the encoder module and the prediction module, each vehicle effectively leverages the most relevant components of the aggregated representation to improve the learning efficiency of driving policy and alleviate issues related to partial observability in making driving decisions. To validate the effectiveness of our approach, we conduct experiments using an open-source autonomous driving simulator. The simulation results demonstrate that our proposed method outperforms the IPPO, MAPPO and CoPO algorithms in terms of success rate, route completion rate, crash rate, and other relevant metrics.
引用
收藏
页码:6624 / 6631
页数:8
相关论文
共 31 条
[1]  
Dai Z., 2023, PMLR, P946
[2]  
Eccles T., 2019, Adv. Neural Inf. Process. Syst., V32, P13121
[3]  
Foerster JN, 2016, ADV NEUR IN, V29
[4]  
Foerster JN, 2018, AAAI CONF ARTIF INTE, P2974
[5]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[6]  
Isele D, 2018, IEEE INT CONF ROBOT, P2034
[7]  
Kim W., 2020, P INT C LEARN REPR
[8]  
Kuba J. G., 2022, ICLR 2022 10 INT C L, P1046
[9]   MetaDrive: Composing Diverse Driving Scenarios for Generalizable Reinforcement Learning [J].
Li, Quanyi ;
Peng, Zhenghao ;
Feng, Lan ;
Zhang, Qihang ;
Xue, Zhenghai ;
Zhou, Bolei .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (03) :3461-3475
[10]  
Liang Q., 1988, Eur. J. Pers., V2, P372