Language-Driven Policy Distillation for Cooperative Driving in Multi-Agent Reinforcement Learning

被引：0

作者：

Liu, Jiaqi ^{[1
,2
]}

Xu, Chengkai ^{[1
,2
]}

Hang, Peng ^{[1
,2
]}

Sun, Jian ^{[1
,2
]}

Ding, Mingyu ^{[3
]}

Zhan, Wei ^{[4
]}

Tomizuka, Masayoshi ^{[4
]}

机构：

[1] Tongji Univ, Coll Transportat, Minist Educ, Shanghai 201804, Peoples R China

[2] Tongji Univ, Key Lab Rd & Traff Engn, Minist Educ, Shanghai 201804, Peoples R China

[3] Univ North Carolina, Dept Comp Sci, Chapel Hill, NC 27599 USA

[4] Univ Calif Berkeley, Dept Mech Engn, Berkeley, CA 94706 USA

来源：

IEEE ROBOTICS AND AUTOMATION LETTERS | 2025年 / 10卷 / 05期

关键词：

Decision making; Cognition; Safety; Aerospace electronics; Reinforcement learning; Large language models; Vehicle dynamics; Costs; Transportation; Training; Cooperative decision-making; large language model; multi-agent reinforcement learning;

D O I：

10.1109/LRA.2025.3551098

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

The cooperative driving technology of Connected and Autonomous Vehicles (CAVs) is crucial for improving the efficiency and safety of transportation systems. Learning-based methods, such as Multi-Agent Reinforcement Learning (MARL), have demonstrated strong capabilities in cooperative decision-making tasks. However, existing MARL approaches still face challenges in terms of learning efficiency and performance. In recent years, Large Language Models (LLMs) have rapidly advanced and shown remarkable abilities in various sequential decision-making tasks. To enhance the learning capabilities of cooperative agents while ensuring decision-making efficiency and cost-effectiveness, we propose LDPD, a language-driven policy distillation method for guiding MARL exploration. In this framework, a teacher agent based on LLM trains smaller student agents to achieve cooperative decision-making through its own decision-making demonstrations. The teacher agent enhances the observation information of CAVs and utilizes LLMs to perform complex cooperative decision-making reasoning, which also leverages carefully designed decision-making tools to achieve expert-level decisions, providing high-quality teaching experiences. The student agent then refines the teacher's prior knowledge into its own model through gradient policy updates. The experiments demonstrate that the students can rapidly improve their capabilities with minimal guidance from the teacher and eventually surpass the teacher's performance. Extensive experiments show that our approach demonstrates better performance and learning efficiency compared to baseline methods.

引用

页码：4292 / 4299

页数：8

共 50 条

[41] Visual Explanation for Cooperative Behavior in Multi-Agent Reinforcement Learning
Itaya, Hidenori
Sagawa, Tom
Hirakawa, Tsubasa
Yamashita, Takayoshi
Fujiyoshi, Hironobu
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[42] Action Prediction for Cooperative Exploration in Multi-agent Reinforcement Learning
Zhang, Yanqiang
Feng, Dawei
Ding, Bo
NEURAL INFORMATION PROCESSING, ICONIP 2023, PT II, 2024, 14448 : 358 - 372
[43] Knowledge Reuse of Multi-Agent Reinforcement Learning in Cooperative Tasks
Shi, Daming
Tong, Junbo
Liu, Yi
Fan, Wenhui
ENTROPY, 2022, 24 (04)
[44] Trustable Policy Collaboration Scheme for Multi-Agent Stigmergic Reinforcement Learning
Xu, Xing
Li, Rongpeng
Zhao, Zhifeng
Zhang, Honggang
IEEE COMMUNICATIONS LETTERS, 2022, 26 (04) : 823 - 827
[45] QSOD: Hybrid Policy Gradient for Deep Multi-agent Reinforcement Learning
Rehman, Hafiz Muhammad Raza Ur
On, Byung-Won
Ningombam, Devarani Devi
Yi, Sungwon
Choi, Gyu Sang
IEEE ACCESS, 2021, 9 : 129728 - 129741
[46] Distributed Deep Multi-Agent Reinforcement Learning for Cooperative Edge Caching in Internet-of-Vehicles
Zhou, Huan
Jiang, Kai
He, Shibo
Min, Geyong
Wu, Jie
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2023, 22 (12) : 9595 - 9609
[47] A Deep Reinforcement Learning Method based on Deterministic Policy Gradient for Multi-Agent Cooperative Competition
Zuo, Xuan
Xue, Hui-Feng
Wang, Xiao-Yin
Du, Wan-Ru
Tian, Tao
Gao, Shan
Zhang, Pu
CONTROL ENGINEERING AND APPLIED INFORMATICS, 2021, 23 (03): : 88 - 98
[48] Multi-Agent Reinforcement Learning for Cooperative Coded Caching via Homotopy Optimization
Wu, Xiongwei
Li, Jun
Xiao, Ming
Ching, P. C.
Poor, H. Vincent
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2021, 20 (08) : 5258 - 5272
[49] Knowledge distillation for portfolio management using multi-agent reinforcement learning
Chen, Min-You
Chen, Chiao-Ting
Huang, Szu-Hao
ADVANCED ENGINEERING INFORMATICS, 2023, 57
[50] Noise Distribution Decomposition Based Multi-Agent Distributional Reinforcement Learning
Geng, Wei
Xiao, Baidi
Li, Rongpeng
Wei, Ning
Wang, Dong
Zhao, Zhifeng
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2025, 24 (03) : 2301 - 2314

← 1 2 3 4 5 →