共 43 条
AoI-Aware Resource Allocation for Platoon-Based C-V2X Networks via Multi-Agent Multi-Task Reinforcement Learning
被引:32
|作者:
Parvini, Mohammad
[1
]
Javan, Mohammad Reza
[2
]
Mokari, Nader
[1
]
Abbasi, Bijan
[1
]
Jorswieck, Eduard A.
[3
]
机构:
[1] Tarbiat Modares Univ, Dept Elect & Comp Engn, Tehran 1411713116, Iran
[2] Shahrood Univ Technol, Fac Elect Engn, Shahrood 3619995161, Iran
[3] TU Braunschweig, Inst Commun Technol, D-2338106 Braunschweig, Germany
关键词:
Resource management;
Cams;
Long Term Evolution;
Wireless communication;
Vehicle dynamics;
Task analysis;
Interference;
V2X;
AoI;
Platoon cooperation;
MARL;
MANAGEMENT;
COMMUNICATION;
VEHICLES;
D O I:
10.1109/TVT.2023.3259688
中图分类号:
TM [电工技术];
TN [电子技术、通信技术];
学科分类号:
0808 ;
0809 ;
摘要:
This paper investigates the problem of age of information (AoI) aware radio resource management for a platooning system. Multiple autonomous platoons exploit the cellular wireless vehicle-to-everything (C-V2X) communication technology to disseminate the cooperative awareness messages (CAMs) to their followers while ensuring timely delivery of safety-critical messages to the Road-Side Unit (RSU). To lower the computational load at the RSU and cope with the challenges of dynamic channel conditions, we exploit a distributed resource allocation framework based on multi-agent reinforcement learning (MARL), where each platoon leader (PL) acts as an agent and interacts with the environment to learn its optimal policy. Motivated by the existing literature in RL, we propose two novel MARL frameworks based on the multi-agent deep deterministic policy gradient (MADDPG), named Modified MADDPG, and Modified MADDPG with task decomposition. Both algorithms train two critics with the following goals: A global critic which estimates the global expected reward and motivates the agents toward a cooperating behavior and an exclusive local critic for each agent that estimates the local individual reward. Furthermore, based on the tasks each agent has to accomplish, in the second algorithm, the holistic individual reward of each agent is decomposed into multiple sub-reward functions where task-wise value functions are learned separately. Numerical results indicate our proposed algorithms' effectiveness compared with other contemporary RL frameworks, e.g., federated reinforcement learning (FRL) in terms of AoI performance and CAM message transmission probability.
引用
收藏
页码:9880 / 9896
页数:17
相关论文