Characterizing and Optimizing the End-to-End Performance of Multi-Agent Reinforcement Learning Systems

被引:0
作者
Gogineni, Kailash [1 ]
Mei, Yongsheng [1 ]
Gogineni, Karthikeya
Wei, Peng [1 ]
Lan, Tian [1 ]
Venkataramani, Guru [1 ]
机构
[1] George Washington Univ, Washington, DC 20052 USA
来源
2024 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION, IISWC 2024 | 2024年
基金
美国国家科学基金会;
关键词
Multi-Agent Systems; Performance Analysis; Reinforcement Learning; Performance Optimization;
D O I
10.1109/IISWC63097.2024.00028
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Multi-Agent Reinforcement Learning Systems (MARL) can unlock the potential to model and control multiple autonomous decision-making agents simultaneously. During online training, MARL algorithms involve performance-intensive computations, such as exploration and exploitation phases originating from a large observation-action space and a huge number of training steps. Understanding and mitigating the MARL performance limiters is key to their practical adoption. In this paper, we first present a detailed workload characterization of MARL workloads under different multi-agent settings. Our experimental analysis identifies a critical performance bottleneck that affects scaling within the mini-batch sampling on transition data. To mitigate this issue, we explore a series of optimization strategies. First, we investigate cache locality-aware sampling that prioritizes intra-agent neighbor transitions over other randomly picked transition data samples within the baseline MARL algorithms. Next, we explore importance sampling techniques that preserve the learning performance/distribution and capture the neighbors of important transitions. Finally, we design an additional algorithmic optimization that reorganizes the transition data layout to improve the cache locality between different agents during the mini-batch sampling process. We evaluate our optimizations using popular MARL workloads on multi-agent particle games. Our work highlights several opportunities for enhancing the performance of multi-agent systems, with end-to-end training time improvements ranging from 8.2% (3 agents) to 20.5% (24 agents) compared to the baseline MADDPG, affirming the usefulness of deeply understanding MARL performance bottlenecks and mitigating them effectively.
引用
收藏
页码:224 / 235
页数:12
相关论文
共 50 条
[31]   Formal Reachability Analysis for Multi-Agent Reinforcement Learning Systems [J].
Wang, Xiaoyan ;
Peng, Jun ;
Li, Shuqiu ;
Li, Bing .
IEEE ACCESS, 2021, 9 :45812-45821
[32]   End-to-End Autonomous Driving Decision Based on Deep Reinforcement Learning [J].
Huang, Zhiqing ;
Zhang, Ji ;
Tian, Rui ;
Zhang, Yanxin .
CONFERENCE PROCEEDINGS OF 2019 5TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND ROBOTICS (ICCAR), 2019, :658-662
[33]   End-to-End Deep Reinforcement Learning based Recommendation with Supervised Embedding [J].
Liu, Feng ;
Guo, Huifeng ;
Li, Xutao ;
Tang, Ruiming ;
Ye, Yunming ;
He, Xiuqiang .
PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM '20), 2020, :384-392
[34]   End-to-end RPA-like testing using reinforcement learning [J].
Paduraru, Ciprian ;
Cristea, Rares ;
Stefanescu, Alin .
2024 IEEE CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION, ICST 2024, 2024, :419-429
[35]   End-to-End Streaming Video Temporal Action Segmentation With Reinforcement Learning [J].
Zhang, Jin-Rong ;
Wen, Wu-Jun ;
Liu, Sheng-Lan ;
Huang, Gao ;
Li, Yun-Heng ;
Li, Qi-Feng ;
Feng, Lin .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025,
[36]   Improvement of End-to-end Automatic Driving Algorithm Based on Reinforcement Learning [J].
Tang, Jianlin ;
Li, Lingyun ;
Ai, Yunfeng ;
Zhao, Bin ;
Ren, Liangcai ;
Tian, Bin .
2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, :5086-5091
[37]   End-to-end Reinforcement Learning for Time-Optimal Quadcopter Flight [J].
Ferede, Robin ;
De Wagter, Christophe ;
Izzo, Dario ;
de Croon, Guido C. H. E. .
2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2024, 2024, :6172-6177
[38]   End-to-End Navigation Strategy With Deep Reinforcement Learning for Mobile Robots [J].
Shi, Haobin ;
Shi, Lin ;
Xu, Meng ;
Hwang, Kao-Shing .
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2020, 16 (04) :2393-2402
[39]   Deep reinforcement learning framework for end-to-end semiconductor process control [J].
Hirtz T. ;
Tian H. ;
Shahzad S. ;
Wu F. ;
Yang Y. ;
Ren T.-L. .
Neural Computing and Applications, 2024, 36 (20) :12443-12460
[40]   Reinforcement Learning Based VNF Scheduling with End-to-End Delay Guarantee [J].
Li, Junling ;
Shi, Weisen ;
Zhang, Ning ;
Shen, Xuemin Sherman .
2019 IEEE/CIC INTERNATIONAL CONFERENCE ON COMMUNICATIONS IN CHINA (ICCC), 2019,