Unified Automatic Control of Vehicular Systems With Reinforcement Learning

被引：25

作者：

Yan, Zhongxia ^{[1
]}

Kreidieh, Abdul Rahman ^{[2
]}

Vinitsky, Eugene ^{[3
]}

Bayen, Alexandre M. ^{[4
]}

Wu, Cathy ^{[5
,6
]}

机构：

[1] MIT, Dept Elect Engn & Comp Sci, Cambridge, MA 02139 USA

[2] Univ Calif Berkeley, Dept Civil & Environm Engn, Berkeley, CA 94720 USA

[3] Univ Calif Berkeley, Dept Mech Engn, Berkeley, CA 94720 USA

[4] Univ Calif Berkeley, Dept Elect Engn & Comp Sci, Berkeley, CA 94720 USA

[5] MIT, Dept Civil & Environm Engn, Lab Informat & Decis Syst, Cambridge, MA 02139 USA

[6] MIT, Inst Data Syst & Soc, Cambridge, MA 02139 USA

来源：

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING | 2023年 / 20卷 / 02期

关键词：

Mobile traffic control; automated vehicles; reinforcement learning; Mixed autonomy; multi-agent systems; AUTONOMOUS VEHICLES; COORDINATION; NETWORKS; IMPACT; ROAD;

D O I：

10.1109/TASE.2022.3168621

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Emerging vehicular systems with increasing proportions of automated components present opportunities for optimal control to mitigate congestion and increase efficiency. There has been a recent interest in applying deep reinforcement learning (DRL) to these nonlinear dynamical systems for the automatic design of effective control strategies. Despite conceptual advantages of DRL being model-free, studies typically nonetheless rely on training setups that are painstakingly specialized to specific vehicular systems. This is a key challenge to efficient analysis of diverse vehicular and mobility systems. To this end, this article contributes a streamlined methodology for vehicular microsimulation and discovers high performance control strategies with minimal manual design. A variable-agent, multi-task approach is presented for optimization of vehicular Partially Observed Markov Decision Processes. The methodology is experimentally validated on mixed autonomy traffic systems, where fractions of vehicles are automated; empirical improvement, typically 15-60% over a human driving baseline, is observed in all configurations of six diverse open or closed traffic systems. The study reveals numerous emergent behaviors resembling wave mitigation, traffic signaling, and ramp metering. Finally, the emergent behaviors are analyzed to produce interpretable control strategies, which are validated against the learned control strategies. As vehicular systems such as real-world traffic systems and robotic warehouses become increasingly automated, optimizing vehicle movements sees an increasing potential to reduce congestion and increase efficiency. For many vehicular systems, simulations of varying fidelity are commonly used for analysis and optimization without the need to deploy real vehicles. This article describes a unified and practical approach for optimal control of vehicles in arbitrary simulated vehicular systems while permitting partial automation, where the behavior of fractions of vehicles at given times can be modelled but not controlled. As illustrated by the diverse traffic systems considered in this article, the presented methodology emphasizes ease of application within any simulated vehicular system while minimizing manual efforts by the practitioner. The control inputs consist of local information around each automated vehicle, while the control outputs are commands for longitudinal acceleration and lateral lane change. Experimental results are presented for relatively small simulated traffic systems, though the methodology can be adapted to larger vehicular systems with minor modifications. Experimentally optimized behaviors provide insights to the practitioner which may assist in designing simplified and interpretable control strategies. Implementation in real-world systems depends on two requirements: 1) a reliable fallback mechanism for ensuring safety of vehicles, and 2) sufficient fidelity of the simulator for simulated behaviors to transfer. These requirements are under active research for traffic systems and may be practical in some robotic settings. To facilitate robust transfer of policies from simulated to real-world systems, future extensions of this work may inject additional randomization into simulation while reducing the unmodeled stochasticity of targeted real-world systems as much as possible.

引用

页码：789 / 804

页数：16

共 60 条

[1] Agarwal Alekh, 2021, JOURNAL OF MACHINE LEARNING RESEARCH, V22
[2] Expert Level Control of Ramp Metering Based on Multi-Task Deep Reinforcement Learning
Belletti, Francois
Haziza, Daniel
Gomes, Gabriel
Bayen, Alexandre M.
[J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2018, 19 (04) : 1198 - 1207
[3] Bertsekas DP, 1996, NEURO DYNAMIC PROGRA
[4] Boutilier C, 1996, THEORETICAL ASPECTS OF RATIONALITY AND KNOWLEDGE, P195
[5] Buehler M, 2009, SPRINGER TRAC ADV RO, V56, P1, DOI 10.1007/978-3-642-03991-1
[6] Ensemble Coordination Approach in Multi-AGV Systems Applied to Industrial Warehouses
Digani, Valerio
Sabattini, Lorenzo
Secchi, Cristian
Fantuzzi, Cesare
[J]. IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2015, 12 (03) : 922 - 934
[7] A multiagent approach to autonomous intersection management
Dresner, Kurt
Stone, Peter
[J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2008, 31 : 591 - 656
[8] Engstrom L, 2019, PR MACH LEARN RES, V97
[9] Fujimoto S, 2018, PR MACH LEARN RES, V80
[10] Gupta Jayesh K., 2017, Autonomous Agents and Multiagent Systems, AAMAS 2017: Workshops, Best Papers. Revised Selected Papers: LNAI 10642, P66, DOI 10.1007/978-3-319-71682-4_5

← 1 2 3 4 5 6 →