GOPS: A general optimal control problem solver for autonomous driving and industrial control applications

被引：37

作者：

Wang, Wenxuan ^{[1
]}

Zhang, Yuhang ^{[1
]}

Gao, Jiaxin ^{[1
]}

Jiang, Yuxuan ^{[1
]}

Yang, Yujie ^{[1
]}

Zheng, Zhilong ^{[1
]}

Zou, Wenjun ^{[1
]}

Li, Jie ^{[1
]}

Zhang, Congsheng ^{[1
]}

Cao, Wenhan ^{[1
]}

Xie, Genjin ^{[1
]}

Duan, Jingliang ^{[1
]}

Li, Shengbo Eben ^{[1
]}

机构：

[1] Tsinghua Univ, Sch Vehicle & Mobil, Beijing 100084, Peoples R China

来源：

COMMUNICATIONS IN TRANSPORTATION RESEARCH | 2023年 / 3卷

关键词：

Industrial control; Reinforcement learning; Approximate dynamic programming; Optimal control; Neural network; Benchmark;

D O I：

10.1016/j.commtr.2023.100096

中图分类号：

U [交通运输];

学科分类号：

08 ; 0823 ;

摘要：

Solving optimal control problems serves as the basic demand of industrial control tasks. Existing methods like model predictive control often suffer from heavy online computational burdens. Reinforcement learning has shown promise in computer and board games but has yet to be widely adopted in industrial applications due to a lack of accessible, high-accuracy solvers. Current Reinforcement learning (RL) solvers are often developed for academic research and require a significant amount of theoretical knowledge and programming skills. Besides, many of them only support Python-based environments and limit to model-free algorithms. To address this gap, this paper develops General Optimal control Problems Solver (GOPS), an easy-to-use RL solver package that aims to build real-time and high-performance controllers in industrial fields. GOPS is built with a highly modular structure that retains a flexible framework for secondary development. Considering the diversity of industrial control tasks, GOPS also includes a conversion tool that allows for the use of Matlab/Simulink to support environment construction, controller design, and performance validation. To handle large-scale problems, GOPS can automatically create various serial and parallel trainers by flexibly combining embedded buffers and samplers. It offers a variety of common approximate functions for policy and value functions, including polynomial, multilayer perceptron, convolutional neural network, etc. Additionally, constrained and robust algorithms for special industrial control systems with state constraints and model uncertainties are also integrated into GOPS. Several examples, including linear quadratic control, inverted double pendulum, vehicle tracking, humanoid robot, obstacle avoidance, and active suspension control, are tested to verify the performances of GOPS.

引用

页数：15

共 44 条

[1] The Arcade Learning Environment: An Evaluation Platform for General Agents [J].

Bellemare, Marc G. ;

Naddaf, Yavar ;

Veness, Joel ;

Bowling, Michael .

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2013, 47 :253-279

[2]

Berner C., DOTA 2 LARGE SCALE D

[3]

Bertsekas DimitriP., 2017, DYNAMIC PROGRAMMING, V1

[4]

Brockman Greg, 2016, arXiv

[5] Interpretable End-to-End Urban Autonomous Driving With Latent Deep Reinforcement Learning [J].

Chen, Jianyu ;

Li, Shengbo Eben ;

Tomizuka, Masayoshi .

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (06) :5068-5078

[6] Multi-Agent Deep Reinforcement Learning for Large-Scale Traffic Signal Control [J].

Chu, Tianshu ;

Wang, Jie ;

Codeca, Lara ;

Li, Zhaojian .

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2020, 21 (03) :1086-1095

[7] Magnetic control of tokamak plasmas through deep reinforcement learning [J].

Degrave, Jonas ;

Felici, Federico ;

Buchli, Jonas ;

Neunert, Michael ;

Tracey, Brendan ;

Carpanese, Francesco ;

Ewalds, Timo ;

Hafner, Roland ;

Abdolmaleki, Abbas ;

de las Casas, Diego ;

Donner, Craig ;

Fritz, Leslie ;

Galperti, Cristian ;

Huber, Andrea ;

Keeling, James ;

Tsimpoukelli, Maria ;

Kay, Jackie ;

Merle, Antoine ;

Moret, Jean-Marc ;

Noury, Seb ;

Pesamosca, Federico ;

Pfau, David ;

Sauter, Olivier ;

Sommariva, Cristian ;

Coda, Stefano ;

Duval, Basil ;

Fasoli, Ambrogio ;

Kohli, Pushmeet ;

Kavukcuoglu, Koray ;

Hassabis, Demis ;

Riedmiller, Martin .

NATURE, 2022, 602 (7897) :414-+

[8]

Dhariwal Prafulla., 2017, GitHub repository

[9] Distributional Soft Actor-Critic: Off-Policy Reinforcement Learning for Addressing Value Estimation Errors [J].

Duan, Jingliang ;

Guan, Yang ;

Li, Shengbo Eben ;

Ren, Yangang ;

Sun, Qi ;

Cheng, Bo .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (11) :6584-6598

[10] Hierarchical reinforcement learning for self-driving decision-making without reliance on labelled driving data [J].

Duan, Jingliang ;

Eben Li, Shengbo ;

Guan, Yang ;

Sun, Qi ;

Cheng, Bo .

IET INTELLIGENT TRANSPORT SYSTEMS, 2020, 14 (05) :297-305

← 1 2 3 4 5 →