Self-Learning Cruise Control Using Kernel-Based Least Squares Policy Iteration

被引:72
作者
Wang, Jian [1 ]
Xu, Xin [1 ]
Liu, Daxue [1 ]
Sun, Zhenping [1 ]
Chen, Qingyang [1 ]
机构
[1] Natl Univ Def Technol, Coll Mechatron & Automat, Changsha 410073, Hunan, Peoples R China
基金
中国国家自然科学基金;
关键词
Approximate dynamic programming (ADP); autonomous land vehicle (ALV); cruise control; kernel-based least squares policy iteration (KLSPI); reinforcement learning; speed control; LONGITUDINAL CONTROL; REINFORCEMENT; DESIGN;
D O I
10.1109/TCST.2013.2271276
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a novel learning-based cruise controller for autonomous land vehicles (ALVs) with unknown dynamics and external disturbances. The learning controller consists of a time-varying proportional-integral (PI) module and an actor-critic learning control module with kernel machines. The learning objective for the cruise control is to make the vehicle's longitudinal velocity follow a smoothed spline-based speed profile with the smallest possible errors. The parameters in the PI module are adaptively tuned based on the vehicle's state and the action policy of the learning control module. Based on the state transition data of the vehicle controlled by various initial policies, the action policy of the learning control module is optimized by kernel-based least squares policy iteration (KLSPI) in an offline way. The effectiveness of the proposed controller was tested on an ALV platform during long-distance driving in urban traffic and autonomous driving on off-road terrain. The experimental results of the cruise control show that the learning control method can realize data-driven controller design and optimization based on KLSPI and that the controller's performance is adaptive to different road conditions.
引用
收藏
页码:1078 / 1087
页数:10
相关论文
共 37 条
[31]  
Wang ZK, 2011, IEEE INT C INT ROBOT, P332, DOI 10.1109/IROS.2011.6048533
[32]   Efficient reinforcement learning using recursive least-squares methods [J].
Xu, X ;
He, HG ;
Hu, DW .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2002, 16 :259-292
[33]   Kernel-based least squares policy iteration for reinforcement learning [J].
Xu, Xin ;
Hu, Dewen ;
Lu, Xicheng .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2007, 18 (04) :973-992
[34]   Hierarchical Approximate Policy Iteration with Binary-Tree State Space Decomposition [J].
Xu, Xin ;
Liu, Chunming ;
Yang, Simon X. ;
Hu, Dewen .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2011, 22 (12) :1863-1877
[35]   Continuous-action reinforcement learning with fast policy search and adaptive basis function selection [J].
Xu, Xin ;
Liu, Chunming ;
Hu, Dewen .
SOFT COMPUTING, 2011, 15 (06) :1055-1070
[36]   Optimal Tracking Control for a Class of Nonlinear Discrete-Time Systems with Time Delays Based on Heuristic Dynamic Programming [J].
Zhang, Huaguang ;
Song, Ruizhuo ;
Wei, Qinglai ;
Zhang, Tieyan .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2011, 22 (12) :1851-1862
[37]  
Zhao D., 2013, NEUROCOMPUT IN PRESS