Self-Learning Cruise Control Using Kernel-Based Least Squares Policy Iteration

被引:72
作者
Wang, Jian [1 ]
Xu, Xin [1 ]
Liu, Daxue [1 ]
Sun, Zhenping [1 ]
Chen, Qingyang [1 ]
机构
[1] Natl Univ Def Technol, Coll Mechatron & Automat, Changsha 410073, Hunan, Peoples R China
基金
中国国家自然科学基金;
关键词
Approximate dynamic programming (ADP); autonomous land vehicle (ALV); cruise control; kernel-based least squares policy iteration (KLSPI); reinforcement learning; speed control; LONGITUDINAL CONTROL; REINFORCEMENT; DESIGN;
D O I
10.1109/TCST.2013.2271276
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a novel learning-based cruise controller for autonomous land vehicles (ALVs) with unknown dynamics and external disturbances. The learning controller consists of a time-varying proportional-integral (PI) module and an actor-critic learning control module with kernel machines. The learning objective for the cruise control is to make the vehicle's longitudinal velocity follow a smoothed spline-based speed profile with the smallest possible errors. The parameters in the PI module are adaptively tuned based on the vehicle's state and the action policy of the learning control module. Based on the state transition data of the vehicle controlled by various initial policies, the action policy of the learning control module is optimized by kernel-based least squares policy iteration (KLSPI) in an offline way. The effectiveness of the proposed controller was tested on an ALV platform during long-distance driving in urban traffic and autonomous driving on off-road terrain. The experimental results of the cruise control show that the learning control method can realize data-driven controller design and optimization based on KLSPI and that the controller's performance is adaptive to different road conditions.
引用
收藏
页码:1078 / 1087
页数:10
相关论文
共 37 条
[1]   Experience Replay for Real-Time Reinforcement Learning Control [J].
Adam, Sander ;
Busoniu, Lucian ;
Babuska, Robert .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2012, 42 (02) :201-212
[2]  
ANDERSON D, 2008, THESIS
[3]   Fuzzily Connected Multimodel Systems Evolving Autonomously From Data Streams [J].
Angelov, Plamen .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2011, 41 (04) :898-910
[4]  
[Anonymous], 2003, J. Mach. Learn. Res.
[5]   Infinite-horizon policy-gradient estimation [J].
Baxter, J ;
Bartlett, PL .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2001, 15 :319-350
[6]   Little Ben: The Ben Franklin Racing Team's entry in the 2007 DARPA Urban Challenge [J].
Bohren, Jonathan ;
Foote, Tully ;
Keller, Jim ;
Kushleyev, Alex ;
Lee, Daniel ;
Stewart, Alex ;
Vernaza, Paul ;
Derenick, Jason ;
Spletzer, John ;
Satterfield, Brian .
JOURNAL OF FIELD ROBOTICS, 2008, 25 (09) :598-614
[7]  
Bradtke SJ, 1996, MACH LEARN, V22, P33, DOI 10.1007/BF00114723
[8]   An approach to tune fuzzy controllers based on reinforcement learning for autonomous vehicle control [J].
Dai, X ;
Li, CK ;
Rad, AB .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2005, 6 (03) :285-293
[9]   Trajectory planning and collisions detector for robotic arms [J].
de Jesus Rubio, Jose ;
Garcia, Enrique ;
Pacheco, Jaime .
NEURAL COMPUTING & APPLICATIONS, 2012, 21 (08) :2105-2114
[10]   SOFMLS: Online Self-Organizing Fuzzy Modified Least-Squares Network [J].
de Jesus Rubio, Jose .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2009, 17 (06) :1296-1309