Towards Enabling Deep Learning Techniques for Adaptive Dynamic Programming

被引:0
作者
Ni, Zhen [1 ]
Malla, Naresh [1 ]
Zhong, Xiangnan [2 ]
机构
[1] South Dakota State Univ, Elect Engn & Comp Sci Dept, Brookings, SD 57007 USA
[2] Univ Rhode Isl, Dept Elect Comp & Biomed Engn, Kingston, RI 02881 USA
来源
2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2017年
关键词
Deep Learning; deep reinforcement learning (DRL); adaptive dynamic programming (ADP); experience replay; computational intelligence; Markov decision process; TIME NONLINEAR-SYSTEMS; EXPERIENCE REPLAY; NEURAL-NETWORK; GAME;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human-level control through deep learning and deep reinforcement learning have revealed the unique and powerful potentials through a very complex Go game. The AlphaGo, developed by Google DeepMind, has beat the top Go game player early this year. The scientific and technological advancement behind the success of AlphaGo attracted researchers from multiple areas, including machine learning, artificial intelligence, computational intelligence and so on. Adaptive dynamic programming (ADP) methods have the similar fundamental principle with reinforcement learning, and show strong performance for continuous time and continuous state systems. Deep learning techniques are also possible to be integrated for ADP designs. In this paper, we discuss the key techniques and components in deep reinforcement learning and then present the successful applications for computer games and maze navigation. Future opportunities for deep learning enabled ADP will be discussed at the end.
引用
收藏
页码:2828 / 2835
页数:8
相关论文
共 64 条
[51]  
Si J., 2004, HDB LEARNING APPROXI
[52]   Mastering the game of Go with deep neural networks and tree search [J].
Silver, David ;
Huang, Aja ;
Maddison, Chris J. ;
Guez, Arthur ;
Sifre, Laurent ;
van den Driessche, George ;
Schrittwieser, Julian ;
Antonoglou, Ioannis ;
Panneershelvam, Veda ;
Lanctot, Marc ;
Dieleman, Sander ;
Grewe, Dominik ;
Nham, John ;
Kalchbrenner, Nal ;
Sutskever, Ilya ;
Lillicrap, Timothy ;
Leach, Madeleine ;
Kavukcuoglu, Koray ;
Graepel, Thore ;
Hassabis, Demis .
NATURE, 2016, 529 (7587) :484-+
[53]  
Springenberg JT., 2015, STRIVING SIMPLICITY, DOI DOI 10.48550/ARXIV.1412.6806
[54]  
Tieleman T., 2012, COURSERA NEURAL NETW, V4, P26, DOI DOI 10.1007/S12654-012-0173-1
[55]  
Wang B, 2016, IEEE IJCNN, P3550, DOI 10.1109/IJCNN.2016.7727655
[56]   Data-based robust optimal control of continuous-time affine nonlinear systems with matched uncertainties [J].
Wang, Ding ;
Li, Chao ;
Liu, Derong ;
Mu, Chaoxu .
INFORMATION SCIENCES, 2016, 366 :121-133
[57]   Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming [J].
Wang, Ding ;
Liu, Derong ;
Wei, Qinglai ;
Zhao, Dongbin ;
Jin, Ning .
AUTOMATICA, 2012, 48 (08) :1825-1832
[58]   Adaptive Dynamic Programming: An Introduction [J].
Wang, Fei-Yue ;
Zhang, Huaguang ;
Liu, Derong .
IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE, 2009, 4 (02) :39-47
[59]   Autonomous reinforcement learning with experience replay [J].
Wawrzynski, Pawel ;
Tanwani, Ajay Kumar .
NEURAL NETWORKS, 2013, 41 :156-167
[60]   BACKPROPAGATION THROUGH TIME - WHAT IT DOES AND HOW TO DO IT [J].
WERBOS, PJ .
PROCEEDINGS OF THE IEEE, 1990, 78 (10) :1550-1560