Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning

被引：27

作者：

Tan, Fuxiao ^{[1
]}

Yan, Pengfei ^{[2
]}

Guan, Xinping ^{[3
]}

机构：

[1] Fuyang Normal Univ, Sch Comp & Informat Engn, Fuyang 236037, Anhui, Peoples R China

[2] Univ Sci & Technol Beijing, Sch Automat & Elect Engn, Beijing 100083, Peoples R China

[3] Shanghai Jiao Tong Univ, Sch Elect Informat & Elect Engn, Shanghai 200240, Peoples R China

来源：

NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV | 2017年 / 10637卷

基金：

中国国家自然科学基金;

关键词：

Deep reinforcement learning; Q-learning; Deep Q-learning; Convolutional neural networks; NEURAL-NETWORKS; ALGORITHM;

D O I：

10.1007/978-3-319-70093-9_50

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As the two hottest branches of machine learning, deep learning and reinforcement learning both play a vital role in the field of artificial intelligence. Combining deep learning with reinforcement learning, deep reinforcement learning is a method of artificial intelligence that is much closer to human learning. As one of the most basic algorithms for reinforcement learning, Q-learning is a discrete strategic learning algorithm that uses a reasonable strategy to generate an action. According to the rewards and the next state generated by the interaction of the action and the environment, optimal Q-function can be obtained. Furthermore, based on Q-learning and convolutional neural networks, the deep Q-learning with experience replay is developed in this paper. To ensure the convergence of value function, a discount factor is involved in the value function. The temporal difference method is introduced to training the Q-function or value function. At last, a detailed procedure is proposed to implement deep reinforcement learning.

引用

页码：475 / 483

页数：9

共 17 条

[1] Deep Machine Learning-A New Frontier in Artificial Intelligence Research [J].

Arel, Itamar ;

Rose, Derek C. ;

Karnowski, Thomas P. .

IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE, 2010, 5 (04) :13-18

[2] Representation Learning: A Review and New Perspectives [J].

Bengio, Yoshua ;

Courville, Aaron ;

Vincent, Pascal .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1798-1828

[3] Reducing the dimensionality of data with neural networks [J].

Hinton, G. E. ;

Salakhutdinov, R. R. .

SCIENCE, 2006, 313 (5786) :504-507

[4] A fast learning algorithm for deep belief nets [J].

Hinton, Geoffrey E. ;

Osindero, Simon ;

Teh, Yee-Whye .

NEURAL COMPUTATION, 2006, 18 (07) :1527-1554

[5] Reinforcement learning: A survey [J].

Kaelbling, LP ;

Littman, ML ;

Moore, AW .

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1996, 4 :237-285

[6] Deep learning [J].

LeCun, Yann ;

Bengio, Yoshua ;

Hinton, Geoffrey .

NATURE, 2015, 521 (7553) :436-444

[7] Neural-Network-Based Online HJB Solution for Optimal Robust Guaranteed Cost Control of Continuous-Time Uncertain Nonlinear Systems [J].

Liu, Derong ;

Wang, Ding ;

Wang, Fei-Yue ;

Li, Hongliang ;

Yang, Xiong .

IEEE TRANSACTIONS ON CYBERNETICS, 2014, 44 (12) :2834-2847

[8] Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems [J].

Liu, Derong ;

Wei, Qinglai .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2014, 25 (03) :621-634

[9] Decentralized Stabilization for a Class of Continuous-Time Nonlinear Interconnected Systems Using Online Learning Optimal Control Approach [J].

Liu, Derong ;

Wang, Ding ;

Li, Hongliang .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2014, 25 (02) :418-428

[10]

Mnih V., 2013, ARXIV

← 1 2 →