Adaptive critic designs

被引:883
作者
Prokhorov, DV
Wunsch, DC
机构
[1] Applied Computational Intelligence Laboratory, Department of Electrical Engineering, Texas Tech. University, Lubbock
来源
IEEE TRANSACTIONS ON NEURAL NETWORKS | 1997年 / 8卷 / 05期
基金
美国国家科学基金会;
关键词
adaptive critic design (ACD); backpropagation; control; DHP; dynamic programming; GDHP; HDP; heuristic dynamic programming; neural network; neurocontrol; reinforcement learning;
D O I
10.1109/72.623201
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We discuss a variety of adaptive critic designs (ACD's) for neurocontrol. These are suitable for learning in noisy, nonlinear, and nonstationary environments. They have common roots as generalizations of dynamic programming for neural reinforcement learning approaches. Our discussion of these origins leads to an explanation of three design families: Heuristic dynamic programming (HDP), dual heuristic programming (DHP), and globalized dual heuristic programming (GDHP). The main emphasis is on DHP and GDHP as advanced ACD's. We suggest two new modifications of the original GDHP design that are currently the only working implementations of GDHP. They promise to be useful for many engineering applications in the areas of optimization and optimal control. Based on one of these modifications, we present a unified approach to all ACD's. This leads to a generalized training procedure for ACD's.
引用
收藏
页码:997 / 1007
页数:11
相关论文
共 47 条
[41]  
WHITE D, 1992, HDB INTELLIGENT CONT
[42]  
WHITE H, 1992, NEURAL NETWORKS, V5, P129
[43]   30 YEARS OF ADAPTIVE NEURAL NETWORKS - PERCEPTRON, MADALINE, AND BACKPROPAGATION [J].
WIDROW, B ;
LEHR, MA .
PROCEEDINGS OF THE IEEE, 1990, 78 (09) :1415-1442
[44]   PUNISH REWARD - LEARNING WITH A CRITIC IN ADAPTIVE THRESHOLD SYSTEMS [J].
WIDROW, B ;
GUPTA, NK ;
MAITRA, S .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1973, SMC3 (05) :455-465
[45]  
WILLIAMS R, NEURAL COMPUTA, V1, P270
[46]  
WUNSCH D, 1995, COMPUT INTELL, P98
[47]  
YUAN F, 1995, P WORLD C NEUR NETW, P326