Training strategies for critic and action neural networks in dual heuristic programming method

被引:0
作者
Lendaris, GG
Paintz, C
机构
来源
1997 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOLS 1-4 | 1997年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper discusses strategies for and details of training procedures for the Dual Heuristic Programming (DHP) methoology, defined in [6]. This and other approximate dynamic programming approaches (HDP, DHP, GDHP) have been discussed in some detail in [2], [4], [5], all being members of the Adaptive Critic Design (ACD) family. The example application used is the inverted pendulum problem, as defined in [1]. This ''plant'' has been successfully controlled using DHP, as reported in [4]. The main recent reference on training procedures for ACDs is [2]. The present paper suggests and investigates several alternative procedures and compares their performance with respect to convergence speed and quality of resulting controller design. A promising modification is to introduce a real copy of the criticNN (criticNN#2) for making the ''desired output'' calculations, and very importantly, this criticNN#2 is trained differently than is criticNN#1. The idea is to provide the ''desired outputs'' from a stable platform during an epoch while adapting the criticNN#1. Then at the end of the epoch, criticNN#2 is made identical to the then-current adapted state of criticNN#1, and a new epoch starts. In this way, both the criticNN#1 and the actionNN can be simultaneously trained on-line during each epoch, with a faster overall convergence than the older approach. Further, the measures used herein suggest that a ''better'' controller design (the actionNN) results.
引用
收藏
页码:712 / 717
页数:6
相关论文
empty
未找到相关数据