Training strategies for critic and action neural networks in dual heuristic programming method

被引：0

作者：

Lendaris, GG

Paintz, C

机构：

来源：

1997 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOLS 1-4 | 1997年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper discusses strategies for and details of training procedures for the Dual Heuristic Programming (DHP) methoology, defined in [6]. This and other approximate dynamic programming approaches (HDP, DHP, GDHP) have been discussed in some detail in [2], [4], [5], all being members of the Adaptive Critic Design (ACD) family. The example application used is the inverted pendulum problem, as defined in [1]. This ''plant'' has been successfully controlled using DHP, as reported in [4]. The main recent reference on training procedures for ACDs is [2]. The present paper suggests and investigates several alternative procedures and compares their performance with respect to convergence speed and quality of resulting controller design. A promising modification is to introduce a real copy of the criticNN (criticNN#2) for making the ''desired output'' calculations, and very importantly, this criticNN#2 is trained differently than is criticNN#1. The idea is to provide the ''desired outputs'' from a stable platform during an epoch while adapting the criticNN#1. Then at the end of the epoch, criticNN#2 is made identical to the then-current adapted state of criticNN#1, and a new epoch starts. In this way, both the criticNN#1 and the actionNN can be simultaneously trained on-line during each epoch, with a faster overall convergence than the older approach. Further, the measures used herein suggest that a ''better'' controller design (the actionNN) results.

引用

页码：712 / 717

页数：6