An improved algorithm for learning long-term dependency problems in adaptive processing of data structures

被引:31
作者
Cho, SY [1 ]
Chi, ZR
Siu, WC
Tsoi, AC
机构
[1] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Ctr Multimedia Signal Proc, Kowloon, Hong Kong, Peoples R China
[2] Univ Wollongong, Informat Technol Serv, Wollongong, NSW 2522, Australia
来源
IEEE TRANSACTIONS ON NEURAL NETWORKS | 2003年 / 14卷 / 04期
关键词
adaptive processing of data structures; backpropagation through structure (BPTS); least-squares method; long-term dependency;
D O I
10.1109/TNN.2003.813831
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
For the past decade, many researchers have explored the use of neural-network representations for the adaptive processing of data structures. One of the most popular learning formulations of data structure processing is backpropagation through structure (BPTS). The BPTS algorithm has been successful applied to a number of learning tasks that involve structural patterns such as logo and natural scene classification. The main limitations of the BPTS algorithm are attributed to slow convergence speed and the long-term dependency problem for the adaptive processing of data structures. In this paper, an improved algorithm is proposed to solve these problems. The idea of this algorithm is to optimize the free learning parameters of the neural network in the node representation by using least-squares-based optimization methods in a layer-by-layer fashion. Not only can fast convergence speed be achieved, but the long-term dependency problem can also be overcome since the vanishing of gradient information is avoided when our approach is applied to very deep tree structures.
引用
收藏
页码:781 / 793
页数:13
相关论文
共 23 条
[1]   TRAINING NEURAL NETS WITH THE REACTIVE TABU SEARCH [J].
BATTITI, R ;
TECCHIOLLI, G .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1995, 6 (05) :1185-1200
[2]   LEARNING LONG-TERM DEPENDENCIES WITH GRADIENT DESCENT IS DIFFICULT [J].
BENGIO, Y ;
SIMARD, P ;
FRASCONI, P .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (02) :157-166
[3]   Input-output HMM's for sequence processing [J].
Bengio, Y ;
Frasconi, P .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1996, 7 (05) :1231-1249
[4]   A Layer-by-Layer Least Squares based recurrent networks training algorithm: Stalling and escape [J].
Cho, SY ;
Chow, TWS .
NEURAL PROCESSING LETTERS, 1998, 7 (01) :15-25
[5]   Training multilayer neural networks using fast global learning algorithm - least-squares and penalized optimization methods [J].
Cho, SY ;
Chow, TWS .
NEUROCOMPUTING, 1999, 25 (1-3) :115-131
[6]  
CHO SY, ICSP 2002
[7]  
FRACONI P, 2001, FIELD GUIDE DYNAMICA, P351
[8]   A general framework for adaptive processing of data structures [J].
Frasconi, P ;
Gori, M ;
Sperduti, A .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1998, 9 (05) :768-786
[9]  
GILES CL, 1998, ADAPTIVE PROCESSING
[10]  
Goller C, 1996, IEEE IJCNN, P347, DOI 10.1109/ICNN.1996.548916