Organization of the state space of a simple recurrent network before and after training on recursive linguistic structures

被引:10
作者
Cernansky, Michal
Makula, Matej
Benuskova, Lubica
机构
[1] Slovak Tech Univ, Fac Informat & Informat Technol, Bratislava 84216 4, Slovakia
[2] Comenius Univ, Fac Math Phys & Informat, Dept Appl Informat, Bratislava 84248 4, Slovakia
关键词
recurrent neural networks; linguistic structures; next-symbol prediction; state space analysis; language processing; markovian architectural bias; neural prediction machines; variable length Markov model;
D O I
10.1016/j.neunet.2006.01.020
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recurrent neural networks are often employed in the cognitive science community to process symbol sequences that represent various natural language structures. The aim is to study possible neural mechanisms of language processing and aid in development of artificial language processing systems. We used data sets containing recursive linguistic structures and trained the Elman simple recurrent network (SRN) for the next-symbol prediction task. Concentrating on neuron activation clusters in the recurrent layer of SRN we investigate the network state space organization before and after training. Given a SRN and a training stream, we construct predictive models, called neural prediction machines, that directly employ the state space dynamics of the network. We demonstrate two important properties of representations of recursive symbol series in the SRN. First, the clusters of recurrent activations emerging before training are meaningful and correspond to Markov prediction contexts. We show that prediction states that naturally arise in the SRN initialized with small random weights approximately correspond to states of Variable Memory Length Markov Models (VLMM) based on individual symbols (i.e. words). Second, we demonstrate that during training, the SRN reorganizes its state space according to word categories and their grammatical subcategories, and the next-symbol prediction is again based on the VLMM strategy. However, after training, the prediction is based on word categories and their grammatical subcategories rather than individual words. Our conclusion holds for small depths of recursions that are comparable to human performances. The methods of SRN training and analysis of its state space introduced in this paper are of a general nature and can be used for investigation of processing of any other symbol time series by means of SRN. (c) 2006 Elsevier Ltd. All rights reserved.
引用
收藏
页码:236 / 244
页数:9
相关论文
共 24 条
[1]  
[Anonymous], 1995, TR95041 U N CAR
[2]  
Cernansky M., 2003, Neural Network World, V13, P223
[3]  
Chomsky N., 1957, SYNTACTIC STRUCTURES, DOI 10.1515/9783112316009
[4]  
Christiansen MH, 1999, COGNITIVE SCI, V23, P417, DOI 10.1207/s15516709cog2304_2
[5]   FINDING STRUCTURE IN TIME [J].
ELMAN, JL .
COGNITIVE SCIENCE, 1990, 14 (02) :179-211
[6]  
Feldkamp LA, 1998, NONLINEAR MODELING, P29
[7]   Recurrent neural networks with small weights implement definite memory machines [J].
Hammer, B ;
Tino, P .
NEURAL COMPUTATION, 2003, 15 (08) :1897-1929
[8]   On the emergence of rules in neural networks [J].
Hanson, SJ ;
Negishi, M .
NEURAL COMPUTATION, 2002, 14 (09) :2245-2268
[9]  
KOLEN JF, 1994, PROCEEDINGS OF THE 1993 CONNECTIONIST MODELS SUMMER SCHOOL, P203
[10]  
KOLEN JF, 1994, PROCEEDINGS OF THE SIXTEENTH ANNUAL CONFERENCE OF THE COGNITIVE SCIENCE SOCIETY, P508