Bootstrapping language acquisition

被引:47
作者
Abend, Omri [1 ,2 ,3 ]
Kwiatkowski, Tom [1 ,4 ]
Smith, Nathaniel J. [1 ,5 ]
Goldwater, Sharon [1 ]
Steedman, Mark [1 ]
机构
[1] Univ Edinburgh, Informat, Edinburgh, Midlothian, Scotland
[2] Hebrew Univ Jerusalem, Dept Comp Sci, Jerusalem, Israel
[3] Hebrew Univ Jerusalem, Dept Cognit Sci, Jerusalem, Israel
[4] Google Res, Mountain View, CA USA
[5] Univ Calif Berkeley, Berkeley Inst Data Sci, Berkeley, CA 94720 USA
关键词
Language acquisition; Syntactic bootstrapping; Semantic bootstrapping; Computational modeling; Bayesian model; Cross-situational learning; COMPUTATIONAL MODEL; GRAMMAR; WORDS; CUE; INFORMATION; EMERGENCE; MECHANISM; SELECTION; TURKISH; SYNTAX;
D O I
10.1016/j.cognition.2017.02.009
中图分类号
B84 [心理学];
学科分类号
04 ; 0402 ;
摘要
The semantic bootstrapping hypothesis proposes that children acquire their native language through exposure to sentences of the language paired with structured representations of their meaning, whose component substructures can be associated with words and syntactic structures used to express these concepts. The child's task is then to learn a language-specific grammar and lexicon based on (probably contextually ambiguous, possibly somewhat noisy) pairs of sentences and their meaning representations (logical forms). Starting from these assumptions, we develop a Bayesian probabilistic account of semantically bootstrapped first-language acquisition in the child, based on techniques from computational parsing and interpretation of unrestricted text. Our learner jointly models (a) word learning: the mapping between components of the given sentential meaning and lexical words (or phrases) of the language, and (b) syntax learning: the projection of lexical elements onto sentences by universal construction-free syntactic rules. Using an incremental learning algorithm, we apply the model to a dataset of real syntactically complex child-directed utterances and (pseudo) logical forms, the latter including contextually plausible but irrelevant distractors. Taking the Eve section of the CHILDES corpus as input, the model simulates several well-documented phenomena from the developmental literature. In particular, the model exhibits syntactic bootstrapping effects (in which previously learned constructions facilitate the learning of novel words), sudden jumps in learning without explicit parameter setting, acceleration of word-learning (the "vocabulary spurt"), an initial bias favoring the learning of nouns over verbs, and one-shot learning of words and their meanings. The learner thus demonstrates how statistical learning over structured representations can provide a unified account for these seemingly disparate phenomena. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:116 / 143
页数:28
相关论文
共 172 条
[1]  
Abend O, 2010, ACL 2010: 48TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, P1298
[2]  
Alishahi A., 2012, Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning, P643
[3]   A computational model of early argument structure acquisition [J].
Alishahi, Afra ;
Stevenson, Suzanne .
COGNITIVE SCIENCE, 2008, 32 (05) :789-834
[4]   A computational model of learning semantic roles from child-directed language [J].
Alishahi, Afra ;
Stevenson, Suzanne .
LANGUAGE AND COGNITIVE PROCESSES, 2010, 25 (01) :50-93
[5]  
Allen J, 1999, CARN S COGN, P115
[6]   Child language acquisition: Why universal grammar doesn't help [J].
Ambridge, Ben ;
Pine, Julian M. ;
Lieven, Elena V. M. .
LANGUAGE, 2014, 90 (03) :E53-E90
[7]   INDUCTIVE INFERENCE OF FORMAL LANGUAGES FROM POSITIVE DATA [J].
ANGLUIN, D .
INFORMATION AND CONTROL, 1980, 45 (02) :117-135
[8]  
[Anonymous], 2007, P 2007 JOINT C EMP M
[9]  
[Anonymous], P ACL WORKSH COGN MO
[10]  
[Anonymous], FORMS OF ENGLISH