The Acquisition of Noun and Verb Categories by Bootstrapping From a Few Known Words: A Computational Model

被引:6
作者
Brusini, Perrine [1 ,2 ]
Seminck, Olga [3 ]
Amsili, Pascal [3 ]
Christophe, Anne [2 ]
机构
[1] Univ Liverpool, Dept Psychol Sci, Liverpool, Merseyside, England
[2] PSL Univ, Ecole Normal Super, Ctr Natl Rech Sci, Lab Sci Cognit & Psychollnguist, Paris, France
[3] Univ Sorbonne Nouvelle, PSL Univ, Ecola Normale Super,Ctr Natl Rech Sci, Lab Langues Textes Traitements Informat Cognit La, Paris, France
基金
英国经济与社会研究理事会;
关键词
language development; acquisition of syntax; computational modeling; semantic seed; noun; verb; French; FREQUENT FRAMES; GRAMMATICAL CATEGORIES; SYNTACTIC CATEGORIES; FUNCTIONAL MORPHEMES; GENDER INFORMATION; SOUND PATTERNS; YOUNG-CHILDREN; INFANTS KNOW; LANGUAGE; DETERMINERS;
D O I
10.3389/fpsyg.2021.661479
中图分类号
B84 [心理学];
学科分类号
04 ; 0402 ;
摘要
While many studies have shown that toddlers are able to detect syntactic regularities in speech, the learning mechanism allowing them to do this is still largely unclear. In this article, we use computational modeling to assess the plausibility of a context-based learning mechanism for the acquisition of nouns and verbs. We hypothesize that infants can assign basic semantic features, such as "is-an-object" and/or "is-an-action," to the very first words they learn, then use these words, the semantic seed, to ground proto-categories of nouns and verbs. The contexts in which these words occur, would then be exploited to bootstrap the noun and verb categories: unknown words are attributed to the class that has been observed most frequently in the corresponding context. To test our hypothesis, we designed a series of computational experiments which used French corpora of child-directed speech and different sizes of semantic seed. We partitioned these corpora in training and test sets: the model extracted the two-word contexts of the seed from the training sets, then used them to predict the syntactic category of content words from the test sets. This very simple algorithm demonstrated to be highly efficient in a categorization task: even the smallest semantic seed (only 8 nouns and 1 verb known) yields a very high precision (similar to 90% of new nouns; similar to 80% of new verbs). Recall, in contrast, was low for small seeds, and increased with the seed size. Interestingly, we observed that the contexts used most often by the model featured function words, which is in line with what we know about infants' language development. Crucially, for the learning method we evaluated here, all initialization hypotheses are plausible and fit the developmental literature (semantic seed and ability to analyse contexts). While this experiment cannot prove that this learning mechanism is indeed used by infants, it demonstrates the feasibility of a realistic learning hypothesis, by using an algorithm that relies on very little computational and memory resources. Altogether, this supports the idea that a probabilistic, context-based mechanism can be very efficient for the acquisition of syntactic categories in infants.
引用
收藏
页数:18
相关论文
共 103 条
[1]   The role of discourse novelty in early word learning [J].
Akhtar, N ;
Carpenter, M ;
Tomasello, M .
CHILD DEVELOPMENT, 1996, 67 (02) :635-645
[2]  
[Anonymous], 2013, Significance, DOI DOI 10.1111/J.1740-9713.2013.00708.X
[3]   Meaning from syntax: Evidence from 2-year-olds [J].
Arunachalam, Sudha ;
Waxman, Sandra R. .
COGNITION, 2010, 114 (03) :442-446
[4]   14-month-olds exploit verbs' syntactic contexts to build expectations about novel words [J].
Babineau, Mireille ;
Shi, Rushen ;
Christophe, Anne .
INFANCY, 2020, 25 (05) :719-733
[5]   Familiar words can serve as a semantic seed for syntactic bootstrapping [J].
Babineau, Mireille ;
de Carvalho, Alex ;
Trueswell, John ;
Christophe, Anne .
DEVELOPMENTAL SCIENCE, 2021, 24 (01)
[6]   Modeling children's early grammatical knowledge [J].
Bannard, Colin ;
Lieven, Elena ;
Tomasello, Michael .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2009, 106 (41) :17284-17289
[7]  
Bates D.M., 2007, **DATA OBJECT**
[8]   Fitting Linear Mixed-Effects Models Using lme4 [J].
Bates, Douglas ;
Maechler, Martin ;
Bolker, Benjamin M. ;
Walker, Steven C. .
JOURNAL OF STATISTICAL SOFTWARE, 2015, 67 (01) :1-48
[9]   Early Word Comprehension in Infants: Replication and Extension [J].
Bergelson, Elika ;
Swingley, Daniel .
LANGUAGE LEARNING AND DEVELOPMENT, 2015, 11 (04) :369-380
[10]   The acquisition of abstract words by young infants [J].
Bergelson, Elika ;
Swingley, Daniel .
COGNITION, 2013, 127 (03) :391-397