Word Acquisition in Neural Language Models

被引:20
作者
Chang, Tyler A. [1 ,2 ]
Bergen, Benjamin K. [1 ]
机构
[1] Univ Calif San Diego, Dept Cognit Sci, San Diego, CA 92093 USA
[2] Univ Calif San Diego, Halicioglu Data Sci Inst, San Diego, CA 92093 USA
关键词
ENGLISH; BIRTH;
D O I
10.1162/tacl_a_00444
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We investigate how neural language models acquire individual words during training, extracting learning curves and ages of acquisition for over 600 words on the MacArthur-Bates Communicative Development Inventory (Fenson et al., 2007). Drawing on studies of word acquisition in children, we evaluate multiple predictors for words' ages of acquisition in LSTMs, BERT, and GPT-2. We find that the effects of concreteness, word length, and lexical class are pointedly different in children and language models, reinforcing the importance of interaction and sensorimotor experience in child language acquisition. Language models rely far more on word frequency than children, but, like children, they exhibit slower learning of words in longer utterances. Interestingly, models follow consistent patterns during training for both unidirectional and bidirectional models, and for both LSTM and Transformer architectures. Models predict based on unigram token frequencies early in training, before transitioning loosely to bigram probabilities, eventually converging on more nuanced predictions. These results shed light on the role of distributional learning mechanisms in children, while also providing insights for more human-like language acquisition in language models.
引用
收藏
页码:1 / 16
页数:16
相关论文
共 50 条
[1]  
Aina L, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P3342
[2]  
[Anonymous], 1986, Parallel Distributed Processing: Explorations in the Microstructure of Cognition
[3]   VQA: Visual Question Answering [J].
Antol, Stanislaw ;
Agrawal, Aishwarya ;
Lu, Jiasen ;
Mitchell, Margaret ;
Batra, Dhruv ;
Zitnick, C. Lawrence ;
Parikh, Devi .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2425-2433
[4]   Distributional Language Learning: Mechanisms and Models of Category Formation [J].
Aslin, Richard N. ;
Newport, Elissa L. .
LANGUAGE LEARNING, 2014, 64 :86-105
[5]  
BENDER E. M., 2020, P 58 ANN M ASS COMP, P5185, DOI [10.18653/v1/2020.acl-main.463, DOI 10.18653/V1/2020.ACL-MAIN.463, 10.18653/v1/2020.aclmain.463, DOI 10.18653/V1/2020.ACLMAIN.463]
[6]   Distributional Semantics and Linguistic Theory [J].
Boleda, Gemma .
ANNUAL REVIEW OF LINGUISTICS, VOL 6, 2020, 6 :213-234
[7]  
Braginsky M., 2016, Proceedings of the 38th Annual Conference of the Cognitive Science Society, P1691
[8]  
Brown TB, 2020, ADV NEUR IN, V33
[9]   Concreteness ratings for 40 thousand generally known English word lemmas [J].
Brysbaert, Marc ;
Warriner, Amy Beth ;
Kuperman, Victor .
BEHAVIOR RESEARCH METHODS, 2014, 46 (03) :904-911
[10]   Polysemy and brevity versus frequency in language [J].
Casas, Bernardino ;
Hernandez-Fernandez, Antoni ;
Catala, Neus ;
Ferrer-i-Cancho, Ramon ;
Baixeries, Jaume .
COMPUTER SPEECH AND LANGUAGE, 2019, 58 :19-50