Word knowledge in the crowd: Measuring vocabulary size and word prevalence in a massive online experiment

被引:172
作者
Keuleers, Emmanuel [1 ]
Stevens, Michael [1 ]
Mandera, Pawel [1 ]
Brysbaert, Marc [1 ]
机构
[1] Univ Ghent, Dept Expt Psychol, B-9000 Ghent, Belgium
关键词
Prevalence; Herdan's law; Bilingualism; Frequency; Ageing; Crowdsourcing; FREQUENCY; RECOGNITION; LEXICON; RATINGS; DECLINE; MEMORY; NORMS;
D O I
10.1080/17470218.2015.1022560
中图分类号
B84 [心理学];
学科分类号
04 ; 0402 ;
摘要
We use the results of a large online experiment on word knowledge in Dutch to investigate variables influencing vocabulary size in a large population and to examine the effect of word prevalence-the percentage of a population knowing a word-as a measure of word occurrence. Nearly 300,000 participants were presented with about 70 word stimuli (selected from a list of 53,000 words) in an adapted lexical decision task. We identify age, education, and multilingualism as the most important factors influencing vocabulary size. The results suggest that the accumulation of vocabulary throughout life and in multiple languages mirrors the logarithmic growth of number of types with number of tokens observed in text corpora (Herdan's law). Moreover, the vocabulary that multilinguals acquire in related languages seems to increase their first language (L1) vocabulary size and outweighs the loss caused by decreased exposure to L1. In addition, we show that corpus word frequency and prevalence are complementary measures of word occurrence covering a broad range of language experiences. Prevalence is shown to be the strongest independent predictor of word processing times in the Dutch Lexicon Project, making it an important variable for psycholinguistic research.
引用
收藏
页码:1665 / 1692
页数:28
相关论文
共 34 条
[1]   Contextual diversity, not word frequency, determines word-naming and lexical decision times [J].
Adelman, James S. ;
Brown, Gordon D. A. ;
Quesada, Jose F. .
PSYCHOLOGICAL SCIENCE, 2006, 17 (09) :814-823
[2]  
[Anonymous], 2007, The Language Teacher
[3]   Visual word recognition of single-syllable words [J].
Balota, DA ;
Cortese, MJ ;
Sergent-Marshall, SD ;
Spieler, DH ;
Yap, MJ .
JOURNAL OF EXPERIMENTAL PSYCHOLOGY-GENERAL, 2004, 133 (02) :283-316
[4]   The English Lexicon Project [J].
Balota, David A. ;
Yap, Melvin J. ;
Cortese, Michael J. ;
Hutchison, Keith A. ;
Kessler, Brett ;
Loftis, Bjorn ;
Neely, James H. ;
Nelson, Douglas L. ;
Simpson, Greg B. ;
Treiman, Rebecca .
BEHAVIOR RESEARCH METHODS, 2007, 39 (03) :445-459
[5]   Concreteness ratings for 40 thousand generally known English word lemmas [J].
Brysbaert, Marc ;
Warriner, Amy Beth ;
Kuperman, Victor .
BEHAVIOR RESEARCH METHODS, 2014, 46 (03) :904-911
[6]   Moving beyond Kucera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English [J].
Brysbaert, Marc ;
New, Boris .
BEHAVIOR RESEARCH METHODS, 2009, 41 (04) :977-990
[7]  
Doran Harold, 2007, JOURNAL OF STATISTICAL SOFTWARE, V20
[8]   Smart Phone, Smart Science: How the Use of Smartphones Can Revolutionize Research in Cognitive Science [J].
Dufau, Stephane ;
Dunabeitia, Jon Andoni ;
Moret-Tatay, Carmen ;
McGonigal, Aileen ;
Peeters, David ;
Alario, F. -Xavier ;
Balota, David A. ;
Brysbaert, Marc ;
Carreiras, Manuel ;
Ferrand, Ludovic ;
Ktori, Maria ;
Perea, Manuel ;
Rastle, Kathy ;
Sasburg, Olivier ;
Yap, Melvin J. ;
Ziegler, Johannes C. ;
Grainger, Jonathan .
PLOS ONE, 2011, 6 (09)
[9]   Using Mechanical Turk to Obtain and Analyze English Acceptability Judgments [J].
Gibson, Edward ;
Piantadosi, Steve ;
Fedorenko, Kristina .
LANGUAGE AND LINGUISTICS COMPASS, 2011, 5 (08) :509-524
[10]   More use almost always means a smaller frequency effect: Aging, bilingualism, and the weaker links hypothesis [J].
Gollan, Tamar H. ;
Montoya, Rosa I. ;
Cera, Cynthia ;
Sandoval, Tiffany C. .
JOURNAL OF MEMORY AND LANGUAGE, 2008, 58 (03) :787-814